ruby - Nokogiri XML Parser with Bad Attribute Values -


i can't find documentation on difference between how nokogiri (or implication libxml) handles attribute values in xml vs. html. 1 of our projects still using defunct hpricot gem, because of it's lax acceptance of attributes.

the crux of problem seems our xml input has both unquoted , missing attribute values. i'm not spec lawyer, gather of html variants allow these attribute patterns , xml not.

if nokogiri (or libxml) going strict, shouldn't there option make less strict on attributes? if html parser not strip namespaces, maybe use that.

we can't team has xmlish formats aren't fish or fowl in between. if fix @ source might that, in meantime have handle format is.

this hack fix attributes before sending nokogiri:

attr_re = /[^\s=>]+\s*(?:=(?:[^\s'">]+|\s*"[^"]*"|\s*'[^']*'))?/mo  element_re = /(<\s*[:\w]+)((?:\s+#{attr_re})*)(\s*>)/mo    nokogiri::xml(    data.gsub(element_re) |m|      open, close = $1, $3      ([open] +       $2.scan(attr_re).map |atr|         if atr =~ /=[ '"]/           atr         elsif atr =~ /=/           "#{$`.strip}=\"#{$'.strip}\""         else           "#{atr.strip}=\"#{atr.strip}\""         end       end      ) * ' ' + close    end  )


Comments

Popular posts from this blog

magento2 - Magento 2 admin grid add filter to collection -

Android volley - avoid multiple requests of the same kind to the server? -

Combining PHP Registration and Login into one class with multiple functions in one PHP file -