ruby - Nokogiri XML Parser with Bad Attribute Values -

- August 15, 2013

i can't find documentation on difference between how nokogiri (or implication libxml) handles attribute values in xml vs. html. 1 of our projects still using defunct hpricot gem, because of it's lax acceptance of attributes.

the crux of problem seems our xml input has both unquoted , missing attribute values. i'm not spec lawyer, gather of html variants allow these attribute patterns , xml not.

if nokogiri (or libxml) going strict, shouldn't there option make less strict on attributes? if html parser not strip namespaces, maybe use that.

we can't team has xmlish formats aren't fish or fowl in between. if fix @ source might that, in meantime have handle format is.

this hack fix attributes before sending nokogiri:

attr_re = /[^\s=>]+\s*(?:=(?:[^\s'">]+|\s*"[^"]*"|\s*'[^']*'))?/mo  element_re = /(<\s*[:\w]+)((?:\s+#{attr_re})*)(\s*>)/mo    nokogiri::xml(    data.gsub(element_re) |m|      open, close = $1, $3      ([open] +       $2.scan(attr_re).map |atr|         if atr =~ /=[ '"]/           atr         elsif atr =~ /=/           "#{$`.strip}=\"#{$'.strip}\""         else           "#{atr.strip}=\"#{atr.strip}\""         end       end      ) * ' ' + close    end  )

Search This Blog

If cop

ruby - Nokogiri XML Parser with Bad Attribute Values -

Comments

Post a Comment

Popular posts from this blog

Android volley - avoid multiple requests of the same kind to the server? -

magento2 - Magento 2 admin grid add filter to collection -

Combining PHP Registration and Login into one class with multiple functions in one PHP file -