Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] SAX start_element behavior changed in libxml v2.12.0 #3148

Closed
flavorjones opened this issue Mar 12, 2024 · 7 comments · Fixed by #3151 or #3153
Closed

[bug] SAX start_element behavior changed in libxml v2.12.0 #3148

flavorjones opened this issue Mar 12, 2024 · 7 comments · Fixed by #3151 or #3153

Comments

@flavorjones
Copy link
Member

Please describe the bug

Originally reported at searls/eiwa#10

#! /usr/bin/env ruby

require "bundler/inline"

gemfile do
  source "https://rubygems.org"
  gem "nokogiri", "~>1.15.0"
end

class Document < Nokogiri::XML::SAX::Document
  def start_element(name, attrs)
    puts "#{__FILE__}:#{__LINE__}:#{__method__}: name=#{name.inspect}, attrs=#{attrs.inspect}"
  end
end

fixture = <<~XML
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root [
  <!ATTLIST foo xml:lang CDATA "eng">
]>
<root>
  <foo xml:lang="ger">Ja</foo>
</root>
XML

parser = Nokogiri::XML::SAX::Parser.new(Document.new)
parser.parse(fixture)

# with nokogiri < 1.16.0:
# ./10-sax-issue.rb:12:start_element: name="root", attrs=[]
# ./10-sax-issue.rb:12:start_element: name="foo", attrs=[["xml:lang", "ger"]]
# 
# with nokogiri >= 1.16.0:
# ./10-sax-issue.rb:12:start_element: name="root", attrs=[]
# ./10-sax-issue.rb:12:start_element: name="foo", attrs=[["xml:lang", "ger"], ["xml:lang", "eng"]]
@flavorjones flavorjones added the state/needs-triage Inbox for non-installation-related bug reports or help requests label Mar 12, 2024
@flavorjones
Copy link
Member Author

Just confirming that this seems to be an upstream issue. I can reproduce it using xmllint and am going to git bisect.

@flavorjones
Copy link
Member Author

flavorjones commented Mar 12, 2024

Upstream commit is https://gitlab.gnome.org/GNOME/libxml2/-/commit/e0dd330b which first appeared in libxml 2.12.0

commit e0dd330b (HEAD)
Author: Nick Wellnhofer <[email protected]>
Date:   2023-09-29 00:18:44 +0200

    parser: Use hash tables to avoid quadratic behavior

    Use a hash table to lookup namespaces by prefix. The hash table stores
    an index into the namespace table. Auxiliary data for namespaces is
    stored in a separate array along the main namespace table.

    Use a hash table to verify attribute uniqueness. The hash table stores
    an index into the attribute table.

    Reuse hash value from the dictionary to avoid computing them twice.

    See #346.

Linked issue is https://gitlab.gnome.org/GNOME/libxml2/-/issues/346

@flavorjones
Copy link
Member Author

flavorjones commented Mar 12, 2024

I've created an issue upstream: https://gitlab.gnome.org/GNOME/libxml2/-/issues/704

@flavorjones flavorjones added upstream/libxml2 and removed state/needs-triage Inbox for non-installation-related bug reports or help requests labels Mar 12, 2024
@flavorjones flavorjones changed the title [bug] SAX parsing behavior changes [bug] SAX start_element behavior changed in libxml v2.12.0 Mar 12, 2024
@flavorjones
Copy link
Member Author

Fixed upstream in https://gitlab.gnome.org/GNOME/libxml2/-/commit/186562a182d2e27f90631d1a1f63ad5079fe62fb

Not sure whether Nick will make a release soon, but if not I can patch this fix into the vendored version in a bugfix release.

@flavorjones
Copy link
Member Author

Fix released upstream in v2.12.6, working on a release for that (unrelated blockers exist so it may be a day or two).

@flavorjones
Copy link
Member Author

Release imminent, please follow #3151

flavorjones added a commit that referenced this issue Mar 15, 2024
**What problem is this PR intended to solve?**

Update to v2.12.6, see
https://gitlab.gnome.org/GNOME/libxml2/-/releases/v2.12.6

This also makes a small change to `XML::Reader` to accommodate changes
in how libxml2 reports the encoding of the Reader (see
https://gitlab.gnome.org/GNOME/libxml2/-/issues/697 for details).

Closes #3148
@flavorjones
Copy link
Member Author

v1.16.3 has been released which fixes this: https://github.com/sparklemotion/nokogiri/releases/tag/v1.16.3

flavorjones added a commit that referenced this issue Mar 15, 2024
**What problem is this PR intended to solve?**

Update to v2.12.6, see
https://gitlab.gnome.org/GNOME/libxml2/-/releases/v2.12.6

This also makes a small change to `XML::Reader` to accommodate changes
in how libxml2 reports the encoding of the Reader (see
https://gitlab.gnome.org/GNOME/libxml2/-/issues/697 for details).

Closes #3148

(see related #3151)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant