Skip to content

Commit d6c4a2b

Browse files
authored
Merge pull request #2602 from sparklemotion/flavorjones-fix-reader-node-gc_backport-v1.13.x
fix: XML::Reader XML::Attr garbage collection (backport to v1.13.x) --- **What problem is this PR intended to solve?** This is a proposed fix for #2598, see that issue for an extended explanation of the problem. This PR implements "option 2" from that issue's proposed solutions: - introduce a new `Reader#attribute_hash` that will return a `Hash<String ⇒ String>` (instead of an `Array<XML::Attr>`) - deprecate `Reader#attribute_nodes` with a plan to remove it entirely in a future release - re-implement `Reader#attributes` to use `#attribute_hash` (instead of `#attribute_nodes`) After this change, only applications calling `Reader#attribute_nodes` directly will be running the unsafe code. These users will see a deprecation warning and may use `#attribute_hash` as a replacement. I think it's very possible that `Reader#attribute_hash` won't meet the needs of people who are working with namespaced attributes and are using `#attribute_nodes` for this purpose. However, I'm intentionally deferring any attempt to solve that DX problem until someone who needs this functionality asks for it. **Have you included adequate test coverage?** I tried and failed to add test coverage to the suite that would reproduce the underlying GC bug. However, existing test coverage of `Reader#attributes` is sufficient for now. **Does this change affect the behavior of either the C or the Java implementations?** This PR modifies both the C and Java implementations to behave the same. Notably, the Java implementation contains a small bugfix which is that `Reader#namespaces` now returns an empty hash when there are no namespaces (it previously returned `nil`).
2 parents 88b4730 + 80e888c commit d6c4a2b

File tree

10 files changed

+215
-32
lines changed

10 files changed

+215
-32
lines changed

.github/workflows/ci.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -282,12 +282,12 @@ jobs:
282282
fail-fast: false
283283
matrix:
284284
sys: ["enable", "disable"]
285-
runs-on: macos-10.15
285+
runs-on: macos-12
286286
steps:
287287
- uses: actions/checkout@v2
288288
with:
289289
submodules: true
290-
- uses: vmactions/freebsd-vm@v0.1.5
290+
- uses: vmactions/freebsd-vm@v0.2.0
291291
with:
292292
usesh: true
293293
prepare: pkg install -y ruby devel/ruby-gems pkgconf libxml2 libxslt

.github/workflows/downstream.yml

+73
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
name: downstream
2+
concurrency:
3+
group: "${{github.workflow}}-${{github.ref}}"
4+
cancel-in-progress: true
5+
on:
6+
workflow_dispatch:
7+
schedule:
8+
- cron: "0 8 * * 1,3,5" # At 08:00 on Monday, Wednesday, and Friday # https://crontab.guru/#0_8_*_*_1,3,5
9+
push:
10+
branches:
11+
- main
12+
- v*.*.x
13+
tags:
14+
- v*.*.*
15+
pull_request:
16+
types: [opened, synchronize]
17+
branches:
18+
- '*'
19+
20+
jobs:
21+
downstream:
22+
name: downstream-${{matrix.name}}
23+
strategy:
24+
fail-fast: false
25+
matrix:
26+
include:
27+
- url: https://github.com/flavorjones/loofah
28+
name: loofah
29+
command: "bundle exec rake test"
30+
- url: https://github.com/rails/rails-html-sanitizer
31+
name: rails-html-sanitizer
32+
command: "bundle exec rake test"
33+
- url: https://github.com/rgrove/sanitize
34+
name: sanitize
35+
command: "bundle exec rake test"
36+
- url: https://github.com/ebeigarts/signer
37+
name: signer
38+
command: "bundle exec rake spec"
39+
- url: https://github.com/WinRb/Viewpoint
40+
name: viewpoint
41+
command: "bundle exec rspec spec"
42+
- url: https://github.com/rails/rails
43+
name: xmlmini
44+
command: "cd activesupport && bundle exec rake test TESTOPTS=-n/XmlMini/"
45+
- url: https://github.com/pythonicrubyist/creek
46+
name: creek
47+
command: "bundle exec rake spec"
48+
runs-on: ubuntu-latest
49+
container:
50+
image: ghcr.io/sparklemotion/nokogiri-test:mri-3.1
51+
steps:
52+
- uses: actions/checkout@v2
53+
with:
54+
submodules: true
55+
- uses: actions/cache@v2
56+
with:
57+
path: ports
58+
key: ports-ubuntu-${{hashFiles('dependencies.yml', 'patches/**/*.patch')}}
59+
- run: bundle install --local || bundle install
60+
- run: bundle exec rake compile
61+
- run: |
62+
git clone --depth=1 ${{matrix.url}} ${{matrix.name}}
63+
cd ${{matrix.name}}
64+
if grep nokogiri Gemfile ; then
65+
sed -i 's/\(.*nokogiri.*\)/\1, path: ".."/' Gemfile
66+
else
67+
echo "gem 'nokogiri', path: '..'" >> Gemfile
68+
fi
69+
if egrep "add_development_dependency.*\bbundler\b" *gemspec ; then
70+
sed -i 's/.*add_development_dependency.*\bbundler\b.*//' *gemspec
71+
fi
72+
bundle install --local || bundle install
73+
${{matrix.command}}

ext/java/nokogiri/XmlReader.java

+8
Original file line numberDiff line numberDiff line change
@@ -141,9 +141,17 @@ public class XmlReader extends RubyObject
141141
public IRubyObject
142142
attribute_nodes(ThreadContext context)
143143
{
144+
context.runtime.getWarnings().warn("Reader#attribute_nodes is deprecated and will be removed in a future version of Nokogiri. Please use Reader#attribute_hash instead.");
144145
return currentNode().getAttributesNodes();
145146
}
146147

148+
@JRubyMethod
149+
public IRubyObject
150+
attribute_hash(ThreadContext context)
151+
{
152+
return currentNode().getAttributes(context);
153+
}
154+
147155
@JRubyMethod(name = "attributes?")
148156
public IRubyObject
149157
attributes_p(ThreadContext context)

ext/java/nokogiri/internals/ReaderNode.java

+3-2
Original file line numberDiff line numberDiff line change
@@ -112,9 +112,10 @@ public abstract class ReaderNode
112112
getAttributes(ThreadContext context)
113113
{
114114
final Ruby runtime = context.runtime;
115-
if (attributeList == null) { return runtime.getNil(); }
116115
RubyHash hash = RubyHash.newHash(runtime);
116+
if (attributeList == null) { return hash; }
117117
for (int i = 0; i < attributeList.length; i++) {
118+
if (isNamespace(attributeList.names.get(i))) { continue; }
118119
IRubyObject k = stringOrBlank(runtime, attributeList.names.get(i));
119120
IRubyObject v = stringOrBlank(runtime, attributeList.values.get(i));
120121
hash.fastASetCheckString(runtime, k, v); // hash.op_aset(context, k, v)
@@ -150,8 +151,8 @@ public abstract class ReaderNode
150151
getNamespaces(ThreadContext context)
151152
{
152153
final Ruby runtime = context.runtime;
153-
if (namespaces == null) { return runtime.getNil(); }
154154
RubyHash hash = RubyHash.newHash(runtime);
155+
if (namespaces == null) { return hash; }
155156
for (Map.Entry<String, String> entry : namespaces.entrySet()) {
156157
IRubyObject k = stringOrBlank(runtime, entry.getKey());
157158
IRubyObject v = stringOrBlank(runtime, entry.getValue());

ext/nokogiri/extconf.rb

+1
Original file line numberDiff line numberDiff line change
@@ -974,6 +974,7 @@ def compile
974974
have_func("xmlSchemaSetValidStructuredErrors") # introduced in libxml 2.6.23
975975
have_func("xmlSchemaSetParserStructuredErrors") # introduced in libxml 2.6.23
976976
have_func("rb_gc_location") # introduced in Ruby 2.7
977+
have_func("rb_category_warning") # introduced in Ruby 3.0
977978

978979
have_func("vasprintf")
979980

ext/nokogiri/nokogiri.h

+6
Original file line numberDiff line numberDiff line change
@@ -202,6 +202,12 @@ NOKOPUBFUN VALUE Nokogiri_wrap_xml_document(VALUE klass,
202202
#define DISCARD_CONST_QUAL(t, v) ((t)(uintptr_t)(v))
203203
#define DISCARD_CONST_QUAL_XMLCHAR(v) DISCARD_CONST_QUAL(xmlChar *, v)
204204

205+
#if HAVE_RB_CATEGORY_WARNING
206+
# define NOKO_WARN_DEPRECATION(message) rb_category_warning(RB_WARN_CATEGORY_DEPRECATED, message)
207+
#else
208+
# define NOKO_WARN_DEPRECATION(message) rb_warning(message)
209+
#endif
210+
205211
void Nokogiri_structured_error_func_save(libxmlStructuredErrorHandlerState *handler_state);
206212
void Nokogiri_structured_error_func_save_and_set(libxmlStructuredErrorHandlerState *handler_state, void *user_data,
207213
xmlStructuredErrorFunc handler);

ext/nokogiri/xml_node.c

+1-1
Original file line numberDiff line numberDiff line change
@@ -1806,7 +1806,7 @@ rb_xml_node_new(int argc, VALUE *argv, VALUE klass)
18061806
}
18071807
if (!rb_obj_is_kind_of(rb_document_node, cNokogiriXmlDocument)) {
18081808
// TODO: deprecate allowing Node
1809-
rb_warn("Passing a Node as the second parameter to Node.new is deprecated. Please pass a Document instead, or prefer an alternative constructor like Node#add_child. This will become an error in a future release of Nokogiri.");
1809+
NOKO_WARN_DEPRECATION("Passing a Node as the second parameter to Node.new is deprecated. Please pass a Document instead, or prefer an alternative constructor like Node#add_child. This will become an error in a future release of Nokogiri.");
18101810
}
18111811
Noko_Node_Get_Struct(rb_document_node, xmlNode, c_document_node);
18121812

ext/nokogiri/xml_reader.c

+52-1
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ has_attributes(xmlTextReaderPtr reader)
3131
return (0);
3232
}
3333

34+
// TODO: merge this function into the `namespaces` method implementation
3435
static void
3536
Nokogiri_xml_node_namespaces(xmlNodePtr node, VALUE attr_hash)
3637
{
@@ -150,7 +151,11 @@ namespaces(VALUE self)
150151
/*
151152
:call-seq: attribute_nodes() → Array<Nokogiri::XML::Attr>
152153
153-
Get the attributes of the current node as an Array of Attr
154+
Get the attributes of the current node as an Array of XML:Attr
155+
156+
⚠ This method is deprecated and unsafe to use. It will be removed in a future version of Nokogiri.
157+
158+
See related: #attribute_hash, #attributes
154159
*/
155160
static VALUE
156161
rb_xml_reader_attribute_nodes(VALUE rb_reader)
@@ -160,6 +165,10 @@ rb_xml_reader_attribute_nodes(VALUE rb_reader)
160165
VALUE attr_nodes;
161166
int j;
162167

168+
// TODO: deprecated, remove in Nokogiri v1.15, see https://github.com/sparklemotion/nokogiri/issues/2598
169+
// After removal, we can also remove all the "node_has_a_document" special handling from xml_node.c
170+
NOKO_WARN_DEPRECATION("Reader#attribute_nodes is deprecated and will be removed in a future version of Nokogiri. Please use Reader#attribute_hash instead.");
171+
163172
Data_Get_Struct(rb_reader, xmlTextReader, c_reader);
164173

165174
if (! has_attributes(c_reader)) {
@@ -181,6 +190,47 @@ rb_xml_reader_attribute_nodes(VALUE rb_reader)
181190
return attr_nodes;
182191
}
183192

193+
/*
194+
:call-seq: attribute_hash() → Hash<String ⇒ String>
195+
196+
Get the attributes of the current node as a Hash of names and values.
197+
198+
See related: #attributes and #namespaces
199+
*/
200+
static VALUE
201+
rb_xml_reader_attribute_hash(VALUE rb_reader)
202+
{
203+
VALUE rb_attributes = rb_hash_new();
204+
xmlTextReaderPtr c_reader;
205+
xmlNodePtr c_node;
206+
xmlAttrPtr c_property;
207+
208+
Data_Get_Struct(rb_reader, xmlTextReader, c_reader);
209+
210+
if (!has_attributes(c_reader)) {
211+
return rb_attributes;
212+
}
213+
214+
c_node = xmlTextReaderExpand(c_reader);
215+
c_property = c_node->properties;
216+
while (c_property != NULL) {
217+
VALUE rb_name = NOKOGIRI_STR_NEW2(c_property->name);
218+
VALUE rb_value = Qnil;
219+
xmlChar *c_value = xmlNodeGetContent((xmlNode *)c_property);
220+
221+
if (c_value) {
222+
rb_value = NOKOGIRI_STR_NEW2(c_value);
223+
xmlFree(c_value);
224+
}
225+
226+
rb_hash_aset(rb_attributes, rb_name, rb_value);
227+
228+
c_property = c_property->next;
229+
}
230+
231+
return rb_attributes;
232+
}
233+
184234
/*
185235
* call-seq:
186236
* attribute_at(index)
@@ -696,6 +746,7 @@ noko_init_xml_reader()
696746
rb_define_method(cNokogiriXmlReader, "attribute_at", attribute_at, 1);
697747
rb_define_method(cNokogiriXmlReader, "attribute_count", attribute_count, 0);
698748
rb_define_method(cNokogiriXmlReader, "attribute_nodes", rb_xml_reader_attribute_nodes, 0);
749+
rb_define_method(cNokogiriXmlReader, "attribute_hash", rb_xml_reader_attribute_hash, 0);
699750
rb_define_method(cNokogiriXmlReader, "attributes?", attributes_eh, 0);
700751
rb_define_method(cNokogiriXmlReader, "base_uri", rb_xml_reader_base_uri, 0);
701752
rb_define_method(cNokogiriXmlReader, "default?", default_eh, 0);

lib/nokogiri/xml/reader.rb

+6-8
Original file line numberDiff line numberDiff line change
@@ -83,16 +83,14 @@ def initialize(source, url = nil, encoding = nil) # :nodoc:
8383
end
8484
private :initialize
8585

86-
# Get the attributes of the current node as a Hash
86+
# Get the attributes and namespaces of the current node as a Hash.
8787
#
88-
# [Returns] (Hash<String, String>) Attribute names and values
88+
# This is the union of Reader#attribute_hash and Reader#namespaces
89+
#
90+
# [Returns]
91+
# (Hash<String, String>) Attribute names and values, and namespace prefixes and hrefs.
8992
def attributes
90-
attrs_hash = attribute_nodes.each_with_object({}) do |node, hash|
91-
hash[node.name] = node.to_s
92-
end
93-
ns = namespaces
94-
attrs_hash.merge!(ns) if ns
95-
attrs_hash
93+
attribute_hash.merge(namespaces)
9694
end
9795

9896
###

test/xml/test_reader.rb

+63-18
Original file line numberDiff line numberDiff line change
@@ -228,23 +228,61 @@ def test_attributes?
228228
reader.map(&:attributes?))
229229
end
230230

231-
def test_attributes
232-
reader = Nokogiri::XML::Reader.from_memory(<<-eoxml)
233-
<x xmlns:tenderlove='http://tenderlovemaking.com/'
234-
xmlns='http://mothership.connection.com/'
235-
>
236-
<tenderlove:foo awesome='true'>snuggles!</tenderlove:foo>
237-
</x>
238-
eoxml
231+
def test_reader_attributes
232+
reader = Nokogiri::XML::Reader.from_memory(<<~XML)
233+
<x xmlns:tenderlove='http://tenderlovemaking.com/' xmlns='http://mothership.connection.com/'>
234+
<tenderlove:foo awesome='true'>snuggles!</tenderlove:foo>
235+
</x>
236+
XML
239237
assert_empty(reader.attributes)
240238
assert_equal([{ "xmlns:tenderlove" => "http://tenderlovemaking.com/",
241239
"xmlns" => "http://mothership.connection.com/", },
242-
{}, { "awesome" => "true" }, {}, { "awesome" => "true" }, {},
240+
{},
241+
{ "awesome" => "true" },
242+
{},
243+
{ "awesome" => "true" },
244+
{},
243245
{ "xmlns:tenderlove" => "http://tenderlovemaking.com/",
244246
"xmlns" => "http://mothership.connection.com/", },],
245247
reader.map(&:attributes))
246248
end
247249

250+
def test_reader_attributes_hash
251+
reader = Nokogiri::XML::Reader.from_memory(<<~XML)
252+
<x xmlns:tenderlove='http://tenderlovemaking.com/' xmlns='http://mothership.connection.com/'>
253+
<tenderlove:foo awesome='true'>snuggles!</tenderlove:foo>
254+
</x>
255+
XML
256+
assert_empty(reader.attribute_hash)
257+
assert_equal([{},
258+
{},
259+
{ "awesome" => "true" },
260+
{},
261+
{ "awesome" => "true" },
262+
{},
263+
{},],
264+
reader.map(&:attribute_hash))
265+
end
266+
267+
def test_reader_namespaces
268+
reader = Nokogiri::XML::Reader.from_memory(<<~XML)
269+
<x xmlns:tenderlove='http://tenderlovemaking.com/' xmlns='http://mothership.connection.com/'>
270+
<tenderlove:foo awesome='true'>snuggles!</tenderlove:foo>
271+
</x>
272+
XML
273+
assert_empty(reader.namespaces)
274+
assert_equal([{ "xmlns:tenderlove" => "http://tenderlovemaking.com/",
275+
"xmlns" => "http://mothership.connection.com/", },
276+
{},
277+
{},
278+
{},
279+
{},
280+
{},
281+
{ "xmlns:tenderlove" => "http://tenderlovemaking.com/",
282+
"xmlns" => "http://mothership.connection.com/", },],
283+
reader.map(&:namespaces))
284+
end
285+
248286
def test_attribute_roundtrip
249287
reader = Nokogiri::XML::Reader.from_memory(<<-eoxml)
250288
<x xmlns:tenderlove='http://tenderlovemaking.com/'
@@ -434,7 +472,9 @@ def test_reader_node_attributes_keep_a_reference_to_the_reader
434472

435473
reader = Nokogiri::XML::Reader.from_memory(xml)
436474
reader.each do |element|
437-
attribute_nodes += element.attribute_nodes
475+
assert_output(nil, /Reader#attribute_nodes is deprecated/) do
476+
attribute_nodes += element.attribute_nodes
477+
end
438478
end
439479
end
440480

@@ -451,7 +491,9 @@ def test_namespaced_attributes
451491
eoxml
452492
attr_ns = []
453493
while reader.read
454-
if reader.node_type == Nokogiri::XML::Node::ELEMENT_NODE
494+
next unless reader.node_type == Nokogiri::XML::Node::ELEMENT_NODE
495+
496+
assert_output(nil, /Reader#attribute_nodes is deprecated/) do
455497
reader.attribute_nodes.each { |attr| attr_ns << (attr.namespace.nil? ? nil : attr.namespace.prefix) }
456498
end
457499
end
@@ -555,14 +597,17 @@ def test_read_from_memory
555597
end
556598

557599
def test_large_document_smoke_test
558-
# simply run on a large document to verify that there no GC issues
559-
xml = []
560-
xml << "<elements>"
561-
10000.times { |j| xml << "<element id=\"#{j}\"/>" }
562-
xml << "</elements>"
563-
xml = xml.join("\n")
600+
skip_unless_libxml2("valgrind tests should only run with libxml2")
564601

565-
Nokogiri::XML::Reader.from_memory(xml).each(&:attributes)
602+
refute_valgrind_errors do
603+
xml = []
604+
xml << "<elements>"
605+
10000.times { |j| xml << "<element id=\"#{j}\"/>" }
606+
xml << "</elements>"
607+
xml = xml.join("\n")
608+
609+
Nokogiri::XML::Reader.from_memory(xml).each(&:attributes)
610+
end
566611
end
567612

568613
def test_correct_outer_xml_inclusion

0 commit comments

Comments
 (0)