-
-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't monkeypatch Nokogiri in 1.9 #40
Conversation
OMG I'd love to remove this so much. Let me make sure I have a test in here about it. |
Looks like the test didn't make it over unfortunately or maybe I never wrote one? In any case, this or the github/github should blow up all over if it's a problem. |
This would be so 🤘. I left it as a TODO because I didn't know what the cause of it was. |
❤️ |
@defunkt do you have a minimal failing test case to go with this? I expected this to fail on 1.8, but it seems to run fine with or without the monkey patch. # encoding: utf-8
require 'test_helper'
class NokogiriReplaceTest < Test::Unit::TestCase
def test_replace_utf8
s = 'Öregtagok'
doc = Nokogiri::HTML.fragment('<div>listázása</div>')
doc.children.first.children.first.replace(s)
assert_equal '<div>Öregtagok</div>', doc.to_s
end
end |
@jch IIRC the bug triggers when you create a fragment that has a top-level text node. i.e. without the |
Yep exactly. Removing the |
Still no go: # encoding: utf-8
require 'test_helper'
class NokogiriReplaceTest < Test::Unit::TestCase
def test_replace_utf8
s = 'Öregtagok'
doc = Nokogiri::HTML.fragment('listázása')
doc.children.first.replace(s)
assert_equal 'Öregtagok', doc.to_s
end
end Inspecting #<Nokogiri::HTML::DocumentFragment:0x828c94e8 name="#document-fragment" children=[#<Nokogiri::XML::Text:0x828c922c "list\303\241z\303\241sa">]> When you mean a top level text node, do you mean one that lives in just a |
That should repro.. maybe it depends on the nokogiri version also? |
@tmm1 I'm using |
Yep, that was it. The fix was introduced sometime after 1.4.4. I'll add another travis build to test for this. Merging this. |
Interesting. Try with https://github.com/jch/html-pipeline/blob/master/lib/html/pipeline.rb#L49-L58 Also from the original comments:
test "utf-8 output 2" do
@comment.body = "And :-1: on the smileys – I prefer ASCII smileys :P"
assert @comment.body_html.include?('–')
end |
Awesome. |
Unfortunately, this isn't quite over yet. Removing the monkey patch in ruby 1.9 raises an exception unless it's used with the latest version of nokogiri (1.5.6, I scripted running the test against all released versions starting from 1.4.4). This same exception also appears in ruby 1.8, but is also fixed by nokogiri 1.5.6. test_replace_utf8(NokogiriReplaceTest):
RuntimeError: error parsing fragment (1)
nokogiri-1.5.5/lib/nokogiri/xml/node.rb:532:in `in_context'
nokogiri-1.5.5/lib/nokogiri/xml/node.rb:532:in `parse'
nokogiri-1.5.5/lib/nokogiri/html/document_fragment.rb:22:in `initialize'
nokogiri-1.5.5/lib/nokogiri/xml/node.rb:508:in `new'
nokogiri-1.5.5/lib/nokogiri/xml/node.rb:508:in `fragment'
nokogiri-1.5.5/lib/nokogiri/xml/node.rb:925:in `coerce'
nokogiri-1.5.5/lib/nokogiri/xml/node.rb:406:in `replace'
test/html/pipeline/nokogiri_replace_test.rb:8:in `test_replace_utf8' |
We use nokogiri 1.4.4 in .com. I wonder why it works there on 1.9. |
It works in .com with 1.9 because the monkey patch is still there. On Monday, February 25, 2013, Aman Gupta wrote:
-Jerry |
This monkeypatch doesn't seem necessary in 1.9 - without it everything works, and with it strings lose their encoding.