Don't monkeypatch Nokogiri in 1.9 #40

defunkt · 2013-02-16T00:08:19Z

This monkeypatch doesn't seem necessary in 1.9 - without it everything works, and with it strings lose their encoding.

rtomayko · 2013-02-16T00:15:13Z

OMG I'd love to remove this so much. Let me make sure I have a test in here about it.

rtomayko · 2013-02-16T00:17:57Z

Looks like the test didn't make it over unfortunately or maybe I never wrote one? In any case, this or the github/github should blow up all over if it's a problem.

jch · 2013-02-16T00:38:35Z

This would be so 🤘. I left it as a TODO because I didn't know what the cause of it was.

mtodd · 2013-02-16T01:23:22Z

❤️

jch · 2013-02-25T22:49:47Z

@defunkt do you have a minimal failing test case to go with this? I expected this to fail on 1.8, but it seems to run fine with or without the monkey patch.

# encoding: utf-8
require 'test_helper'

class NokogiriReplaceTest < Test::Unit::TestCase
  def test_replace_utf8
    s = 'Öregtagok'
    doc = Nokogiri::HTML.fragment('<div>listázása</div>')
    doc.children.first.children.first.replace(s)
    assert_equal '<div>Öregtagok</div>', doc.to_s
  end
end

tmm1 · 2013-02-25T22:57:31Z

@jch IIRC the bug triggers when you create a fragment that has a top-level text node. i.e. without the <div> wrapper.

rtomayko · 2013-02-25T23:00:39Z

Yep exactly. Removing the <div> should repro under 1.8.7.

jch · 2013-02-25T23:15:03Z

Still no go:

# encoding: utf-8
require 'test_helper'

class NokogiriReplaceTest < Test::Unit::TestCase
  def test_replace_utf8
    s = 'Öregtagok'
    doc = Nokogiri::HTML.fragment('listázása')
    doc.children.first.replace(s)
    assert_equal 'Öregtagok', doc.to_s
  end
end

Inspecting doc yields:

#<Nokogiri::HTML::DocumentFragment:0x828c94e8 name="#document-fragment" children=[#<Nokogiri::XML::Text:0x828c922c "list\303\241z\303\241sa">]>

When you mean a top level text node, do you mean one that lives in just a NodeSet? I tried Nokogiri::HTML.parse, but that also creates a full document.

tmm1 · 2013-02-25T23:18:34Z

That should repro.. maybe it depends on the nokogiri version also?

jch · 2013-02-25T23:19:42Z

@tmm1 I'm using bundle exec, so it should be the same. I'll check the nokogiri version from when the patch was first introduced.

Don't monkeypatch Nokogiri in 1.9

jch · 2013-02-25T23:26:52Z

Yep, that was it. The fix was introduced sometime after 1.4.4. I'll add another travis build to test for this. Merging this.

rtomayko · 2013-02-25T23:30:05Z

Interesting. Try with HTML::Pipeline.parse("Öregtagok") maybe:

https://github.com/jch/html-pipeline/blob/master/lib/html/pipeline.rb#L49-L58

Also from the original comments:

See 'utf-8 output 2' in user_content_test.rb for details.

  test "utf-8 output 2" do
    @comment.body = "And :-1: on the smileys – I prefer ASCII smileys :P"
    assert @comment.body_html.include?('–')
  end

https://github.com/github/github/blob/master/test/models/user_content_test.rb#L73-L76

rtomayko · 2013-02-25T23:30:59Z

Awesome.

jch · 2013-02-26T00:49:59Z

Unfortunately, this isn't quite over yet. Removing the monkey patch in ruby 1.9 raises an exception unless it's used with the latest version of nokogiri (1.5.6, I scripted running the test against all released versions starting from 1.4.4). This same exception also appears in ruby 1.8, but is also fixed by nokogiri 1.5.6.

test_replace_utf8(NokogiriReplaceTest):
RuntimeError: error parsing fragment (1)
    nokogiri-1.5.5/lib/nokogiri/xml/node.rb:532:in `in_context'
    nokogiri-1.5.5/lib/nokogiri/xml/node.rb:532:in `parse'
    nokogiri-1.5.5/lib/nokogiri/html/document_fragment.rb:22:in `initialize'
    nokogiri-1.5.5/lib/nokogiri/xml/node.rb:508:in `new'
    nokogiri-1.5.5/lib/nokogiri/xml/node.rb:508:in `fragment'
    nokogiri-1.5.5/lib/nokogiri/xml/node.rb:925:in `coerce'
    nokogiri-1.5.5/lib/nokogiri/xml/node.rb:406:in `replace'
    test/html/pipeline/nokogiri_replace_test.rb:8:in `test_replace_utf8'

tmm1 · 2013-02-26T00:52:17Z

We use nokogiri 1.4.4 in .com. I wonder why it works there on 1.9.

jch · 2013-02-26T01:17:28Z

It works in .com with 1.9 because the monkey patch is still there.

On Monday, February 25, 2013, Aman Gupta wrote:

We use nokogiri 1.4.4 in .com. I wonder why it works there on 1.9.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/40#issuecomment-14086210
.

-Jerry
@whatcodecraves http://twitter.com/whatcodecraves
github https://github.com/jch

Don't monkeypatch Nokogiri in 1.9

8ac4a82

jch added a commit that referenced this pull request Feb 25, 2013

Merge pull request #40 from jch/no-nokogiri-monkeypatch-in-19

d650b19

Don't monkeypatch Nokogiri in 1.9

jch merged commit d650b19 into master Feb 25, 2013

atmos deleted the no-nokogiri-monkeypatch-in-19 branch February 27, 2013 03:48

rtomayko referenced this pull request Mar 4, 2013

release 0.0.9

98fc8dd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't monkeypatch Nokogiri in 1.9 #40

Don't monkeypatch Nokogiri in 1.9 #40

defunkt commented Feb 16, 2013

rtomayko commented Feb 16, 2013

rtomayko commented Feb 16, 2013

jch commented Feb 16, 2013

mtodd commented Feb 16, 2013

jch commented Feb 25, 2013

tmm1 commented Feb 25, 2013

rtomayko commented Feb 25, 2013

jch commented Feb 25, 2013

tmm1 commented Feb 25, 2013

jch commented Feb 25, 2013

jch commented Feb 25, 2013

rtomayko commented Feb 25, 2013

rtomayko commented Feb 25, 2013

jch commented Feb 26, 2013

tmm1 commented Feb 26, 2013

jch commented Feb 26, 2013

Don't monkeypatch Nokogiri in 1.9 #40

Don't monkeypatch Nokogiri in 1.9 #40

Conversation

defunkt commented Feb 16, 2013

rtomayko commented Feb 16, 2013

rtomayko commented Feb 16, 2013

jch commented Feb 16, 2013

mtodd commented Feb 16, 2013

jch commented Feb 25, 2013

tmm1 commented Feb 25, 2013

rtomayko commented Feb 25, 2013

jch commented Feb 25, 2013

tmm1 commented Feb 25, 2013

jch commented Feb 25, 2013

jch commented Feb 25, 2013

rtomayko commented Feb 25, 2013

rtomayko commented Feb 25, 2013

jch commented Feb 26, 2013

tmm1 commented Feb 26, 2013

jch commented Feb 26, 2013