Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't monkeypatch Nokogiri in 1.9 #40

Merged
merged 1 commit into from
Feb 25, 2013
Merged

Conversation

defunkt
Copy link
Contributor

@defunkt defunkt commented Feb 16, 2013

This monkeypatch doesn't seem necessary in 1.9 - without it everything works, and with it strings lose their encoding.

@rtomayko
Copy link
Contributor

OMG I'd love to remove this so much. Let me make sure I have a test in here about it.

@rtomayko
Copy link
Contributor

Looks like the test didn't make it over unfortunately or maybe I never wrote one? In any case, this or the github/github should blow up all over if it's a problem.

@jch
Copy link
Contributor

jch commented Feb 16, 2013

This would be so 🤘. I left it as a TODO because I didn't know what the cause of it was.

@mtodd
Copy link
Contributor

mtodd commented Feb 16, 2013

❤️

@jch
Copy link
Contributor

jch commented Feb 25, 2013

@defunkt do you have a minimal failing test case to go with this? I expected this to fail on 1.8, but it seems to run fine with or without the monkey patch.

# encoding: utf-8
require 'test_helper'

class NokogiriReplaceTest < Test::Unit::TestCase
  def test_replace_utf8
    s = 'Öregtagok'
    doc = Nokogiri::HTML.fragment('<div>listázása</div>')
    doc.children.first.children.first.replace(s)
    assert_equal '<div>Öregtagok</div>', doc.to_s
  end
end

@tmm1
Copy link
Contributor

tmm1 commented Feb 25, 2013

@jch IIRC the bug triggers when you create a fragment that has a top-level text node. i.e. without the <div> wrapper.

@rtomayko
Copy link
Contributor

Yep exactly. Removing the <div> should repro under 1.8.7.

@jch
Copy link
Contributor

jch commented Feb 25, 2013

Still no go:

# encoding: utf-8
require 'test_helper'

class NokogiriReplaceTest < Test::Unit::TestCase
  def test_replace_utf8
    s = 'Öregtagok'
    doc = Nokogiri::HTML.fragment('listázása')
    doc.children.first.replace(s)
    assert_equal 'Öregtagok', doc.to_s
  end
end

Inspecting doc yields:

#<Nokogiri::HTML::DocumentFragment:0x828c94e8 name="#document-fragment" children=[#<Nokogiri::XML::Text:0x828c922c "list\303\241z\303\241sa">]>

When you mean a top level text node, do you mean one that lives in just a NodeSet? I tried Nokogiri::HTML.parse, but that also creates a full document.

@tmm1
Copy link
Contributor

tmm1 commented Feb 25, 2013

That should repro.. maybe it depends on the nokogiri version also?

@jch
Copy link
Contributor

jch commented Feb 25, 2013

@tmm1 I'm using bundle exec, so it should be the same. I'll check the nokogiri version from when the patch was first introduced.

jch added a commit that referenced this pull request Feb 25, 2013
@jch jch merged commit d650b19 into master Feb 25, 2013
@jch
Copy link
Contributor

jch commented Feb 25, 2013

Yep, that was it. The fix was introduced sometime after 1.4.4. I'll add another travis build to test for this. Merging this.

@rtomayko
Copy link
Contributor

Interesting. Try with HTML::Pipeline.parse("Öregtagok") maybe:

https://github.com/jch/html-pipeline/blob/master/lib/html/pipeline.rb#L49-L58

Also from the original comments:

See 'utf-8 output 2' in user_content_test.rb for details.

  test "utf-8 output 2" do
    @comment.body = "And :-1: on the smileys – I prefer ASCII smileys :P"
    assert @comment.body_html.include?('–')
  end

https://github.com/github/github/blob/master/test/models/user_content_test.rb#L73-L76

@rtomayko
Copy link
Contributor

Awesome.

@jch
Copy link
Contributor

jch commented Feb 26, 2013

Unfortunately, this isn't quite over yet. Removing the monkey patch in ruby 1.9 raises an exception unless it's used with the latest version of nokogiri (1.5.6, I scripted running the test against all released versions starting from 1.4.4). This same exception also appears in ruby 1.8, but is also fixed by nokogiri 1.5.6.

test_replace_utf8(NokogiriReplaceTest):
RuntimeError: error parsing fragment (1)
    nokogiri-1.5.5/lib/nokogiri/xml/node.rb:532:in `in_context'
    nokogiri-1.5.5/lib/nokogiri/xml/node.rb:532:in `parse'
    nokogiri-1.5.5/lib/nokogiri/html/document_fragment.rb:22:in `initialize'
    nokogiri-1.5.5/lib/nokogiri/xml/node.rb:508:in `new'
    nokogiri-1.5.5/lib/nokogiri/xml/node.rb:508:in `fragment'
    nokogiri-1.5.5/lib/nokogiri/xml/node.rb:925:in `coerce'
    nokogiri-1.5.5/lib/nokogiri/xml/node.rb:406:in `replace'
    test/html/pipeline/nokogiri_replace_test.rb:8:in `test_replace_utf8'

@tmm1
Copy link
Contributor

tmm1 commented Feb 26, 2013

We use nokogiri 1.4.4 in .com. I wonder why it works there on 1.9.

@jch
Copy link
Contributor

jch commented Feb 26, 2013

It works in .com with 1.9 because the monkey patch is still there.

On Monday, February 25, 2013, Aman Gupta wrote:

We use nokogiri 1.4.4 in .com. I wonder why it works there on 1.9.


Reply to this email directly or view it on GitHubhttps://github.com//pull/40#issuecomment-14086210
.

-Jerry
@whatcodecraves http://twitter.com/whatcodecraves
github https://github.com/jch

@atmos atmos deleted the no-nokogiri-monkeypatch-in-19 branch February 27, 2013 03:48
rtomayko referenced this pull request Mar 4, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants