-
Notifications
You must be signed in to change notification settings - Fork 3.5k
filter/xml fix for LOGSTASH-2246: extract non-ascii content with xpath #1790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
f86287d to
a536eef
Compare
|
I can confirm that without the patch to filters/xml.rb, the test you added fails (as expected): |
|
I am reviewing to see if the mysterious |
|
Nokogiri::XML is shorthand for Nokogiri::XML::Document.parse(), which has this method signature:
|
Definitely fails! Nasty. I think the parse encoding is the way to fix this: |
Improves upon elastic#1790
|
jordansissel@d5dfc5a is my proposal to fix this. It appears to work and in testing this still passes your new specs. Woo! |
|
Closing in favor of #1803 |
|
@wiibaa thank you for figuring out the problem and solving it! <3 |
|
@jordansissel thanks you but the credits should go to the original reporter in jira. |
|
thanks you. I've tried to submit it to #1804. ------------------ 原始邮件 ------------------ 主题: Re: [logstash] filter/xml fix for LOGSTASH-2246: extract non-asciicontent with xpath (#1790) @jordansissel thanks you but the credits should go to the original reporter in jira. — |
As reported in LOGSTASH-2246, xml filter xpath fails with non-ascii content,
here is a test case and fix.
It is due to Nokogiri to_s calling to_xhml or to_xml without encoding parameters.
using the to_str method extract the content differently (hidden in the C or Java implementation) and do not suffer encoding issue