Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passed content must be valid XML to be filtered #100

Closed
benbalter opened this issue Jan 19, 2014 · 7 comments
Closed

Passed content must be valid XML to be filtered #100

benbalter opened this issue Jan 19, 2014 · 7 comments

Comments

@benbalter
Copy link
Contributor

Right now HTML::Pipeline::MentionFilter.new "test @benbalter test" will return the input string, while filter = HTML::Pipeline::MentionFilter.new "<p>test @benbalter test</p>" will return the expected @mentioned string.

I believe this is due to the doc.search('text()') pattern. Would be awesome if html-pipeline could support arbitrary strings, as right now I believe the input must be HTML, or the first filter must be the markdown filter for the expected behavior to occur.

At the very least, documentation could help clear things up for new users.

@jch
Copy link
Contributor

jch commented Jan 20, 2014

@benbalter good catch. I think was an assumption of how our filters were chained internally. If you create pull request changing the mention filter to work on arbitrary text, I'd be happy to review it.

@benbalter
Copy link
Contributor Author

I believe this is going to affect any filter that uses doc.search('text()'. FWIW, I ran into it with the EmojiFilter. My work around was to wrap the string with an arbitrary tag e.g., <emoji_filter>#{input}</emoji_filter>, which forced Nokogiri to treat the string as XML, and then return output.search('emoji_filter').children.as_xml. Is there a saner way?

@simeonwillbanks
Copy link
Contributor

@benbalter Thanks for reporting this issue.

Please, can you provide your app's Nokogiri::VERSION? Maybe Nokogiri::HTML::DocumentFragment is part of the problem.

@jch
Copy link
Contributor

jch commented Jan 22, 2014

@benbalter I understand what you're trying to do now. I think you want to add PlainTextInputFilter to your pipeline. It's a filter that normalizes text input into a html document for filters that expect it. Does that answer your question?

@benbalter
Copy link
Contributor Author

I think you want to add PlainTextInputFilter to your pipeline.

If I'm reading that correctly, that will escape all HTML. I'd want to be able to support any arbitrary string, especially markdown, which may contain HTML, or even more likely, may contain block quotes, for example, which would get HTML entity'd.

can you provide your app's Nokogiri::VERSION

Developing against Nokogiri 1.6.1

@benbalter
Copy link
Contributor Author

Here's the explicit test case, if it helps to articulate a bit better:

[9] pry(main)> s = HTML::Pipeline::MentionFilter.new("test @benbalter test").call.to_xml
=> "test @benbalter test"
[10] pry(main)> s = HTML::Pipeline::MentionFilter.new("<some_tag>test @benbalter test</some_tag>").call.to_xml
=> "<some_tag>test <a href=\"/benbalter\" class=\"user-mention\">@benbalter</a> test</some_tag>"
[11] pry(main)> s = HTML::Pipeline::MentionFilter.new("<some_tag>test @benbalter test</some_tag>").call.search('some_tag').children.to_xml
=> "test <a href=\"/benbalter\" class=\"user-mention\">@benbalter</a> test"

@jch
Copy link
Contributor

jch commented Jan 22, 2014

I'd want to be able to support any arbitrary string, especially markdown, which may contain HTML, or even more likely, may contain block quotes, for example, which would get HTML entity's.

This is a good point. For legacy reasons, the filter was written to only handle HTML input. I'm open to changing it, but it would change the existing behavior and require a major release. We'd also need to profile the performance of switching it to just regexes instead of using nokogiri to find text nodes. I don't think that will be too bad, but something to keep in mind.

It's not a high priority for the project right now, but if you'd like to hack on it, we'd appreciate a pull request 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants