-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML detector function #75
Comments
Another similar use case might be "strip HTML and keep only the text", e.g. go from |
Just use the regular mode and set the Maybe we should add a |
Right, that works. I thought that would remove the text in those element, but it looks like it's kept. |
It seems that
|
You need to parse it as a document fragment instead of as the whole document. That'll prevent html5ever from inferring the document structure. |
@notriddle That happens even with |
That's annoying. It's supposed to be parsing the fragment as if it was inside a |
I opened servo/html5ever#323 about this. |
We should probably use https://play.rust-lang.org/?gist=dd917d5f859ac115971d8355e5ab3dd2&version=stable instead, then. I was wrong about fragment parsers. Sorry. |
Just a small warning. The code I whipped up is pretty basic, feel free to change it around. E.g. remove |
139: Detect html (see #75) r=notriddle a=mozfreddyb Having looked at the TokenSink example in #75, this seemed mostly straightfoward. I found a couple of other nits along my way that I figured I might just pick up and do within this pull request, but I'm happy to drop them or turn them into individual PRs, if I must. Co-authored-by: Frederik Braun <[email protected]> Co-authored-by: Michael Howell <[email protected]>
157: Add empty() constructor r=notriddle a=nrempel Hey there, This pull request adds a new `empty` constructor. This was mentioned in [#75](#75 (comment)) but never implemented. I need this so that I can obtain `&Builder` and not `&mut Builder` which is returned when modifying the tags with `tags()`. I'm happy to add an additional test as well if you like but the doctest seemed sufficient to me. What do you think about this change? Thanks. Co-authored-by: Nicholas Rempel <[email protected]>
157: Add empty() constructor r=notriddle a=nrempel Hey there, This pull request adds a new `empty` constructor. This was mentioned in [#75](#75 (comment)) but never implemented. I need this so that I can obtain `&Builder` and not `&mut Builder` which is returned when modifying the tags with `tags()`. I'm happy to add an additional test as well if you like but the doctest seemed sufficient to me. What do you think about this change? Thanks. Co-authored-by: Nicholas Rempel <[email protected]>
As nice as it is to be able to remove the blatantly bad stuff, sometimes you don't want the user to be able to enter any HTML at all. You could do this by escaping the markup, but if having a database with
<
in it doesn't appeal to you, or you're worried about double-escaping or similarly nasty accidents, you could use a function that just tells you if a string has any HTML tags in it. And in case you wonder why anybody would want a library for that, it's not actually that easy to detect HTML without any false positives or false negatives.We should be able to do this pretty easily:
The text was updated successfully, but these errors were encountered: