Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML Parsing #69

Open
mnutt opened this issue Feb 13, 2022 · 3 comments
Open

HTML Parsing #69

mnutt opened this issue Feb 13, 2022 · 3 comments
Labels
enhancement New feature or request

Comments

@mnutt
Copy link

mnutt commented Feb 13, 2022

I was curious if you had considered exposing something like DOMParser, or some other HTML parsing interface. I think in edge computing people often use it for rewriting outgoing HTML, though I was hoping to use it just to extract a few HTML attributes.

It looks like rust has some decent HTML parsers (https://github.com/y21/tl) but in looking at the other blueboat interfaces exposed to v8 it seems like most are functional and don't hold any state, whereas the interface I was imagining might tokenize HTML in rust but also run queries in rust and just return the result to JS. But perhaps there's some better way to set up the interface?

@losfair
Copy link
Owner

losfair commented Feb 14, 2022

I considered an HTMLRewriter-like API backed by https://github.com/cloudflare/lol-html but a streaming rewriter doesn't feel as intuitive as the browser DOM; a proper browser-like DOM interface would be preferred.

Support for stateful native API was recently added to blueboat (123cc0c) so the DOM interface can be built on it. tl looks like a nice foundation for that!

@losfair losfair added the enhancement New feature or request label Feb 14, 2022
@mnutt
Copy link
Author

mnutt commented Feb 14, 2022

This is great! Agreed that DOM interface is much nicer to use.

@losfair
Copy link
Owner

losfair commented Feb 15, 2022

Basic DOM operations on HTML and XML documents are now implemented (#71).

The API looks like:

let dom = TextUtil.DOM.HTMLDOMNode.parse('<div><p class="some-class">Test</p></div>', { fragment: true });
dom.queryWithFilter({type: "hasClass", className: "some-class"}, elem => {
  const props = elem.get();
  props.attrs.push({name: "data-test", value: "42"});
  elem.update(props);
  return true;
});
new TextDecoder().decode(dom.serialize());

(not yet the final API, still need some design around it)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants