Replies: 1 comment 2 replies
-
Of course, I'd like everybody's input on this :-) |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Currently, the parser is able to parse documents, but isn't capable of doing fragment cases.
A fragment case could be a part between "template" tags or other parts where we need to parse html5 without it being a full document. The html5lib tests will test this with the
#document-fragment
keyword, which indicates that the document to be parsed is not a full document, but a fragment case. Therefor it should a little bit different than the regular parser (not all "modes" are supported).Our html5lib parser currently has a
is_fragment_case
flag that indicates whether or not the parse is in fragment case mode or not. This is not completely implemented but most of the work is checking at the right places to see if this flag is set, and act a little bit differently or jump to another state.The current html5parser works like this:
html5parser::new
parse()
function with a document handle where the parser will add the nodes into.This current system is not able to deal with fragment cases correctly.
I would propose a small change in setup:
new()
without a input streamparse()
with a token stream and document(handle) when we want to parse a regular document.parse_fragment(0
with a token stream, a document(handle) to a document-fragment, and a context nodeId.@emwalker I think this interferes a bit with your idea's on setting up a pipeline from bytestream to parser? We currently have a tokenizer being initialized in the html5parser, so probably we want to do this outside the parser itself, something like:
I'm not sure this is the way you have envisioned.. I think we can keep the bytestream -> encoding -> tokenizer -> parser pipeline through interfaces/traits, but have the parser tie everything together.
@CharlesChen0823 Would you be ok to change the initial
new()
andparse()
methods from the parser, and add aparse_fragment()
as well.. (and move the input stream toparse()
instead of thenew()
Beta Was this translation helpful? Give feedback.
All reactions