Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(parser/html): fix whitespace being lexed as html literal #3908

Merged

Conversation

dyc3
Copy link
Contributor

@dyc3 dyc3 commented Sep 15, 2024

Summary

This fixes the HTML lexer treating whitespace trivia as HTML_LITERAL when it shouldn't.

We end up creating less syntax nodes by treating all HTML text that would affect what is rendered on the screen as part of the same HTML_LITERAL. See the updated snapshots for examples of how this affects lexing and parsing.

This also significantly restructures the HTML lexer, and it should be a bit more straightforward.

Test Plan

Added/updated tests

@github-actions github-actions bot added A-Parser Area: parser L-HTML Language: HTML labels Sep 15, 2024
@dyc3 dyc3 requested review from a team September 15, 2024 22:20
@dyc3 dyc3 force-pushed the 09-12-fix_parser_html_fix_whitespace_being_lexed_as_html_literal branch from 272e65f to fb07a79 Compare September 15, 2024 22:42
Copy link

codspeed-hq bot commented Sep 15, 2024

CodSpeed Performance Report

Merging #3908 will not alter performance

Comparing 09-12-fix_parser_html_fix_whitespace_being_lexed_as_html_literal (7df8723) with main (300d6c6)

Summary

✅ 107 untouched benchmarks

@@ -150,17 +153,11 @@ fn element() {
}

#[test]
fn element_with_text() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this test changed for some reason?

Copy link
Contributor Author

@dyc3 dyc3 Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was removed because lexing it correctly requires changing contexts, which is not supported by the assert_lex! macro. It was replaced with other tests that specify the context upfront.

Comment on lines 323 to 331
/// Consume HTML text literals outside of tags.
///
/// This includes text and single spaces between words. If newline or a second
/// consecutive space is found, this will stop consuming and to allow the lexer to
/// switch to `consume_whitespace`.
fn consume_html_text(&mut self) -> HtmlSyntaxKind {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a link to the spec that justify having newlines and double spaces parsed differently? It would be useful to have that link in the docs of the function

Copy link
Contributor Author

@dyc3 dyc3 Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning is that trivia means that the token doesn't affect the functionality of the code. The primary function of an HTML document is to render in a web browser. If you remove a space between words in HTML text, it does not render the document invalid, and it changes how the document is rendered (aka its functionality), and is therefore not trivia. Multiple spaces are collapsed into a single space, so all those extra spaces are trivia.

@dyc3 dyc3 force-pushed the 09-12-fix_parser_html_fix_whitespace_being_lexed_as_html_literal branch from fb07a79 to 96929a5 Compare September 16, 2024 04:09
@dyc3 dyc3 force-pushed the 09-12-fix_parser_html_fix_whitespace_being_lexed_as_html_literal branch from 96929a5 to 7df8723 Compare September 16, 2024 04:14
@dyc3 dyc3 merged commit 4968fa5 into main Sep 16, 2024
15 checks passed
@dyc3 dyc3 deleted the 09-12-fix_parser_html_fix_whitespace_being_lexed_as_html_literal branch September 16, 2024 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Parser Area: parser L-HTML Language: HTML
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants