fix(parser/html): fix whitespace being lexed as html literal #3908

dyc3 · 2024-09-15T22:20:00Z

Summary

This fixes the HTML lexer treating whitespace trivia as HTML_LITERAL when it shouldn't.

We end up creating less syntax nodes by treating all HTML text that would affect what is rendered on the screen as part of the same HTML_LITERAL. See the updated snapshots for examples of how this affects lexing and parsing.

This also significantly restructures the HTML lexer, and it should be a bit more straightforward.

Test Plan

Added/updated tests

codspeed-hq · 2024-09-15T22:54:34Z

CodSpeed Performance Report

Merging #3908 will not alter performance

_{Comparing 09-12-fix_parser_html_fix_whitespace_being_lexed_as_html_literal (7df8723) with main (300d6c6)}

Summary

✅ 107 untouched benchmarks

ematipico · 2024-09-16T03:32:00Z

crates/biome_html_parser/src/lexer/tests.rs

@@ -150,17 +153,11 @@ fn element() {
 }

 #[test]
-fn element_with_text() {


Was this test changed for some reason?

This test was removed because lexing it correctly requires changing contexts, which is not supported by the assert_lex! macro. It was replaced with other tests that specify the context upfront.

ematipico · 2024-09-16T03:38:42Z

crates/biome_html_parser/src/lexer/mod.rs

+    /// Consume HTML text literals outside of tags.
+    ///
+    /// This includes text and single spaces between words. If newline or a second
+    /// consecutive space is found, this will stop consuming and to allow the lexer to
+    /// switch to `consume_whitespace`.
+    fn consume_html_text(&mut self) -> HtmlSyntaxKind {


Is there a link to the spec that justify having newlines and double spaces parsed differently? It would be useful to have that link in the docs of the function

I found these and added them to the doc comment:

https://html.spec.whatwg.org/#space-separated-tokens

https://infra.spec.whatwg.org/#strip-leading-and-trailing-ascii-whitespace

My reasoning is that trivia means that the token doesn't affect the functionality of the code. The primary function of an HTML document is to render in a web browser. If you remove a space between words in HTML text, it does not render the document invalid, and it changes how the document is rendered (aka its functionality), and is therefore not trivia. Multiple spaces are collapsed into a single space, so all those extra spaces are trivia.

github-actions bot added A-Parser Area: parser L-HTML Language: HTML labels Sep 15, 2024

dyc3 requested review from a team September 15, 2024 22:20

dyc3 force-pushed the 09-12-fix_parser_html_fix_whitespace_being_lexed_as_html_literal branch from 272e65f to fb07a79 Compare September 15, 2024 22:42

ematipico reviewed Sep 16, 2024

View reviewed changes

dyc3 force-pushed the 09-12-fix_parser_html_fix_whitespace_being_lexed_as_html_literal branch from fb07a79 to 96929a5 Compare September 16, 2024 04:09

fix(parser/html): fix whitespace being lexed as html literal

7df8723

dyc3 force-pushed the 09-12-fix_parser_html_fix_whitespace_being_lexed_as_html_literal branch from 96929a5 to 7df8723 Compare September 16, 2024 04:14

ematipico approved these changes Sep 16, 2024

View reviewed changes

dyc3 merged commit 4968fa5 into main Sep 16, 2024
15 checks passed

dyc3 deleted the 09-12-fix_parser_html_fix_whitespace_being_lexed_as_html_literal branch September 16, 2024 11:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(parser/html): fix whitespace being lexed as html literal #3908

fix(parser/html): fix whitespace being lexed as html literal #3908

dyc3 commented Sep 15, 2024 •

edited

Loading

codspeed-hq bot commented Sep 15, 2024 •

edited

Loading

ematipico Sep 16, 2024

dyc3 Sep 16, 2024 •

edited

Loading

ematipico Sep 16, 2024

dyc3 Sep 16, 2024 •

edited

Loading

dyc3 Sep 16, 2024

fix(parser/html): fix whitespace being lexed as html literal #3908

fix(parser/html): fix whitespace being lexed as html literal #3908

Conversation

dyc3 commented Sep 15, 2024 • edited Loading

Summary

Test Plan

codspeed-hq bot commented Sep 15, 2024 • edited Loading

CodSpeed Performance Report

Merging #3908 will not alter performance

Summary

ematipico Sep 16, 2024

Choose a reason for hiding this comment

dyc3 Sep 16, 2024 • edited Loading

Choose a reason for hiding this comment

ematipico Sep 16, 2024

Choose a reason for hiding this comment

dyc3 Sep 16, 2024 • edited Loading

Choose a reason for hiding this comment

dyc3 Sep 16, 2024

Choose a reason for hiding this comment

dyc3 commented Sep 15, 2024 •

edited

Loading

codspeed-hq bot commented Sep 15, 2024 •

edited

Loading

dyc3 Sep 16, 2024 •

edited

Loading

dyc3 Sep 16, 2024 •

edited

Loading