refactor(html): component parsing by ematipico · Pull Request #8886 · biomejs/biome

ematipico · 2026-01-27T09:32:35Z

Summary

Closes #8880
Part of #8590

Note

Vibe coded PR, mostly. I had to intervene multiple times to fix incorrect implementation in the parser and the formatter.

The reason why useImportType couldn't work in some cases is that the HTML parser couldn't properly parse component tags.

This PR adds new nodes to the language so that HTML can parse components such as

<Component />
<Component ></Component>
<Comopnet.A />
// etc

After implementing the parser, I updated the document services to properly track the components.

Test Plan

Added new tests in various sections

Docs

Not needed

changeset-bot · 2026-01-27T09:32:39Z

⚠️ No Changeset found

Latest commit: 697bc09

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

codspeed-hq · 2026-01-27T09:38:17Z

CodSpeed Performance Report

Merging this PR will not alter performance

_{Comparing feat/comoponent-parsing (697bc09) with next (39e7fd9)}

Summary

✅ 58 untouched benchmarks
⏩ 96 skipped benchmarks¹

96 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

coderabbitai · 2026-01-27T13:22:38Z

Walkthrough

This PR adds component-aware HTML support end-to-end: introduces HtmlComponentName and HtmlMemberName and the union AnyHtmlTagName; updates lexer, grammar and parser to recognise component/member tag names; adapts element/tag APIs and formatter rules to use AnyHtmlTagName; updates embedded value-reference extraction to track component names; threads snippet-level document services into linting paths; and adds tests for component/member tag handling and useImportType behaviour in templates.

Possibly related PRs

biomejs/biome PR 7869: Modifies HTML tag-name handling for component frameworks and touches parsing/formatting paths for component/member tag kinds.
biomejs/biome PR 8804: Changes element/tag-name helpers and element_ext accessors that are adjusted by this PR.
biomejs/biome PR 8824: Adds embedded value-reference tracking and useImportType template tests closely related to the new service wiring and tests.

Suggested reviewers

dyc3

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'refactor(html): component parsing' accurately describes the primary change, which is refactoring HTML parsing to support component-style tags.
Description check	✅ Passed	The description clearly relates to the changeset by explaining the motivation (useImportType failures due to parser limitations), the solution (adding new nodes for component parsing), and references the linked issues.
Linked Issues check	✅ Passed	The PR successfully implements the objective from issue `#8880`: the HTML parser now correctly recognises and produces nodes for component-style tags like and <Component.A />. Changes introduce HtmlComponentName and HtmlMemberName nodes, updated parsing logic, and extended document services to track components.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to component parsing support. Updates to formatters, analyse rules, and document services are all necessary to complete the component parsing feature. No unrelated refactoring or cleanup is present.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/biome_html_parser/src/syntax/mod.rs (1)

220-230: Use the full component member name when matching closing tags.
Line 221 captures only the first segment before member parsing. For <Foo.Bar></Foo.Baz>, the mismatch can slip through because the later comparison only sees Foo. Consider deriving opening_tag_name from the parsed node so it includes the full member path.

🛠️ Suggested fix

-    let opening_tag_name = p.cur_text().to_string();
-    let should_be_self_closing = VOID_ELEMENTS
-        .iter()
-        .any(|tag| tag.eq_ignore_ascii_case(opening_tag_name.as_str()))
-        && !is_possible_component(p, opening_tag_name.as_str());
-    let is_embedded_language_tag = EMBEDDED_LANGUAGE_ELEMENTS
-        .iter()
-        .any(|tag| tag.eq_ignore_ascii_case(opening_tag_name.as_str()));
-
-    parse_any_tag_name(p).or_add_diagnostic(p, expected_element_name);
+    let tag_name = parse_any_tag_name(p).or_add_diagnostic(p, expected_element_name);
+    let opening_tag_name = tag_name
+        .ok()
+        .map(|name| name.text(p).to_string())
+        .unwrap_or_default();
+    let should_be_self_closing = VOID_ELEMENTS
+        .iter()
+        .any(|tag| tag.eq_ignore_ascii_case(opening_tag_name.as_str()))
+        && !is_possible_component(p, opening_tag_name.as_str());
+    let is_embedded_language_tag = EMBEDDED_LANGUAGE_ELEMENTS
+        .iter()
+        .any(|tag| tag.eq_ignore_ascii_case(opening_tag_name.as_str()));

🤖 Fix all issues with AI agents

In `@crates/biome_html_analyze/src/lint/nursery/no_sync_scripts.rs`:
- Around line 49-51: The comparison of the HTML tag name uses a case-sensitive
!= check on name_text (from token_text_trimmed()), which misses tags like
<SCRIPT>; change the condition to a case-insensitive check using
name_text.eq_ignore_ascii_case("script") (i.e., replace the current if name_text
!= "script" { return None; } with an inverted eq_ignore_ascii_case check) so the
rule treats tag names per HTML spec; locate the comparison near the name_text
binding in no_sync_scripts.rs and update it consistent with other rules like
use_vue_vapor.rs and no_script_url.rs.

In `@crates/biome_html_syntax/src/element_ext.rs`:
- Around line 230-252: The is_style_tag and is_script_tag helpers currently
treat AnyHtmlTagName case-insensitively and therefore match component names like
<Script> or <Style>; update both functions (is_style_tag and is_script_tag) to
first ensure the tag name is specifically an HtmlTagName (rather than
AnyHtmlTagName) before calling get_tag_name_text and doing the
eq_ignore_ascii_case check; use the opening_element().and_then(|el| el.name())
flow but pattern-match or inspect the returned name to confirm it's the
HtmlTagName variant (or return false otherwise) so only actual HTML script/style
tags trigger the embedded parsing logic.

🧹 Nitpick comments (2)

crates/biome_html_parser/src/lexer/mod.rs (1)
679-682: Doc comments slightly duplicated.

Lines 679-682 contain duplicated doc comment content. The first line (679) appears to be a remnant.
🧹 Remove duplicate doc comment line
-    /// Consumes an HTML tag name token starting with the given byte.
-    /// Tag names can contain alphanumeric characters, hyphens, colons and dots.
     /// Consumes an HTML tag name token starting with the given byte.
     /// Tag names can contain alphanumeric characters, hyphens, and colons.
     /// In component contexts (Vue/Svelte/Astro), dots are excluded and lexed separately.
     fn consume_tag_name(&mut self, first: u8) -> HtmlSyntaxKind {
crates/biome_service/src/workspace/document/services/embedded_value_references.rs (1)
233-242: Swap println! for dbg! in tests.
Keeps debug output consistent with the project convention.
Proposed change
-        println!("Diagnostics: {:?}", parsed.diagnostics());
-        println!("Has errors: {}", !parsed.diagnostics().is_empty());
+        dbg!(parsed.diagnostics());
+        dbg!(!parsed.diagnostics().is_empty());
As per coding guidelines, prefer `dbg!` for debug output in Rust tests.

crates/biome_html_analyze/src/lint/nursery/no_sync_scripts.rs

crates/biome_html_syntax/src/element_ext.rs

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@crates/biome_html_syntax/src/element_ext.rs`:
- Around line 15-37: Change get_tag_name_text to return a
SyntaxResult<TokenText> (not Option) and stop using .ok()? to convert Results
into Options; instead use ? to propagate parser errors and return
Ok(token.token_text_trimmed()) for HtmlTagName and HtmlComponentName. For
HtmlMemberName handle it explicitly by extracting the member's identifier token
via its proper accessor (e.g., name_token/value_token) and return its full token
text per the docstring (trimmed or full as required) as Ok(...). Then update
callers such as HtmlSelfClosingElement::is_void_element (and the other
occurrences mentioned) to accept the SyntaxResult from get_tag_name_text and
propagate the Result (or map to bool as needed) so the code compiles and member
names no longer return None.

crates/biome_html_syntax/src/element_ext.rs

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@crates/biome_html_syntax/src/element_ext.rs`:
- Around line 15-28: get_tag_name_text currently returns None for
AnyHtmlTagName::HtmlMemberName, which contradicts the helper's docstring and
prevents component.member names from being surfaced; update the HtmlMemberName
arm in get_tag_name_text to extract and return the full member expression text
(similar to how HtmlTagName::HtmlTagName and
HtmlComponentName::HtmlComponentName obtain token_text_trimmed), by calling the
appropriate accessor on the HtmlMemberName node to get its value token/text and
returning token_text_trimmed (or the equivalent full-member text method) instead
of None so Component.Member names are preserved.

crates/biome_html_syntax/src/element_ext.rs

crates/biome_html_parser/src/syntax/mod.rs

crates/biome_html_parser/tests/quick_test.rs

crates/biome_html_formatter/tests/quick_test.rs

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/biome_html_parser/src/syntax/mod.rs (1)

215-226: Opening tag name is captured too early for member tags.

Line 216 reads p.cur_text() before parse_any_tag_name; when . is tokenised (Vue/component contexts), that only captures the first segment, so the later contains check can accept </Component.B> for <Component.A>. Please derive the name from the parsed AnyHtmlTagName node (token text) and compare the full text; note that get_tag_name_text returns None for HtmlMemberName, so you’ll need the node’s token text. Based on learnings, HtmlMemberName does not provide a unified tag-name string.

🤖 Fix all issues with AI agents

In `@crates/biome_html_parser/src/syntax/mod.rs`:
- Around line 152-207: The member-name loop in parse_any_tag_name assumes the
initial tag tokens were lexed with component_name_context, but bump_with_context
sets the next token's lexing context and closing tags/non-Vue files may have the
`.` still inside HTML_LITERAL; fix parse_any_tag_name to explicitly re-lex the
tag name under component_name_context when is_possible_component returns true
(use component_name_context(p) to re-tokenise the base name and any member parts
so `.` becomes T![.] tokens), ensure the re-lex happens for both opening and
closing tags, and immediately reset the lexing context back to
HtmlLexContext::InsideTag (or the original inside_tag_context) before parsing
attributes so subsequent attribute tokens are not lexed as InsideTagVue
directives; reference p.bump_with_context, component_name_context,
HtmlLexContext::InsideTagVue, parse_any_tag_name and ensure the member loop uses
tokens produced by the re-lexed context.

coderabbitai · 2026-01-27T15:19:20Z

crates/biome_html_parser/src/syntax/mod.rs

+/// Returns the lexer context to use when parsing component names and member expressions.
+/// This allows `.` to be lexed as a token for member expressions like Component.Member
+/// We reuse InsideTagVue context because it supports `.` lexing, but this is ONLY used
+/// for parsing component names, not for parsing attributes.
+#[inline(always)]
+fn component_name_context(p: &HtmlParser) -> HtmlLexContext {
+    if Vue.is_supported(p) || Astro.is_supported(p) || SingleTextExpressions.is_supported(p) {
+        // Use InsideTagVue for all component-supporting files when parsing component names
+        // This allows `.` to be lexed properly for member expressions
+        // Note: This is safe because we only use this context for tag names, not attributes
+        HtmlLexContext::InsideTagVue
+    } else {
+        HtmlLexContext::InsideTag
+    }
+}
+
+/// Parse a tag name, which returns AnyHtmlTagName (one of: HtmlTagName, HtmlComponentName, or HtmlMemberName)
+/// This follows the JSX parser pattern for handling member expressions like Component.Member
+fn parse_any_tag_name(p: &mut HtmlParser) -> ParsedSyntax {
+    if !is_at_start_literal(p) {
+        return Absent;
+    }
+
+    let tag_text = p.cur_text();
+
+    // Step 1: Parse base name (either component or regular tag)
+    let name = if is_possible_component(p, tag_text) {
+        // Parse as component name - use component_name_context to allow `.` for member expressions
+        let m = p.start();
+        p.bump_with_context(HTML_LITERAL, component_name_context(p));
+        Present(m.complete(p, HTML_COMPONENT_NAME))
+    } else {
+        // Parse as regular HTML tag
+        parse_literal(p, HTML_TAG_NAME)
+    };
+
+    // Step 2: Extend with member access if present (using .map() pattern from JSX parser)
+    name.map(|mut name| {
+        while p.at(T![.]) {
+            let m = name.precede(p); // Create marker BEFORE already-parsed name
+            p.bump_with_context(T![.], component_name_context(p)); // Use component context for `.`
+
+            // Parse member name - must use component_name_context to maintain `.` lexing
+            if is_at_start_literal(p) {
+                let member_m = p.start();
+                p.bump_with_context(HTML_LITERAL, component_name_context(p));
+                member_m.complete(p, HTML_TAG_NAME);
+            } else {
+                p.error(expected_element_name(p, p.cur_range()));
+            }
+
+            name = m.complete(p, HTML_MEMBER_NAME); // Wrap previous name
+        }
+        name
+    })
+}


⚠️ Potential issue | 🟠 Major

Member component tags can be mis-tokenised outside Vue / in closing tags.

Line 180 onward assumes . is tokenised, but the tag-name token is still lexed under inside_tag_context / HtmlLexContext::InsideTag, so in Astro/SingleText—and in all closing tags—the . can stay inside HTML_LITERAL and the member loop never runs. Also, bump_with_context(..., component_name_context) sets the next token’s lexing context, so attributes after component tags may be tokenised as Vue directives in non‑Vue files. Consider re‑lexing the tag name with component_name_context and resetting to inside_tag_context before parsing attributes. Based on learnings, the context parameter affects the next token.

Also applies to: 307-307

🤖 Prompt for AI Agents

In `@crates/biome_html_parser/src/syntax/mod.rs` around lines 152 - 207, The member-name loop in parse_any_tag_name assumes the initial tag tokens were lexed with component_name_context, but bump_with_context sets the next token's lexing context and closing tags/non-Vue files may have the `.` still inside HTML_LITERAL; fix parse_any_tag_name to explicitly re-lex the tag name under component_name_context when is_possible_component returns true (use component_name_context(p) to re-tokenise the base name and any member parts so `.` becomes T![.] tokens), ensure the re-lex happens for both opening and closing tags, and immediately reset the lexing context back to HtmlLexContext::InsideTag (or the original inside_tag_context) before parsing attributes so subsequent attribute tokens are not lexed as InsideTagVue directives; reference p.bump_with_context, component_name_context, HtmlLexContext::InsideTagVue, parse_any_tag_name and ensure the member loop uses tokens produced by the re-lexed context.

refactor(html): component parsing

d3c9241

github-actions bot added A-CLI Area: CLI A-Project Area: project A-Linter Area: linter A-Parser Area: parser A-Formatter Area: formatter A-Tooling Area: internal tools L-JavaScript Language: JavaScript and super languages L-HTML Language: HTML and super languages labels Jan 27, 2026

clippy and more tests

97f5a0b

ematipico marked this pull request as ready for review January 27, 2026 13:11

ematipico requested review from a team January 27, 2026 13:11

coderabbitai bot reviewed Jan 27, 2026

View reviewed changes

crates/biome_html_analyze/src/lint/nursery/no_sync_scripts.rs Show resolved Hide resolved

crates/biome_html_syntax/src/element_ext.rs Show resolved Hide resolved

address feedback

ff44678

coderabbitai bot reviewed Jan 27, 2026

View reviewed changes

crates/biome_html_syntax/src/element_ext.rs Outdated Show resolved Hide resolved

fix fix fix

f60ef66

coderabbitai bot reviewed Jan 27, 2026

View reviewed changes

crates/biome_html_syntax/src/element_ext.rs Show resolved Hide resolved

dyc3 reviewed Jan 27, 2026

View reviewed changes

crates/biome_html_parser/src/syntax/mod.rs Outdated Show resolved Hide resolved

crates/biome_html_parser/tests/quick_test.rs Show resolved Hide resolved

crates/biome_html_formatter/tests/quick_test.rs Show resolved Hide resolved

feddback

941d952

coderabbitai bot reviewed Jan 27, 2026

View reviewed changes

feedback

697bc09

ematipico requested a review from dyc3 January 27, 2026 16:51

dyc3 approved these changes Jan 27, 2026

View reviewed changes

ematipico merged commit fda74af into next Jan 28, 2026
19 checks passed

ematipico deleted the feat/comoponent-parsing branch January 28, 2026 05:55

coderabbitai bot mentioned this pull request Jan 28, 2026

feat(html): improved parsing spread attributes #8894

Merged

Netail mentioned this pull request Jan 30, 2026

feat(html_parser): add name reference identifier #8827

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(html): component parsing#8886

refactor(html): component parsing#8886
ematipico merged 6 commits intonextfrom
feat/comoponent-parsing

ematipico commented Jan 27, 2026 •

edited

Loading

Uh oh!

changeset-bot bot commented Jan 27, 2026 •

edited

Loading

Uh oh!

codspeed-hq bot commented Jan 27, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Jan 27, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ematipico commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Docs

Uh oh!

changeset-bot bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

codspeed-hq bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging this PR will not alter performance

Summary

Footnotes

Uh oh!

coderabbitai bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ematipico commented Jan 27, 2026 •

edited

Loading

changeset-bot bot commented Jan 27, 2026 •

edited

Loading

codspeed-hq bot commented Jan 27, 2026 •

edited

Loading

coderabbitai bot commented Jan 27, 2026 •

edited

Loading