Skip to content

refactor(html): component parsing#8886

Merged
ematipico merged 6 commits intonextfrom
feat/comoponent-parsing
Jan 28, 2026
Merged

refactor(html): component parsing#8886
ematipico merged 6 commits intonextfrom
feat/comoponent-parsing

Conversation

@ematipico
Copy link
Member

@ematipico ematipico commented Jan 27, 2026

Summary

Closes #8880
Part of #8590

Note

Vibe coded PR, mostly. I had to intervene multiple times to fix incorrect implementation in the parser and the formatter.

The reason why useImportType couldn't work in some cases is that the HTML parser couldn't properly parse component tags.

This PR adds new nodes to the language so that HTML can parse components such as

<Component />
<Component ></Component>
<Comopnet.A />
// etc

After implementing the parser, I updated the document services to properly track the components.

Test Plan

Added new tests in various sections

Docs

Not needed

@changeset-bot
Copy link

changeset-bot bot commented Jan 27, 2026

⚠️ No Changeset found

Latest commit: 697bc09

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions bot added A-CLI Area: CLI A-Project Area: project A-Linter Area: linter A-Parser Area: parser A-Formatter Area: formatter A-Tooling Area: internal tools L-JavaScript Language: JavaScript and super languages L-HTML Language: HTML and super languages labels Jan 27, 2026
@codspeed-hq
Copy link

codspeed-hq bot commented Jan 27, 2026

CodSpeed Performance Report

Merging this PR will not alter performance

Comparing feat/comoponent-parsing (697bc09) with next (39e7fd9)

Summary

✅ 58 untouched benchmarks
⏩ 96 skipped benchmarks1

Footnotes

  1. 96 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@ematipico ematipico marked this pull request as ready for review January 27, 2026 13:11
@ematipico ematipico requested review from a team January 27, 2026 13:11
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 27, 2026

Walkthrough

This PR adds component-aware HTML support end-to-end: introduces HtmlComponentName and HtmlMemberName and the union AnyHtmlTagName; updates lexer, grammar and parser to recognise component/member tag names; adapts element/tag APIs and formatter rules to use AnyHtmlTagName; updates embedded value-reference extraction to track component names; threads snippet-level document services into linting paths; and adds tests for component/member tag handling and useImportType behaviour in templates.

Possibly related PRs

  • biomejs/biome PR 7869: Modifies HTML tag-name handling for component frameworks and touches parsing/formatting paths for component/member tag kinds.
  • biomejs/biome PR 8804: Changes element/tag-name helpers and element_ext accessors that are adjusted by this PR.
  • biomejs/biome PR 8824: Adds embedded value-reference tracking and useImportType template tests closely related to the new service wiring and tests.

Suggested reviewers

  • dyc3
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'refactor(html): component parsing' accurately describes the primary change, which is refactoring HTML parsing to support component-style tags.
Description check ✅ Passed The description clearly relates to the changeset by explaining the motivation (useImportType failures due to parser limitations), the solution (adding new nodes for component parsing), and references the linked issues.
Linked Issues check ✅ Passed The PR successfully implements the objective from issue #8880: the HTML parser now correctly recognises and produces nodes for component-style tags like and <Component.A />. Changes introduce HtmlComponentName and HtmlMemberName nodes, updated parsing logic, and extended document services to track components.
Out of Scope Changes check ✅ Passed All changes are directly scoped to component parsing support. Updates to formatters, analyse rules, and document services are all necessary to complete the component parsing feature. No unrelated refactoring or cleanup is present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/biome_html_parser/src/syntax/mod.rs (1)

220-230: Use the full component member name when matching closing tags.
Line 221 captures only the first segment before member parsing. For <Foo.Bar></Foo.Baz>, the mismatch can slip through because the later comparison only sees Foo. Consider deriving opening_tag_name from the parsed node so it includes the full member path.

🛠️ Suggested fix
-    let opening_tag_name = p.cur_text().to_string();
-    let should_be_self_closing = VOID_ELEMENTS
-        .iter()
-        .any(|tag| tag.eq_ignore_ascii_case(opening_tag_name.as_str()))
-        && !is_possible_component(p, opening_tag_name.as_str());
-    let is_embedded_language_tag = EMBEDDED_LANGUAGE_ELEMENTS
-        .iter()
-        .any(|tag| tag.eq_ignore_ascii_case(opening_tag_name.as_str()));
-
-    parse_any_tag_name(p).or_add_diagnostic(p, expected_element_name);
+    let tag_name = parse_any_tag_name(p).or_add_diagnostic(p, expected_element_name);
+    let opening_tag_name = tag_name
+        .ok()
+        .map(|name| name.text(p).to_string())
+        .unwrap_or_default();
+    let should_be_self_closing = VOID_ELEMENTS
+        .iter()
+        .any(|tag| tag.eq_ignore_ascii_case(opening_tag_name.as_str()))
+        && !is_possible_component(p, opening_tag_name.as_str());
+    let is_embedded_language_tag = EMBEDDED_LANGUAGE_ELEMENTS
+        .iter()
+        .any(|tag| tag.eq_ignore_ascii_case(opening_tag_name.as_str()));
🤖 Fix all issues with AI agents
In `@crates/biome_html_analyze/src/lint/nursery/no_sync_scripts.rs`:
- Around line 49-51: The comparison of the HTML tag name uses a case-sensitive
!= check on name_text (from token_text_trimmed()), which misses tags like
<SCRIPT>; change the condition to a case-insensitive check using
name_text.eq_ignore_ascii_case("script") (i.e., replace the current if name_text
!= "script" { return None; } with an inverted eq_ignore_ascii_case check) so the
rule treats tag names per HTML spec; locate the comparison near the name_text
binding in no_sync_scripts.rs and update it consistent with other rules like
use_vue_vapor.rs and no_script_url.rs.

In `@crates/biome_html_syntax/src/element_ext.rs`:
- Around line 230-252: The is_style_tag and is_script_tag helpers currently
treat AnyHtmlTagName case-insensitively and therefore match component names like
<Script> or <Style>; update both functions (is_style_tag and is_script_tag) to
first ensure the tag name is specifically an HtmlTagName (rather than
AnyHtmlTagName) before calling get_tag_name_text and doing the
eq_ignore_ascii_case check; use the opening_element().and_then(|el| el.name())
flow but pattern-match or inspect the returned name to confirm it's the
HtmlTagName variant (or return false otherwise) so only actual HTML script/style
tags trigger the embedded parsing logic.
🧹 Nitpick comments (2)
crates/biome_html_parser/src/lexer/mod.rs (1)

679-682: Doc comments slightly duplicated.

Lines 679-682 contain duplicated doc comment content. The first line (679) appears to be a remnant.

🧹 Remove duplicate doc comment line
-    /// Consumes an HTML tag name token starting with the given byte.
-    /// Tag names can contain alphanumeric characters, hyphens, colons and dots.
     /// Consumes an HTML tag name token starting with the given byte.
     /// Tag names can contain alphanumeric characters, hyphens, and colons.
     /// In component contexts (Vue/Svelte/Astro), dots are excluded and lexed separately.
     fn consume_tag_name(&mut self, first: u8) -> HtmlSyntaxKind {
crates/biome_service/src/workspace/document/services/embedded_value_references.rs (1)

233-242: Swap println! for dbg! in tests.
Keeps debug output consistent with the project convention.

Proposed change
-        println!("Diagnostics: {:?}", parsed.diagnostics());
-        println!("Has errors: {}", !parsed.diagnostics().is_empty());
+        dbg!(parsed.diagnostics());
+        dbg!(!parsed.diagnostics().is_empty());
As per coding guidelines, prefer `dbg!` for debug output in Rust tests.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@crates/biome_html_syntax/src/element_ext.rs`:
- Around line 15-37: Change get_tag_name_text to return a
SyntaxResult<TokenText> (not Option) and stop using .ok()? to convert Results
into Options; instead use ? to propagate parser errors and return
Ok(token.token_text_trimmed()) for HtmlTagName and HtmlComponentName. For
HtmlMemberName handle it explicitly by extracting the member's identifier token
via its proper accessor (e.g., name_token/value_token) and return its full token
text per the docstring (trimmed or full as required) as Ok(...). Then update
callers such as HtmlSelfClosingElement::is_void_element (and the other
occurrences mentioned) to accept the SyntaxResult from get_tag_name_text and
propagate the Result (or map to bool as needed) so the code compiles and member
names no longer return None.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@crates/biome_html_syntax/src/element_ext.rs`:
- Around line 15-28: get_tag_name_text currently returns None for
AnyHtmlTagName::HtmlMemberName, which contradicts the helper's docstring and
prevents component.member names from being surfaced; update the HtmlMemberName
arm in get_tag_name_text to extract and return the full member expression text
(similar to how HtmlTagName::HtmlTagName and
HtmlComponentName::HtmlComponentName obtain token_text_trimmed), by calling the
appropriate accessor on the HtmlMemberName node to get its value token/text and
returning token_text_trimmed (or the equivalent full-member text method) instead
of None so Component.Member names are preserved.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/biome_html_parser/src/syntax/mod.rs (1)

215-226: Opening tag name is captured too early for member tags.

Line 216 reads p.cur_text() before parse_any_tag_name; when . is tokenised (Vue/component contexts), that only captures the first segment, so the later contains check can accept </Component.B> for <Component.A>. Please derive the name from the parsed AnyHtmlTagName node (token text) and compare the full text; note that get_tag_name_text returns None for HtmlMemberName, so you’ll need the node’s token text. Based on learnings, HtmlMemberName does not provide a unified tag-name string.

🤖 Fix all issues with AI agents
In `@crates/biome_html_parser/src/syntax/mod.rs`:
- Around line 152-207: The member-name loop in parse_any_tag_name assumes the
initial tag tokens were lexed with component_name_context, but bump_with_context
sets the next token's lexing context and closing tags/non-Vue files may have the
`.` still inside HTML_LITERAL; fix parse_any_tag_name to explicitly re-lex the
tag name under component_name_context when is_possible_component returns true
(use component_name_context(p) to re-tokenise the base name and any member parts
so `.` becomes T![.] tokens), ensure the re-lex happens for both opening and
closing tags, and immediately reset the lexing context back to
HtmlLexContext::InsideTag (or the original inside_tag_context) before parsing
attributes so subsequent attribute tokens are not lexed as InsideTagVue
directives; reference p.bump_with_context, component_name_context,
HtmlLexContext::InsideTagVue, parse_any_tag_name and ensure the member loop uses
tokens produced by the re-lexed context.

Comment on lines +152 to +207
/// Returns the lexer context to use when parsing component names and member expressions.
/// This allows `.` to be lexed as a token for member expressions like Component.Member
/// We reuse InsideTagVue context because it supports `.` lexing, but this is ONLY used
/// for parsing component names, not for parsing attributes.
#[inline(always)]
fn component_name_context(p: &HtmlParser) -> HtmlLexContext {
if Vue.is_supported(p) || Astro.is_supported(p) || SingleTextExpressions.is_supported(p) {
// Use InsideTagVue for all component-supporting files when parsing component names
// This allows `.` to be lexed properly for member expressions
// Note: This is safe because we only use this context for tag names, not attributes
HtmlLexContext::InsideTagVue
} else {
HtmlLexContext::InsideTag
}
}

/// Parse a tag name, which returns AnyHtmlTagName (one of: HtmlTagName, HtmlComponentName, or HtmlMemberName)
/// This follows the JSX parser pattern for handling member expressions like Component.Member
fn parse_any_tag_name(p: &mut HtmlParser) -> ParsedSyntax {
if !is_at_start_literal(p) {
return Absent;
}

let tag_text = p.cur_text();

// Step 1: Parse base name (either component or regular tag)
let name = if is_possible_component(p, tag_text) {
// Parse as component name - use component_name_context to allow `.` for member expressions
let m = p.start();
p.bump_with_context(HTML_LITERAL, component_name_context(p));
Present(m.complete(p, HTML_COMPONENT_NAME))
} else {
// Parse as regular HTML tag
parse_literal(p, HTML_TAG_NAME)
};

// Step 2: Extend with member access if present (using .map() pattern from JSX parser)
name.map(|mut name| {
while p.at(T![.]) {
let m = name.precede(p); // Create marker BEFORE already-parsed name
p.bump_with_context(T![.], component_name_context(p)); // Use component context for `.`

// Parse member name - must use component_name_context to maintain `.` lexing
if is_at_start_literal(p) {
let member_m = p.start();
p.bump_with_context(HTML_LITERAL, component_name_context(p));
member_m.complete(p, HTML_TAG_NAME);
} else {
p.error(expected_element_name(p, p.cur_range()));
}

name = m.complete(p, HTML_MEMBER_NAME); // Wrap previous name
}
name
})
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Member component tags can be mis-tokenised outside Vue / in closing tags.

Line 180 onward assumes . is tokenised, but the tag-name token is still lexed under inside_tag_context / HtmlLexContext::InsideTag, so in Astro/SingleText—and in all closing tags—the . can stay inside HTML_LITERAL and the member loop never runs. Also, bump_with_context(..., component_name_context) sets the next token’s lexing context, so attributes after component tags may be tokenised as Vue directives in non‑Vue files. Consider re‑lexing the tag name with component_name_context and resetting to inside_tag_context before parsing attributes. Based on learnings, the context parameter affects the next token.

Also applies to: 307-307

🤖 Prompt for AI Agents
In `@crates/biome_html_parser/src/syntax/mod.rs` around lines 152 - 207, The
member-name loop in parse_any_tag_name assumes the initial tag tokens were lexed
with component_name_context, but bump_with_context sets the next token's lexing
context and closing tags/non-Vue files may have the `.` still inside
HTML_LITERAL; fix parse_any_tag_name to explicitly re-lex the tag name under
component_name_context when is_possible_component returns true (use
component_name_context(p) to re-tokenise the base name and any member parts so
`.` becomes T![.] tokens), ensure the re-lex happens for both opening and
closing tags, and immediately reset the lexing context back to
HtmlLexContext::InsideTag (or the original inside_tag_context) before parsing
attributes so subsequent attribute tokens are not lexed as InsideTagVue
directives; reference p.bump_with_context, component_name_context,
HtmlLexContext::InsideTagVue, parse_any_tag_name and ensure the member loop uses
tokens produced by the re-lexed context.

@ematipico ematipico requested a review from dyc3 January 27, 2026 16:51
@ematipico ematipico merged commit fda74af into next Jan 28, 2026
19 checks passed
@ematipico ematipico deleted the feat/comoponent-parsing branch January 28, 2026 05:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-CLI Area: CLI A-Formatter Area: formatter A-Linter Area: linter A-Parser Area: parser A-Project Area: project A-Tooling Area: internal tools L-HTML Language: HTML and super languages L-JavaScript Language: JavaScript and super languages

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants