Skip to content

refactor(core): embed detectors#9416

Merged
ematipico merged 4 commits intomainfrom
refactor/embeds-as-detectors
Mar 9, 2026
Merged

refactor(core): embed detectors#9416
ematipico merged 4 commits intomainfrom
refactor/embeds-as-detectors

Conversation

@ematipico
Copy link
Member

@ematipico ematipico commented Mar 9, 2026

Summary

Closes #9131
Closes #9112
Closes #9166

This PR refines the detection of embedded snippets to be language-agnostic and const-friendly.

The new code completely abstracts from DocumentFileService, and creates a few new concepts:

  • We have the concept of HostLanguage - the language that can contain embeds - and GuestLanguage - the language that is hosted as a snippet
  • EmbedDetector which allows the creation of "detectors". They describe where, inside a host language, we can find possible embeds. They are generic constructs. They don't have logic on their own, other than exposing a function called try_match which runs the logic against a candidate. The detector expose the concept of target, which can be either static or dynamic. The dynamic detector was created to fill cases where we have <script lang="ts>, for example.
  • We have EmbedDetectorsRegistry, which is essentially the repository of detectors, and we use detect_match to determine if an EmbedCandidate is a hit.
  • EmbedCandidate is a type that says "here we could have a possible embedded snippet". Contains a bunch of information that are needed to run a match, and eventually create an embedded snippet. The candidate is like a counterpart of the detector, but it's dynamic, because it needs to be created when we inspect the AST of a host language.
  • Eventually, we have detect_match which returns an EmbedMatch which contains the GuestLanguage, which is used to tell the workspace "that's how you need to parse the snippet".

I used AI to write the trivial code and docstring, but I wrote the whole architecture. Let me know if anything isn't clear.

Test Plan

Current tests should pass

Docs

N/A

@changeset-bot
Copy link

changeset-bot bot commented Mar 9, 2026

🦋 Changeset detected

Latest commit: ddbe360

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 13 packages
Name Type
@biomejs/biome Patch
@biomejs/cli-win32-x64 Patch
@biomejs/cli-win32-arm64 Patch
@biomejs/cli-darwin-x64 Patch
@biomejs/cli-darwin-arm64 Patch
@biomejs/cli-linux-x64 Patch
@biomejs/cli-linux-arm64 Patch
@biomejs/cli-linux-x64-musl Patch
@biomejs/cli-linux-arm64-musl Patch
@biomejs/wasm-web Patch
@biomejs/wasm-bundler Patch
@biomejs/wasm-nodejs Patch
@biomejs/backend-jsonrpc Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@ematipico ematipico changed the title refactor: embed detectors refactor(core): embed detectors Mar 9, 2026
@ematipico ematipico requested review from a team March 9, 2026 14:53
@github-actions github-actions bot added the A-Project Area: project label Mar 9, 2026
@Netail
Copy link
Member

Netail commented Mar 9, 2026

Would this resolve #9131, #9112 & #9166?

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 9, 2026

Walkthrough

Adds a registry-driven embed detection system and types: HostLanguage, GuestLanguage, EmbedCandidate/EmbedContent, TemplateTagKind; introduces a const-constructible EmbedDetector/EmbedTarget with pattern matching and dynamic resolution; provides EmbedDetectorsRegistry exposing host-specific detector arrays and a detect_match entrypoint; refactors HTML and JavaScript file handlers to build EmbedCandidates, query the registry, and dispatch parsing for matched embeds. No public API changes.

Possibly related PRs

Suggested labels

A-Tooling, A-Parser, A-Formatter, L-JavaScript, L-HTML

Suggested reviewers

  • dyc3
  • arendjr
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'refactor(core): embed detectors' directly summarises the main architectural change—introducing a new detector-based system for identifying embedded code snippets.
Description check ✅ Passed The description is well-related to the changeset, explaining the new HostLanguage/GuestLanguage concepts, EmbedDetector, EmbedDetectorsRegistry, and EmbedCandidate—all of which are present in the PR changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch refactor/embeds-as-detectors

Comment @coderabbitai help to get the list of available commands and usage tips.

@ematipico
Copy link
Member Author

ematipico commented Mar 9, 2026

Would this resolve #9131, #9112 & #9166?

I don't know, I haven't looked at those issues. I did think about adding the "indentation strategy" though, but I left it out for the time being.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/biome_service/src/file_handlers/html.rs (1)

546-564: Clarification: First-JS-capture strategy

The code captures embedded_file_source from the first successful JS parse (Line 559-561). If a Vue file has multiple <script> tags with different lang attributes (e.g., one plain JS and one TS), only the first source type is used for subsequent text expressions. This appears intentional for consistency, but worth documenting if not already done elsewhere.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_service/src/file_handlers/html.rs` around lines 546 - 564, The
code currently sets embedded_file_source from the first successful JS parse (see
JsFileSource::js_module() initialization and the assignment inside the loop
where parse_matched_embed returns parsed.js_file_source), which means in
multi-<script> Vue files subsequent scripts with different lang (e.g., TS) are
ignored; either document this first-js-capture strategy clearly above the loop
(mention build_html_candidate, EmbedDetectorsRegistry::detect_match,
parse_matched_embed, elements, nodes, and ctx) so future readers know this is
intentional, or change the logic to choose the most specific/accurate source
(e.g., prefer TS over JS) by comparing parsed.js_file_source values instead of
taking the first—update comments/tests accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/biome_service/src/file_handlers/html.rs`:
- Around line 546-564: The code currently sets embedded_file_source from the
first successful JS parse (see JsFileSource::js_module() initialization and the
assignment inside the loop where parse_matched_embed returns
parsed.js_file_source), which means in multi-<script> Vue files subsequent
scripts with different lang (e.g., TS) are ignored; either document this
first-js-capture strategy clearly above the loop (mention build_html_candidate,
EmbedDetectorsRegistry::detect_match, parse_matched_embed, elements, nodes, and
ctx) so future readers know this is intentional, or change the logic to choose
the most specific/accurate source (e.g., prefer TS over JS) by comparing
parsed.js_file_source values instead of taking the first—update comments/tests
accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 726e3e85-62ca-4073-99d9-e920bb1e521f

📥 Commits

Reviewing files that changed from the base of the PR and between d212a35 and bf2001f.

📒 Files selected for processing (2)
  • crates/biome_service/src/file_handlers/html.rs
  • crates/biome_service/src/file_handlers/javascript.rs

@Netail
Copy link
Member

Netail commented Mar 9, 2026

Would this resolve #9131, #9112 & #9166?

I don't know, I haven't looked at those issues. I did think about adding the "indentation strategy" though, but I left it out for the time being.

Basically when the source also contains a non-embedded template string, shit hits the fan.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_service/src/file_handlers/html.rs`:
- Around line 850-868: The loop only handles AnySvelteDirective which excludes
Svelte "on:" event directives, so event handlers like on:click are never parsed;
update the branch in the Pass 4 loop to also detect and handle Svelte event
directives by attempting a cast for the event-specific node (e.g.,
AnySvelteEventDirective or AnySvelteOnDirective depending on your syntax types)
in addition to AnySvelteDirective, then feed that directive's initializer
through build_svelte_directive_candidate, EmbedDetectorsRegistry::detect_match,
and parse_matched_embed exactly as existing code does (push parsed.node into
nodes) so "on:" handlers are parsed/linted/formatted like other directives.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ab92c01c-aa97-49ba-986f-f4dec067e920

📥 Commits

Reviewing files that changed from the base of the PR and between bf2001f and 1c31136.

📒 Files selected for processing (1)
  • crates/biome_service/src/file_handlers/html.rs

@ematipico
Copy link
Member Author

Would this resolve #9131, #9112 & #9166?

I don't know, I haven't looked at those issues. I did think about adding the "indentation strategy" though, but I left it out for the time being.

Basically when the source also contains a non-embedded template string, shit hits the fan.

Oh, ok. No, the issue belongs to the formatter and overlaps with the embedded snippet range. There's some issue there. I can fix it.

@github-actions github-actions bot added A-Formatter Area: formatter L-JavaScript Language: JavaScript and super languages labels Mar 9, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/biome_js_formatter/src/js/auxiliary/template_chunk_element.rs (1)

51-58: Consider centralizing embed detector names to prevent silent formatting failures.

The hardcoded constants KNOWN_EMBED_TAGS and KNOWN_EMBED_OBJECTS duplicate the detector definitions from JS_DETECTORS, creating a cross-crate maintenance burden. If someone adds a detector without updating these constants, embeds will silently fail to format.

Given that EmbedDetector types and JS_DETECTORS are not publicly exported from biome_service, consider either:

  • Adding cross-references in JS_DETECTORS comments pointing to where these constants are duplicated
  • Exporting detector definitions to allow biome_js_formatter to derive these constants

At minimum, a test validating alignment would catch future drift.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_js_formatter/src/js/auxiliary/template_chunk_element.rs` around
lines 51 - 58, The duplicated hardcoded arrays KNOWN_EMBED_TAGS and
KNOWN_EMBED_OBJECTS in template_chunk_element.rs can drift from the source
JS_DETECTORS in biome_service; update the crate to remove duplication by either
(1) importing/exporting the detector definitions from biome_service (expose
JS_DETECTORS or a small public list of tag/object names) and derive
KNOWN_EMBED_TAGS/KNOWN_EMBED_OBJECTS from that, or (2) if exporting is not
possible, add a unit/integration test that iterates JS_DETECTORS (or the
internal EmbedDetector definitions in biome_service) and asserts exact equality
with these constants (and add a clear comment linking JS_DETECTORS to these
constants) so future changes fail the test; reference the symbols
KNOWN_EMBED_TAGS, KNOWN_EMBED_OBJECTS, JS_DETECTORS, and EmbedDetector when
implementing the fix.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/biome_js_formatter/src/js/auxiliary/template_chunk_element.rs`:
- Around line 51-58: The duplicated hardcoded arrays KNOWN_EMBED_TAGS and
KNOWN_EMBED_OBJECTS in template_chunk_element.rs can drift from the source
JS_DETECTORS in biome_service; update the crate to remove duplication by either
(1) importing/exporting the detector definitions from biome_service (expose
JS_DETECTORS or a small public list of tag/object names) and derive
KNOWN_EMBED_TAGS/KNOWN_EMBED_OBJECTS from that, or (2) if exporting is not
possible, add a unit/integration test that iterates JS_DETECTORS (or the
internal EmbedDetector definitions in biome_service) and asserts exact equality
with these constants (and add a clear comment linking JS_DETECTORS to these
constants) so future changes fail the test; reference the symbols
KNOWN_EMBED_TAGS, KNOWN_EMBED_OBJECTS, JS_DETECTORS, and EmbedDetector when
implementing the fix.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: dda92488-4002-4e84-a000-93cced4d4beb

📥 Commits

Reviewing files that changed from the base of the PR and between 1c31136 and ddbe360.

⛔ Files ignored due to path filters (1)
  • crates/biome_service/src/workspace/snapshots/biome_service__workspace__server__tests__issue_9131.snap is excluded by !**/*.snap and included by **
📒 Files selected for processing (3)
  • .changeset/fix-embedded-template-crash.md
  • crates/biome_js_formatter/src/js/auxiliary/template_chunk_element.rs
  • crates/biome_service/src/workspace/server.tests.rs
✅ Files skipped from review due to trivial changes (1)
  • .changeset/fix-embedded-template-crash.md

@codspeed-hq
Copy link

codspeed-hq bot commented Mar 9, 2026

Merging this PR will not alter performance

✅ 58 untouched benchmarks
⏩ 156 skipped benchmarks1


Comparing refactor/embeds-as-detectors (ddbe360) with main (69e1973)2

Open in CodSpeed

Footnotes

  1. 156 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

  2. No successful run was found on main (71c7df6) during the generation of this report, so 69e1973 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@ematipico
Copy link
Member Author

Done, it's fixed @Netail :)

@Netail
Copy link
Member

Netail commented Mar 9, 2026

Done, it's fixed @Netail :)

Legend 🙌

Copy link
Contributor

@dyc3 dyc3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach makes a bit more sense. LGTM

@ematipico ematipico merged commit f2581b8 into main Mar 9, 2026
18 checks passed
@ematipico ematipico deleted the refactor/embeds-as-detectors branch March 9, 2026 18:08
@github-actions github-actions bot mentioned this pull request Mar 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Formatter Area: formatter A-Project Area: project L-JavaScript Language: JavaScript and super languages

Projects

None yet

3 participants