feat: svelte html super language support by gagansuie · Pull Request #6724 · biomejs/biome

gagansuie · 2025-07-05T19:40:17Z

Summary

This pull request introduces improvements to the SvelteFileHandler in the crates/biome_service module, focusing on better handling of embedded JavaScript content within Svelte files by leveraging HTML parsing instead of relying solely on regex. It also maintains backward compatibility for existing functionality.

Enhancements to Svelte file handling:

HTML Parsing Integration:
- Added support for parsing Svelte files as HTML, enabling more robust extraction of embedded JavaScript content. This is achieved using the HtmlFileSource and HtmlRoot types, along with the parse_html_with_cache function. ([[1]](https://github.com/biomejs/biome/pull/6724/files#diff-80d4efb321e6e9ed68e57b1fd2167c08ea0e0f385d0978139dfa5ca294682cc6L5-R16), [[2]](https://github.com/biomejs/biome/pull/6724/files#diff-80d4efb321e6e9ed68e57b1fd2167c08ea0e0f385d0978139dfa5ca294682cc6L24-R55))
- Introduced the extract_scripts method to extract all embedded JavaScript content using HTML parsing, improving handling of HTML structure and indentation. ([crates/biome_service/src/file_handlers/svelte.rsL24-R55](https://github.com/biomejs/biome/pull/6724/files#diff-80d4efb321e6e9ed68e57b1fd2167c08ea0e0f385d0978139dfa5ca294682cc6L24-R55))
Improved Script Parsing:
- Updated the parse method to prioritize HTML-based script extraction. If no scripts are found, the method falls back to the previous regex-based approach for backward compatibility. ([crates/biome_service/src/file_handlers/svelte.rsL119-R163](https://github.com/biomejs/biome/pull/6724/files#diff-80d4efb321e6e9ed68e57b1fd2167c08ea0e0f385d0978139dfa5ca294682cc6L119-R163))

Backward Compatibility:

Regex Updates:
- Modified the regex used for extracting <script> tags to improve handling of script content while retaining compatibility with existing code. ([crates/biome_service/src/file_handlers/svelte.rsL24-R55](https://github.com/biomejs/biome/pull/6724/files#diff-80d4efb321e6e9ed68e57b1fd2167c08ea0e0f385d0978139dfa5ca294682cc6L24-R55))

Formatting Updates:

Preserving HTML Context:
- Enhanced the format method to use the JavaScript formatter while preserving the HTML context in Svelte files. ([crates/biome_service/src/file_handlers/svelte.rsR174](https://github.com/biomejs/biome/pull/6724/files#diff-80d4efb321e6e9ed68e57b1fd2167c08ea0e0f385d0978139dfa5ca294682cc6R174))

Test Plan

Html formatter

changeset-bot · 2025-07-05T19:40:21Z

⚠️ No Changeset found

Latest commit: b951c2f

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

dyc3

IMO, this isn't a meaningful improvement from the current implementation.

If I'm reading this right, it looks like you're only invoking the html parser to grab the script tag contents, and then do everything the same as we did before?

dyc3 · 2025-07-05T21:44:13Z

crates/biome_service/src/file_handlers/svelte.rs

 pub struct SvelteFileHandler;

-// https://regex101.com/r/E4n4hh/6
+// Kept for backward compatibility with existing code


Losing this comment is a regression

dyc3 · 2025-07-05T21:47:24Z

crates/biome_service/src/file_handlers/svelte.rs

+    // If we found scripts, use the first one; otherwise fall back to regex approach
+    if let Some(first_script) = scripts.first() {


This is wrong, we want to process all the script blocks.

ah! thats fascinating. i've never used a script module before. 🤔

gagansuie · 2025-07-06T06:40:22Z

IMO, this isn't a meaningful improvement from the current implementation.

If I'm reading this right, it looks like you're only invoking the html parser to grab the script tag contents, and then do everything the same as we did before?

ill make a better attempt tomorrow. Thanks for the feedback!

ematipico · 2025-07-06T08:01:05Z

crates/biome_service/src/file_handlers/svelte.rs

+    /// Extracts all embedded JavaScript content from a Svelte file using HTML parsing.
+    /// This provides better handling of HTML structure and script indentation.
+    pub fn extract_scripts(text: &str, cache: &mut NodeCache) -> Vec<EmbeddedJsContent> {
+        // Parse the Svelte file as HTML
+        let html_file_source = HtmlFileSource::default();
+        let parse = parse_html_with_cache(text, html_file_source, cache);
+        let html_root = HtmlRoot::cast(parse.syntax()).expect("Failed to cast to HtmlRoot");
+
+        // Extract embedded scripts using the HTML handler
+        html::extract_embedded_scripts(&html_root, cache)
+    }


This provides less functionality than the regex solution. The regex solution is able to understand if the internal code must be parsed as TypeScript or JavaScript. This solution doesn't do that.

coderabbitai · 2025-09-08T03:31:54Z

Walkthrough

Adds an HTML-driven extraction path for embedded JavaScript in Svelte files alongside a new public API SvelteFileHandler::extract_scripts(text, cache) -> Vec. The parse() method now prefers HTML-parsed scripts (using parse_html_with_cache, HtmlFileSource, HtmlRoot, EmbeddedJsContent) and falls back to the legacy regex path if none are found. The SVELTE_FENCE regex is updated to non-greedy matching. input() maintains backward compatibility with the regex path. Additional imports, NodeCache usage, AstNode, and debug logs are introduced. format() remains JavaScript-focused with a note on HTML context.

Possibly related PRs

feat(htmlish): implement indentScriptAndStyle option #7333: Also updates Svelte file handling with HTML-aware logic (HtmlLanguage/indentScriptAndStyle), overlapping with the new HTML parsing pathway.

Suggested labels

A-Formatter, L-HTML

Suggested reviewers

ematipico

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (2)

crates/biome_service/src/file_handlers/svelte.rs (2)
26-26: Restore the lost explanatory comment.

Echoing prior feedback: losing the original comment here was a regression; please reinstate the intent/context it provided.

144-151: Propagate the correct JS file source (TS/JS detection) instead of the document’s file source.

Using language: Some(file_source) drops the ability to detect lang="ts" etc., which the regex path handled. Use the script’s own file source.
-        ParseResult {
-            any_parse: first_script.parse.clone(),
-            language: Some(file_source),
-        }
+        ParseResult {
+            any_parse: first_script.parse.clone(),
+            // Ensure this mirrors what the regex path does (EmbeddingKind::Svelte etc.)
+            language: Some(first_script.file_source.into()),
+        }
If EmbeddedJsContent exposes the field under a different name, adapt accordingly.
#!/bin/bash
# Verify `EmbeddedJsContent` fields
rg -nP 'struct\s+EmbeddedJsContent|impl\s+EmbeddedJsContent' -C3

🧹 Nitpick comments (4)

crates/biome_service/src/file_handlers/svelte.rs (4)
32-38: Docstring contradicts implementation.

The method currently uses the regex path for compatibility, not HTML parsing.
-    /// Extracts the JavaScript/TypeScript code contained in the script block of a Svelte file
-    /// using HTML parsing instead of regex for better handling of HTML structure.
+    /// Extracts the JavaScript/TypeScript code contained in the first `<script>` block of a Svelte file.
+    /// Backwards-compatible: currently uses the regex-based path.
28-28: Regex tweak looks reasonable; add coverage for edge cases.

Non-greedy content is good. Please add tests for multiple scripts, Windows newlines, lang="ts", and attributes with varying whitespace.

I can draft Svelte fixtures exercising these cases if you’d like.

174-176: Comment overpromises.

We’re still formatting the extracted JS; we’re not preserving HTML context in formatting output here. Rephrase to avoid confusion.
-    // Use JavaScript formatter but preserve HTML context
+    // Format the extracted JavaScript; HTML is not formatted here
46-54: Make extract_scripts crate-private.
Usage search shows it’s only called within svelte.rs, so narrow its visibility:
@@ crates/biome_service/src/file_handlers/svelte.rs:46
-    pub fn extract_scripts(text: &str, cache: &mut NodeCache) -> Vec<EmbeddedJsContent> {
+    pub(crate) fn extract_scripts(text: &str, cache: &mut NodeCache) -> Vec<EmbeddedJsContent> {

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8cb0b38 and b951c2f.

📒 Files selected for processing (1)

crates/biome_service/src/file_handlers/svelte.rs (4 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

crates/biome_*/**

📄 CodeRabbit inference engine (CLAUDE.md)

Place core crates under /crates/biome_*/

Files:

crates/biome_service/src/file_handlers/svelte.rs

**/*.rs

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Format all Rust source files before committing (just f)

Files:

crates/biome_service/src/file_handlers/svelte.rs

🧠 Learnings (5)

📚 Learning: 2025-09-05T09:13:58.901Z

Learnt from: CR
PR: biomejs/biome#0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-09-05T09:13:58.901Z
Learning: Applies to crates/biome_analyze/crates/biome_js_analyze/**/src/**/*.rs : Avoid deep indentation and unwrap/expect in CST navigation; prefer ?, map, filter, and_then

Applied to files:

crates/biome_service/src/file_handlers/svelte.rs

📚 Learning: 2025-08-11T11:48:27.774Z

Learnt from: CR
PR: biomejs/biome#0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-08-11T11:48:27.774Z
Learning: Applies to crates/biome_formatter/biome_html_formatter/src/lib.rs : Expose a documented public function format_node(options: HtmlFormatOptions, root: &HtmlSyntaxNode) -> FormatResult<Formatted<HtmlFormatContext>> delegating to biome_formatter::format_node

Applied to files:

crates/biome_service/src/file_handlers/svelte.rs

📚 Learning: 2025-08-11T11:48:52.001Z

Learnt from: CR
PR: biomejs/biome#0
File: crates/biome_js_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-08-11T11:48:52.001Z
Learning: Applies to crates/biome_js_formatter/**/*.rs : Build an intermediate IR via the provided helper APIs when formatting

Applied to files:

crates/biome_service/src/file_handlers/svelte.rs

📚 Learning: 2025-08-11T11:48:52.001Z

Learnt from: CR
PR: biomejs/biome#0
File: crates/biome_js_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-08-11T11:48:52.001Z
Learning: Applies to crates/biome_js_formatter/**/*.rs : Import the FormatNode trait and implement it for your Node

Applied to files:

crates/biome_service/src/file_handlers/svelte.rs

📚 Learning: 2025-08-11T11:48:52.001Z

Learnt from: CR
PR: biomejs/biome#0
File: crates/biome_js_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-08-11T11:48:52.001Z
Learning: Applies to crates/biome_js_formatter/**/*.rs : Use the generic Format trait and FormatNode for AST nodes when implementing the formatter

Applied to files:

crates/biome_service/src/file_handlers/svelte.rs

🧬 Code graph analysis (1)

crates/biome_service/src/file_handlers/svelte.rs (3)

crates/biome_html_parser/src/lib.rs (1)

parse_html_with_cache (18-34)

crates/biome_js_parser/src/parse.rs (1)

parse_js_with_cache (272-283)

crates/biome_service/src/file_handlers/html.rs (2)

parse (258-284)

extract_embedded_scripts (298-312)

🔇 Additional comments (1)

crates/biome_service/src/file_handlers/svelte.rs (1)

140-165: Confirm end-to-end parity between HTML and fallback parsing

Both branches use the same parse_lang_from_script_opening_tag logic for TS/JS detection.

HTML-extracted scripts preserve original byte offsets; verify diagnostics from the regex fallback align exactly.

Benchmark against large Svelte files (e.g. >100 KB) to ensure no noticeable performance regression.

coderabbitai · 2025-09-08T03:38:49Z

crates/biome_service/src/file_handlers/svelte.rs

+use biome_html_parser::parse_html_with_cache;
+use biome_html_syntax::{HtmlFileSource, HtmlRoot};
 use biome_js_parser::{JsParserOptions, parse_js_with_cache};


⚠️ Potential issue

Wrong imports for HTML parsing; fix types and remove unused symbol.

parse_html_with_cache expects HtmlParseOptions, not HtmlFileSource. Also, HtmlFileSource becomes unused after the fix.

Apply:

-use biome_html_parser::parse_html_with_cache; -use biome_html_syntax::{HtmlFileSource, HtmlRoot}; +use biome_html_parser::{parse_html_with_cache, HtmlParseOptions}; +use biome_html_syntax::HtmlRoot;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

use biome_html_parser::parse_html_with_cache;

use biome_html_syntax::{HtmlFileSource, HtmlRoot};

use biome_js_parser::{JsParserOptions, parse_js_with_cache};

// crates/biome_service/src/file_handlers/svelte.rs

use biome_html_parser::{parse_html_with_cache, HtmlParseOptions};

use biome_html_syntax::HtmlRoot;

use biome_js_parser::{JsParserOptions, parse_js_with_cache};

🤖 Prompt for AI Agents

In crates/biome_service/src/file_handlers/svelte.rs around lines 11 to 13, the import list is wrong: parse_html_with_cache expects HtmlParseOptions (not HtmlFileSource) and HtmlFileSource becomes unused. Replace the HtmlFileSource import with HtmlParseOptions and remove the unused HtmlFileSource symbol from the file; keep HtmlRoot and other imports as-is so parse_html_with_cache is called with the correct type.

coderabbitai · 2025-09-08T03:38:49Z

crates/biome_service/src/file_handlers/svelte.rs

+        let html_file_source = HtmlFileSource::default();
+        let parse = parse_html_with_cache(text, html_file_source, cache);
+        let html_root = HtmlRoot::cast(parse.syntax()).expect("Failed to cast to HtmlRoot");
+
+        // Extract embedded scripts using the HTML handler
+        html::extract_embedded_scripts(&html_root, cache)
+    }
+


⚠️ Potential issue

Fix parse_html_with_cache call and avoid panicking on cast.

Arguments are in the wrong order and of the wrong types. The expect will panic on malformed HTML; prefer a graceful fallback.

- let html_file_source = HtmlFileSource::default(); - let parse = parse_html_with_cache(text, html_file_source, cache); - let html_root = HtmlRoot::cast(parse.syntax()).expect("Failed to cast to HtmlRoot"); - - // Extract embedded scripts using the HTML handler - html::extract_embedded_scripts(&html_root, cache) + let parse = parse_html_with_cache(text, cache, HtmlParseOptions::default()); + let Some(html_root) = HtmlRoot::cast(parse.syntax()) else { + debug!("HTML parse produced no root; falling back to regex"); + return Vec::new(); + }; + html::extract_embedded_scripts(&html_root, cache)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let html_file_source = HtmlFileSource::default();

let parse = parse_html_with_cache(text, html_file_source, cache);

let html_root = HtmlRoot::cast(parse.syntax()).expect("Failed to cast to HtmlRoot");

// Extract embedded scripts using the HTML handler

html::extract_embedded_scripts(&html_root, cache)

}

let parse = parse_html_with_cache(text, cache, HtmlParseOptions::default());

let Some(html_root) = HtmlRoot::cast(parse.syntax()) else {

debug!("HTML parse produced no root; falling back to regex");

return Vec::new();

};

html::extract_embedded_scripts(&html_root, cache)

🤖 Prompt for AI Agents

In crates/biome_service/src/file_handlers/svelte.rs around lines 48–55, the call to parse_html_with_cache passes arguments in the wrong order/types and then uses expect which panics on malformed HTML; change the parse call to use the correct parameter order and types (pass the cache and the HtmlFileSource in the expected positions — e.g., parse_html_with_cache(cache, text, &html_file_source) or parse_html_with_cache(cache, text, html_file_source) depending on the function signature) and replace the expect with a non-panicking branch: attempt HtmlRoot::cast(parse.syntax()) with an if let / match, and if the cast fails return a safe fallback (for example an empty Vec or None) so html::extract_embedded_scripts is only called with a valid &HtmlRoot.

coderabbitai · 2025-09-08T03:38:49Z

crates/biome_service/src/file_handlers/svelte.rs

+    // If we found scripts, use the first one; otherwise fall back to regex approach
+    if let Some(first_script) = scripts.first() {


🛠️ Refactor suggestion

Don’t drop additional <script> blocks; handle both instance and module scripts.

Only using the first script misses context="module" or additional blocks. At minimum prefer the instance script if present; ideally parse both and expose them to formatter/linter.

Short-term: pick the instance script when available; fall back to module. Long-term: model multiple embedded documents (module + instance) so lint/format can address both with proper source maps. Happy to sketch an API if useful.

🤖 Prompt for AI Agents

In crates/biome_service/src/file_handlers/svelte.rs around lines 143-144, the code currently picks only the first <script> block from the parsed scripts which drops additional blocks (notably the instance vs module script). Update the logic to prefer the instance/script without context="module" when present but if not present fall back to the module script; do not discard the other block—return or attach both script bodies (module and instance) to the caller so the formatter/linter can operate on each with correct source ranges; as a short-term change implement a selection that searches scripts for one without context="module" first and uses that, else uses the module, and as a follow-up surface both scripts in the parsed result (e.g., add fields for instance_script and module_script) so future formatting/linting can handle both with proper source maps.

gagansuie added 2 commits July 5, 2025 14:18

fix: html super language support

79bce3b

Merge pull request #3 from gagansuie/html-formatter

799b9e7

Html formatter

github-actions bot added the A-Project Area: project label Jul 5, 2025

Merge branch 'main' into main

0920272

gagansuie changed the title ~~feat: html super lagnauge support~~ feat: html super langauge support Jul 5, 2025

gagansuie changed the title ~~feat: html super langauge support~~ feat: svelte html super language support Jul 5, 2025

dyc3 requested changes Jul 5, 2025

View reviewed changes

ematipico reviewed Jul 6, 2025

View reviewed changes

Merge remote-tracking branch 'upstream/main'

b951c2f

coderabbitai bot reviewed Sep 8, 2025

View reviewed changes

gagansuie closed this Sep 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: svelte html super language support#6724

feat: svelte html super language support#6724
gagansuie wants to merge 4 commits intobiomejs:mainfrom
gagansuie:main

gagansuie commented Jul 5, 2025 •

edited

Loading

Uh oh!

changeset-bot bot commented Jul 5, 2025 •

edited

Loading

Uh oh!

dyc3 left a comment

Uh oh!

dyc3 Jul 5, 2025

Uh oh!

dyc3 Jul 5, 2025

Uh oh!

gagansuie Jul 6, 2025

Uh oh!

gagansuie commented Jul 6, 2025

Uh oh!

ematipico Jul 6, 2025

Uh oh!

coderabbitai bot commented Sep 8, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 8, 2025

Uh oh!

coderabbitai bot Sep 8, 2025

Uh oh!

coderabbitai bot Sep 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		// If we found scripts, use the first one; otherwise fall back to regex approach
		if let Some(first_script) = scripts.first() {

Uh oh!

Conversation

gagansuie commented Jul 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Enhancements to Svelte file handling:

Backward Compatibility:

Formatting Updates:

Test Plan

Uh oh!

changeset-bot bot commented Jul 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

dyc3 left a comment

Choose a reason for hiding this comment

Uh oh!

dyc3 Jul 5, 2025

Choose a reason for hiding this comment

Uh oh!

dyc3 Jul 5, 2025

Choose a reason for hiding this comment

Uh oh!

gagansuie Jul 6, 2025

Choose a reason for hiding this comment

Uh oh!

gagansuie commented Jul 6, 2025

Uh oh!

ematipico Jul 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gagansuie commented Jul 5, 2025 •

edited

Loading

changeset-bot bot commented Jul 5, 2025 •

edited

Loading

coderabbitai bot commented Sep 8, 2025 •

edited

Loading