feat: svelte html super language support#6724
feat: svelte html super language support#6724gagansuie wants to merge 4 commits intobiomejs:mainfrom
Conversation
|
dyc3
left a comment
There was a problem hiding this comment.
IMO, this isn't a meaningful improvement from the current implementation.
If I'm reading this right, it looks like you're only invoking the html parser to grab the script tag contents, and then do everything the same as we did before?
| pub struct SvelteFileHandler; | ||
|
|
||
| // https://regex101.com/r/E4n4hh/6 | ||
| // Kept for backward compatibility with existing code |
There was a problem hiding this comment.
Losing this comment is a regression
| // If we found scripts, use the first one; otherwise fall back to regex approach | ||
| if let Some(first_script) = scripts.first() { |
There was a problem hiding this comment.
This is wrong, we want to process all the script blocks.
ill make a better attempt tomorrow. Thanks for the feedback! |
| /// Extracts all embedded JavaScript content from a Svelte file using HTML parsing. | ||
| /// This provides better handling of HTML structure and script indentation. | ||
| pub fn extract_scripts(text: &str, cache: &mut NodeCache) -> Vec<EmbeddedJsContent> { | ||
| // Parse the Svelte file as HTML | ||
| let html_file_source = HtmlFileSource::default(); | ||
| let parse = parse_html_with_cache(text, html_file_source, cache); | ||
| let html_root = HtmlRoot::cast(parse.syntax()).expect("Failed to cast to HtmlRoot"); | ||
|
|
||
| // Extract embedded scripts using the HTML handler | ||
| html::extract_embedded_scripts(&html_root, cache) | ||
| } |
There was a problem hiding this comment.
This provides less functionality than the regex solution. The regex solution is able to understand if the internal code must be parsed as TypeScript or JavaScript. This solution doesn't do that.
WalkthroughAdds an HTML-driven extraction path for embedded JavaScript in Svelte files alongside a new public API SvelteFileHandler::extract_scripts(text, cache) -> Vec. The parse() method now prefers HTML-parsed scripts (using parse_html_with_cache, HtmlFileSource, HtmlRoot, EmbeddedJsContent) and falls back to the legacy regex path if none are found. The SVELTE_FENCE regex is updated to non-greedy matching. input() maintains backward compatibility with the regex path. Additional imports, NodeCache usage, AstNode, and debug logs are introduced. format() remains JavaScript-focused with a note on HTML context. Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
crates/biome_service/src/file_handlers/svelte.rs (2)
26-26: Restore the lost explanatory comment.Echoing prior feedback: losing the original comment here was a regression; please reinstate the intent/context it provided.
144-151: Propagate the correct JS file source (TS/JS detection) instead of the document’s file source.Using
language: Some(file_source)drops the ability to detectlang="ts"etc., which the regex path handled. Use the script’s own file source.- ParseResult { - any_parse: first_script.parse.clone(), - language: Some(file_source), - } + ParseResult { + any_parse: first_script.parse.clone(), + // Ensure this mirrors what the regex path does (EmbeddingKind::Svelte etc.) + language: Some(first_script.file_source.into()), + }If
EmbeddedJsContentexposes the field under a different name, adapt accordingly.#!/bin/bash # Verify `EmbeddedJsContent` fields rg -nP 'struct\s+EmbeddedJsContent|impl\s+EmbeddedJsContent' -C3
🧹 Nitpick comments (4)
crates/biome_service/src/file_handlers/svelte.rs (4)
32-38: Docstring contradicts implementation.The method currently uses the regex path for compatibility, not HTML parsing.
- /// Extracts the JavaScript/TypeScript code contained in the script block of a Svelte file - /// using HTML parsing instead of regex for better handling of HTML structure. + /// Extracts the JavaScript/TypeScript code contained in the first `<script>` block of a Svelte file. + /// Backwards-compatible: currently uses the regex-based path.
28-28: Regex tweak looks reasonable; add coverage for edge cases.Non-greedy content is good. Please add tests for multiple scripts, Windows newlines,
lang="ts", and attributes with varying whitespace.I can draft Svelte fixtures exercising these cases if you’d like.
174-176: Comment overpromises.We’re still formatting the extracted JS; we’re not preserving HTML context in formatting output here. Rephrase to avoid confusion.
- // Use JavaScript formatter but preserve HTML context + // Format the extracted JavaScript; HTML is not formatted here
46-54: Makeextract_scriptscrate-private.
Usage search shows it’s only called withinsvelte.rs, so narrow its visibility:@@ crates/biome_service/src/file_handlers/svelte.rs:46 - pub fn extract_scripts(text: &str, cache: &mut NodeCache) -> Vec<EmbeddedJsContent> { + pub(crate) fn extract_scripts(text: &str, cache: &mut NodeCache) -> Vec<EmbeddedJsContent> {
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
crates/biome_service/src/file_handlers/svelte.rs(4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
crates/biome_*/**
📄 CodeRabbit inference engine (CLAUDE.md)
Place core crates under /crates/biome_*/
Files:
crates/biome_service/src/file_handlers/svelte.rs
**/*.rs
📄 CodeRabbit inference engine (CONTRIBUTING.md)
Format all Rust source files before committing (just f)
Files:
crates/biome_service/src/file_handlers/svelte.rs
🧠 Learnings (5)
📚 Learning: 2025-09-05T09:13:58.901Z
Learnt from: CR
PR: biomejs/biome#0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-09-05T09:13:58.901Z
Learning: Applies to crates/biome_analyze/crates/biome_js_analyze/**/src/**/*.rs : Avoid deep indentation and unwrap/expect in CST navigation; prefer ?, map, filter, and_then
Applied to files:
crates/biome_service/src/file_handlers/svelte.rs
📚 Learning: 2025-08-11T11:48:27.774Z
Learnt from: CR
PR: biomejs/biome#0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-08-11T11:48:27.774Z
Learning: Applies to crates/biome_formatter/biome_html_formatter/src/lib.rs : Expose a documented public function format_node(options: HtmlFormatOptions, root: &HtmlSyntaxNode) -> FormatResult<Formatted<HtmlFormatContext>> delegating to biome_formatter::format_node
Applied to files:
crates/biome_service/src/file_handlers/svelte.rs
📚 Learning: 2025-08-11T11:48:52.001Z
Learnt from: CR
PR: biomejs/biome#0
File: crates/biome_js_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-08-11T11:48:52.001Z
Learning: Applies to crates/biome_js_formatter/**/*.rs : Build an intermediate IR via the provided helper APIs when formatting
Applied to files:
crates/biome_service/src/file_handlers/svelte.rs
📚 Learning: 2025-08-11T11:48:52.001Z
Learnt from: CR
PR: biomejs/biome#0
File: crates/biome_js_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-08-11T11:48:52.001Z
Learning: Applies to crates/biome_js_formatter/**/*.rs : Import the FormatNode trait and implement it for your Node
Applied to files:
crates/biome_service/src/file_handlers/svelte.rs
📚 Learning: 2025-08-11T11:48:52.001Z
Learnt from: CR
PR: biomejs/biome#0
File: crates/biome_js_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-08-11T11:48:52.001Z
Learning: Applies to crates/biome_js_formatter/**/*.rs : Use the generic Format trait and FormatNode for AST nodes when implementing the formatter
Applied to files:
crates/biome_service/src/file_handlers/svelte.rs
🧬 Code graph analysis (1)
crates/biome_service/src/file_handlers/svelte.rs (3)
crates/biome_html_parser/src/lib.rs (1)
parse_html_with_cache(18-34)crates/biome_js_parser/src/parse.rs (1)
parse_js_with_cache(272-283)crates/biome_service/src/file_handlers/html.rs (2)
parse(258-284)extract_embedded_scripts(298-312)
🔇 Additional comments (1)
crates/biome_service/src/file_handlers/svelte.rs (1)
140-165: Confirm end-to-end parity between HTML and fallback parsing
- Both branches use the same
parse_lang_from_script_opening_taglogic for TS/JS detection.- HTML-extracted scripts preserve original byte offsets; verify diagnostics from the regex fallback align exactly.
- Benchmark against large Svelte files (e.g. >100 KB) to ensure no noticeable performance regression.
| use biome_html_parser::parse_html_with_cache; | ||
| use biome_html_syntax::{HtmlFileSource, HtmlRoot}; | ||
| use biome_js_parser::{JsParserOptions, parse_js_with_cache}; |
There was a problem hiding this comment.
Wrong imports for HTML parsing; fix types and remove unused symbol.
parse_html_with_cache expects HtmlParseOptions, not HtmlFileSource. Also, HtmlFileSource becomes unused after the fix.
Apply:
-use biome_html_parser::parse_html_with_cache;
-use biome_html_syntax::{HtmlFileSource, HtmlRoot};
+use biome_html_parser::{parse_html_with_cache, HtmlParseOptions};
+use biome_html_syntax::HtmlRoot;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| use biome_html_parser::parse_html_with_cache; | |
| use biome_html_syntax::{HtmlFileSource, HtmlRoot}; | |
| use biome_js_parser::{JsParserOptions, parse_js_with_cache}; | |
| // crates/biome_service/src/file_handlers/svelte.rs | |
| use biome_html_parser::{parse_html_with_cache, HtmlParseOptions}; | |
| use biome_html_syntax::HtmlRoot; | |
| use biome_js_parser::{JsParserOptions, parse_js_with_cache}; |
🤖 Prompt for AI Agents
In crates/biome_service/src/file_handlers/svelte.rs around lines 11 to 13, the
import list is wrong: parse_html_with_cache expects HtmlParseOptions (not
HtmlFileSource) and HtmlFileSource becomes unused. Replace the HtmlFileSource
import with HtmlParseOptions and remove the unused HtmlFileSource symbol from
the file; keep HtmlRoot and other imports as-is so parse_html_with_cache is
called with the correct type.
| let html_file_source = HtmlFileSource::default(); | ||
| let parse = parse_html_with_cache(text, html_file_source, cache); | ||
| let html_root = HtmlRoot::cast(parse.syntax()).expect("Failed to cast to HtmlRoot"); | ||
|
|
||
| // Extract embedded scripts using the HTML handler | ||
| html::extract_embedded_scripts(&html_root, cache) | ||
| } | ||
|
|
There was a problem hiding this comment.
Fix parse_html_with_cache call and avoid panicking on cast.
Arguments are in the wrong order and of the wrong types. The expect will panic on malformed HTML; prefer a graceful fallback.
- let html_file_source = HtmlFileSource::default();
- let parse = parse_html_with_cache(text, html_file_source, cache);
- let html_root = HtmlRoot::cast(parse.syntax()).expect("Failed to cast to HtmlRoot");
-
- // Extract embedded scripts using the HTML handler
- html::extract_embedded_scripts(&html_root, cache)
+ let parse = parse_html_with_cache(text, cache, HtmlParseOptions::default());
+ let Some(html_root) = HtmlRoot::cast(parse.syntax()) else {
+ debug!("HTML parse produced no root; falling back to regex");
+ return Vec::new();
+ };
+ html::extract_embedded_scripts(&html_root, cache)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let html_file_source = HtmlFileSource::default(); | |
| let parse = parse_html_with_cache(text, html_file_source, cache); | |
| let html_root = HtmlRoot::cast(parse.syntax()).expect("Failed to cast to HtmlRoot"); | |
| // Extract embedded scripts using the HTML handler | |
| html::extract_embedded_scripts(&html_root, cache) | |
| } | |
| let parse = parse_html_with_cache(text, cache, HtmlParseOptions::default()); | |
| let Some(html_root) = HtmlRoot::cast(parse.syntax()) else { | |
| debug!("HTML parse produced no root; falling back to regex"); | |
| return Vec::new(); | |
| }; | |
| html::extract_embedded_scripts(&html_root, cache) |
🤖 Prompt for AI Agents
In crates/biome_service/src/file_handlers/svelte.rs around lines 48–55, the call
to parse_html_with_cache passes arguments in the wrong order/types and then uses
expect which panics on malformed HTML; change the parse call to use the correct
parameter order and types (pass the cache and the HtmlFileSource in the expected
positions — e.g., parse_html_with_cache(cache, text, &html_file_source) or
parse_html_with_cache(cache, text, html_file_source) depending on the function
signature) and replace the expect with a non-panicking branch: attempt
HtmlRoot::cast(parse.syntax()) with an if let / match, and if the cast fails
return a safe fallback (for example an empty Vec or None) so
html::extract_embedded_scripts is only called with a valid &HtmlRoot.
| // If we found scripts, use the first one; otherwise fall back to regex approach | ||
| if let Some(first_script) = scripts.first() { |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Don’t drop additional <script> blocks; handle both instance and module scripts.
Only using the first script misses context="module" or additional blocks. At minimum prefer the instance script if present; ideally parse both and expose them to formatter/linter.
Short-term: pick the instance script when available; fall back to module. Long-term: model multiple embedded documents (module + instance) so lint/format can address both with proper source maps. Happy to sketch an API if useful.
🤖 Prompt for AI Agents
In crates/biome_service/src/file_handlers/svelte.rs around lines 143-144, the
code currently picks only the first <script> block from the parsed scripts which
drops additional blocks (notably the instance vs module script). Update the
logic to prefer the instance/script without context="module" when present but if
not present fall back to the module script; do not discard the other
block—return or attach both script bodies (module and instance) to the caller so
the formatter/linter can operate on each with correct source ranges; as a
short-term change implement a selection that searches scripts for one without
context="module" first and uses that, else uses the module, and as a follow-up
surface both scripts in the parsed result (e.g., add fields for instance_script
and module_script) so future formatting/linting can handle both with proper
source maps.

Summary
This pull request introduces improvements to the
SvelteFileHandlerin thecrates/biome_servicemodule, focusing on better handling of embedded JavaScript content within Svelte files by leveraging HTML parsing instead of relying solely on regex. It also maintains backward compatibility for existing functionality.Enhancements to Svelte file handling:
HTML Parsing Integration:
HtmlFileSourceandHtmlRoottypes, along with theparse_html_with_cachefunction. ([[1]](https://github.com/biomejs/biome/pull/6724/files#diff-80d4efb321e6e9ed68e57b1fd2167c08ea0e0f385d0978139dfa5ca294682cc6L5-R16),[[2]](https://github.com/biomejs/biome/pull/6724/files#diff-80d4efb321e6e9ed68e57b1fd2167c08ea0e0f385d0978139dfa5ca294682cc6L24-R55))extract_scriptsmethod to extract all embedded JavaScript content using HTML parsing, improving handling of HTML structure and indentation. ([crates/biome_service/src/file_handlers/svelte.rsL24-R55](https://github.com/biomejs/biome/pull/6724/files#diff-80d4efb321e6e9ed68e57b1fd2167c08ea0e0f385d0978139dfa5ca294682cc6L24-R55))Improved Script Parsing:
parsemethod to prioritize HTML-based script extraction. If no scripts are found, the method falls back to the previous regex-based approach for backward compatibility. ([crates/biome_service/src/file_handlers/svelte.rsL119-R163](https://github.com/biomejs/biome/pull/6724/files#diff-80d4efb321e6e9ed68e57b1fd2167c08ea0e0f385d0978139dfa5ca294682cc6L119-R163))Backward Compatibility:
<script>tags to improve handling of script content while retaining compatibility with existing code. ([crates/biome_service/src/file_handlers/svelte.rsL24-R55](https://github.com/biomejs/biome/pull/6724/files#diff-80d4efb321e6e9ed68e57b1fd2167c08ea0e0f385d0978139dfa5ca294682cc6L24-R55))Formatting Updates:
formatmethod to use the JavaScript formatter while preserving the HTML context in Svelte files. ([crates/biome_service/src/file_handlers/svelte.rsR174](https://github.com/biomejs/biome/pull/6724/files#diff-80d4efb321e6e9ed68e57b1fd2167c08ea0e0f385d0978139dfa5ca294682cc6R174))Test Plan