Skip to content

feat: svelte html super language support#6724

Closed
gagansuie wants to merge 4 commits intobiomejs:mainfrom
gagansuie:main
Closed

feat: svelte html super language support#6724
gagansuie wants to merge 4 commits intobiomejs:mainfrom
gagansuie:main

Conversation

@gagansuie
Copy link

@gagansuie gagansuie commented Jul 5, 2025

Summary

This pull request introduces improvements to the SvelteFileHandler in the crates/biome_service module, focusing on better handling of embedded JavaScript content within Svelte files by leveraging HTML parsing instead of relying solely on regex. It also maintains backward compatibility for existing functionality.

Enhancements to Svelte file handling:

  • HTML Parsing Integration:

    • Added support for parsing Svelte files as HTML, enabling more robust extraction of embedded JavaScript content. This is achieved using the HtmlFileSource and HtmlRoot types, along with the parse_html_with_cache function. ([[1]](https://github.com/biomejs/biome/pull/6724/files#diff-80d4efb321e6e9ed68e57b1fd2167c08ea0e0f385d0978139dfa5ca294682cc6L5-R16), [[2]](https://github.com/biomejs/biome/pull/6724/files#diff-80d4efb321e6e9ed68e57b1fd2167c08ea0e0f385d0978139dfa5ca294682cc6L24-R55))
    • Introduced the extract_scripts method to extract all embedded JavaScript content using HTML parsing, improving handling of HTML structure and indentation. ([crates/biome_service/src/file_handlers/svelte.rsL24-R55](https://github.com/biomejs/biome/pull/6724/files#diff-80d4efb321e6e9ed68e57b1fd2167c08ea0e0f385d0978139dfa5ca294682cc6L24-R55))
  • Improved Script Parsing:

    • Updated the parse method to prioritize HTML-based script extraction. If no scripts are found, the method falls back to the previous regex-based approach for backward compatibility. ([crates/biome_service/src/file_handlers/svelte.rsL119-R163](https://github.com/biomejs/biome/pull/6724/files#diff-80d4efb321e6e9ed68e57b1fd2167c08ea0e0f385d0978139dfa5ca294682cc6L119-R163))

Backward Compatibility:

  • Regex Updates:
    • Modified the regex used for extracting <script> tags to improve handling of script content while retaining compatibility with existing code. ([crates/biome_service/src/file_handlers/svelte.rsL24-R55](https://github.com/biomejs/biome/pull/6724/files#diff-80d4efb321e6e9ed68e57b1fd2167c08ea0e0f385d0978139dfa5ca294682cc6L24-R55))

Formatting Updates:

  • Preserving HTML Context:
    • Enhanced the format method to use the JavaScript formatter while preserving the HTML context in Svelte files. ([crates/biome_service/src/file_handlers/svelte.rsR174](https://github.com/biomejs/biome/pull/6724/files#diff-80d4efb321e6e9ed68e57b1fd2167c08ea0e0f385d0978139dfa5ca294682cc6R174))

Test Plan

@changeset-bot
Copy link

changeset-bot bot commented Jul 5, 2025

⚠️ No Changeset found

Latest commit: b951c2f

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions bot added the A-Project Area: project label Jul 5, 2025
@gagansuie gagansuie changed the title feat: html super lagnauge support feat: html super langauge support Jul 5, 2025
@gagansuie gagansuie changed the title feat: html super langauge support feat: svelte html super language support Jul 5, 2025
Copy link
Contributor

@dyc3 dyc3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this isn't a meaningful improvement from the current implementation.

If I'm reading this right, it looks like you're only invoking the html parser to grab the script tag contents, and then do everything the same as we did before?

pub struct SvelteFileHandler;

// https://regex101.com/r/E4n4hh/6
// Kept for backward compatibility with existing code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Losing this comment is a regression

Comment on lines +143 to +144
// If we found scripts, use the first one; otherwise fall back to regex approach
if let Some(first_script) = scripts.first() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong, we want to process all the script blocks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah! thats fascinating. i've never used a script module before. 🤔

Screenshot from 2025-07-06 01-35-21

@gagansuie
Copy link
Author

IMO, this isn't a meaningful improvement from the current implementation.

If I'm reading this right, it looks like you're only invoking the html parser to grab the script tag contents, and then do everything the same as we did before?

ill make a better attempt tomorrow. Thanks for the feedback!

Comment on lines +44 to +54
/// Extracts all embedded JavaScript content from a Svelte file using HTML parsing.
/// This provides better handling of HTML structure and script indentation.
pub fn extract_scripts(text: &str, cache: &mut NodeCache) -> Vec<EmbeddedJsContent> {
// Parse the Svelte file as HTML
let html_file_source = HtmlFileSource::default();
let parse = parse_html_with_cache(text, html_file_source, cache);
let html_root = HtmlRoot::cast(parse.syntax()).expect("Failed to cast to HtmlRoot");

// Extract embedded scripts using the HTML handler
html::extract_embedded_scripts(&html_root, cache)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This provides less functionality than the regex solution. The regex solution is able to understand if the internal code must be parsed as TypeScript or JavaScript. This solution doesn't do that.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 8, 2025

Walkthrough

Adds an HTML-driven extraction path for embedded JavaScript in Svelte files alongside a new public API SvelteFileHandler::extract_scripts(text, cache) -> Vec. The parse() method now prefers HTML-parsed scripts (using parse_html_with_cache, HtmlFileSource, HtmlRoot, EmbeddedJsContent) and falls back to the legacy regex path if none are found. The SVELTE_FENCE regex is updated to non-greedy matching. input() maintains backward compatibility with the regex path. Additional imports, NodeCache usage, AstNode, and debug logs are introduced. format() remains JavaScript-focused with a note on HTML context.

Possibly related PRs

Suggested labels

A-Formatter, L-HTML

Suggested reviewers

  • ematipico
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
crates/biome_service/src/file_handlers/svelte.rs (2)

26-26: Restore the lost explanatory comment.

Echoing prior feedback: losing the original comment here was a regression; please reinstate the intent/context it provided.


144-151: Propagate the correct JS file source (TS/JS detection) instead of the document’s file source.

Using language: Some(file_source) drops the ability to detect lang="ts" etc., which the regex path handled. Use the script’s own file source.

-        ParseResult {
-            any_parse: first_script.parse.clone(),
-            language: Some(file_source),
-        }
+        ParseResult {
+            any_parse: first_script.parse.clone(),
+            // Ensure this mirrors what the regex path does (EmbeddingKind::Svelte etc.)
+            language: Some(first_script.file_source.into()),
+        }

If EmbeddedJsContent exposes the field under a different name, adapt accordingly.

#!/bin/bash
# Verify `EmbeddedJsContent` fields
rg -nP 'struct\s+EmbeddedJsContent|impl\s+EmbeddedJsContent' -C3
🧹 Nitpick comments (4)
crates/biome_service/src/file_handlers/svelte.rs (4)

32-38: Docstring contradicts implementation.

The method currently uses the regex path for compatibility, not HTML parsing.

-    /// Extracts the JavaScript/TypeScript code contained in the script block of a Svelte file
-    /// using HTML parsing instead of regex for better handling of HTML structure.
+    /// Extracts the JavaScript/TypeScript code contained in the first `<script>` block of a Svelte file.
+    /// Backwards-compatible: currently uses the regex-based path.

28-28: Regex tweak looks reasonable; add coverage for edge cases.

Non-greedy content is good. Please add tests for multiple scripts, Windows newlines, lang="ts", and attributes with varying whitespace.

I can draft Svelte fixtures exercising these cases if you’d like.


174-176: Comment overpromises.

We’re still formatting the extracted JS; we’re not preserving HTML context in formatting output here. Rephrase to avoid confusion.

-    // Use JavaScript formatter but preserve HTML context
+    // Format the extracted JavaScript; HTML is not formatted here

46-54: Make extract_scripts crate-private.
Usage search shows it’s only called within svelte.rs, so narrow its visibility:

@@ crates/biome_service/src/file_handlers/svelte.rs:46
-    pub fn extract_scripts(text: &str, cache: &mut NodeCache) -> Vec<EmbeddedJsContent> {
+    pub(crate) fn extract_scripts(text: &str, cache: &mut NodeCache) -> Vec<EmbeddedJsContent> {
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8cb0b38 and b951c2f.

📒 Files selected for processing (1)
  • crates/biome_service/src/file_handlers/svelte.rs (4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
crates/biome_*/**

📄 CodeRabbit inference engine (CLAUDE.md)

Place core crates under /crates/biome_*/

Files:

  • crates/biome_service/src/file_handlers/svelte.rs
**/*.rs

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Format all Rust source files before committing (just f)

Files:

  • crates/biome_service/src/file_handlers/svelte.rs
🧠 Learnings (5)
📚 Learning: 2025-09-05T09:13:58.901Z
Learnt from: CR
PR: biomejs/biome#0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-09-05T09:13:58.901Z
Learning: Applies to crates/biome_analyze/crates/biome_js_analyze/**/src/**/*.rs : Avoid deep indentation and unwrap/expect in CST navigation; prefer ?, map, filter, and_then

Applied to files:

  • crates/biome_service/src/file_handlers/svelte.rs
📚 Learning: 2025-08-11T11:48:27.774Z
Learnt from: CR
PR: biomejs/biome#0
File: crates/biome_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-08-11T11:48:27.774Z
Learning: Applies to crates/biome_formatter/biome_html_formatter/src/lib.rs : Expose a documented public function format_node(options: HtmlFormatOptions, root: &HtmlSyntaxNode) -> FormatResult<Formatted<HtmlFormatContext>> delegating to biome_formatter::format_node

Applied to files:

  • crates/biome_service/src/file_handlers/svelte.rs
📚 Learning: 2025-08-11T11:48:52.001Z
Learnt from: CR
PR: biomejs/biome#0
File: crates/biome_js_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-08-11T11:48:52.001Z
Learning: Applies to crates/biome_js_formatter/**/*.rs : Build an intermediate IR via the provided helper APIs when formatting

Applied to files:

  • crates/biome_service/src/file_handlers/svelte.rs
📚 Learning: 2025-08-11T11:48:52.001Z
Learnt from: CR
PR: biomejs/biome#0
File: crates/biome_js_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-08-11T11:48:52.001Z
Learning: Applies to crates/biome_js_formatter/**/*.rs : Import the FormatNode trait and implement it for your Node

Applied to files:

  • crates/biome_service/src/file_handlers/svelte.rs
📚 Learning: 2025-08-11T11:48:52.001Z
Learnt from: CR
PR: biomejs/biome#0
File: crates/biome_js_formatter/CONTRIBUTING.md:0-0
Timestamp: 2025-08-11T11:48:52.001Z
Learning: Applies to crates/biome_js_formatter/**/*.rs : Use the generic Format trait and FormatNode for AST nodes when implementing the formatter

Applied to files:

  • crates/biome_service/src/file_handlers/svelte.rs
🧬 Code graph analysis (1)
crates/biome_service/src/file_handlers/svelte.rs (3)
crates/biome_html_parser/src/lib.rs (1)
  • parse_html_with_cache (18-34)
crates/biome_js_parser/src/parse.rs (1)
  • parse_js_with_cache (272-283)
crates/biome_service/src/file_handlers/html.rs (2)
  • parse (258-284)
  • extract_embedded_scripts (298-312)
🔇 Additional comments (1)
crates/biome_service/src/file_handlers/svelte.rs (1)

140-165: Confirm end-to-end parity between HTML and fallback parsing

  • Both branches use the same parse_lang_from_script_opening_tag logic for TS/JS detection.
  • HTML-extracted scripts preserve original byte offsets; verify diagnostics from the regex fallback align exactly.
  • Benchmark against large Svelte files (e.g. >100 KB) to ensure no noticeable performance regression.

Comment on lines +11 to 13
use biome_html_parser::parse_html_with_cache;
use biome_html_syntax::{HtmlFileSource, HtmlRoot};
use biome_js_parser::{JsParserOptions, parse_js_with_cache};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Wrong imports for HTML parsing; fix types and remove unused symbol.

parse_html_with_cache expects HtmlParseOptions, not HtmlFileSource. Also, HtmlFileSource becomes unused after the fix.

Apply:

-use biome_html_parser::parse_html_with_cache;
-use biome_html_syntax::{HtmlFileSource, HtmlRoot};
+use biome_html_parser::{parse_html_with_cache, HtmlParseOptions};
+use biome_html_syntax::HtmlRoot;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
use biome_html_parser::parse_html_with_cache;
use biome_html_syntax::{HtmlFileSource, HtmlRoot};
use biome_js_parser::{JsParserOptions, parse_js_with_cache};
// crates/biome_service/src/file_handlers/svelte.rs
use biome_html_parser::{parse_html_with_cache, HtmlParseOptions};
use biome_html_syntax::HtmlRoot;
use biome_js_parser::{JsParserOptions, parse_js_with_cache};
🤖 Prompt for AI Agents
In crates/biome_service/src/file_handlers/svelte.rs around lines 11 to 13, the
import list is wrong: parse_html_with_cache expects HtmlParseOptions (not
HtmlFileSource) and HtmlFileSource becomes unused. Replace the HtmlFileSource
import with HtmlParseOptions and remove the unused HtmlFileSource symbol from
the file; keep HtmlRoot and other imports as-is so parse_html_with_cache is
called with the correct type.

Comment on lines +48 to +55
let html_file_source = HtmlFileSource::default();
let parse = parse_html_with_cache(text, html_file_source, cache);
let html_root = HtmlRoot::cast(parse.syntax()).expect("Failed to cast to HtmlRoot");

// Extract embedded scripts using the HTML handler
html::extract_embedded_scripts(&html_root, cache)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix parse_html_with_cache call and avoid panicking on cast.

Arguments are in the wrong order and of the wrong types. The expect will panic on malformed HTML; prefer a graceful fallback.

-        let html_file_source = HtmlFileSource::default();
-        let parse = parse_html_with_cache(text, html_file_source, cache);
-        let html_root = HtmlRoot::cast(parse.syntax()).expect("Failed to cast to HtmlRoot");
-        
-        // Extract embedded scripts using the HTML handler
-        html::extract_embedded_scripts(&html_root, cache)
+        let parse = parse_html_with_cache(text, cache, HtmlParseOptions::default());
+        let Some(html_root) = HtmlRoot::cast(parse.syntax()) else {
+            debug!("HTML parse produced no root; falling back to regex");
+            return Vec::new();
+        };
+        html::extract_embedded_scripts(&html_root, cache)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let html_file_source = HtmlFileSource::default();
let parse = parse_html_with_cache(text, html_file_source, cache);
let html_root = HtmlRoot::cast(parse.syntax()).expect("Failed to cast to HtmlRoot");
// Extract embedded scripts using the HTML handler
html::extract_embedded_scripts(&html_root, cache)
}
let parse = parse_html_with_cache(text, cache, HtmlParseOptions::default());
let Some(html_root) = HtmlRoot::cast(parse.syntax()) else {
debug!("HTML parse produced no root; falling back to regex");
return Vec::new();
};
html::extract_embedded_scripts(&html_root, cache)
🤖 Prompt for AI Agents
In crates/biome_service/src/file_handlers/svelte.rs around lines 48–55, the call
to parse_html_with_cache passes arguments in the wrong order/types and then uses
expect which panics on malformed HTML; change the parse call to use the correct
parameter order and types (pass the cache and the HtmlFileSource in the expected
positions — e.g., parse_html_with_cache(cache, text, &html_file_source) or
parse_html_with_cache(cache, text, html_file_source) depending on the function
signature) and replace the expect with a non-panicking branch: attempt
HtmlRoot::cast(parse.syntax()) with an if let / match, and if the cast fails
return a safe fallback (for example an empty Vec or None) so
html::extract_embedded_scripts is only called with a valid &HtmlRoot.

Comment on lines +143 to +144
// If we found scripts, use the first one; otherwise fall back to regex approach
if let Some(first_script) = scripts.first() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Don’t drop additional <script> blocks; handle both instance and module scripts.

Only using the first script misses context="module" or additional blocks. At minimum prefer the instance script if present; ideally parse both and expose them to formatter/linter.

Short-term: pick the instance script when available; fall back to module. Long-term: model multiple embedded documents (module + instance) so lint/format can address both with proper source maps. Happy to sketch an API if useful.

🤖 Prompt for AI Agents
In crates/biome_service/src/file_handlers/svelte.rs around lines 143-144, the
code currently picks only the first <script> block from the parsed scripts which
drops additional blocks (notably the instance vs module script). Update the
logic to prefer the instance/script without context="module" when present but if
not present fall back to the module script; do not discard the other
block—return or attach both script bodies (module and instance) to the caller so
the formatter/linter can operate on each with correct source ranges; as a
short-term change implement a selection that searches scripts for one without
context="module" first and uses that, else uses the module, and as a follow-up
surface both scripts in the parsed result (e.g., add fields for instance_script
and module_script) so future formatting/linting can handle both with proper
source maps.

@gagansuie gagansuie closed this Sep 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Project Area: project

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants