Skip to content

feat(css): implement shared CSS scanner for parsing SCSS interpolations and URL bodies#9948

Merged
denbezrukov merged 13 commits intomainfrom
db/scss-interpolation-4
Apr 16, 2026
Merged

feat(css): implement shared CSS scanner for parsing SCSS interpolations and URL bodies#9948
denbezrukov merged 13 commits intomainfrom
db/scss-interpolation-4

Conversation

@denbezrukov
Copy link
Copy Markdown
Contributor

This PR was created with AI assistance (Codex).

Summary

Fix SCSS url(...) parsing for interpolation-based function names and simplify the shared lexer scan
layer.

The main bug was that valid SCSS function-call shapes inside url(...) were not preserved correctly. The
lexer could classify the URL body too early, and the parser only handled plain identifier-based function
heads there, so interpolation-based names were lost or rejected.

Now we parse these SCSS forms correctly:

a {
  expr4: url(foo#{1 + 1}(bar));
  expr5: url(#{name}(bar));
  expr6: url(#{prefix}-#{suffix}(bar));

  modifier-interpolated-mid: url("a.png" foo#{1 + 1}(bar));
  modifier-interpolated-start: url("a.png" #{format}(woff2));
}

This PR also refactors the CSS lexer scan layer so source navigation and CSS-aware scanning are shared
through SourceCursor and CssScanCursor instead of being duplicated between lexer and lookahead code.

Test Plan

  • cargo test -p biome_css_parser

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 12, 2026

⚠️ No Changeset found

Latest commit: ee8b3a3

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions bot added A-Parser Area: parser L-CSS Language: CSS and super languages labels Apr 12, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 12, 2026

Walkthrough

Extracts byte/char navigation into a new SourceCursor and centralises speculative, non‑committing lexing in CssScanCursor (identifier scans, string bodies, SCSS interpolation, URL-body classification and raw-value scans). Refactors CssLexer to hold a SourceCursor and delegate byte/position operations and mutation to it. Replaces in‑lexer string and raw‑URL loops with scan‑then‑commit flows, adds CssLexContext::UrlBody { scss_exclusive_syntax_allowed }, and introduces an is_at_scss_interpolated_function lexer lookahead used by parser URL handling and re‑exports.

Possibly related PRs

Suggested reviewers

  • ematipico
  • dyc3
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarises the main change—introducing a shared CSS scanner for SCSS interpolations and URL body parsing—which aligns with the refactoring described in the raw summary.
Description check ✅ Passed The description clearly outlines the bug fix for SCSS url(...) parsing with interpolation-based function names and explains the refactoring of the lexer scan layer into SourceCursor and CssScanCursor.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch db/scss-interpolation-4

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/biome_css_parser/src/lexer/scan_cursor.rs (1)

421-427: Prefer lookup_byte for whitespace classification here.

This brings back a hand-rolled whitespace set in the middle of a scanner that otherwise already leans on lookup_byte. Using the dispatch table keeps the classification path consistent and saves one more byte list to babysit.

Based on learnings, "In the biomejs/biome codebase, when handling whitespace and character classification, use the whitespace/character classification dispatch tables from biome_unicode_table::lookup_byte (e.g., Dispatch::WHS) rather than ad-hoc raw byte literal comparisons (e.g., b <= b' ' or explicit b == b'\t')."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_css_parser/src/lexer/scan_cursor.rs` around lines 421 - 427, In
skip_whitespace_and_block_comments, replace the ad‑hoc byte comparisons (the
matches!(byte, b'\t' | b' ' | b'\n' | b'\r' | 0x0C) loop) with the
biome_unicode_table::lookup_byte dispatch table check: call lookup_byte on the
current byte and test for Dispatch::WHS to classify whitespace, then advance as
before; this keeps classification consistent with other scanner logic and
removes the hand‑rolled byte set.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_css_parser/src/lexer/scan_cursor.rs`:
- Around line 160-163: The escape-handling logic needs to treat form-feed
(U+000C) as a newline-class character: update is_valid_escape_at and
consume_string_escape to include '\u{000C}' (0x0C) alongside LF ('\n') and CR
('\r') in the newline checks so sequences like "\\\u{000C}" are not treated as
ordinary escapes; apply the same change to the other escape-related checks in
the same file (around the 310-346 region) so all places that special-case LF/CR
also consider form-feed as newline-class.

---

Nitpick comments:
In `@crates/biome_css_parser/src/lexer/scan_cursor.rs`:
- Around line 421-427: In skip_whitespace_and_block_comments, replace the ad‑hoc
byte comparisons (the matches!(byte, b'\t' | b' ' | b'\n' | b'\r' | 0x0C) loop)
with the biome_unicode_table::lookup_byte dispatch table check: call lookup_byte
on the current byte and test for Dispatch::WHS to classify whitespace, then
advance as before; this keeps classification consistent with other scanner logic
and removes the hand‑rolled byte set.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9737d259-0733-4bdf-9fc5-479e4b08f996

📥 Commits

Reviewing files that changed from the base of the PR and between 9956f1d and 365f16f.

⛔ Files ignored due to path filters (2)
  • crates/biome_css_parser/tests/css_test_suite/ok/scss/value/interpolated-minus-line-break.scss.snap is excluded by !**/*.snap and included by **
  • crates/biome_css_parser/tests/css_test_suite/ok/scss/value/url.scss.snap is excluded by !**/*.snap and included by **
📒 Files selected for processing (13)
  • crates/biome_css_parser/src/lexer/mod.rs
  • crates/biome_css_parser/src/lexer/scan_cursor.rs
  • crates/biome_css_parser/src/lexer/source_cursor.rs
  • crates/biome_css_parser/src/lexer/tests.rs
  • crates/biome_css_parser/src/syntax/scss/mod.rs
  • crates/biome_css_parser/src/syntax/scss/value/function.rs
  • crates/biome_css_parser/src/syntax/scss/value/mod.rs
  • crates/biome_css_parser/src/syntax/value/function/call.rs
  • crates/biome_css_parser/src/syntax/value/function/mod.rs
  • crates/biome_css_parser/src/syntax/value/url.rs
  • crates/biome_css_parser/src/token_source.rs
  • crates/biome_css_parser/tests/css_test_suite/ok/scss/value/interpolated-minus-line-break.scss
  • crates/biome_css_parser/tests/css_test_suite/ok/scss/value/url.scss
💤 Files with no reviewable changes (1)
  • crates/biome_css_parser/src/syntax/value/function/call.rs

Comment thread crates/biome_css_parser/src/lexer/scan_cursor.rs
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Apr 12, 2026

Merging this PR will not alter performance

✅ 29 untouched benchmarks
⏩ 227 skipped benchmarks1


Comparing db/scss-interpolation-4 (ee8b3a3) with main (9956f1d)

Open in CodSpeed

Footnotes

  1. 227 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@github-actions github-actions bot added the A-Formatter Area: formatter label Apr 12, 2026
@denbezrukov denbezrukov force-pushed the db/scss-interpolation-4 branch from 3c63dc4 to 58f624b Compare April 12, 2026 20:52
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
crates/biome_css_parser/src/lexer/source_cursor.rs (1)

93-96: ⚠️ Potential issue | 🟡 Minor

Treat form-feed as a newline-class escape terminator here too.

is_valid_escape_at() still accepts \\\u{000C} as a normal escape. That now leaks through every scanner that delegates to SourceCursor, including identifier and raw-URL scans.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_css_parser/src/lexer/source_cursor.rs` around lines 93 - 96,
is_valid_escape_at currently treats a backslash followed by form-feed (U+000C)
as a valid escape; update the function to treat form-feed as a newline-class
terminator so such escapes are invalid. In the is_valid_escape_at(&self, offset:
usize) function, include the form-feed byte (b'\x0C') alongside b'\n' and b'\r'
in the matches!() check (and keep the existing None check) so sequences like
"\\\u{000C}" are rejected by SourceCursor and won't leak into identifier/raw-URL
scanners.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_css_parser/src/lexer/scan_cursor.rs`:
- Around line 785-794: scan_url_body_start() currently only classifies
interpolated functions via self.cursor.is_at_scss_interpolated_function(),
causing ordinary function heads (e.g. foo(...)) to be tokenised as raw values;
add an early check for ordinary function heads by calling a cursor predicate
like self.cursor.is_at_function_head() (or self.cursor.is_at_function()) and
return the appropriate UrlBodyStartScan variant (e.g. UrlBodyStartScan::Function
or UrlBodyStartScan::FunctionHead); if that UrlBodyStartScan variant doesn't
exist yet, add it and use it here before falling back to scan_url_raw_value().

In `@crates/biome_css_parser/src/syntax/value/url.rs`:
- Around line 88-103: The new URL parsing branches around
is_at_nth_url_modifier_function and the conditional bump with
CssLexContext::UrlBody (used by parse_url_value and
UrlModifierList::new(...).parse_list) need parser fixtures exercising both valid
and invalid cases; add snapshot tests that cover: valid interpolated heads and
modifier functions (e.g. url(#{name}(bar)), url(foo#{1 + 1}(bar))) and
malformed/interpolated-head errors (e.g. broken/missing interpolation inside the
function head), plus cases where the plain '(' branch vs UrlBody branch is
taken, placing them alongside existing parser tests for URL values so
AST/recovery behavior is asserted for both success and error snapshots.

---

Duplicate comments:
In `@crates/biome_css_parser/src/lexer/source_cursor.rs`:
- Around line 93-96: is_valid_escape_at currently treats a backslash followed by
form-feed (U+000C) as a valid escape; update the function to treat form-feed as
a newline-class terminator so such escapes are invalid. In the
is_valid_escape_at(&self, offset: usize) function, include the form-feed byte
(b'\x0C') alongside b'\n' and b'\r' in the matches!() check (and keep the
existing None check) so sequences like "\\\u{000C}" are rejected by SourceCursor
and won't leak into identifier/raw-URL scanners.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d148a007-1f04-4640-908b-8355422e5871

📥 Commits

Reviewing files that changed from the base of the PR and between b058d22 and 58f624b.

⛔ Files ignored due to path filters (5)
  • crates/biome_css_formatter/tests/specs/css/scss/at-rule/container-general-enclosed.scss.snap is excluded by !**/*.snap and included by **
  • crates/biome_css_formatter/tests/specs/css/scss/at-rule/supports-general-enclosed.scss.snap is excluded by !**/*.snap and included by **
  • crates/biome_css_formatter/tests/specs/css/scss/declaration/url-interpolated-functions.scss.snap is excluded by !**/*.snap and included by **
  • crates/biome_css_parser/tests/css_test_suite/ok/scss/value/interpolated-minus-line-break.scss.snap is excluded by !**/*.snap and included by **
  • crates/biome_css_parser/tests/css_test_suite/ok/scss/value/url.scss.snap is excluded by !**/*.snap and included by **
📒 Files selected for processing (15)
  • crates/biome_css_formatter/src/css/auxiliary/url_function.rs
  • crates/biome_css_formatter/tests/specs/css/scss/declaration/url-interpolated-functions.scss
  • crates/biome_css_parser/src/lexer/mod.rs
  • crates/biome_css_parser/src/lexer/scan_cursor.rs
  • crates/biome_css_parser/src/lexer/source_cursor.rs
  • crates/biome_css_parser/src/lexer/tests.rs
  • crates/biome_css_parser/src/syntax/scss/mod.rs
  • crates/biome_css_parser/src/syntax/scss/value/function.rs
  • crates/biome_css_parser/src/syntax/scss/value/mod.rs
  • crates/biome_css_parser/src/syntax/value/function/call.rs
  • crates/biome_css_parser/src/syntax/value/function/mod.rs
  • crates/biome_css_parser/src/syntax/value/url.rs
  • crates/biome_css_parser/src/token_source.rs
  • crates/biome_css_parser/tests/css_test_suite/ok/scss/value/interpolated-minus-line-break.scss
  • crates/biome_css_parser/tests/css_test_suite/ok/scss/value/url.scss
💤 Files with no reviewable changes (1)
  • crates/biome_css_parser/src/syntax/value/function/call.rs
✅ Files skipped from review due to trivial changes (2)
  • crates/biome_css_parser/tests/css_test_suite/ok/scss/value/interpolated-minus-line-break.scss
  • crates/biome_css_formatter/tests/specs/css/scss/declaration/url-interpolated-functions.scss
🚧 Files skipped from review as they are similar to previous changes (7)
  • crates/biome_css_parser/src/token_source.rs
  • crates/biome_css_formatter/src/css/auxiliary/url_function.rs
  • crates/biome_css_parser/src/syntax/scss/mod.rs
  • crates/biome_css_parser/src/syntax/scss/value/function.rs
  • crates/biome_css_parser/tests/css_test_suite/ok/scss/value/url.scss
  • crates/biome_css_parser/src/syntax/value/function/mod.rs
  • crates/biome_css_parser/src/syntax/scss/value/mod.rs

Comment thread crates/biome_css_parser/src/lexer/scan_cursor.rs
Comment thread crates/biome_css_parser/src/syntax/value/url.rs
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
crates/biome_css_parser/src/lexer/scan_cursor.rs (2)

851-859: ⚠️ Potential issue | 🟠 Major

Keep plain function heads out of the raw-URL path.

After skip_url_body_trivia(), a regular foo( still falls straight through to scan_url_raw_value(), so inputs like url(foo(bar)) or url(/*x*/foo(bar)) get tokenised as raw URL content and the parser never sees the function head. A plain identifier-function check needs to run before the raw scan and return Other.

Possible shape
+    fn is_at_plain_function(&self) -> bool {
+        let mut cursor = self.cursor;
+        if !cursor.is_ident_start() {
+            return false;
+        }
+
+        cursor.advance_ident_sequence(IdentifierScanMode::Standard);
+        cursor.current_byte() == Some(b'(')
+    }
+
     fn scan_url_body_start(&mut self, scss_exclusive_syntax_allowed: bool) -> UrlBodyStartScan {
         self.cursor.skip_url_body_trivia();
 
-        if scss_exclusive_syntax_allowed && self.cursor.is_at_scss_interpolated_function() {
+        if self.is_at_plain_function() {
+            UrlBodyStartScan::Other
+        } else if scss_exclusive_syntax_allowed && self.cursor.is_at_scss_interpolated_function() {
             UrlBodyStartScan::InterpolatedFunction
         } else {
             self.scan_url_raw_value()
                 .map_or(UrlBodyStartScan::Other, UrlBodyStartScan::RawValue)
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_css_parser/src/lexer/scan_cursor.rs` around lines 851 - 859,
After calling self.cursor.skip_url_body_trivia() in scan_url_body_start, add a
check for a plain identifier followed by '(' (i.e. a non-interpolated function
head) before calling scan_url_raw_value: if the cursor indicates a simple
identifier-function (distinct from
self.cursor.is_at_scss_interpolated_function()), return UrlBodyStartScan::Other
so the parser will see the function head; otherwise proceed to call
self.scan_url_raw_value() and map to UrlBodyStartScan::RawValue as before.
Ensure you only short-circuit for plain identifier-function heads and keep the
existing interpolated-function branch that returns
UrlBodyStartScan::InterpolatedFunction.

380-392: ⚠️ Potential issue | 🟡 Minor

Treat form-feed as newline-class in string scanning.

\\\u{000C} still falls through the ordinary ASCII-escape branch, and a bare \u{000C} inside a quoted string does not stop with StringBodyScanStop::Newline. That lets malformed strings scan past their real boundary.

Small fix
         match self.current_byte() {
-            Some(b'\n') => {
+            Some(b'\n' | b'\x0C') => {
                 self.advance(1);
                 false
             }
             Some(b'\r') => {
                 let len = if self.peek_byte() == Some(b'\n') {
@@
-                WHS if matches!(byte, b'\n' | b'\r') => {
+                WHS if matches!(byte, b'\n' | b'\r' | b'\x0C') => {
                     let len = if byte == b'\r' && self.cursor.peek_byte() == Some(b'\n') {
                         2
                     } else {
                         1
                     };

Also applies to: 657-668

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_css_parser/src/lexer/scan_cursor.rs` around lines 380 - 392, The
string-body scanning branch that matches self.current_byte() for newline
handling must also treat form-feed (0x0C) as newline-class so bare '\u{000C}'
stops with StringBodyScanStop::Newline; add a case for Some(b'\x0C') that
advances by 1 and returns the same false/newline behavior as the b'\n' arm.
Apply the same change to the duplicate match block around the 657-668 region so
both occurrences handle form-feed consistently. Ensure references to the match
on self.current_byte() and the StringBodyScanStop::Newline behavior are updated
accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@crates/biome_css_parser/src/lexer/scan_cursor.rs`:
- Around line 851-859: After calling self.cursor.skip_url_body_trivia() in
scan_url_body_start, add a check for a plain identifier followed by '(' (i.e. a
non-interpolated function head) before calling scan_url_raw_value: if the cursor
indicates a simple identifier-function (distinct from
self.cursor.is_at_scss_interpolated_function()), return UrlBodyStartScan::Other
so the parser will see the function head; otherwise proceed to call
self.scan_url_raw_value() and map to UrlBodyStartScan::RawValue as before.
Ensure you only short-circuit for plain identifier-function heads and keep the
existing interpolated-function branch that returns
UrlBodyStartScan::InterpolatedFunction.
- Around line 380-392: The string-body scanning branch that matches
self.current_byte() for newline handling must also treat form-feed (0x0C) as
newline-class so bare '\u{000C}' stops with StringBodyScanStop::Newline; add a
case for Some(b'\x0C') that advances by 1 and returns the same false/newline
behavior as the b'\n' arm. Apply the same change to the duplicate match block
around the 657-668 region so both occurrences handle form-feed consistently.
Ensure references to the match on self.current_byte() and the
StringBodyScanStop::Newline behavior are updated accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8bd12e01-a47a-437a-8fc9-ba4a95a625e3

📥 Commits

Reviewing files that changed from the base of the PR and between 7be6d4c and ce47131.

📒 Files selected for processing (3)
  • crates/biome_css_parser/src/lexer/mod.rs
  • crates/biome_css_parser/src/lexer/scan_cursor.rs
  • crates/biome_css_parser/src/lexer/tests.rs
✅ Files skipped from review due to trivial changes (1)
  • crates/biome_css_parser/src/lexer/mod.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/biome_css_parser/src/lexer/tests.rs

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
crates/biome_css_parser/src/lexer/scan_cursor.rs (1)

856-864: ⚠️ Potential issue | 🟠 Major

Keep non-interpolated function heads out of raw URL scanning.

Line 859 only special-cases interpolation-based heads. url(foo(bar)) and url(/*x*/foo(bar)) still fall through to scan_url_raw_value(), so the parser never sees foo(. Please return UrlBodyStartScan::Other for ordinary function heads before attempting raw scanning.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_css_parser/src/lexer/scan_cursor.rs` around lines 856 - 864, The
scanner currently only treats SCSS-interpolated function heads specially in
scan_url_body_start, so ordinary function heads like foo( or foo(/*...*/) fall
through to scan_url_raw_value; change scan_url_body_start to detect
non-interpolated function heads (i.e., an identifier/name followed by an opening
parenthesis possibly separated by URL-body trivia/comments) using the cursor API
and, when such an ordinary function head is found, return
UrlBodyStartScan::Other instead of calling scan_url_raw_value; keep the existing
special-case for self.cursor.is_at_scss_interpolated_function() that returns
UrlBodyStartScan::InterpolatedFunction and otherwise preserve the RawValue path
for true raw values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_css_parser/src/lexer/mod.rs`:
- Around line 1284-1316: The rewind for `-*` inside consume_ident_sequence is
currently gated on the parser option
`self.options.is_tailwind_directives_enabled()` which leaks parser state into
the lexer; change this so the lexer behavior is driven by lexer-local context
instead: add a lexer-context flag or an explicit parameter (e.g., a new bool
like `allow_tailwind_rewind` passed into or stored on the lexer) and replace the
check `if self.options.is_tailwind_directives_enabled() && scan.stop_byte ==
Some(b'*')` with a check against that lexer-local flag and `scan.stop_byte ==
Some(b'*')`; keep the existing rewind logic (inspect `byte_before`, adjust
`position` and `count`, call `self.set_position`) but do not reference
`self.options` from inside `consume_ident_sequence`.

---

Duplicate comments:
In `@crates/biome_css_parser/src/lexer/scan_cursor.rs`:
- Around line 856-864: The scanner currently only treats SCSS-interpolated
function heads specially in scan_url_body_start, so ordinary function heads like
foo( or foo(/*...*/) fall through to scan_url_raw_value; change
scan_url_body_start to detect non-interpolated function heads (i.e., an
identifier/name followed by an opening parenthesis possibly separated by
URL-body trivia/comments) using the cursor API and, when such an ordinary
function head is found, return UrlBodyStartScan::Other instead of calling
scan_url_raw_value; keep the existing special-case for
self.cursor.is_at_scss_interpolated_function() that returns
UrlBodyStartScan::InterpolatedFunction and otherwise preserve the RawValue path
for true raw values.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 307ef4ee-c693-4a0b-bfde-1fca3065e89b

📥 Commits

Reviewing files that changed from the base of the PR and between 7be6d4c and 09d7d5d.

⛔ Files ignored due to path filters (1)
  • crates/biome_css_parser/tests/css_test_suite/error/scss/value/url-interpolated-function-name.scss.snap is excluded by !**/*.snap and included by **
📒 Files selected for processing (5)
  • crates/biome_css_parser/src/lexer/mod.rs
  • crates/biome_css_parser/src/lexer/scan_cursor.rs
  • crates/biome_css_parser/src/lexer/source_cursor.rs
  • crates/biome_css_parser/src/lexer/tests.rs
  • crates/biome_css_parser/tests/css_test_suite/error/scss/value/url-interpolated-function-name.scss
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/biome_css_parser/src/lexer/source_cursor.rs

Comment thread crates/biome_css_parser/src/lexer/mod.rs
Copy link
Copy Markdown
Member

@ematipico ematipico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I have difficulty understanding the changes, however if we don't any sort of regressions, it should be safe to merge.

Great work!!

Comment thread crates/biome_css_parser/src/lexer/scan_cursor.rs Outdated
Comment thread crates/biome_css_parser/src/lexer/scan_cursor.rs Outdated
Comment thread crates/biome_css_parser/src/lexer/scan_cursor.rs Outdated
Comment thread crates/biome_css_parser/src/lexer/scan_cursor.rs Outdated
Comment thread crates/biome_css_parser/src/lexer/scan_cursor.rs Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
crates/biome_css_parser/src/lexer/scan_cursor.rs (1)

860-868: ⚠️ Potential issue | 🟠 Major

Don’t swallow plain function heads as raw URL text.

After trivia, this fast-path only exempts interpolated heads. Plain foo(bar) / format(woff2) still drop into scan_url_raw_value(), so the parser never sees the pre-existing plain function-head path. Please short-circuit ordinary function heads here as well, and pin it with a trivia-prefixed regression before this scanner gets any more clever than the rest of us.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_css_parser/src/lexer/scan_cursor.rs` around lines 860 - 868,
After skipping trivia in scan_url_body_start, short-circuit ordinary
(non-interpolated) function heads so they are not treated as RawValue: add a
check after self.cursor.skip_url_body_trivia() (and after the existing
is_at_scss_interpolated_function check) that calls a cursor helper like
self.cursor.is_at_plain_function_head() (create it if missing to detect patterns
like identifier immediately followed by '(' with only trivia between) and if
true return UrlBodyStartScan::Other; otherwise fall through to the existing
self.scan_url_raw_value() path. This preserves the interpolated fast-path
(is_at_scss_interpolated_function), prevents swallowing plain
foo(bar)/format(woff2) as raw text, and keeps the current RawValue logic intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_css_parser/src/lexer/scan_cursor.rs`:
- Around line 466-469: consume_line_comment currently treats only LF/CR as line
terminators; include form-feed by reusing the existing newline-class helper:
replace the match on current in consume_line_comment() with a check using
is_newline_byte(current) to return true for any newline-class byte (including
b'\x0C'), otherwise call advance_byte_or_char(current); reference functions:
consume_line_comment, is_newline_byte, advance_byte_or_char, current_byte.

---

Duplicate comments:
In `@crates/biome_css_parser/src/lexer/scan_cursor.rs`:
- Around line 860-868: After skipping trivia in scan_url_body_start,
short-circuit ordinary (non-interpolated) function heads so they are not treated
as RawValue: add a check after self.cursor.skip_url_body_trivia() (and after the
existing is_at_scss_interpolated_function check) that calls a cursor helper like
self.cursor.is_at_plain_function_head() (create it if missing to detect patterns
like identifier immediately followed by '(' with only trivia between) and if
true return UrlBodyStartScan::Other; otherwise fall through to the existing
self.scan_url_raw_value() path. This preserves the interpolated fast-path
(is_at_scss_interpolated_function), prevents swallowing plain
foo(bar)/format(woff2) as raw text, and keeps the current RawValue logic intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c6a6a7f3-cac0-42fd-ab6c-73d7fdade477

📥 Commits

Reviewing files that changed from the base of the PR and between 09d7d5d and 1fc4f29.

📒 Files selected for processing (3)
  • crates/biome_css_parser/src/lexer/mod.rs
  • crates/biome_css_parser/src/lexer/scan_cursor.rs
  • crates/biome_css_parser/src/lexer/tests.rs
✅ Files skipped from review due to trivial changes (1)
  • crates/biome_css_parser/src/lexer/mod.rs

Comment on lines +466 to +469
while let Some(current) = self.current_byte() {
match current {
b'\n' | b'\r' => return true,
_ => self.advance_byte_or_char(current),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Form-feed still doesn’t terminate // comments.

consume_line_comment() only stops on LF/CR. With // comment\u{000C}#{name}(bar), the scanner treats the comment as running to EOF, so both URL-trivia skipping and interpolation scanning miss the following tokens. Reusing is_newline_byte() here would keep FF aligned with the rest of this file’s newline-class handling.

🩹 Proposed fix
         while let Some(current) = self.current_byte() {
             match current {
-                b'\n' | b'\r' => return true,
+                byte if is_newline_byte(byte) => return true,
                 _ => self.advance_byte_or_char(current),
             }
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
while let Some(current) = self.current_byte() {
match current {
b'\n' | b'\r' => return true,
_ => self.advance_byte_or_char(current),
while let Some(current) = self.current_byte() {
match current {
byte if is_newline_byte(byte) => return true,
_ => self.advance_byte_or_char(current),
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_css_parser/src/lexer/scan_cursor.rs` around lines 466 - 469,
consume_line_comment currently treats only LF/CR as line terminators; include
form-feed by reusing the existing newline-class helper: replace the match on
current in consume_line_comment() with a check using is_newline_byte(current) to
return true for any newline-class byte (including b'\x0C'), otherwise call
advance_byte_or_char(current); reference functions: consume_line_comment,
is_newline_byte, advance_byte_or_char, current_byte.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_css_parser/src/lexer/tests.rs`:
- Around line 120-146: Extend the checkpoint test to cover stateful scanner
state: create a lexer via CssLexer::from_str with an input that triggers a
multi-byte/stateful scan (e.g., a quoted string or a raw/url with spaces), call
next_token(CssLexContext::Regular) until you are mid-way through that stateful
scan, call checkpoint(), consume forward to produce tokens/ranges, call
rewind(checkpoint) and then assert that position(), current_byte(),
next_token(CssLexContext::Regular) and current_range() return the same token and
range as before rewinding; use the existing helpers (CssLexer::from_str,
next_token, checkpoint, rewind, position, current_byte, current_range) and
mirror the assert pattern from rewind_restores_lexer_source_cursor_position but
when checkpointing inside a stateful scan to verify scanner internal pending
state is restored.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 17b7a875-dc23-452a-b6b6-eb4f1a9c8c14

📥 Commits

Reviewing files that changed from the base of the PR and between 1fc4f29 and 4a7e808.

📒 Files selected for processing (2)
  • crates/biome_css_parser/src/lexer/scan_cursor.rs
  • crates/biome_css_parser/src/lexer/tests.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/biome_css_parser/src/lexer/scan_cursor.rs

Comment thread crates/biome_css_parser/src/lexer/tests.rs
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
crates/biome_css_parser/src/lexer/scan_cursor.rs (2)

448-451: ⚠️ Potential issue | 🟡 Minor

Treat form-feed as a line terminator in // comments.

Line 450 currently stops only on LF/CR, so //...\u{000C} runs to EOF and can hide following tokens.

🩹 Proposed fix
         while let Some(current) = self.current_byte() {
             match current {
-                b'\n' | b'\r' => return true,
+                byte if is_newline_byte(byte) => return true,
                 _ => self.advance_byte_or_char(current),
             }
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_css_parser/src/lexer/scan_cursor.rs` around lines 448 - 451, The
line-comment scanning loop in scan_cursor (the while-let using current_byte()
and advance_byte_or_char) only treats LF and CR as terminators; add form-feed as
a third terminator so `//` comments stop on U+000C as well (i.e., include
b'\x0C' or b'\f' in the match arm alongside b'\n' and b'\r') to ensure `//...␌`
doesn't run to EOF and hide following tokens.

845-849: ⚠️ Potential issue | 🟠 Major

Plain function heads after skipped trivia still fall into raw URL scanning.

At Line 845-849, only interpolated function heads bypass raw-value scanning. Inputs like url(/*x*/foo(bar)) still become raw literals, so the parser never sees foo( as a function head.

🧭 Suggested direction
     fn scan_url_body_start(&mut self, scss_exclusive_syntax_allowed: bool) -> UrlBodyStartScan {
         self.cursor.skip_url_body_trivia();

         if scss_exclusive_syntax_allowed && self.cursor.is_at_scss_interpolated_function() {
             UrlBodyStartScan::InterpolatedFunction
+        } else if self.is_at_plain_function_head() {
+            UrlBodyStartScan::Other
         } else {
             self.scan_url_raw_value()
                 .map_or(UrlBodyStartScan::Other, UrlBodyStartScan::RawValue)
         }
     }

And add a small lookahead helper in UrlBodyScanner:

fn is_at_plain_function_head(&self) -> bool {
    let mut lookahead = self.cursor;
    if !lookahead.is_ident_start() {
        return false;
    }
    lookahead.advance_ident_sequence();
    lookahead.current_byte() == Some(b'(')
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_css_parser/src/lexer/scan_cursor.rs` around lines 845 - 849, The
scan currently only treats scss interpolated function heads as functions,
causing inputs like `url(/*x*/foo(bar))` to be mis-scanned as raw; add a small
lookahead helper on UrlBodyScanner (e.g., fn is_at_plain_function_head(&self) ->
bool) that clones the cursor, skips the same trivia/commments you skip before
scanning, then checks lookahead.is_ident_start(), calls
lookahead.advance_ident_sequence(), and returns lookahead.current_byte() ==
Some(b'('); then update the condition at the UrlBodyStartScan decision site to
treat plain function heads as UrlBodyStartScan::InterpolatedFunction (or the
appropriate variant) when either is_at_scss_interpolated_function() or
is_at_plain_function_head() is true, otherwise fall back to scan_url_raw_value()
-> UrlBodyStartScan::RawValue/Other.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@crates/biome_css_parser/src/lexer/scan_cursor.rs`:
- Around line 448-451: The line-comment scanning loop in scan_cursor (the
while-let using current_byte() and advance_byte_or_char) only treats LF and CR
as terminators; add form-feed as a third terminator so `//` comments stop on
U+000C as well (i.e., include b'\x0C' or b'\f' in the match arm alongside b'\n'
and b'\r') to ensure `//...␌` doesn't run to EOF and hide following tokens.
- Around line 845-849: The scan currently only treats scss interpolated function
heads as functions, causing inputs like `url(/*x*/foo(bar))` to be mis-scanned
as raw; add a small lookahead helper on UrlBodyScanner (e.g., fn
is_at_plain_function_head(&self) -> bool) that clones the cursor, skips the same
trivia/commments you skip before scanning, then checks
lookahead.is_ident_start(), calls lookahead.advance_ident_sequence(), and
returns lookahead.current_byte() == Some(b'('); then update the condition at the
UrlBodyStartScan decision site to treat plain function heads as
UrlBodyStartScan::InterpolatedFunction (or the appropriate variant) when either
is_at_scss_interpolated_function() or is_at_plain_function_head() is true,
otherwise fall back to scan_url_raw_value() -> UrlBodyStartScan::RawValue/Other.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f77f6c08-25cc-4332-a789-0db38e703233

📥 Commits

Reviewing files that changed from the base of the PR and between 4a7e808 and ee8b3a3.

📒 Files selected for processing (1)
  • crates/biome_css_parser/src/lexer/scan_cursor.rs

@denbezrukov denbezrukov merged commit 6a19cea into main Apr 16, 2026
28 checks passed
@denbezrukov denbezrukov deleted the db/scss-interpolation-4 branch April 16, 2026 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Formatter Area: formatter A-Parser Area: parser L-CSS Language: CSS and super languages

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants