feat(css): implement shared CSS scanner for parsing SCSS interpolations and URL bodies#9948
feat(css): implement shared CSS scanner for parsing SCSS interpolations and URL bodies#9948denbezrukov merged 13 commits intomainfrom
Conversation
|
WalkthroughExtracts byte/char navigation into a new Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
crates/biome_css_parser/src/lexer/scan_cursor.rs (1)
421-427: Preferlookup_bytefor whitespace classification here.This brings back a hand-rolled whitespace set in the middle of a scanner that otherwise already leans on
lookup_byte. Using the dispatch table keeps the classification path consistent and saves one more byte list to babysit.Based on learnings, "In the biomejs/biome codebase, when handling whitespace and character classification, use the whitespace/character classification dispatch tables from biome_unicode_table::lookup_byte (e.g.,
Dispatch::WHS) rather than ad-hoc raw byte literal comparisons (e.g.,b <= b' 'or explicitb == b'\t')."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_css_parser/src/lexer/scan_cursor.rs` around lines 421 - 427, In skip_whitespace_and_block_comments, replace the ad‑hoc byte comparisons (the matches!(byte, b'\t' | b' ' | b'\n' | b'\r' | 0x0C) loop) with the biome_unicode_table::lookup_byte dispatch table check: call lookup_byte on the current byte and test for Dispatch::WHS to classify whitespace, then advance as before; this keeps classification consistent with other scanner logic and removes the hand‑rolled byte set.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/biome_css_parser/src/lexer/scan_cursor.rs`:
- Around line 160-163: The escape-handling logic needs to treat form-feed
(U+000C) as a newline-class character: update is_valid_escape_at and
consume_string_escape to include '\u{000C}' (0x0C) alongside LF ('\n') and CR
('\r') in the newline checks so sequences like "\\\u{000C}" are not treated as
ordinary escapes; apply the same change to the other escape-related checks in
the same file (around the 310-346 region) so all places that special-case LF/CR
also consider form-feed as newline-class.
---
Nitpick comments:
In `@crates/biome_css_parser/src/lexer/scan_cursor.rs`:
- Around line 421-427: In skip_whitespace_and_block_comments, replace the ad‑hoc
byte comparisons (the matches!(byte, b'\t' | b' ' | b'\n' | b'\r' | 0x0C) loop)
with the biome_unicode_table::lookup_byte dispatch table check: call lookup_byte
on the current byte and test for Dispatch::WHS to classify whitespace, then
advance as before; this keeps classification consistent with other scanner logic
and removes the hand‑rolled byte set.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9737d259-0733-4bdf-9fc5-479e4b08f996
⛔ Files ignored due to path filters (2)
crates/biome_css_parser/tests/css_test_suite/ok/scss/value/interpolated-minus-line-break.scss.snapis excluded by!**/*.snapand included by**crates/biome_css_parser/tests/css_test_suite/ok/scss/value/url.scss.snapis excluded by!**/*.snapand included by**
📒 Files selected for processing (13)
crates/biome_css_parser/src/lexer/mod.rscrates/biome_css_parser/src/lexer/scan_cursor.rscrates/biome_css_parser/src/lexer/source_cursor.rscrates/biome_css_parser/src/lexer/tests.rscrates/biome_css_parser/src/syntax/scss/mod.rscrates/biome_css_parser/src/syntax/scss/value/function.rscrates/biome_css_parser/src/syntax/scss/value/mod.rscrates/biome_css_parser/src/syntax/value/function/call.rscrates/biome_css_parser/src/syntax/value/function/mod.rscrates/biome_css_parser/src/syntax/value/url.rscrates/biome_css_parser/src/token_source.rscrates/biome_css_parser/tests/css_test_suite/ok/scss/value/interpolated-minus-line-break.scsscrates/biome_css_parser/tests/css_test_suite/ok/scss/value/url.scss
💤 Files with no reviewable changes (1)
- crates/biome_css_parser/src/syntax/value/function/call.rs
Merging this PR will not alter performance
Comparing Footnotes
|
…ns and URL bodies
…rpolated expressions
…general-enclosed at-rules
3c63dc4 to
58f624b
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
crates/biome_css_parser/src/lexer/source_cursor.rs (1)
93-96:⚠️ Potential issue | 🟡 MinorTreat form-feed as a newline-class escape terminator here too.
is_valid_escape_at()still accepts\\\u{000C}as a normal escape. That now leaks through every scanner that delegates toSourceCursor, including identifier and raw-URL scans.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_css_parser/src/lexer/source_cursor.rs` around lines 93 - 96, is_valid_escape_at currently treats a backslash followed by form-feed (U+000C) as a valid escape; update the function to treat form-feed as a newline-class terminator so such escapes are invalid. In the is_valid_escape_at(&self, offset: usize) function, include the form-feed byte (b'\x0C') alongside b'\n' and b'\r' in the matches!() check (and keep the existing None check) so sequences like "\\\u{000C}" are rejected by SourceCursor and won't leak into identifier/raw-URL scanners.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/biome_css_parser/src/lexer/scan_cursor.rs`:
- Around line 785-794: scan_url_body_start() currently only classifies
interpolated functions via self.cursor.is_at_scss_interpolated_function(),
causing ordinary function heads (e.g. foo(...)) to be tokenised as raw values;
add an early check for ordinary function heads by calling a cursor predicate
like self.cursor.is_at_function_head() (or self.cursor.is_at_function()) and
return the appropriate UrlBodyStartScan variant (e.g. UrlBodyStartScan::Function
or UrlBodyStartScan::FunctionHead); if that UrlBodyStartScan variant doesn't
exist yet, add it and use it here before falling back to scan_url_raw_value().
In `@crates/biome_css_parser/src/syntax/value/url.rs`:
- Around line 88-103: The new URL parsing branches around
is_at_nth_url_modifier_function and the conditional bump with
CssLexContext::UrlBody (used by parse_url_value and
UrlModifierList::new(...).parse_list) need parser fixtures exercising both valid
and invalid cases; add snapshot tests that cover: valid interpolated heads and
modifier functions (e.g. url(#{name}(bar)), url(foo#{1 + 1}(bar))) and
malformed/interpolated-head errors (e.g. broken/missing interpolation inside the
function head), plus cases where the plain '(' branch vs UrlBody branch is
taken, placing them alongside existing parser tests for URL values so
AST/recovery behavior is asserted for both success and error snapshots.
---
Duplicate comments:
In `@crates/biome_css_parser/src/lexer/source_cursor.rs`:
- Around line 93-96: is_valid_escape_at currently treats a backslash followed by
form-feed (U+000C) as a valid escape; update the function to treat form-feed as
a newline-class terminator so such escapes are invalid. In the
is_valid_escape_at(&self, offset: usize) function, include the form-feed byte
(b'\x0C') alongside b'\n' and b'\r' in the matches!() check (and keep the
existing None check) so sequences like "\\\u{000C}" are rejected by SourceCursor
and won't leak into identifier/raw-URL scanners.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d148a007-1f04-4640-908b-8355422e5871
⛔ Files ignored due to path filters (5)
crates/biome_css_formatter/tests/specs/css/scss/at-rule/container-general-enclosed.scss.snapis excluded by!**/*.snapand included by**crates/biome_css_formatter/tests/specs/css/scss/at-rule/supports-general-enclosed.scss.snapis excluded by!**/*.snapand included by**crates/biome_css_formatter/tests/specs/css/scss/declaration/url-interpolated-functions.scss.snapis excluded by!**/*.snapand included by**crates/biome_css_parser/tests/css_test_suite/ok/scss/value/interpolated-minus-line-break.scss.snapis excluded by!**/*.snapand included by**crates/biome_css_parser/tests/css_test_suite/ok/scss/value/url.scss.snapis excluded by!**/*.snapand included by**
📒 Files selected for processing (15)
crates/biome_css_formatter/src/css/auxiliary/url_function.rscrates/biome_css_formatter/tests/specs/css/scss/declaration/url-interpolated-functions.scsscrates/biome_css_parser/src/lexer/mod.rscrates/biome_css_parser/src/lexer/scan_cursor.rscrates/biome_css_parser/src/lexer/source_cursor.rscrates/biome_css_parser/src/lexer/tests.rscrates/biome_css_parser/src/syntax/scss/mod.rscrates/biome_css_parser/src/syntax/scss/value/function.rscrates/biome_css_parser/src/syntax/scss/value/mod.rscrates/biome_css_parser/src/syntax/value/function/call.rscrates/biome_css_parser/src/syntax/value/function/mod.rscrates/biome_css_parser/src/syntax/value/url.rscrates/biome_css_parser/src/token_source.rscrates/biome_css_parser/tests/css_test_suite/ok/scss/value/interpolated-minus-line-break.scsscrates/biome_css_parser/tests/css_test_suite/ok/scss/value/url.scss
💤 Files with no reviewable changes (1)
- crates/biome_css_parser/src/syntax/value/function/call.rs
✅ Files skipped from review due to trivial changes (2)
- crates/biome_css_parser/tests/css_test_suite/ok/scss/value/interpolated-minus-line-break.scss
- crates/biome_css_formatter/tests/specs/css/scss/declaration/url-interpolated-functions.scss
🚧 Files skipped from review as they are similar to previous changes (7)
- crates/biome_css_parser/src/token_source.rs
- crates/biome_css_formatter/src/css/auxiliary/url_function.rs
- crates/biome_css_parser/src/syntax/scss/mod.rs
- crates/biome_css_parser/src/syntax/scss/value/function.rs
- crates/biome_css_parser/tests/css_test_suite/ok/scss/value/url.scss
- crates/biome_css_parser/src/syntax/value/function/mod.rs
- crates/biome_css_parser/src/syntax/scss/value/mod.rs
…rpolated functions
…ts for ASCII and Tailwind scenarios
…Tailwind `-*` fixup adjustments
There was a problem hiding this comment.
♻️ Duplicate comments (2)
crates/biome_css_parser/src/lexer/scan_cursor.rs (2)
851-859:⚠️ Potential issue | 🟠 MajorKeep plain function heads out of the raw-URL path.
After
skip_url_body_trivia(), a regularfoo(still falls straight through toscan_url_raw_value(), so inputs likeurl(foo(bar))orurl(/*x*/foo(bar))get tokenised as raw URL content and the parser never sees the function head. A plain identifier-function check needs to run before the raw scan and returnOther.Possible shape
+ fn is_at_plain_function(&self) -> bool { + let mut cursor = self.cursor; + if !cursor.is_ident_start() { + return false; + } + + cursor.advance_ident_sequence(IdentifierScanMode::Standard); + cursor.current_byte() == Some(b'(') + } + fn scan_url_body_start(&mut self, scss_exclusive_syntax_allowed: bool) -> UrlBodyStartScan { self.cursor.skip_url_body_trivia(); - if scss_exclusive_syntax_allowed && self.cursor.is_at_scss_interpolated_function() { + if self.is_at_plain_function() { + UrlBodyStartScan::Other + } else if scss_exclusive_syntax_allowed && self.cursor.is_at_scss_interpolated_function() { UrlBodyStartScan::InterpolatedFunction } else { self.scan_url_raw_value() .map_or(UrlBodyStartScan::Other, UrlBodyStartScan::RawValue) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_css_parser/src/lexer/scan_cursor.rs` around lines 851 - 859, After calling self.cursor.skip_url_body_trivia() in scan_url_body_start, add a check for a plain identifier followed by '(' (i.e. a non-interpolated function head) before calling scan_url_raw_value: if the cursor indicates a simple identifier-function (distinct from self.cursor.is_at_scss_interpolated_function()), return UrlBodyStartScan::Other so the parser will see the function head; otherwise proceed to call self.scan_url_raw_value() and map to UrlBodyStartScan::RawValue as before. Ensure you only short-circuit for plain identifier-function heads and keep the existing interpolated-function branch that returns UrlBodyStartScan::InterpolatedFunction.
380-392:⚠️ Potential issue | 🟡 MinorTreat form-feed as newline-class in string scanning.
\\\u{000C}still falls through the ordinary ASCII-escape branch, and a bare\u{000C}inside a quoted string does not stop withStringBodyScanStop::Newline. That lets malformed strings scan past their real boundary.Small fix
match self.current_byte() { - Some(b'\n') => { + Some(b'\n' | b'\x0C') => { self.advance(1); false } Some(b'\r') => { let len = if self.peek_byte() == Some(b'\n') { @@ - WHS if matches!(byte, b'\n' | b'\r') => { + WHS if matches!(byte, b'\n' | b'\r' | b'\x0C') => { let len = if byte == b'\r' && self.cursor.peek_byte() == Some(b'\n') { 2 } else { 1 };Also applies to: 657-668
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_css_parser/src/lexer/scan_cursor.rs` around lines 380 - 392, The string-body scanning branch that matches self.current_byte() for newline handling must also treat form-feed (0x0C) as newline-class so bare '\u{000C}' stops with StringBodyScanStop::Newline; add a case for Some(b'\x0C') that advances by 1 and returns the same false/newline behavior as the b'\n' arm. Apply the same change to the duplicate match block around the 657-668 region so both occurrences handle form-feed consistently. Ensure references to the match on self.current_byte() and the StringBodyScanStop::Newline behavior are updated accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@crates/biome_css_parser/src/lexer/scan_cursor.rs`:
- Around line 851-859: After calling self.cursor.skip_url_body_trivia() in
scan_url_body_start, add a check for a plain identifier followed by '(' (i.e. a
non-interpolated function head) before calling scan_url_raw_value: if the cursor
indicates a simple identifier-function (distinct from
self.cursor.is_at_scss_interpolated_function()), return UrlBodyStartScan::Other
so the parser will see the function head; otherwise proceed to call
self.scan_url_raw_value() and map to UrlBodyStartScan::RawValue as before.
Ensure you only short-circuit for plain identifier-function heads and keep the
existing interpolated-function branch that returns
UrlBodyStartScan::InterpolatedFunction.
- Around line 380-392: The string-body scanning branch that matches
self.current_byte() for newline handling must also treat form-feed (0x0C) as
newline-class so bare '\u{000C}' stops with StringBodyScanStop::Newline; add a
case for Some(b'\x0C') that advances by 1 and returns the same false/newline
behavior as the b'\n' arm. Apply the same change to the duplicate match block
around the 657-668 region so both occurrences handle form-feed consistently.
Ensure references to the match on self.current_byte() and the
StringBodyScanStop::Newline behavior are updated accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8bd12e01-a47a-437a-8fc9-ba4a95a625e3
📒 Files selected for processing (3)
crates/biome_css_parser/src/lexer/mod.rscrates/biome_css_parser/src/lexer/scan_cursor.rscrates/biome_css_parser/src/lexer/tests.rs
✅ Files skipped from review due to trivial changes (1)
- crates/biome_css_parser/src/lexer/mod.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/biome_css_parser/src/lexer/tests.rs
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
crates/biome_css_parser/src/lexer/scan_cursor.rs (1)
856-864:⚠️ Potential issue | 🟠 MajorKeep non-interpolated function heads out of raw URL scanning.
Line 859 only special-cases interpolation-based heads.
url(foo(bar))andurl(/*x*/foo(bar))still fall through toscan_url_raw_value(), so the parser never seesfoo(. Please returnUrlBodyStartScan::Otherfor ordinary function heads before attempting raw scanning.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_css_parser/src/lexer/scan_cursor.rs` around lines 856 - 864, The scanner currently only treats SCSS-interpolated function heads specially in scan_url_body_start, so ordinary function heads like foo( or foo(/*...*/) fall through to scan_url_raw_value; change scan_url_body_start to detect non-interpolated function heads (i.e., an identifier/name followed by an opening parenthesis possibly separated by URL-body trivia/comments) using the cursor API and, when such an ordinary function head is found, return UrlBodyStartScan::Other instead of calling scan_url_raw_value; keep the existing special-case for self.cursor.is_at_scss_interpolated_function() that returns UrlBodyStartScan::InterpolatedFunction and otherwise preserve the RawValue path for true raw values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/biome_css_parser/src/lexer/mod.rs`:
- Around line 1284-1316: The rewind for `-*` inside consume_ident_sequence is
currently gated on the parser option
`self.options.is_tailwind_directives_enabled()` which leaks parser state into
the lexer; change this so the lexer behavior is driven by lexer-local context
instead: add a lexer-context flag or an explicit parameter (e.g., a new bool
like `allow_tailwind_rewind` passed into or stored on the lexer) and replace the
check `if self.options.is_tailwind_directives_enabled() && scan.stop_byte ==
Some(b'*')` with a check against that lexer-local flag and `scan.stop_byte ==
Some(b'*')`; keep the existing rewind logic (inspect `byte_before`, adjust
`position` and `count`, call `self.set_position`) but do not reference
`self.options` from inside `consume_ident_sequence`.
---
Duplicate comments:
In `@crates/biome_css_parser/src/lexer/scan_cursor.rs`:
- Around line 856-864: The scanner currently only treats SCSS-interpolated
function heads specially in scan_url_body_start, so ordinary function heads like
foo( or foo(/*...*/) fall through to scan_url_raw_value; change
scan_url_body_start to detect non-interpolated function heads (i.e., an
identifier/name followed by an opening parenthesis possibly separated by
URL-body trivia/comments) using the cursor API and, when such an ordinary
function head is found, return UrlBodyStartScan::Other instead of calling
scan_url_raw_value; keep the existing special-case for
self.cursor.is_at_scss_interpolated_function() that returns
UrlBodyStartScan::InterpolatedFunction and otherwise preserve the RawValue path
for true raw values.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 307ef4ee-c693-4a0b-bfde-1fca3065e89b
⛔ Files ignored due to path filters (1)
crates/biome_css_parser/tests/css_test_suite/error/scss/value/url-interpolated-function-name.scss.snapis excluded by!**/*.snapand included by**
📒 Files selected for processing (5)
crates/biome_css_parser/src/lexer/mod.rscrates/biome_css_parser/src/lexer/scan_cursor.rscrates/biome_css_parser/src/lexer/source_cursor.rscrates/biome_css_parser/src/lexer/tests.rscrates/biome_css_parser/tests/css_test_suite/error/scss/value/url-interpolated-function-name.scss
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/biome_css_parser/src/lexer/source_cursor.rs
ematipico
left a comment
There was a problem hiding this comment.
Unfortunately I have difficulty understanding the changes, however if we don't any sort of regressions, it should be safe to merge.
Great work!!
… body scanning, and clarify invalid escape handling
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
crates/biome_css_parser/src/lexer/scan_cursor.rs (1)
860-868:⚠️ Potential issue | 🟠 MajorDon’t swallow plain function heads as raw URL text.
After trivia, this fast-path only exempts interpolated heads. Plain
foo(bar)/format(woff2)still drop intoscan_url_raw_value(), so the parser never sees the pre-existing plain function-head path. Please short-circuit ordinary function heads here as well, and pin it with a trivia-prefixed regression before this scanner gets any more clever than the rest of us.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_css_parser/src/lexer/scan_cursor.rs` around lines 860 - 868, After skipping trivia in scan_url_body_start, short-circuit ordinary (non-interpolated) function heads so they are not treated as RawValue: add a check after self.cursor.skip_url_body_trivia() (and after the existing is_at_scss_interpolated_function check) that calls a cursor helper like self.cursor.is_at_plain_function_head() (create it if missing to detect patterns like identifier immediately followed by '(' with only trivia between) and if true return UrlBodyStartScan::Other; otherwise fall through to the existing self.scan_url_raw_value() path. This preserves the interpolated fast-path (is_at_scss_interpolated_function), prevents swallowing plain foo(bar)/format(woff2) as raw text, and keeps the current RawValue logic intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/biome_css_parser/src/lexer/scan_cursor.rs`:
- Around line 466-469: consume_line_comment currently treats only LF/CR as line
terminators; include form-feed by reusing the existing newline-class helper:
replace the match on current in consume_line_comment() with a check using
is_newline_byte(current) to return true for any newline-class byte (including
b'\x0C'), otherwise call advance_byte_or_char(current); reference functions:
consume_line_comment, is_newline_byte, advance_byte_or_char, current_byte.
---
Duplicate comments:
In `@crates/biome_css_parser/src/lexer/scan_cursor.rs`:
- Around line 860-868: After skipping trivia in scan_url_body_start,
short-circuit ordinary (non-interpolated) function heads so they are not treated
as RawValue: add a check after self.cursor.skip_url_body_trivia() (and after the
existing is_at_scss_interpolated_function check) that calls a cursor helper like
self.cursor.is_at_plain_function_head() (create it if missing to detect patterns
like identifier immediately followed by '(' with only trivia between) and if
true return UrlBodyStartScan::Other; otherwise fall through to the existing
self.scan_url_raw_value() path. This preserves the interpolated fast-path
(is_at_scss_interpolated_function), prevents swallowing plain
foo(bar)/format(woff2) as raw text, and keeps the current RawValue logic intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c6a6a7f3-cac0-42fd-ab6c-73d7fdade477
📒 Files selected for processing (3)
crates/biome_css_parser/src/lexer/mod.rscrates/biome_css_parser/src/lexer/scan_cursor.rscrates/biome_css_parser/src/lexer/tests.rs
✅ Files skipped from review due to trivial changes (1)
- crates/biome_css_parser/src/lexer/mod.rs
| while let Some(current) = self.current_byte() { | ||
| match current { | ||
| b'\n' | b'\r' => return true, | ||
| _ => self.advance_byte_or_char(current), |
There was a problem hiding this comment.
Form-feed still doesn’t terminate // comments.
consume_line_comment() only stops on LF/CR. With // comment\u{000C}#{name}(bar), the scanner treats the comment as running to EOF, so both URL-trivia skipping and interpolation scanning miss the following tokens. Reusing is_newline_byte() here would keep FF aligned with the rest of this file’s newline-class handling.
🩹 Proposed fix
while let Some(current) = self.current_byte() {
match current {
- b'\n' | b'\r' => return true,
+ byte if is_newline_byte(byte) => return true,
_ => self.advance_byte_or_char(current),
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| while let Some(current) = self.current_byte() { | |
| match current { | |
| b'\n' | b'\r' => return true, | |
| _ => self.advance_byte_or_char(current), | |
| while let Some(current) = self.current_byte() { | |
| match current { | |
| byte if is_newline_byte(byte) => return true, | |
| _ => self.advance_byte_or_char(current), | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/biome_css_parser/src/lexer/scan_cursor.rs` around lines 466 - 469,
consume_line_comment currently treats only LF/CR as line terminators; include
form-feed by reusing the existing newline-class helper: replace the match on
current in consume_line_comment() with a check using is_newline_byte(current) to
return true for any newline-class byte (including b'\x0C'), otherwise call
advance_byte_or_char(current); reference functions: consume_line_comment,
is_newline_byte, advance_byte_or_char, current_byte.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/biome_css_parser/src/lexer/tests.rs`:
- Around line 120-146: Extend the checkpoint test to cover stateful scanner
state: create a lexer via CssLexer::from_str with an input that triggers a
multi-byte/stateful scan (e.g., a quoted string or a raw/url with spaces), call
next_token(CssLexContext::Regular) until you are mid-way through that stateful
scan, call checkpoint(), consume forward to produce tokens/ranges, call
rewind(checkpoint) and then assert that position(), current_byte(),
next_token(CssLexContext::Regular) and current_range() return the same token and
range as before rewinding; use the existing helpers (CssLexer::from_str,
next_token, checkpoint, rewind, position, current_byte, current_range) and
mirror the assert pattern from rewind_restores_lexer_source_cursor_position but
when checkpointing inside a stateful scan to verify scanner internal pending
state is restored.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 17b7a875-dc23-452a-b6b6-eb4f1a9c8c14
📒 Files selected for processing (2)
crates/biome_css_parser/src/lexer/scan_cursor.rscrates/biome_css_parser/src/lexer/tests.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/biome_css_parser/src/lexer/scan_cursor.rs
…anCursor` invalid escape handling
There was a problem hiding this comment.
♻️ Duplicate comments (2)
crates/biome_css_parser/src/lexer/scan_cursor.rs (2)
448-451:⚠️ Potential issue | 🟡 MinorTreat form-feed as a line terminator in
//comments.Line 450 currently stops only on LF/CR, so
//...\u{000C}runs to EOF and can hide following tokens.🩹 Proposed fix
while let Some(current) = self.current_byte() { match current { - b'\n' | b'\r' => return true, + byte if is_newline_byte(byte) => return true, _ => self.advance_byte_or_char(current), } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_css_parser/src/lexer/scan_cursor.rs` around lines 448 - 451, The line-comment scanning loop in scan_cursor (the while-let using current_byte() and advance_byte_or_char) only treats LF and CR as terminators; add form-feed as a third terminator so `//` comments stop on U+000C as well (i.e., include b'\x0C' or b'\f' in the match arm alongside b'\n' and b'\r') to ensure `//...␌` doesn't run to EOF and hide following tokens.
845-849:⚠️ Potential issue | 🟠 MajorPlain function heads after skipped trivia still fall into raw URL scanning.
At Line 845-849, only interpolated function heads bypass raw-value scanning. Inputs like
url(/*x*/foo(bar))still become raw literals, so the parser never seesfoo(as a function head.🧭 Suggested direction
fn scan_url_body_start(&mut self, scss_exclusive_syntax_allowed: bool) -> UrlBodyStartScan { self.cursor.skip_url_body_trivia(); if scss_exclusive_syntax_allowed && self.cursor.is_at_scss_interpolated_function() { UrlBodyStartScan::InterpolatedFunction + } else if self.is_at_plain_function_head() { + UrlBodyStartScan::Other } else { self.scan_url_raw_value() .map_or(UrlBodyStartScan::Other, UrlBodyStartScan::RawValue) } }And add a small lookahead helper in
UrlBodyScanner:fn is_at_plain_function_head(&self) -> bool { let mut lookahead = self.cursor; if !lookahead.is_ident_start() { return false; } lookahead.advance_ident_sequence(); lookahead.current_byte() == Some(b'(') }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_css_parser/src/lexer/scan_cursor.rs` around lines 845 - 849, The scan currently only treats scss interpolated function heads as functions, causing inputs like `url(/*x*/foo(bar))` to be mis-scanned as raw; add a small lookahead helper on UrlBodyScanner (e.g., fn is_at_plain_function_head(&self) -> bool) that clones the cursor, skips the same trivia/commments you skip before scanning, then checks lookahead.is_ident_start(), calls lookahead.advance_ident_sequence(), and returns lookahead.current_byte() == Some(b'('); then update the condition at the UrlBodyStartScan decision site to treat plain function heads as UrlBodyStartScan::InterpolatedFunction (or the appropriate variant) when either is_at_scss_interpolated_function() or is_at_plain_function_head() is true, otherwise fall back to scan_url_raw_value() -> UrlBodyStartScan::RawValue/Other.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@crates/biome_css_parser/src/lexer/scan_cursor.rs`:
- Around line 448-451: The line-comment scanning loop in scan_cursor (the
while-let using current_byte() and advance_byte_or_char) only treats LF and CR
as terminators; add form-feed as a third terminator so `//` comments stop on
U+000C as well (i.e., include b'\x0C' or b'\f' in the match arm alongside b'\n'
and b'\r') to ensure `//...␌` doesn't run to EOF and hide following tokens.
- Around line 845-849: The scan currently only treats scss interpolated function
heads as functions, causing inputs like `url(/*x*/foo(bar))` to be mis-scanned
as raw; add a small lookahead helper on UrlBodyScanner (e.g., fn
is_at_plain_function_head(&self) -> bool) that clones the cursor, skips the same
trivia/commments you skip before scanning, then checks
lookahead.is_ident_start(), calls lookahead.advance_ident_sequence(), and
returns lookahead.current_byte() == Some(b'('); then update the condition at the
UrlBodyStartScan decision site to treat plain function heads as
UrlBodyStartScan::InterpolatedFunction (or the appropriate variant) when either
is_at_scss_interpolated_function() or is_at_plain_function_head() is true,
otherwise fall back to scan_url_raw_value() -> UrlBodyStartScan::RawValue/Other.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f77f6c08-25cc-4332-a789-0db38e703233
📒 Files selected for processing (1)
crates/biome_css_parser/src/lexer/scan_cursor.rs
Summary
Fix SCSS
url(...)parsing for interpolation-based function names and simplify the shared lexer scanlayer.
The main bug was that valid SCSS function-call shapes inside
url(...)were not preserved correctly. Thelexer could classify the URL body too early, and the parser only handled plain identifier-based function
heads there, so interpolation-based names were lost or rejected.
Now we parse these SCSS forms correctly:
This PR also refactors the CSS lexer scan layer so source navigation and CSS-aware scanning are shared
through SourceCursor and CssScanCursor instead of being duplicated between lexer and lookahead code.
Test Plan