perf(parse/tw): avoid going into basename store trie when not needed#8528
perf(parse/tw): avoid going into basename store trie when not needed#8528
Conversation
|
This stack of pull requests is managed by Graphite. Learn more about stacking. |
77eeab3 to
b8d3bcb
Compare
8d85823 to
4ea9996
Compare
4ea9996 to
a020914
Compare
CodSpeed Performance ReportMerging #8528 will not alter performanceComparing Summary
Footnotes
|
WalkthroughThis PR optimises the Tailwind basename tokenisation by introducing an ASCII fast-path that scans for dashes and delimiters before consulting the trie-based BASENAME_STORE. The Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/biome_tailwind_parser/src/lexer/mod.rs (1)
125-161: Add explicit test case for thedatakeyword.The fast-path logic is exercised through existing tests (e.g., "block", "hover:") and the property-based losslessness test. However, there's no explicit test for the "data" keyword, which receives special handling in
consume_base. Add a test case for "data" as a standalone token to document this behaviour.
🧹 Nitpick comments (2)
crates/biome_tailwind_parser/src/lexer/mod.rs (2)
131-150: Nice optimisation! Fast-path logic looks sound.The ASCII scan correctly handles undashed basenames before consulting the trie, with proper fallback when hitting a dash. The early exits for "data" and simple basenames are well-positioned.
One clarification: the boundary check on line 147 intentionally excludes dash (unlike the trie's
is_boundary_byte), ensuring dashed basenames like "border-t" fall through to the trie. Consider adding an inline comment noting this distinction for future maintainers.💡 Optional: Add clarifying comment
+ // Note: We only accept delimiter or end-of-input here (not dash). + // If we hit a dash, we defer to the trie to match dashed basenames like "border-t". if end > 0 && (end == slice.len() || is_delimiter(slice[end])) { self.advance(end); return TW_BASE; }
142-145: Duplicate "data" check—consider clarifying intent.The "data" keyword is checked at lines 142–145 (fast-path) and again at lines 156–157 (trie fallback). The second check appears redundant in typical cases since the fast-path catches "data" before a dash or delimiter. If it's defensive for edge cases, a brief comment would help future readers.
💡 Optional: Add comment explaining redundancy
// Fallback to dashed-basename trie matching for cases with '-' inside the basename let dashed_end = BASENAME_STORE.matcher(slice).base_end(); self.advance(dashed_end); + // Redundant safety check: the fast-path above typically handles "data", + // but we recheck here in case of unusual trie-fallback edge cases. if dashed_end == 4 && &slice[..dashed_end] == b"data" { DATA_KW } else { TW_BASE }Also applies to: 156-157
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
crates/biome_tailwind_parser/src/lexer/base_name_store.rs(1 hunks)crates/biome_tailwind_parser/src/lexer/mod.rs(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
crates/**/*.rs
📄 CodeRabbit inference engine (CONTRIBUTING.md)
Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates
Files:
crates/biome_tailwind_parser/src/lexer/base_name_store.rscrates/biome_tailwind_parser/src/lexer/mod.rs
🧠 Learnings (13)
📓 Common learnings
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Avoid string allocations by comparing against `&str` or using `TokenText`
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Parse rule functions must be prefixed with `parse_` and use the name defined in the grammar file, e.g., `parse_for_statement` or `parse_expression`
Applied to files:
crates/biome_tailwind_parser/src/lexer/base_name_store.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/language_kind.rs : Add a new language prefix to the `LANGUAGE_PREFIXES` constant in `language_kind.rs` file
Applied to files:
crates/biome_tailwind_parser/src/lexer/base_name_store.rscrates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Avoid string allocations by comparing against `&str` or using `TokenText`
Applied to files:
crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/lexer/mod.rs : Implement a `Lexer` trait from `biome_parser` crate for the lexer struct that consumes characters from source code and emits tokens
Applied to files:
crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement a token source struct that wraps the lexer and implements `TokenSourceWithBufferedLexer` and `LexerWithCheckpoint` for lookahead and re-lexing capabilities
Applied to files:
crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Prefix line with `#` in documentation code examples sparingly; prefer concise complete snippets
Applied to files:
crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/biome_rule_options/lib/**/*.rs : Use `rename_all = "camelCase"` in serde derive macro for rule options
Applied to files:
crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `ParseSeparatedList` and `ParseNodeList` for parsing lists with error recovery to avoid infinite loops
Applied to files:
crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `p.eat(token)` for optional tokens, `p.expect(token)` for required tokens, `parse_rule(p).ok(p)` for optional nodes, and `parse_rule(p).or_add_diagnostic(p, error)` for required nodes
Applied to files:
crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:12.048Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_service/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:12.048Z
Learning: Applies to crates/biome_service/src/workspace/watcher.tests.rs : Implement watcher tests for workspace methods in watcher.tests.rs and end-to-end tests in LSP tests
Applied to files:
crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : A parser struct must implement the `Parser` trait and save the token source, parser context, and optional parser options
Applied to files:
crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement error recovery in list parsing using `or_recover()` to wrap unparseable tokens in a `BOGUS_*` node and consume tokens until a recovery token is found
Applied to files:
crates/biome_tailwind_parser/src/lexer/mod.rs
🧬 Code graph analysis (1)
crates/biome_tailwind_parser/src/lexer/mod.rs (1)
crates/biome_tailwind_parser/src/lexer/base_name_store.rs (1)
is_delimiter(152-158)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Lint project (depot-ubuntu-24.04-arm-16)
- GitHub Check: Test (depot-ubuntu-24.04-arm-16)
- GitHub Check: Test (depot-windows-2022-16)
- GitHub Check: Check Dependencies
- GitHub Check: End-to-end tests
- GitHub Check: Lint project (depot-windows-2022)
- GitHub Check: Documentation
- GitHub Check: Test Node.js API
- GitHub Check: Bench (biome_tailwind_parser)
- GitHub Check: autofix
🔇 Additional comments (2)
crates/biome_tailwind_parser/src/lexer/base_name_store.rs (1)
152-152: LGTM! Visibility change enables the fast-path optimisation.Exposing
is_delimiterat crate level allows the lexer to perform boundary checks during the ASCII fast-path without duplicating delimiter logic.crates/biome_tailwind_parser/src/lexer/mod.rs (1)
4-4: LGTM!Import addition enables the fast-path delimiter checks.

Summary
This PR makes it so we avoid touching the basename store if we can trivially check to see that a delimiter comes immediately after. This only works for ascii, but since tailwind is majority ascii characters, this works great.
On my machine, this results in anywhere from +2-18% throughput for uncached benches, and +2-36% for cached benches. No idea why codspeed isn't showing that.
Test Plan
Docs