perf(parse/tw): avoid going into basename store trie when not needed by dyc3 · Pull Request #8528 · biomejs/biome

dyc3 · 2025-12-20T20:39:29Z

Summary

This PR makes it so we avoid touching the basename store if we can trivially check to see that a delimiter comes immediately after. This only works for ascii, but since tailwind is majority ascii characters, this works great.

On my machine, this results in anywhere from +2-18% throughput for uncached benches, and +2-36% for cached benches. No idea why codspeed isn't showing that.

Test Plan

Docs

changeset-bot · 2025-12-20T20:39:34Z

⚠️ No Changeset found

Latest commit: a020914

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

dyc3 · 2025-12-20T20:39:48Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

codspeed-hq · 2025-12-20T20:56:29Z

CodSpeed Performance Report

Merging #8528 will not alter performance

_{Comparing dyc3/tw-parser-shortcircuit-trie (a020914) with main (b8d3bcb)}

Summary

✅ 10 untouched
⏩ 145 skipped¹

145 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

coderabbitai · 2025-12-20T20:57:55Z

Walkthrough

This PR optimises the Tailwind basename tokenisation by introducing an ASCII fast-path that scans for dashes and delimiters before consulting the trie-based BASENAME_STORE. The is_delimiter function in base_name_store.rs is made accessible at crate level via pub(crate) visibility, and the lexer's basename matching logic now leverages this for early termination. The handling of the "data" keyword (emitting DATA_KW) is preserved, with fallback to the existing dashed-basename trie when the fast-path doesn't yield a definitive result.

Possibly related PRs

feat(parse/tailwind): parse data attribute selectors as dedicated syntax #8514 — Modifies Tailwind lexer basename handling and DATA_KW emission logic in the same module.
perf(parse/tailwind): use compact trie for lexing base names instead of linear search #7977 — Adjusts the trie-based BASENAME_STORE approach that this PR builds upon, affecting the same lexer components.

Suggested reviewers

ematipico

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main performance optimisation: avoiding unnecessary trie lookups by checking for delimiters first in ASCII strings.
Description check	✅ Passed	The PR description clearly explains the motivation (avoiding unnecessary trie lookups for ASCII delimiters), the approach (ASCII fast-path check), and reports benchmark improvements.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch dyc3/tw-parser-shortcircuit-trie

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/biome_tailwind_parser/src/lexer/mod.rs (1)

125-161: Add explicit test case for the data keyword.

The fast-path logic is exercised through existing tests (e.g., "block", "hover:") and the property-based losslessness test. However, there's no explicit test for the "data" keyword, which receives special handling in consume_base. Add a test case for "data" as a standalone token to document this behaviour.

🧹 Nitpick comments (2)

crates/biome_tailwind_parser/src/lexer/mod.rs (2)
131-150: Nice optimisation! Fast-path logic looks sound.

The ASCII scan correctly handles undashed basenames before consulting the trie, with proper fallback when hitting a dash. The early exits for "data" and simple basenames are well-positioned.

One clarification: the boundary check on line 147 intentionally excludes dash (unlike the trie's is_boundary_byte), ensuring dashed basenames like "border-t" fall through to the trie. Consider adding an inline comment noting this distinction for future maintainers.
💡 Optional: Add clarifying comment
+        // Note: We only accept delimiter or end-of-input here (not dash).
+        // If we hit a dash, we defer to the trie to match dashed basenames like "border-t".
         if end > 0 && (end == slice.len() || is_delimiter(slice[end])) {
             self.advance(end);
             return TW_BASE;
         }
142-145: Duplicate "data" check—consider clarifying intent.

The "data" keyword is checked at lines 142–145 (fast-path) and again at lines 156–157 (trie fallback). The second check appears redundant in typical cases since the fast-path catches "data" before a dash or delimiter. If it's defensive for edge cases, a brief comment would help future readers.
💡 Optional: Add comment explaining redundancy
         // Fallback to dashed-basename trie matching for cases with '-' inside the basename
         let dashed_end = BASENAME_STORE.matcher(slice).base_end();
         self.advance(dashed_end);
 
+        // Redundant safety check: the fast-path above typically handles "data",
+        // but we recheck here in case of unusual trie-fallback edge cases.
         if dashed_end == 4 && &slice[..dashed_end] == b"data" {
             DATA_KW
         } else {
             TW_BASE
         }
Also applies to: 156-157

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b8d3bcb and a020914.

📒 Files selected for processing (2)

crates/biome_tailwind_parser/src/lexer/base_name_store.rs (1 hunks)
crates/biome_tailwind_parser/src/lexer/mod.rs (2 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

crates/**/*.rs

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Files:

crates/biome_tailwind_parser/src/lexer/base_name_store.rs
crates/biome_tailwind_parser/src/lexer/mod.rs

🧠 Learnings (13)

📓 Common learnings

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Avoid string allocations by comparing against `&str` or using `TokenText`

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Parse rule functions must be prefixed with `parse_` and use the name defined in the grammar file, e.g., `parse_for_statement` or `parse_expression`

Applied to files:

crates/biome_tailwind_parser/src/lexer/base_name_store.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/language_kind.rs : Add a new language prefix to the `LANGUAGE_PREFIXES` constant in `language_kind.rs` file

Applied to files:

crates/biome_tailwind_parser/src/lexer/base_name_store.rs
crates/biome_tailwind_parser/src/lexer/mod.rs

📚 Learning: 2025-12-19T12:53:30.399Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Avoid string allocations by comparing against `&str` or using `TokenText`

Applied to files:

crates/biome_tailwind_parser/src/lexer/mod.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/lexer/mod.rs : Implement a `Lexer` trait from `biome_parser` crate for the lexer struct that consumes characters from source code and emits tokens

Applied to files:

crates/biome_tailwind_parser/src/lexer/mod.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement a token source struct that wraps the lexer and implements `TokenSourceWithBufferedLexer` and `LexerWithCheckpoint` for lookahead and re-lexing capabilities

Applied to files:

crates/biome_tailwind_parser/src/lexer/mod.rs

📚 Learning: 2025-12-19T12:53:30.399Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Prefix line with `#` in documentation code examples sparingly; prefer concise complete snippets

Applied to files:

crates/biome_tailwind_parser/src/lexer/mod.rs

📚 Learning: 2025-12-19T12:53:30.399Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/biome_rule_options/lib/**/*.rs : Use `rename_all = "camelCase"` in serde derive macro for rule options

Applied to files:

crates/biome_tailwind_parser/src/lexer/mod.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `ParseSeparatedList` and `ParseNodeList` for parsing lists with error recovery to avoid infinite loops

Applied to files:

crates/biome_tailwind_parser/src/lexer/mod.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `p.eat(token)` for optional tokens, `p.expect(token)` for required tokens, `parse_rule(p).ok(p)` for optional nodes, and `parse_rule(p).or_add_diagnostic(p, error)` for required nodes

Applied to files:

crates/biome_tailwind_parser/src/lexer/mod.rs

📚 Learning: 2025-11-24T18:06:12.048Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_service/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:12.048Z
Learning: Applies to crates/biome_service/src/workspace/watcher.tests.rs : Implement watcher tests for workspace methods in watcher.tests.rs and end-to-end tests in LSP tests

Applied to files:

crates/biome_tailwind_parser/src/lexer/mod.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : A parser struct must implement the `Parser` trait and save the token source, parser context, and optional parser options

Applied to files:

crates/biome_tailwind_parser/src/lexer/mod.rs

📚 Learning: 2025-11-24T18:06:03.545Z

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement error recovery in list parsing using `or_recover()` to wrap unparseable tokens in a `BOGUS_*` node and consume tokens until a recovery token is found

Applied to files:

crates/biome_tailwind_parser/src/lexer/mod.rs

🧬 Code graph analysis (1)

crates/biome_tailwind_parser/src/lexer/mod.rs (1)

crates/biome_tailwind_parser/src/lexer/base_name_store.rs (1)

is_delimiter (152-158)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: Lint project (depot-ubuntu-24.04-arm-16)
GitHub Check: Test (depot-ubuntu-24.04-arm-16)
GitHub Check: Test (depot-windows-2022-16)
GitHub Check: Check Dependencies
GitHub Check: End-to-end tests
GitHub Check: Lint project (depot-windows-2022)
GitHub Check: Documentation
GitHub Check: Test Node.js API
GitHub Check: Bench (biome_tailwind_parser)
GitHub Check: autofix

🔇 Additional comments (2)

crates/biome_tailwind_parser/src/lexer/base_name_store.rs (1)

152-152: LGTM! Visibility change enables the fast-path optimisation.

Exposing is_delimiter at crate level allows the lexer to perform boundary checks during the ASCII fast-path without duplicating delimiter logic.

crates/biome_tailwind_parser/src/lexer/mod.rs (1)

4-4: LGTM!

Import addition enables the fast-path delimiter checks.

dyc3 mentioned this pull request Dec 20, 2025

refactor(parse/tailwind): use trivia again #8513

Merged

dyc3 mentioned this pull request Dec 20, 2025

feat(parse/tailwind): parse data attribute selectors as dedicated syntax #8514

Merged

github-actions bot added A-Parser Area: parser L-Tailwind Language: Tailwind CSS labels Dec 20, 2025

dyc3 requested review from a team December 20, 2025 20:50

dyc3 marked this pull request as ready for review December 20, 2025 20:51

dyc3 changed the base branch from dyc3/tw-parser-fixes to graphite-base/8528 December 20, 2025 20:51

dyc3 force-pushed the graphite-base/8528 branch from 77eeab3 to b8d3bcb Compare December 20, 2025 20:52

dyc3 force-pushed the dyc3/tw-parser-shortcircuit-trie branch from 8d85823 to 4ea9996 Compare December 20, 2025 20:52

graphite-app bot changed the base branch from graphite-base/8528 to main December 20, 2025 20:53

perf(parse/tw): avoid going into basename store trie when not needed

a020914

dyc3 force-pushed the dyc3/tw-parser-shortcircuit-trie branch from 4ea9996 to a020914 Compare December 20, 2025 20:53

coderabbitai bot reviewed Dec 20, 2025

View reviewed changes

ematipico approved these changes Dec 21, 2025

View reviewed changes

dyc3 merged commit 83efa4d into main Dec 21, 2025
16 checks passed

dyc3 deleted the dyc3/tw-parser-shortcircuit-trie branch December 21, 2025 22:34

coderabbitai bot mentioned this pull request Feb 21, 2026

perf(parse/tailwind): use lookup_byte for slightly better throughput #9183

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(parse/tw): avoid going into basename store trie when not needed#8528

perf(parse/tw): avoid going into basename store trie when not needed#8528
dyc3 merged 1 commit intomainfrom
dyc3/tw-parser-shortcircuit-trie

dyc3 commented Dec 20, 2025 •

edited

Loading

Uh oh!

changeset-bot bot commented Dec 20, 2025 •

edited

Loading

Uh oh!

dyc3 commented Dec 20, 2025 •

edited

Loading

Uh oh!

codspeed-hq bot commented Dec 20, 2025

Uh oh!

coderabbitai bot commented Dec 20, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

dyc3 commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Docs

Uh oh!

changeset-bot bot commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

dyc3 commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented Dec 20, 2025

CodSpeed Performance Report

Merging #8528 will not alter performance

Summary

Footnotes

Uh oh!

coderabbitai bot commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dyc3 commented Dec 20, 2025 •

edited

Loading

changeset-bot bot commented Dec 20, 2025 •

edited

Loading

dyc3 commented Dec 20, 2025 •

edited

Loading

coderabbitai bot commented Dec 20, 2025 •

edited

Loading