perf(parse/tailwind): use lookup_byte for slightly better throughput#9183
perf(parse/tailwind): use lookup_byte for slightly better throughput#9183
lookup_byte for slightly better throughput#9183Conversation
|
Merging this PR will not alter performance
Comparing Footnotes
|
|
No actionable comments were generated in the recent review. 🎉 WalkthroughThe PR refactors the Tailwind lexer to use the Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/biome_tailwind_parser/src/lexer/mod.rs (1)
135-172:⚠️ Potential issue | 🟠 MajorFast path now ignores
-and can skip dashed-basename matching.
The loop never stops on-, so inputs likeborder-t-red-300will take the fast path and consume the whole string asTW_BASE, bypassing the dashed-basename trie. That looks like a tokenisation regression.💡 Suggested fix
- let mut end = 0usize; + let mut end = 0usize; + let mut saw_dash = false; while end < slice.len() { let b = slice[end]; let dispatched = lookup_byte(b); - if dispatched == COL || is_delimiter(dispatched) { + if dispatched == MIN { + saw_dash = true; + break; + } + if dispatched == COL || is_delimiter(dispatched) { break; } end += 1; } @@ - if end > 0 && (end == slice.len() || is_delimiter(lookup_byte(slice[end]))) { + if !saw_dash + && end > 0 + && (end == slice.len() || is_delimiter(lookup_byte(slice[end]))) + { self.advance(end); return TW_BASE; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_tailwind_parser/src/lexer/mod.rs` around lines 135 - 172, The fast-path loop in consume_base is incorrectly treating '-' as non-delim so inputs like "border-t-red-300" are consumed as TW_BASE; modify the loop that computes end to break when encountering a dash so dashed basenames go through the trie: in consume_base check for b == b'-' (or dispatched == DASH if you have a DASH kind) alongside the existing COL/is_delimiter check, so the loop stops at '-' and the subsequent logic falls back to BASENAME_STORE.matcher(slice).base_end() to return DATA_KW or TW_BASE as appropriate; keep the DATA_KW special-case checks unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@crates/biome_tailwind_parser/src/lexer/mod.rs`:
- Around line 135-172: The fast-path loop in consume_base is incorrectly
treating '-' as non-delim so inputs like "border-t-red-300" are consumed as
TW_BASE; modify the loop that computes end to break when encountering a dash so
dashed basenames go through the trie: in consume_base check for b == b'-' (or
dispatched == DASH if you have a DASH kind) alongside the existing
COL/is_delimiter check, so the loop stops at '-' and the subsequent logic falls
back to BASENAME_STORE.matcher(slice).base_end() to return DATA_KW or TW_BASE as
appropriate; keep the DATA_KW special-case checks unchanged.
bda3ca0 to
db1ed69
Compare
Summary
This refactors the tailwind parser to use charater byte lookups using
biome_unicode_table::lookup_byte.On my machine, this results in about a 33% speedup, and anywhere from 40-60% throughput increase.
Test Plan
no snapshot changes
Docs