Skip to content

perf(parse/tailwind): use lookup_byte for slightly better throughput#9183

Merged
dyc3 merged 1 commit intomainfrom
dyc3/tw-lexer-perf
Feb 22, 2026
Merged

perf(parse/tailwind): use lookup_byte for slightly better throughput#9183
dyc3 merged 1 commit intomainfrom
dyc3/tw-lexer-perf

Conversation

@dyc3
Copy link
Contributor

@dyc3 dyc3 commented Feb 21, 2026

Summary

This refactors the tailwind parser to use charater byte lookups using biome_unicode_table::lookup_byte.

On my machine, this results in about a 33% speedup, and anywhere from 40-60% throughput increase.

Test Plan

no snapshot changes

Docs

@changeset-bot
Copy link

changeset-bot bot commented Feb 21, 2026

⚠️ No Changeset found

Latest commit: db1ed69

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions bot added A-Parser Area: parser L-Tailwind Language: Tailwind CSS labels Feb 21, 2026
@codspeed-hq
Copy link

codspeed-hq bot commented Feb 21, 2026

Merging this PR will not alter performance

✅ 10 untouched benchmarks
⏩ 206 skipped benchmarks1


Comparing dyc3/tw-lexer-perf (db1ed69) with main (b834078)

Open in CodSpeed

Footnotes

  1. 206 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 21, 2026

No actionable comments were generated in the recent review. 🎉


Walkthrough

The PR refactors the Tailwind lexer to use the Dispatch enum and lookup_byte for character classification instead of raw u8 comparisons. base_name_store.rs and lexer/mod.rs now obtain a Dispatch for bytes and pass it to updated helpers (is_delimiter, is_boundary_byte) and token-consumption paths. Several internal functions and branching points were changed to accept or operate on Dispatch values; public APIs remain unchanged.

Possibly related PRs

Suggested reviewers

  • ematipico
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: refactoring the Tailwind parser to use lookup_byte for performance improvements.
Description check ✅ Passed The description clearly explains the motivation (performance improvement via character byte lookups), provides measured results, and confirms no snapshot changes were made.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dyc3/tw-lexer-perf

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/biome_tailwind_parser/src/lexer/mod.rs (1)

135-172: ⚠️ Potential issue | 🟠 Major

Fast path now ignores - and can skip dashed-basename matching.
The loop never stops on -, so inputs like border-t-red-300 will take the fast path and consume the whole string as TW_BASE, bypassing the dashed-basename trie. That looks like a tokenisation regression.

💡 Suggested fix
-        let mut end = 0usize;
+        let mut end = 0usize;
+        let mut saw_dash = false;
         while end < slice.len() {
             let b = slice[end];
             let dispatched = lookup_byte(b);
-            if dispatched == COL || is_delimiter(dispatched) {
+            if dispatched == MIN {
+                saw_dash = true;
+                break;
+            }
+            if dispatched == COL || is_delimiter(dispatched) {
                 break;
             }
             end += 1;
         }
@@
-        if end > 0 && (end == slice.len() || is_delimiter(lookup_byte(slice[end]))) {
+        if !saw_dash
+            && end > 0
+            && (end == slice.len() || is_delimiter(lookup_byte(slice[end])))
+        {
             self.advance(end);
             return TW_BASE;
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_tailwind_parser/src/lexer/mod.rs` around lines 135 - 172, The
fast-path loop in consume_base is incorrectly treating '-' as non-delim so
inputs like "border-t-red-300" are consumed as TW_BASE; modify the loop that
computes end to break when encountering a dash so dashed basenames go through
the trie: in consume_base check for b == b'-' (or dispatched == DASH if you have
a DASH kind) alongside the existing COL/is_delimiter check, so the loop stops at
'-' and the subsequent logic falls back to
BASENAME_STORE.matcher(slice).base_end() to return DATA_KW or TW_BASE as
appropriate; keep the DATA_KW special-case checks unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@crates/biome_tailwind_parser/src/lexer/mod.rs`:
- Around line 135-172: The fast-path loop in consume_base is incorrectly
treating '-' as non-delim so inputs like "border-t-red-300" are consumed as
TW_BASE; modify the loop that computes end to break when encountering a dash so
dashed basenames go through the trie: in consume_base check for b == b'-' (or
dispatched == DASH if you have a DASH kind) alongside the existing
COL/is_delimiter check, so the loop stops at '-' and the subsequent logic falls
back to BASENAME_STORE.matcher(slice).base_end() to return DATA_KW or TW_BASE as
appropriate; keep the DATA_KW special-case checks unchanged.

@dyc3 dyc3 requested review from a team February 21, 2026 22:19
@dyc3 dyc3 force-pushed the dyc3/tw-lexer-perf branch from bda3ca0 to db1ed69 Compare February 22, 2026 12:21
@dyc3 dyc3 merged commit b76c42b into main Feb 22, 2026
14 checks passed
@dyc3 dyc3 deleted the dyc3/tw-lexer-perf branch February 22, 2026 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Parser Area: parser L-Tailwind Language: Tailwind CSS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants