perf(parser): Explore packed tokens by tomgasson · Pull Request #10933 · oxc-project/oxc

tomgasson · 2025-05-10T14:13:04Z

Pack the parser tokens into a single u128.

I was musing with o3 about compiler performance, and it told me that Hermes used packed tokens to improve parser performance so I thought I'd try and see if that would also work here. In local benchmarks it appears to have a decent improvement.

closes #9926

graphite-app · 2025-05-10T14:13:10Z

How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

0-merge - adds this PR to the back of the merge queue
hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

codspeed-hq · 2025-05-10T14:23:14Z

CodSpeed Instrumentation Performance Report

Merging #10933 will degrade performances by 3.85%

_{Comparing tomgasson:packed_token (18cc339) with main (a86cbb3)}

Summary

⚡ 2 improvements
❌ 4 regressions
✅ 30 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
❌	`lexer[antd.js]`	24.1 ms	25 ms	-3.72%
❌	`lexer[cal.com.tsx]`	5.8 ms	6.1 ms	-3.63%
❌	`lexer[checker.ts]`	14.4 ms	14.9 ms	-3.2%
❌	`lexer[pdf.mjs]`	3.9 ms	4 ms	-3.85%
⚡	`parser[antd.js]`	106 ms	102.8 ms	+3.07%
⚡	`parser[cal.com.tsx]`	28.7 ms	27.8 ms	+3.38%

Boshen · 2025-05-10T14:39:34Z

Hi Tom! Long time no see!

Very promising performance improvements:

I'll get some of the API changes into main first to make this PR easier to review.

Previous attempt was #8153 and it made similar API changes I forgot to do that first ...

Boshen · 2025-05-10T14:41:28Z

I was musing with o3 about compiler performance

Curious how you chatted with o3 ;-)

The goal of this PR is to reduce the diff for #10933

magic-akari · 2025-05-11T07:27:17Z

After this modification, can it parse this library properly?
Just to highlight, there's a Base64-encoded WASM file embedded directly in the source code as a very long string.

https://www.npmjs.com/package/@shikijs/engine-oniguruma?activeTab=code
https://cdn.jsdelivr.net/npm/@shikijs/engine-oniguruma@3.4.0/dist/wasm-inlined.mjs

Based on experimentation and calculation, the shikijs oniguruma library size is approximately 600KB, and 20 bits correspond to about 1MB of data, so it's safe.

Pack the parser tokens into a single u128

tomgasson · 2025-05-12T00:46:10Z

@magic-akari thanks for that example. I was aware of this new constraint - but struggled to identify where someone would actually have such a large single individual token.

From some iteration locally, it seems that moving to u128 performs similarly.

I also couldn't get stable results between using start & len vs start & end so I've reverted that to end to reduce what's changing.

baseline: main

tomgasson · 2025-05-12T01:06:46Z

Curious how you chatted with o3 ;-)

Just conversationally exploring, rough overall structure

I want to build a language from scratch (in rust), ask me what features it should have
[answer questions like JS]
Read about the following [oxc, hermes, v8, javascriptcore, typescript, flow, hhvm, JVM, c#, go, mold, wild, 1brc, ripgrep, ninja, n2, buck2, bazel, simdjson, simdutf, ada, sep] (filling context with high perf insights)
Let's build my language. Starting at the [lexer|parser|printer]
[simple example]
What techniques is X using which I could use?

Relevant snippet of convo

Consider Hermes (Facebook/meta). Are there any tricks we can take from there to apply to our lexer/parser?

Yes — Hermes, Facebook/Meta’s JS engine, is famous for its focus on startup time, memory compactness, and ahead-of-time execution, especially on low-power devices. There are several powerful tricks in its lexer/parser/compiler that you could absolutely adapt to your high-performance parser.

Let’s break them down:

1. Compact Token & AST Representation

Hermes:

Uses bit-packed token headers (u32 or u64):
- kind: 8 bits
- start_offset: 24 bits
- flags/metadata: remaining bits

AST nodes are tight structs, often with union-like variants.

Apply to your language:

You can bit-pack token metadata:
- Use u32 with {kind: u8, start: u24}
Replace heap-allocated structs with:
- flat arenas
- small, Copy AST nodes
- Possibly #[repr(C)] + enum-discriminant layouts for traversals

Benefit: Faster parsing, smaller memory footprint, better locality

2. Pre-parsed Bytecode: Skipping Parse at Runtime

...

Tell me more about bit-packing token

4. Bit-Packed Tokens: What, Why, and How

Why:

Your lexer is blazing fast — but each token is 12–24 bytes
That adds up fast in large files (>10K tokens)
Packing reduces memory, improves cache locality

Idea:

Represent a Token as a single u32 or u64:

Layout:

Field	Bits	Example
Kind	8	0–255 enum
Start offset	24	Byte offset
End offset	24	Byte offset
Flags	8	custom

Accessors:

impl PackedToken {
    fn kind(self) -> TokenKind {
        TokenKind::from_u8((self.0 & 0xFF) as u8)
    }
    fn start(self) -> u32 {
        ((self.0 >> 8) & 0xFFFFFF) as u32
    }
    fn end(self) -> u32 {
        ((self.0 >> 32) & 0xFFFFFF) as u32
    }
}

Optionally make TokenKind: u8 with #[repr(u8)] to simplify.

tomgasson · 2025-05-12T04:25:24Z

From CodSpeed it look like this has now shifted much of the work over to the lexer (more so with u128 than u64). This might need a bit more work on the lexing side to improve

overlookmotel

Great!

On Mac Mini M4, I'm getting this result from benchmarks:

lexer/checker.ts        time:   [3.5806 ms 3.5878 ms 3.5953 ms]
                        change: [+2.8823% +3.1912% +3.4929%] (p = 0.00 < 0.05)
                        Performance has regressed.
lexer/cal.com.tsx       time:   [1.4846 ms 1.4899 ms 1.4961 ms]
                        change: [+3.5426% +4.5244% +5.7089%] (p = 0.00 < 0.05)
                        Performance has regressed.
lexer/RadixUI.jsx       time:   [1.5299 µs 1.5328 µs 1.5358 µs]
                        change: [−4.0315% −3.7330% −3.4303%] (p = 0.00 < 0.05)
                        Performance has improved.
lexer/pdf.mjs           time:   [918.41 µs 920.58 µs 923.02 µs]
                        change: [+1.2432% +1.7523% +2.2282%] (p = 0.00 < 0.05)
                        Performance has regressed.
lexer/antd.js           time:   [6.8587 ms 6.8660 ms 6.8756 ms]
                        change: [+3.8498% +3.9941% +4.1547%] (p = 0.00 < 0.05)
                        Performance has regressed.

parser/checker.ts       time:   [7.1211 ms 7.1489 ms 7.1802 ms]
                        change: [−3.8394% −3.3014% −2.7636%] (p = 0.00 < 0.05)
                        Performance has improved.
parser/cal.com.tsx      time:   [3.5647 ms 3.5728 ms 3.5822 ms]
                        change: [−6.5004% −6.2345% −5.9451%] (p = 0.00 < 0.05)
                        Performance has improved.
parser/RadixUI.jsx      time:   [4.3214 µs 4.3289 µs 4.3367 µs]
                        change: [−12.434% −12.188% −11.924%] (p = 0.00 < 0.05)
                        Performance has improved.
parser/pdf.mjs          time:   [2.0171 ms 2.0206 ms 2.0243 ms]
                        change: [−6.4245% −5.8924% −5.4479%] (p = 0.00 < 0.05)
                        Performance has improved.
parser/antd.js          time:   [13.776 ms 13.836 ms 13.905 ms]
                        change: [−4.3647% −3.9299% −3.5010%] (p = 0.00 < 0.05)
                        Performance has improved.

+12% perf improvement on the RadixUI benchmark - which I think is the one that's most representative of real-world code that gets thrown at Oxc, as all the other benchmark files are huge.

I'm really surprised that this change nets such a gain. I've previously tried making Token #[repr(C, align(16))] with padding u8 fields, so all 16 bytes are initialized. i.e. so the memory layout of Token is same as a u128 and with all the fields in same location as here. I'd have expected the compiler to boil that down to exactly the same assembly as this. But it seems not! 🤷

graphite-app · 2025-05-13T15:16:12Z

Merge activity

May 13, 3:16 PM UTC: @tomgasson we removed the merge queue label because we could not find a Graphite account associated with your GitHub profile.

You must have a Graphite account in order to use the merge queue. Create an account and try again using this link

Follow-on after #10933. That PR made some cosmetic changes to the tests for `Token` which weren't necessary. Revert them.

Follow-on after #10933. Cosmetic change only. Reformat comments to our standard style, with backticks around type/field names.

Boshen · 2025-05-14T01:05:16Z

Thank you @tomgasson! Can't wait to see what you find next.

Follow-on after #10933. It's preferable to include explicit type params in calls to `mem::transmute` rather than relying on type inference. This ensures any later changes won't inadvertently cause UB. Also expand the safety comment.

Conaclos · 2025-05-14T08:29:51Z

I'm really surprised that this change nets such a gain. I've previously tried making Token #[repr(C, align(16))] with padding u8 fields, so all 16 bytes are initialized. i.e. so the memory layout of Token is same as a u128 and with all the fields in same location as here. I'd have expected the compiler to boil that down to exactly the same assembly as this. But it seems not!

This is the kind of thing you could raise as an issue on the Rust repository.

Follow-on after #10933. Make `u8` value of `Kind::Eof` (`Kind::default()`) be 0. This makes `Token::default()` a little cheaper, as all bits of default `Token` are 0.

github-actions bot added A-parser Area - Parser C-performance Category - Solution not expected to change functional behavior, only performance labels May 10, 2025

camc314 requested a review from overlookmotel May 10, 2025 14:15

Boshen self-assigned this May 10, 2025

Boshen pushed a commit that referenced this pull request May 11, 2025

refactor(parser): make Token fields private

c3fa7fe

The goal of this PR is to reduce the diff for #10933

Boshen mentioned this pull request May 11, 2025

refactor(parser): make Token fields private #10936

Merged

graphite-app bot pushed a commit that referenced this pull request May 11, 2025

refactor(parser): make Token fields private (#10936)

b526da9

The goal of this PR is to reduce the diff for #10933

Boshen force-pushed the packed_token branch from cd4f1a5 to 1bf168b Compare May 11, 2025 05:29

Boshen assigned overlookmotel and unassigned Boshen May 11, 2025

perf(parser): Implement packed tokens

18cc339

Pack the parser tokens into a single u128

tomgasson force-pushed the packed_token branch from 1bf168b to 18cc339 Compare May 12, 2025 00:35

Boshen mentioned this pull request May 12, 2025

This week in Oxc #10329

Closed

overlookmotel approved these changes May 13, 2025

View reviewed changes

overlookmotel added the 0-merge Merge with Graphite Merge Queue label May 13, 2025

graphite-app bot removed the 0-merge Merge with Graphite Merge Queue label May 13, 2025

overlookmotel merged commit cfd1ed3 into oxc-project:main May 13, 2025
28 checks passed

overlookmotel mentioned this pull request May 13, 2025

refactor(lexer): tidy tests for Token #11011

Merged

graphite-app bot pushed a commit that referenced this pull request May 13, 2025

refactor(lexer): tidy tests for Token (#11011)

54bfb4b

Follow-on after #10933. That PR made some cosmetic changes to the tests for `Token` which weren't necessary. Revert them.

This was referenced May 13, 2025

style(lexer): reformat comments #11012

Merged

refactor(lexer): harden safety of transmute #11013

Merged

graphite-app bot pushed a commit that referenced this pull request May 13, 2025

style(lexer): reformat comments (#11012)

c049765

Follow-on after #10933. Cosmetic change only. Reformat comments to our standard style, with backticks around type/field names.

overlookmotel mentioned this pull request May 13, 2025

perf(parser): make Kind::Eof (default) 0 #11014

Merged

overlookmotel mentioned this pull request May 14, 2025

Reduce size of Token to 8 via limits on source text / token size? #11043

Closed

oxc-bot mentioned this pull request May 15, 2025

release(crates): v0.70.0 #11061

Merged

overlookmotel mentioned this pull request Aug 14, 2025

Convert Span to a single u64? oxc-project/backlog#178

Open

Uh oh!

Conversation

tomgasson commented May 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

graphite-app bot commented May 10, 2025

How to use the Graphite Merge Queue

Uh oh!

codspeed-hq bot commented May 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Instrumentation Performance Report

Merging #10933 will degrade performances by 3.85%

Summary

Benchmarks breakdown

Uh oh!

Boshen commented May 10, 2025

Uh oh!

Boshen commented May 10, 2025

Uh oh!

magic-akari commented May 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomgasson commented May 12, 2025

Uh oh!

tomgasson commented May 12, 2025

1. Compact Token & AST Representation

Hermes:

2. Pre-parsed Bytecode: Skipping Parse at Runtime

4. Bit-Packed Tokens: What, Why, and How

Why:

Idea:

Layout:

Accessors:

Uh oh!

tomgasson commented May 12, 2025

Uh oh!

overlookmotel left a comment

Choose a reason for hiding this comment

Uh oh!

graphite-app bot commented May 13, 2025

Merge activity

Uh oh!

Uh oh!

Boshen commented May 14, 2025

Uh oh!

Conaclos commented May 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tomgasson commented May 10, 2025 •

edited

Loading

codspeed-hq bot commented May 10, 2025 •

edited

Loading

magic-akari commented May 11, 2025 •

edited

Loading