feat(parser): add token collection option by lilnasy · Pull Request #17024 · oxc-project/oxc

lilnasy · 2025-12-17T23:24:43Z

Part of #16207. This is just to measure the performance impact of unconditionally always storing tokens in an oxc_allocator::Vec.

lilnasy · 2025-12-17T23:25:00Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

0-merge - adds this PR to the back of the merge queue
hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

codspeed-hq · 2025-12-17T23:31:34Z

CodSpeed Performance Report

Merging #17024 will degrade performances by 29.01%

_{Comparing 12-17-feat_oxc_parser_store_tokens_in_lexer_ (d9d0d73) with 12-18-test_benchmarks_add_parser_tokens_benchmark (08c5680)}

Summary

❌ 12 regressions
✅ 34 untouched
⏩ 3 skipped¹

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Mode	Benchmark	`BASE`	`HEAD`	Change
❌	Simulation	`parser_tokens[react.development.js]`	1.2 ms	1.8 ms	-29.01%
❌	Simulation	`parser[RadixUIAdoptionSection.jsx]`	80.6 µs	88.1 µs	-8.48%
❌	Simulation	`parser[binder.ts]`	3.2 ms	3.4 ms	-6.32%
❌	Simulation	`parser_tokens[RadixUIAdoptionSection.jsx]`	80.6 µs	97.1 µs	-17.03%
❌	Simulation	`parser_tokens[cal.com.tsx]`	25.8 ms	32.8 ms	-21.16%
❌	Simulation	`parser[cal.com.tsx]`	25.8 ms	27.7 ms	-6.85%
❌	Simulation	`parser[react.development.js]`	1.2 ms	1.3 ms	-6.23%
❌	Simulation	`parser_tokens[binder.ts]`	3.2 ms	4.3 ms	-26.25%
❌	Simulation	`lexer[cal.com.tsx]`	5.5 ms	7.5 ms	-27.47%
❌	Simulation	`lexer[RadixUIAdoptionSection.jsx]`	21 µs	26.3 µs	-20.05%
❌	Simulation	`lexer[binder.ts]`	884.6 µs	1,149.6 µs	-23.05%
❌	Simulation	`lexer[react.development.js]`	357.9 µs	465.9 µs	-23.18%

3 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

Copilot

Pull request overview

This PR implements unconditional token storage in the Lexer as part of measuring the performance impact for issue #16207. The changes add a new tokens field to store all tokens encountered during parsing in an arena-allocated vector, and expose this collection through the ParserReturn struct.

Adds token storage infrastructure to the Lexer struct
Exports Token and Kind types publicly from the parser crate
Optimizes error collection by using append instead of extend

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
`tasks/track_memory_allocations/allocs_parser.snap`	Updates memory allocation benchmarks showing minimal increase in arena allocations and reallocations from storing tokens
`crates/oxc_parser/src/lib.rs`	Adds public exports for `Token` and `Kind`, adds `tokens` field to `ParserReturn`, and optimizes error handling with `append`
`crates/oxc_parser/src/lexer/mod.rs`	Adds `tokens` field to `Lexer`, updates checkpoint/rewind to track token vector length, and implements `tokens()` method to extract collected tokens

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

overlookmotel · 2025-12-18T10:05:05Z

I discussed with Boshen today. He wants to abandon the ParserConfig trait approach for now. Plan is to configure whether to store tokens or not with a runtime option. Boshen feels it's OK to take a small slowdown in compiler pipeline in return for less complicated code. If we find the perf hit is too much, we can look at bringing in ParserConfig trait later on.

Just to be transparent: Personally I am not entirely on board with this decision - Boshen and I have different priorities. I prioritize perf over all else (pretty much), whereas Boshen puts more weight on avoiding complex generics (readable code) and keeping compile times down. But we follow our fearless captain! So the decision is made.

I'd like to apologise to you. I thought that the approach was agreed already, but it seems not. So I've sent you down a pointless path of getting into the ParserConfig stuff (which was not trivial) and then reversed direction. Doing that is not cool - we should have reached agreement on the design first. I'm sorry.

So... here's what I suggest:

Add tokens: bool field to ParseOptions to enable/disable collecting tokens (default false).
Add tokens: ArenaVec<'a, Token> to ParserReturn (which will be empty if collecting tokens is disabled).
Stack this PR on top of test(benchmarks): add parser_tokens benchmark #17047.
Alter the parser_tokens benchmark to pass tokens: true in options.

I've added the parser_tokens benchmark in a separate PR, so we'll see the costs clearly in CodSpeed on this PR - it'll show cost to both compiler pipeline, and to linter.

I think we can improve perf when collecting tokens. Simplest thing would be to pre-allocate capacity in tokens Vec for source_text.len() tokens. There can never be more tokens than there are bytes in source text, so this will guarantee that the Vec doesn't have to grow during the parsing process.

Growing a Vec is really expensive (especially if it's large) as all the contents of the Vec (pre-growth) have to be copied to the new allocation (post-growth). You can see this cost in the CodSpeed flame graphs for lexer benchmarks (RawVec::finish_grow::grow).

Pre-allocating a lot of space is a speed vs memory usage trade-off. I imagine it'll be worth it.

But... since you have Graphite now, please make that optimization in a separate PR on top of this one, so we can see the effect it has on benchmarks in isolation.

lilnasy · 2025-12-19T01:31:32Z

Simplest thing would be to pre-allocate capacity in tokens Vec for source_text.len() tokens.

#17095 shows it improves performance by ~23% after this change degrades it by 28%, but that is misleading because of how CodSpeed does its maths. When I had the preallocation change in this PR, the degradation dropped down only to ~27%.

overlookmotel · 2025-12-19T02:10:19Z

tasks/benchmark/benches/parser.rs

                Parser::new(&allocator, source_text, source_type)
                    .with_options(ParseOptions {
                        parse_regular_expression: true,
+                        collect_tokens: true,


This should be false here, and true on the other benchmark - bench_parser_with_tokens.

But ergh! So the cost to parser with runtime option in parse-transform-minify-print pipeline is 6%-8%. That's a lot. We might have to bring back ParserConfig! :)

I've pushed a commit to switch round which benchmark gets collect_tokens: true. I just want the bench results to be clear so I can discuss with Boshen.

I've not restacked rest of the stack - didn't want to touch any of the rest.

Let's not worry about performance for now. If you're able to get it working and conformance tests passing, then we'll loop back and fix the perf. If necessary, we may have to go back to ParserConfig trait.

overlookmotel · 2025-12-19T14:38:15Z

crates/oxc_parser/src/lexer/mod.rs

+            let backtrace = std::backtrace::Backtrace::capture();
+            panic!("Can't retrieve tokens because they were not collected\n{backtrace}");


I've never seen Backtrace used before. I think panic! automatically produces a backtrace, so it's not required. Any reason why you added this?

I wasn't seeing the names of the methods in the call stack until I added this. It's possible I missed something.

We don't have to keep this, I needed it just for debugging.

fyi probably:

https://github.com/Boshen/oxc/blob/main/Cargo.toml#L252-L255

Ah I see. Usually running with RUST_BACKTRACE=1 gives you stack traces.

Turning on debug temporarily can also help sometimes:

oxc/Cargo.toml

Lines 247 to 255 in 5a7fcd1

[profile.dev]

# Disabling debug info speeds up local and CI builds,

# and we don't rely on it for debugging that much.

debug = false

[profile.test]

# Disabling debug info speeds up local and CI builds,

# and we don't rely on it for debugging that much.

debug = false

Let me know if neither of those works.

overlookmotel · 2025-12-19T14:40:42Z

crates/oxc_parser/src/lib.rs

-            errors.extend(self.lexer.errors);
-            errors.extend(self.errors);
+            errors.append(&mut self.lexer.errors);
+            errors.append(&mut self.errors);


This change is likely good, but it's incidental to this PR. Want to make a separate PR for this?

It maybe incidental for self.errors, but extend() moves self.lexer while consuming self.lexer.errors. That prevented the self.lexer.tokens() call below.

Actually, maybe it's not good. extend is I think slightly cheaper.

But this code could be optimized in other way.

module_record_errors.len() is not included in reservation.

Could extend one of the existing Vecs instead of creating a new one.

Something like:

if !self.source_type.is_typescript() { module_record_errors.truncate(); } errors = self.lexer.errors; errors.reserve(self.errors.len() + module_record_errors.len()); errors.extend(self.errors); errors.extend(module_record_errors);

It maybe incidental for self.errors, but extend() moves self.lexer while consuming self.lexer.errors. That prevented the self.lexer.tokens() call below.

Ah ha! Sorry I was wrong. It's not incidental.

But could you just move the self.lexer.tokens() call to earlier, before the error-handling code, and then leave it as using extends()?

github-actions bot added A-parser Area - Parser C-enhancement Category - New feature or request labels Dec 17, 2025

This was referenced Dec 17, 2025

feat(linter/plugins): serialize rust-parsed tokens #17025

Closed

refactor(linter/plugins): remove dependency on TypeScript #17026

Closed

lilnasy force-pushed the 12-17-feat_oxc_parser_store_tokens_in_lexer_ branch from e35dc7e to f58229f Compare December 17, 2025 23:46

lilnasy marked this pull request as ready for review December 18, 2025 00:26

Copilot AI review requested due to automatic review settings December 18, 2025 00:26

Copilot started reviewing on behalf of lilnasy December 18, 2025 00:26 View session

Copilot AI reviewed Dec 18, 2025

View reviewed changes

Boshen self-requested a review December 18, 2025 03:10

Boshen self-assigned this Dec 18, 2025

lilnasy marked this pull request as draft December 18, 2025 05:31

overlookmotel mentioned this pull request Dec 18, 2025

test(benchmarks): add parser_tokens benchmark #17047

Closed

overlookmotel mentioned this pull request Dec 18, 2025

Linter plugins: Rustify token methods #16207

Open

lilnasy force-pushed the 12-17-feat_oxc_parser_store_tokens_in_lexer_ branch 2 times, most recently from e7c19d8 to 4ecf1ad Compare December 18, 2025 23:26

github-actions bot added A-linter Area - Linter A-cli Area - CLI A-formatter Area - Formatter labels Dec 18, 2025

lilnasy force-pushed the 12-17-feat_oxc_parser_store_tokens_in_lexer_ branch 4 times, most recently from 1ed0bc7 to 125e885 Compare December 19, 2025 00:32

lilnasy changed the title ~~feat(oxc_parser): store tokens in Lexer~~ feat(parser): add token collection option Dec 19, 2025

lilnasy changed the base branch from main to 12-18-test_benchmarks_add_parser_tokens_benchmark December 19, 2025 00:47

github-actions bot added A-ast-tools Area - AST tools A-editor Area - Editor and Language Server labels Dec 19, 2025

github-actions bot added the A-linter-plugins Area - Linter JS plugins label Dec 19, 2025

lilnasy force-pushed the 12-18-test_benchmarks_add_parser_tokens_benchmark branch from 3296252 to e7d7cab Compare December 19, 2025 00:48

lilnasy changed the base branch from 12-18-test_benchmarks_add_parser_tokens_benchmark to graphite-base/17024 December 19, 2025 00:51

lilnasy force-pushed the 12-17-feat_oxc_parser_store_tokens_in_lexer_ branch from 125e885 to 3ee6e9e Compare December 19, 2025 00:52

lilnasy force-pushed the graphite-base/17024 branch from 3296252 to 6d053b4 Compare December 19, 2025 00:52

lilnasy changed the base branch from graphite-base/17024 to main December 19, 2025 00:52

lilnasy changed the base branch from main to graphite-base/17024 December 19, 2025 00:58

lilnasy force-pushed the 12-17-feat_oxc_parser_store_tokens_in_lexer_ branch from 3ee6e9e to 7635cc4 Compare December 19, 2025 00:58

lilnasy changed the base branch from graphite-base/17024 to 12-18-test_benchmarks_add_parser_tokens_benchmark December 19, 2025 00:58

lilnasy force-pushed the 12-17-feat_oxc_parser_store_tokens_in_lexer_ branch from 7635cc4 to f59f6c8 Compare December 19, 2025 01:14

lilnasy mentioned this pull request Dec 19, 2025

perf(oxc_parser): preallocate tokens vector #17095

Closed

overlookmotel reviewed Dec 19, 2025

View reviewed changes

lilnasy and others added 3 commits December 19, 2025 20:28

feat(oxc_parser): store tokens in Lexer

9fb6a46

fix benchmark

5ebab04

collect tokens before lexer errors need to be consumed

d9d0d73

lilnasy force-pushed the 12-17-feat_oxc_parser_store_tokens_in_lexer_ branch from bdd6736 to d9d0d73 Compare December 19, 2025 14:59

lilnasy force-pushed the 12-18-test_benchmarks_add_parser_tokens_benchmark branch from f525989 to 08c5680 Compare December 19, 2025 14:59

Boshen removed their assignment Feb 6, 2026

camc314 closed this Feb 19, 2026

overlookmotel deleted the 12-17-feat_oxc_parser_store_tokens_in_lexer_ branch February 27, 2026 00:42

		let backtrace = std::backtrace::Backtrace::capture();
		panic!("Can't retrieve tokens because they were not collected\n{backtrace}");

	[profile.dev]
	# Disabling debug info speeds up local and CI builds,
	# and we don't rely on it for debugging that much.
	debug = false

	[profile.test]
	# Disabling debug info speeds up local and CI builds,
	# and we don't rely on it for debugging that much.
	debug = false

Uh oh!

Conversation

lilnasy commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lilnasy commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to use the Graphite Merge Queue

Uh oh!

codspeed-hq bot commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging #17024 will degrade performances by 29.01%

Summary

Benchmarks breakdown

Footnotes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

overlookmotel commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lilnasy commented Dec 19, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

overlookmotel Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

overlookmotel Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

overlookmotel Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

lilnasy commented Dec 17, 2025 •

edited

Loading

lilnasy commented Dec 17, 2025 •

edited

Loading

codspeed-hq bot commented Dec 17, 2025 •

edited

Loading

overlookmotel commented Dec 18, 2025 •

edited

Loading

overlookmotel Dec 19, 2025 •

edited

Loading

overlookmotel Dec 19, 2025 •

edited

Loading

overlookmotel Dec 19, 2025 •

edited

Loading