feat(linter/plugins): serialize rust-parsed tokens by lilnasy · Pull Request #17025 · oxc-project/oxc

lilnasy · 2025-12-17T23:24:44Z

No description provided.

lilnasy · 2025-12-17T23:25:01Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

0-merge - adds this PR to the back of the merge queue
hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

codspeed-hq · 2025-12-17T23:33:22Z

CodSpeed Performance Report

Merging #17025 will degrade performances by 25.15%

_{Comparing 12-17-serialization (7ff857c) with 12-17-feat_oxc_parser_store_tokens_in_lexer_ (e7c19d8)}

Summary

❌ 4 regressions
✅ 38 untouched
⏩ 3 skipped¹

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Mode	Benchmark	`BASE`	`HEAD`	Change
❌	Simulation	`parser[cal.com.tsx]`	32.7 ms	43.7 ms	-25.15%
❌	Simulation	`parser[RadixUIAdoptionSection.jsx]`	96.9 µs	121.4 µs	-20.16%
❌	Simulation	`parser[react.development.js]`	1.8 ms	2.3 ms	-22.42%
❌	Simulation	`parser[binder.ts]`	4.3 ms	5.5 ms	-22.64%

3 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

overlookmotel

I imagine perf regression is coming from the loop which calls to_tseslint_type. Hopefully my suggestions below solve that.

overlookmotel · 2025-12-18T10:14:13Z

crates/oxc_ast/src/ast/js.rs

+    #[estree(skip)]
+    pub tokens: Vec<'a, Token<'a>>,


Can we avoid adding this field to Program? Tokens can be returned in ParserReturn instead.

overlookmotel · 2025-12-18T10:21:06Z

crates/oxc_ast/src/ast/token.rs

+use oxc_allocator::CloneIn;
+use oxc_ast_macros::{ast, ast_meta};
+use oxc_estree::ESTree;
+use oxc_span::{Atom, ContentEq, GetSpan, GetSpanMut, Span};
+
+#[ast]
+#[generate_derive(CloneIn, ContentEq, ESTree, GetSpan, GetSpanMut)]
+#[estree(add_fields(value = TokenValue), no_type, no_ts_def, no_parent)]
+#[derive(Debug)]
+/// Represents a token in the source code.
+pub struct Token<'a> {
+    /// Span.
+    #[span]
+    pub span: Span,
+    /// Type.
+    pub r#type: Atom<'a>,
+    /// Flags.
+    pub flags: Option<Atom<'a>>,
+    /// Pattern.
+    pub pattern: Option<Atom<'a>>,
+}
+
+/// Custom deserializer for `value` field of `Token`.
+#[ast_meta]
+#[generate_derive(CloneIn, ContentEq, ESTree)]
+#[estree(ts_type = "string", raw_deser = "SOURCE_TEXT.slice(THIS.start, THIS.end)")]
+pub struct TokenValue<'a, 'b>(pub &'b Token<'a>);


We shouldn't need this. We want JS side code to read the original Vec<Token> directly, without converting it.

Instead:

Add #[ast] attribute to the existing Token struct (and remove #[repr(transparent)] attr - #[ast] macro adds it automatically).

Add #[ast] #[generate_derive(ESTree)] #[estree(rename = "TokenKind")] attributes to the existing Kind (token kind) enum.

Implement ESTree trait on Token.

Add crates/oxc_parser/src/lexer/token.rs and crates/oxc_parser/src/lexer/kind.rs to the list of files that codegen processes at in ast_tools.

Add another type #[ast] struct Tokens<'a>(Vec<'a, Tokens>); in same file. We don't need to use that type, but adding it should make ast_tools generate a deserializeVecToken function.

Add a TOKEN flag to FLAG_NAMES in raw_transfer.rs generator.

Export deserializeVecToken from deserializer when TOKEN flag is enabled.

ESTree impl on Token will need to be manually defined (including the #[estree(raw_deser)] attr). Shout if you have any difficulty with this. Writing raw_deser implementations is pretty dreadful - my fault, it's terribly designed - and you'll need to write the byte offsets manually (since Token doesn't have real fields).

But you should be able to use deserializeTokenKind which codegen will generate, and also use the generated ESTree impl for Kind.

The conversion from Kind (token kind) to ESTree token type is the tricky part. Should be able to get codegen to do most of the work by adding #[estree(rename = "Keyword")] (etc) to all the variants of Kind.

No doubt, you didn't need me to explain every one of the steps above. I've just included them for clarity, to (hopefully) save you a little time. Please excuse me if I'm "teaching grandpa to suck eggs".

I've laid a bit of groundwork for this in #17050 and #17052.

Oh actually maybe we can get codegen to generate everything for us by:

Don't put #[ast] attr on Token.

Instead, add this:

/// Dummy type to communicate the content of `Token` to `oxc_ast_tools`. #[ast(foreign = Token)] #[generate_derive(ESTree)] #[expect(dead_code)] struct TokenAlias { pub span: Span, pub kind: Kind, #[estree(skip)] pub _align: U128Align, } /// Zero-sized type which has alignment of `u128` #[repr(transparent)] struct U128Align([u128; 0]);

This tells the codegen "treat Token as if it's defined like this".

You'd need to add a "special case" to codegen for U128Align (search ast_tools for PointerAlign).

EDIT: No this is a bad idea. Need to implement ESTree and raw_deser manually to handle the special case of extra fields on regexp tokens.

overlookmotel · 2025-12-18T11:32:37Z

crates/oxc_parser/src/lib.rs

+        let tokens = self.ast.vec_from_iter(self.lexer.tokens().iter().filter_map(|token| {
+            token.kind().to_tseslint_type().map(|ty| ast::Token {
+                span: token.span(),
+                r#type: self.ast.atom(ty),
+                flags: None,
+                pattern: None,
+            })
+        }));


This is probably where the perf regression is coming from. Should be able to remove it once my other comments are actioned.

… structs (#17052) `#[ast]` macro adds `#[repr(C)]` to structs. For structs with a single field, use `#[repr(transparent)]` instead. This does not alter memory layout, but can change the ABI, which can make it more performant to pass such structs to functions in some cases, as it can change the registers used for passing. This will help with making lexer tokens serializable (#17025), since `Token` will have `#[ast]` added to it, and we need it to continue to be `#[repr(transparent)]`.

Previously `u128` was not supported in `assert_layouts` generator, because Rust changed the alignment of `u128` from 8 to 16 - and therefore the alignment depended on Rust version. That change happened some time ago now, and all Rust versions above our MSRV have `u128` with alignment of 16. So we can now support `u128` (and also `i128`, `NonZeroU128`, `NonZeroI128`). Also support `u128` in `raw_transfer` and `raw_transfer_lazy` generators. This will help with making lexer tokens serializable (#17025), since `Token` contains a `u128`.

github-actions bot added A-linter Area - Linter A-parser Area - Parser A-cli Area - CLI A-ast Area - AST A-isolated-declarations Isolated Declarations A-ast-tools Area - AST tools A-formatter Area - Formatter A-linter-plugins Area - Linter JS plugins labels Dec 17, 2025

github-actions bot added the C-enhancement Category - New feature or request label Dec 17, 2025

This was referenced Dec 17, 2025

feat(parser): add token collection option #17024

Closed

refactor(linter/plugins): remove dependency on TypeScript #17026

Closed

lilnasy force-pushed the 12-17-serialization branch from 5416e19 to 3700cde Compare December 17, 2025 23:46

lilnasy force-pushed the 12-17-feat_oxc_parser_store_tokens_in_lexer_ branch from e35dc7e to f58229f Compare December 17, 2025 23:46

camc314 assigned overlookmotel Dec 18, 2025

overlookmotel mentioned this pull request Dec 18, 2025

fix(ast_tools): support u128 in assert_layouts generator #17050

Merged

overlookmotel reviewed Dec 18, 2025

View reviewed changes

overlookmotel mentioned this pull request Dec 18, 2025

perf(ast): #[ast] macro use #[repr(transparent)] for single-field structs #17052

Merged

overlookmotel mentioned this pull request Dec 18, 2025

Linter plugins: Rustify token methods #16207

Open

lilnasy changed the base branch from 12-17-feat_oxc_parser_store_tokens_in_lexer_ to graphite-base/17025 December 18, 2025 22:24

lilnasy force-pushed the graphite-base/17025 branch from f58229f to e7c19d8 Compare December 18, 2025 22:26

lilnasy force-pushed the 12-17-serialization branch from 3700cde to 7ff857c Compare December 18, 2025 22:26

lilnasy changed the base branch from graphite-base/17025 to 12-17-feat_oxc_parser_store_tokens_in_lexer_ December 18, 2025 22:26

lilnasy changed the base branch from 12-17-feat_oxc_parser_store_tokens_in_lexer_ to graphite-base/17025 December 18, 2025 23:26

lilnasy force-pushed the graphite-base/17025 branch from e7c19d8 to d6951b8 Compare December 19, 2025 00:07

lilnasy force-pushed the 12-17-serialization branch from 7ff857c to 1aa1abc Compare December 19, 2025 00:07

lilnasy changed the base branch from graphite-base/17025 to 12-17-feat_oxc_parser_store_tokens_in_lexer_ December 19, 2025 00:08

lilnasy changed the base branch from 12-17-feat_oxc_parser_store_tokens_in_lexer_ to graphite-base/17025 December 19, 2025 00:10

lilnasy force-pushed the 12-17-serialization branch from 1aa1abc to d75a44f Compare December 19, 2025 00:10

lilnasy force-pushed the graphite-base/17025 branch from d6951b8 to 8e5c3be Compare December 19, 2025 00:10

lilnasy changed the base branch from graphite-base/17025 to 12-17-feat_oxc_parser_store_tokens_in_lexer_ December 19, 2025 00:10

lilnasy force-pushed the 12-17-feat_oxc_parser_store_tokens_in_lexer_ branch from 8e5c3be to 1ed0bc7 Compare December 19, 2025 00:22

lilnasy force-pushed the 12-17-serialization branch from d75a44f to bfea179 Compare December 19, 2025 00:22

lilnasy changed the base branch from 12-17-feat_oxc_parser_store_tokens_in_lexer_ to graphite-base/17025 December 19, 2025 00:32

lilnasy force-pushed the 12-17-serialization branch from bfea179 to f75eb49 Compare December 19, 2025 00:32

lilnasy force-pushed the graphite-base/17025 branch from 1ed0bc7 to 125e885 Compare December 19, 2025 00:32

lilnasy changed the base branch from graphite-base/17025 to 12-17-feat_oxc_parser_store_tokens_in_lexer_ December 19, 2025 00:32

lilnasy force-pushed the 12-17-feat_oxc_parser_store_tokens_in_lexer_ branch from 125e885 to 3ee6e9e Compare December 19, 2025 00:52

lilnasy force-pushed the 12-17-serialization branch 2 times, most recently from e6b8780 to e2b7436 Compare December 19, 2025 00:58

lilnasy force-pushed the 12-17-feat_oxc_parser_store_tokens_in_lexer_ branch from 3ee6e9e to 7635cc4 Compare December 19, 2025 00:58

lilnasy mentioned this pull request Dec 19, 2025

test(benchmarks): add parser_tokens benchmark #17047

Closed

feat(linter/plugins): serialize rust-parsed tokens

8d40ca3

lilnasy changed the base branch from 12-17-feat_oxc_parser_store_tokens_in_lexer_ to graphite-base/17025 December 19, 2025 01:14

lilnasy force-pushed the graphite-base/17025 branch from 7635cc4 to 01dc729 Compare December 19, 2025 01:14

lilnasy force-pushed the 12-17-serialization branch from e2b7436 to 8d40ca3 Compare December 19, 2025 01:14

lilnasy changed the base branch from graphite-base/17025 to 12-19-perf_oxc_parser_preallocate_tokens_vector December 19, 2025 01:14

lilnasy mentioned this pull request Dec 19, 2025

perf(oxc_parser): preallocate tokens vector #17095

Closed

camc314 closed this Feb 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(linter/plugins): serialize rust-parsed tokens#17025

feat(linter/plugins): serialize rust-parsed tokens#17025
lilnasy wants to merge 1 commit into12-19-perf_oxc_parser_preallocate_tokens_vectorfrom
12-17-serialization

lilnasy commented Dec 17, 2025

Uh oh!

lilnasy commented Dec 17, 2025 •

edited

Loading

Uh oh!

codspeed-hq bot commented Dec 17, 2025 •

edited

Loading

Uh oh!

overlookmotel left a comment •

edited

Loading

Uh oh!

overlookmotel Dec 18, 2025

Uh oh!

overlookmotel Dec 18, 2025 •

edited

Loading

Uh oh!

overlookmotel Dec 18, 2025 •

edited

Loading

Uh oh!

overlookmotel Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

lilnasy commented Dec 17, 2025

Uh oh!

lilnasy commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to use the Graphite Merge Queue

Uh oh!

codspeed-hq bot commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #17025 will degrade performances by 25.15%

Summary

Benchmarks breakdown

Footnotes

Uh oh!

overlookmotel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

overlookmotel Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

overlookmotel Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

overlookmotel Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

overlookmotel Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lilnasy commented Dec 17, 2025 •

edited

Loading

codspeed-hq bot commented Dec 17, 2025 •

edited

Loading

overlookmotel left a comment •

edited

Loading

overlookmotel Dec 18, 2025 •

edited

Loading

overlookmotel Dec 18, 2025 •

edited

Loading