Skip to content

feat(linter/plugins): serialize rust-parsed tokens#17025

Closed
lilnasy wants to merge 1 commit into12-19-perf_oxc_parser_preallocate_tokens_vectorfrom
12-17-serialization
Closed

feat(linter/plugins): serialize rust-parsed tokens#17025
lilnasy wants to merge 1 commit into12-19-perf_oxc_parser_preallocate_tokens_vectorfrom
12-17-serialization

Conversation

@lilnasy
Copy link
Contributor

@lilnasy lilnasy commented Dec 17, 2025

No description provided.

@github-actions github-actions bot added A-linter Area - Linter A-parser Area - Parser A-cli Area - CLI A-ast Area - AST A-isolated-declarations Isolated Declarations A-ast-tools Area - AST tools A-formatter Area - Formatter A-linter-plugins Area - Linter JS plugins labels Dec 17, 2025
Copy link
Contributor Author

lilnasy commented Dec 17, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@codspeed-hq
Copy link

codspeed-hq bot commented Dec 17, 2025

CodSpeed Performance Report

Merging #17025 will degrade performances by 25.15%

Comparing 12-17-serialization (7ff857c) with 12-17-feat_oxc_parser_store_tokens_in_lexer_ (e7c19d8)

Summary

❌ 4 regressions
✅ 38 untouched
⏩ 3 skipped1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Mode Benchmark BASE HEAD Change
Simulation parser[cal.com.tsx] 32.7 ms 43.7 ms -25.15%
Simulation parser[RadixUIAdoptionSection.jsx] 96.9 µs 121.4 µs -20.16%
Simulation parser[react.development.js] 1.8 ms 2.3 ms -22.42%
Simulation parser[binder.ts] 4.3 ms 5.5 ms -22.64%

Footnotes

  1. 3 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Copy link
Member

@overlookmotel overlookmotel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine perf regression is coming from the loop which calls to_tseslint_type. Hopefully my suggestions below solve that.

Comment on lines +60 to +61
#[estree(skip)]
pub tokens: Vec<'a, Token<'a>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid adding this field to Program? Tokens can be returned in ParserReturn instead.

Comment on lines +1 to +27
use oxc_allocator::CloneIn;
use oxc_ast_macros::{ast, ast_meta};
use oxc_estree::ESTree;
use oxc_span::{Atom, ContentEq, GetSpan, GetSpanMut, Span};

#[ast]
#[generate_derive(CloneIn, ContentEq, ESTree, GetSpan, GetSpanMut)]
#[estree(add_fields(value = TokenValue), no_type, no_ts_def, no_parent)]
#[derive(Debug)]
/// Represents a token in the source code.
pub struct Token<'a> {
/// Span.
#[span]
pub span: Span,
/// Type.
pub r#type: Atom<'a>,
/// Flags.
pub flags: Option<Atom<'a>>,
/// Pattern.
pub pattern: Option<Atom<'a>>,
}

/// Custom deserializer for `value` field of `Token`.
#[ast_meta]
#[generate_derive(CloneIn, ContentEq, ESTree)]
#[estree(ts_type = "string", raw_deser = "SOURCE_TEXT.slice(THIS.start, THIS.end)")]
pub struct TokenValue<'a, 'b>(pub &'b Token<'a>);
Copy link
Member

@overlookmotel overlookmotel Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need this. We want JS side code to read the original Vec<Token> directly, without converting it.

Instead:

  • Add #[ast] attribute to the existing Token struct (and remove #[repr(transparent)] attr - #[ast] macro adds it automatically).
  • Add #[ast] #[generate_derive(ESTree)] #[estree(rename = "TokenKind")] attributes to the existing Kind (token kind) enum.
  • Implement ESTree trait on Token.
  • Add crates/oxc_parser/src/lexer/token.rs and crates/oxc_parser/src/lexer/kind.rs to the list of files that codegen processes at in ast_tools.
  • Add another type #[ast] struct Tokens<'a>(Vec<'a, Tokens>); in same file. We don't need to use that type, but adding it should make ast_tools generate a deserializeVecToken function.
  • Add a TOKEN flag to FLAG_NAMES in raw_transfer.rs generator.
  • Export deserializeVecToken from deserializer when TOKEN flag is enabled.

ESTree impl on Token will need to be manually defined (including the #[estree(raw_deser)] attr). Shout if you have any difficulty with this. Writing raw_deser implementations is pretty dreadful - my fault, it's terribly designed - and you'll need to write the byte offsets manually (since Token doesn't have real fields).

But you should be able to use deserializeTokenKind which codegen will generate, and also use the generated ESTree impl for Kind.

The conversion from Kind (token kind) to ESTree token type is the tricky part. Should be able to get codegen to do most of the work by adding #[estree(rename = "Keyword")] (etc) to all the variants of Kind.

No doubt, you didn't need me to explain every one of the steps above. I've just included them for clarity, to (hopefully) save you a little time. Please excuse me if I'm "teaching grandpa to suck eggs".

I've laid a bit of groundwork for this in #17050 and #17052.

Copy link
Member

@overlookmotel overlookmotel Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh actually maybe we can get codegen to generate everything for us by:

  • Don't put #[ast] attr on Token.
  • Instead, add this:
/// Dummy type to communicate the content of `Token` to `oxc_ast_tools`.
#[ast(foreign = Token)]
#[generate_derive(ESTree)]
#[expect(dead_code)]
struct TokenAlias {
  pub span: Span,
  pub kind: Kind,
  #[estree(skip)]
  pub _align: U128Align,
}

/// Zero-sized type which has alignment of `u128`
#[repr(transparent)]
struct U128Align([u128; 0]);

This tells the codegen "treat Token as if it's defined like this".

You'd need to add a "special case" to codegen for U128Align (search ast_tools for PointerAlign).

EDIT: No this is a bad idea. Need to implement ESTree and raw_deser manually to handle the special case of extra fields on regexp tokens.

Comment on lines +534 to +541
let tokens = self.ast.vec_from_iter(self.lexer.tokens().iter().filter_map(|token| {
token.kind().to_tseslint_type().map(|ty| ast::Token {
span: token.span(),
r#type: self.ast.atom(ty),
flags: None,
pattern: None,
})
}));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably where the perf regression is coming from. Should be able to remove it once my other comments are actioned.

graphite-app bot pushed a commit that referenced this pull request Dec 18, 2025
… structs (#17052)

`#[ast]` macro adds `#[repr(C)]` to structs. For structs with a single field, use `#[repr(transparent)]` instead. This does not alter memory layout, but can change the ABI, which can make it more performant to pass such structs to functions in some cases, as it can change the registers used for passing.

This will help with making lexer tokens serializable (#17025), since `Token` will have `#[ast]` added to it, and we need it to continue to be `#[repr(transparent)]`.
graphite-app bot pushed a commit that referenced this pull request Dec 18, 2025
Previously `u128` was not supported in `assert_layouts` generator, because Rust changed the alignment of `u128` from 8 to 16 - and therefore the alignment depended on Rust version.

That change happened some time ago now, and all Rust versions above our MSRV have `u128` with alignment of 16. So we can now support `u128` (and also `i128`, `NonZeroU128`, `NonZeroI128`).

Also support `u128` in `raw_transfer` and `raw_transfer_lazy` generators.

This will help with making lexer tokens serializable (#17025), since `Token` contains a `u128`.
@lilnasy lilnasy changed the base branch from 12-17-feat_oxc_parser_store_tokens_in_lexer_ to graphite-base/17025 December 18, 2025 22:24
@lilnasy lilnasy force-pushed the graphite-base/17025 branch from f58229f to e7c19d8 Compare December 18, 2025 22:26
@lilnasy lilnasy force-pushed the 12-17-serialization branch from 3700cde to 7ff857c Compare December 18, 2025 22:26
@lilnasy lilnasy changed the base branch from graphite-base/17025 to 12-17-feat_oxc_parser_store_tokens_in_lexer_ December 18, 2025 22:26
@lilnasy lilnasy changed the base branch from 12-17-feat_oxc_parser_store_tokens_in_lexer_ to graphite-base/17025 December 18, 2025 23:26
@lilnasy lilnasy force-pushed the graphite-base/17025 branch from e7c19d8 to d6951b8 Compare December 19, 2025 00:07
@lilnasy lilnasy force-pushed the 12-17-serialization branch from 7ff857c to 1aa1abc Compare December 19, 2025 00:07
@lilnasy lilnasy changed the base branch from graphite-base/17025 to 12-17-feat_oxc_parser_store_tokens_in_lexer_ December 19, 2025 00:08
@lilnasy lilnasy changed the base branch from 12-17-feat_oxc_parser_store_tokens_in_lexer_ to graphite-base/17025 December 19, 2025 00:10
@lilnasy lilnasy force-pushed the 12-17-serialization branch from 1aa1abc to d75a44f Compare December 19, 2025 00:10
@lilnasy lilnasy force-pushed the graphite-base/17025 branch from d6951b8 to 8e5c3be Compare December 19, 2025 00:10
@lilnasy lilnasy changed the base branch from graphite-base/17025 to 12-17-feat_oxc_parser_store_tokens_in_lexer_ December 19, 2025 00:10
@lilnasy lilnasy force-pushed the 12-17-feat_oxc_parser_store_tokens_in_lexer_ branch from 8e5c3be to 1ed0bc7 Compare December 19, 2025 00:22
@lilnasy lilnasy force-pushed the 12-17-serialization branch from d75a44f to bfea179 Compare December 19, 2025 00:22
@lilnasy lilnasy changed the base branch from 12-17-feat_oxc_parser_store_tokens_in_lexer_ to graphite-base/17025 December 19, 2025 00:32
@lilnasy lilnasy force-pushed the 12-17-serialization branch from bfea179 to f75eb49 Compare December 19, 2025 00:32
@lilnasy lilnasy force-pushed the graphite-base/17025 branch from 1ed0bc7 to 125e885 Compare December 19, 2025 00:32
@lilnasy lilnasy changed the base branch from graphite-base/17025 to 12-17-feat_oxc_parser_store_tokens_in_lexer_ December 19, 2025 00:32
@lilnasy lilnasy force-pushed the 12-17-feat_oxc_parser_store_tokens_in_lexer_ branch from 125e885 to 3ee6e9e Compare December 19, 2025 00:52
@lilnasy lilnasy force-pushed the 12-17-serialization branch 2 times, most recently from e6b8780 to e2b7436 Compare December 19, 2025 00:58
@lilnasy lilnasy force-pushed the 12-17-feat_oxc_parser_store_tokens_in_lexer_ branch from 3ee6e9e to 7635cc4 Compare December 19, 2025 00:58
@lilnasy lilnasy changed the base branch from 12-17-feat_oxc_parser_store_tokens_in_lexer_ to graphite-base/17025 December 19, 2025 01:14
@lilnasy lilnasy force-pushed the graphite-base/17025 branch from 7635cc4 to 01dc729 Compare December 19, 2025 01:14
@lilnasy lilnasy force-pushed the 12-17-serialization branch from e2b7436 to 8d40ca3 Compare December 19, 2025 01:14
@lilnasy lilnasy changed the base branch from graphite-base/17025 to 12-19-perf_oxc_parser_preallocate_tokens_vector December 19, 2025 01:14
@camc314 camc314 closed this Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-ast Area - AST A-ast-tools Area - AST tools A-cli Area - CLI A-formatter Area - Formatter A-isolated-declarations Isolated Declarations A-linter Area - Linter A-linter-plugins Area - Linter JS plugins A-parser Area - Parser C-enhancement Category - New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants