refactor(estree/tokens): replace stored token when re-lexing#19698
Conversation
Merging this PR will not alter performance
Comparing Footnotes
|
1eac4b3 to
7233548
Compare
a05a0b1 to
79a366f
Compare
There was a problem hiding this comment.
Pull request overview
This PR refactors the TypeScript angle bracket re-lexing mechanism to fix token collection issues by preventing duplicate tokens rather than skipping them during serialization. Previously, when the parser speculatively tried parsing << as type arguments and failed, it would leave a duplicate < token in the stream that the ESTree serializer had to skip. Now, the parser pops the compound token before re-lexing and restores it if the speculative parse fails.
Changes:
- Refactored parser to pop compound tokens (e.g.,
<<) during TypeScript angle bracket re-lexing and restore them viarewrite_last_collected_tokenwhen speculative parsing fails - Removed special-case logic from ESTree tokens serializer that was skipping duplicate tokens
- Added comprehensive comments explaining the labyrinthine re-lexing logic
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| oxfmtrc.jsonc | Added new tokens test fixtures directory to formatting exclusion list |
| crates/oxc_parser/src/lexer/typescript.rs | Added detailed comments explaining re-lexing behavior and token popping logic for left angle brackets |
| crates/oxc_parser/src/lexer/mod.rs | Added rewrite_last_collected_token method to restore popped tokens after failed speculative parses |
| crates/oxc_parser/src/js/expression.rs | Restructured conditional logic to call rewrite_last_collected_token when type argument parsing fails |
| crates/oxc_estree_tokens/src/lib.rs | Removed special-case logic for skipping duplicate << tokens and related ts_type_parameter_starts tracking |
| apps/oxlint/test/fixtures/tokens/files/ts_angle_relex.ts | Added test cases for TypeScript angle bracket disambiguation scenarios |
| apps/oxlint/test/fixtures/tokens/output.snap.md | Updated test snapshots with expected token output for new test cases |
Merge activity
|
Previously the ESTree tokens serializer had a special case to skip a `<<` token which has same start as preceding `<` token. Instead, solve this problem at source - prevent this duplicate token from being in the `Vec<Token>` in the first place, by modifying the parser. This is important because we want to move to sending tokens to JS via raw transfer, and we need the `Vec<Token>` to have the right number of tokens in it to start with, so we can do lazy deserialization and just get a token at a specific index. This breaks down if there are extra tokens that need to be skipped. The logic around this in parser is quite labyrinthine, so add lengthy comments explaining it. This change also simplifies the main loop in ESTree serializer, so has side effect of +1% on `estree_tokens` benchmark. Conversely, it doesn't affect the `parser_tokens` benchmark perf, because the path in parser which got more complex is not commonly taken.
79a366f to
03bce9f
Compare

Previously the ESTree tokens serializer had a special case to skip a
<<token which has same start as preceding<token. Instead, solve this problem at source - prevent this duplicate token from being in theVec<Token>in the first place, by modifying the parser.This is important because we want to move to sending tokens to JS via raw transfer, and we need the
Vec<Token>to have the right number of tokens in it to start with, so we can do lazy deserialization and just get a token at a specific index. This breaks down if there are extra tokens that need to be skipped.The logic around this in parser is quite labyrinthine, so add lengthy comments explaining it.
This change also simplifies the main loop in ESTree serializer, so has side effect of +1% on
estree_tokensbenchmark. Conversely, it doesn't affect theparser_tokensbenchmark perf, because the path in parser which got more complex is not commonly taken.