perf(parser): introduce ParserConfig#19637
Conversation
Merging this PR will improve performance by 18.18%
Performance Changes
Comparing Footnotes
|
There was a problem hiding this comment.
Pull request overview
This PR introduces a ParserConfig trait to control whether the parser collects tokens at compile-time or runtime, addressing a performance regression from #19497. The change enables zero-cost abstractions for token collection by making it a compile-time decision.
Changes:
- Introduced
ParserConfigtrait with three implementations:NoTokensParserConfig(default),TokensParserConfig, andRuntimeParserConfig - Removed
collect_tokensfield fromParseOptionsand replaced it with the config system - Updated all parser and lexer implementations to be generic over the config type
- Migrated byte handler dispatch from a static array to per-config static arrays to enable better optimization
Reviewed changes
Copilot reviewed 34 out of 35 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| crates/oxc_parser/src/config.rs | New module defining ParserConfig and LexerConfig traits with three concrete implementations |
| crates/oxc_parser/src/lib.rs | Updated Parser struct to be generic over ParserConfig, added with_config method, removed collect_tokens from ParseOptions |
| crates/oxc_parser/src/lexer/mod.rs | Updated Lexer to be generic over LexerConfig, changed config field from bool to generic type |
| crates/oxc_parser/src/lexer/byte_handlers.rs | Converted static BYTE_HANDLERS array to per-config static arrays in byte_handler_tables module |
| crates/oxc_parser/src/js/*.rs | Added generic Config parameter to all ParserImpl implementations in JS parsing modules |
| crates/oxc_parser/src/ts/*.rs | Added generic Config parameter to all ParserImpl implementations in TS parsing modules |
| crates/oxc_parser/src/jsx/mod.rs | Added generic Config parameter to ParserImpl implementation for JSX |
| crates/oxc_parser/src/lexer/*.rs | Added generic Config parameter to all Lexer implementations in lexer submodules |
| tasks/coverage/src/tools.rs | Updated to use RuntimeParserConfig for token collection in coverage tests |
| tasks/benchmark/benches/lexer.rs | Updated to use NoTokensLexerConfig for benchmarks |
| napi/playground/src/lib.rs | Removed collect_tokens field from ParseOptions struct initialization |
| crates/oxc_formatter/src/service/mod.rs | Removed collect_tokens field from ParseOptions struct initialization |
|
@overlookmotel i think we should move this below #19497 so we can monitor the perf change more clearly? |
Yes, I agree that'd be preferable. I tried, but it was a bit of a nightmare because the 2 PRs touch all the same code. I've checked the numbers on CodSpeed and they're exactly back to where they were before the preceding PR. |
a62c8da to
fcc54a9
Compare

What this PR does
Introduce
ParserConfigtrait (another try at #16785).The aim is to remove the large performance regression in parser that #19497 created, by making whether the parser generates tokens or not a compile-time option.
ParserConfig::tokensmethod replacesParseOptions::collect_tokensproperty. The former can be const-folded at compile time, where the latter couldn't.3 options
This PR also introduces 3 different config types that users can pass to the parser:
NoTokensParserConfig(default)TokensParserConfigRuntimeParserConfigThe first 2 set whether tokens are collected or not at compile time. The last sets it at runtime.
All 3 implement
ParserConfig.NoTokensParserConfigis the default, and is what's used in compiler pipeline. It switches tokens off in the parser, and makes all the tokens-related code dead code, which the compiler eliminates. This makes the ability of the parser to generate tokens zero cost when it's not used (in the compiler pipeline).TokensParserConfigis the one to use where you always want tokens. This is probably the config that linter will use.RuntimeParserConfigis the one to use when an application decides whether to generate tokens or not at runtime. This config avoids compiling the parser twice, at the cost of runtime checks. This is what NAPI parser package will use.Future extension
Supporting additional features
In future we intend to build the UTF-8 to UTF-16 offsets conversion table in the parser. This will be more performant than searching through the source text for unicode characters in a 2nd pass later on. But this feature is only required for uses of the parser where we're interacting with JS side (NAPI parser package, linter with JS plugins).
ParserConfigcan be extended to toggle this feature on/off at compile time or runtime, in the same way as you toggle on/off tokens.Options and configs
This PR introduces
ParserConfigbut leavesParseOptionsas it is. So we now have 2 sets of options, passed toParserwithwith_options(...)andwith_config(...). This is confusing.We could merge the 2 by making
ParseOptionsimplementParserConfig, so then you'd define all options with onewith_optionscall.This would have the side effect of making all other parser options (e.g.
preserve_parens) able to be set at either runtime or compile time, depending on the use case.For users consuming
oxc_parseras a library, with specific needs, they could also configureParserto their needs e.g. create a parser which only handles plain JS code with all the code paths for JSX and TS shaken out as dead code. This would likely improve parsing speed significantly for these use cases.Implementation details
Why a trait instead of a cargo feature?
IMO a trait is preferable for the following reasons:
#[cfg_attr(feature = "whatever", expect(clippy::unused_async))]etc.The introduction of a trait does not seem to significantly affect compile time:
Measured on Mac Mini M4 Pro,
cargo cleanrun before each. The difference appears to be mostly within the noise threshold.