refactor(oxc_parser): ParserConfig for Lexer and Parser#16785
refactor(oxc_parser): ParserConfig for Lexer and Parser#16785lilnasy wants to merge 1 commit intooxc-project:mainfrom
ParserConfig for Lexer and Parser#16785Conversation
Lexer and ParserParserConfig for Lexer and Parser
CodSpeed Performance ReportMerging #16785 will not alter performanceComparing Summary
Footnotes
|
There was a problem hiding this comment.
Pull request overview
This PR introduces a generic ParserConfig trait to the Lexer and Parser types as preparation for conditional compilation features (token storage and UTF-8 to UTF-16 translation for oxlint). The refactoring uses zero-cost abstractions via PhantomData and is intended to have no behavioral or performance changes at this stage.
Key Changes
- Added
ParserConfigtrait andStandardParserConfigimplementation for compile-time configuration - Made
Parser,ParserImpl, andLexergeneric overC: ParserConfigwithStandardParserConfigas the default - Updated all impl blocks across parser and lexer modules to include the generic parameter
- Converted byte handlers to use a const fn that's parameterized by config type
Reviewed changes
Copilot reviewed 30 out of 30 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
crates/oxc_parser/src/lib.rs |
Added ParserConfig trait, made Parser and ParserImpl generic, added PhantomData markers |
crates/oxc_parser/src/ts/types.rs |
Updated ParserImpl impl block to include generic C: ParserConfig parameter |
crates/oxc_parser/src/ts/statement.rs |
Updated ParserImpl impl block to include generic C: ParserConfig parameter |
crates/oxc_parser/src/modifiers.rs |
Updated ParserImpl impl blocks and nested function to include generic parameter |
crates/oxc_parser/src/lexer/mod.rs |
Made Lexer and LexerCheckpoint generic, added PhantomData markers |
crates/oxc_parser/src/lexer/byte_handlers.rs |
Converted byte handlers table to const fn, updated all handler functions to be generic |
crates/oxc_parser/src/lexer/whitespace.rs |
Updated Lexer impl block to include generic parameter |
crates/oxc_parser/src/lexer/unicode.rs |
Updated Lexer impl block to include generic parameter |
crates/oxc_parser/src/lexer/typescript.rs |
Updated Lexer impl block to include generic parameter |
crates/oxc_parser/src/lexer/template.rs |
Updated Lexer impl block and test to use generic parameter |
crates/oxc_parser/src/lexer/string.rs |
Updated Lexer impl block to include generic parameter |
crates/oxc_parser/src/lexer/regex.rs |
Updated Lexer impl block to include generic parameter |
crates/oxc_parser/src/lexer/punctuation.rs |
Updated Lexer impl block to include generic parameter |
crates/oxc_parser/src/lexer/numeric.rs |
Updated Lexer impl block to include generic parameter |
crates/oxc_parser/src/lexer/jsx.rs |
Updated Lexer impl block to include generic parameter |
crates/oxc_parser/src/lexer/identifier.rs |
Updated Lexer impl block to include generic parameter |
crates/oxc_parser/src/lexer/comment.rs |
Updated Lexer impl block to include generic parameter |
crates/oxc_parser/src/jsx/mod.rs |
Updated ParserImpl impl block to include generic parameter |
crates/oxc_parser/src/js/statement.rs |
Updated ParserImpl impl block to include generic parameter |
crates/oxc_parser/src/js/object.rs |
Updated ParserImpl impl block to include generic parameter |
crates/oxc_parser/src/js/module.rs |
Updated ParserImpl impl block to include generic parameter |
crates/oxc_parser/src/js/grammar.rs |
Updated CoverGrammar trait and all implementations to include generic parameter |
crates/oxc_parser/src/js/function.rs |
Updated ParserImpl impl block to include generic parameter |
crates/oxc_parser/src/js/expression.rs |
Updated ParserImpl impl block to include generic parameter |
crates/oxc_parser/src/js/declaration.rs |
Updated ParserImpl impl block to include generic parameter |
crates/oxc_parser/src/js/class.rs |
Updated ParserImpl impl block to include generic parameter |
crates/oxc_parser/src/js/binding.rs |
Updated ParserImpl impl block to include generic parameter |
crates/oxc_parser/src/js/arrow.rs |
Updated ParserImpl impl block to include generic parameter |
crates/oxc_parser/src/error_handler.rs |
Updated ParserImpl impl blocks to include generic parameter |
crates/oxc_parser/src/cursor.rs |
Made ParserCheckpoint generic, updated method signatures and closures |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// Create a new [`Parser`] | ||
| /// | ||
| /// # Parameters | ||
| /// - `allocator`: [Memory arena](oxc_allocator::Allocator) for allocating AST nodes | ||
| /// - `source_text`: Source code to parse | ||
| /// - `source_type`: Source type (e.g. JavaScript, TypeScript, JSX, ESM Module, Script) | ||
| pub fn new_with_config( | ||
| allocator: &'a Allocator, | ||
| source_text: &'a str, | ||
| source_type: SourceType, | ||
| ) -> Self { |
There was a problem hiding this comment.
The documentation for this method is incomplete - it's described as "Create a new Parser" but doesn't explain how it differs from the new method, or when this method should be used instead of new. The documentation should clarify that this is for creating a Parser with a custom ParserConfig implementation.
| /// Create a new `ParserImpl`. | ||
| /// | ||
| /// Requiring a `UniquePromise` to be provided guarantees only 1 `ParserImpl` can exist | ||
| /// on a single thread at one time. | ||
| #[inline] | ||
| pub fn new_with_config( | ||
| allocator: &'a Allocator, | ||
| source_text: &'a str, | ||
| source_type: SourceType, | ||
| options: ParseOptions, | ||
| unique: UniquePromise, |
There was a problem hiding this comment.
The documentation for this method is incomplete - it duplicates the documentation from the new method without explaining the difference. The documentation should clarify that this is for creating a ParserImpl with a custom ParserConfig implementation, and explain when to use this instead of new.
| use crate::diagnostics; | ||
|
|
||
| use super::{Kind, Lexer, RegExpFlags, Token}; | ||
| use super::{Kind, Lexer, ParserConfig, RegExpFlags, Token}; |
There was a problem hiding this comment.
The import path for ParserConfig is incorrect. ParserConfig is defined in the crate root (crate), not in the parent module (super). This should be use crate::ParserConfig; in a separate use statement, similar to the pattern used in other lexer modules like numeric.rs, string.rs, etc.
| /// Compile-time configuration for the parser and lexer. | ||
| /// | ||
| /// Will be used to configure token collection and UTF-8 to UTF-16 translation only for oxlint. |
There was a problem hiding this comment.
The ParserConfig trait documentation should provide more details about its purpose and usage. Consider adding:
- An explanation that this trait is a marker trait for zero-cost compile-time configuration
- Examples of when and how to implement custom configs
- A note that most users should use
StandardParserConfigviaParser::new() - A reference to the related
new_with_config()methods for custom configs
| /// Compile-time configuration for the parser and lexer. | |
| /// | |
| /// Will be used to configure token collection and UTF-8 to UTF-16 translation only for oxlint. | |
| /// Marker trait for zero-cost compile-time configuration of the parser and lexer. | |
| /// | |
| /// # Purpose | |
| /// | |
| /// `ParserConfig` is a marker trait used to enable compile-time configuration of the parser and lexer | |
| /// without incurring runtime overhead. It is primarily used to control features such as token collection | |
| /// and UTF-8 to UTF-16 translation, which are only needed in specialized scenarios (e.g., for linters like Oxlint). | |
| /// | |
| /// # Usage | |
| /// | |
| /// Most users should use the default [`StandardParserConfig`] by constructing a parser with [`Parser::new()`]. | |
| /// Custom configurations are only necessary if you need to enable or disable specific compile-time features. | |
| /// | |
| /// To use a custom configuration, implement the `ParserConfig` trait for your type and use the | |
| /// [`Parser::new_with_config()`] or [`Parser::with_options_and_config()`] methods. | |
| /// | |
| /// # Example | |
| /// | |
| /// ``` | |
| /// use oxc_parser::{Parser, ParserConfig, StandardParserConfig}; | |
| /// | |
| /// // Define a custom config if you need to enable special features. | |
| /// pub struct MyCustomConfig; | |
| /// | |
| /// impl ParserConfig for MyCustomConfig {} | |
| /// | |
| /// // Use the custom config with the parser. | |
| /// let parser = Parser::new_with_config(source_text, MyCustomConfig); | |
| /// ``` | |
| /// | |
| /// # Note | |
| /// | |
| /// For most use cases, prefer [`StandardParserConfig`] and [`Parser::new()`]. | |
| /// | |
| /// # See also | |
| /// | |
| /// - [`Parser::new_with_config()`] | |
| /// - [`Parser::with_options_and_config()`] |
| /// Will be used to configure token collection and UTF-8 to UTF-16 translation only for oxlint. | ||
| pub trait ParserConfig {} | ||
|
|
||
| /// Parser configuration intended for general use. |
There was a problem hiding this comment.
The StandardParserConfig type should have documentation explaining that it's the default configuration for the parser and lexer, used for general-purpose parsing. Add a doc comment describing when to use this versus implementing a custom ParserConfig.
| /// Parser configuration intended for general use. | |
| /// The default parser and lexer configuration for general-purpose parsing. | |
| /// | |
| /// Use `StandardParserConfig` when you want the standard behavior of the parser and lexer. | |
| /// This is suitable for most use cases. If you need to customize token collection, | |
| /// UTF-8 to UTF-16 translation, or other parser behaviors (e.g., for integration with | |
| /// tools like linters or IDEs), implement your own `ParserConfig`. |
| options: ParseOptions, | ||
|
|
||
| pub(crate) lexer: Lexer<'a>, | ||
| // TODO: investigate whether this needs to be `pub(crate)` |
There was a problem hiding this comment.
This TODO comment should be expanded to explain why this investigation is needed. What are the trade-offs of making the lexer field pub(crate) versus private? Is there a plan to address this, or should it be tracked in an issue?
| // TODO: investigate whether this needs to be `pub(crate)` | |
| /// TODO: Investigate whether this field needs to be `pub(crate)` or can be made private. | |
| /// | |
| /// Rationale: Currently, the `lexer` field is `pub(crate)`, which allows access from other modules within this crate. | |
| /// This may be necessary for benchmarks or internal tooling, but increases the surface area for unintended usage. | |
| /// Making it private would better encapsulate the parser's internals, but could break existing code that relies on direct access. | |
| /// | |
| /// Action: Review all usages of `lexer` outside this module to determine if `pub(crate)` is required. | |
| /// If not, consider making it private for better encapsulation. | |
| /// If it is required, document the use cases and consider whether a more controlled API is appropriate. | |
| /// | |
| /// Tracking: If this investigation is not immediately actionable, consider filing a tracking issue. |
| } | ||
|
|
||
| impl<'a> Parser<'a> { | ||
| impl<'a, C: ParserConfig> Parser<'a, C> { |
There was a problem hiding this comment.
Making this impl block generic over C: ParserConfig introduces a bug. The methods inside this block (parse and parse_expression) call ParserImpl::new, which is only implemented for StandardParserConfig. When C is not StandardParserConfig, these calls will fail to compile. The calls should use ParserImpl::new_with_config instead.
There was a problem hiding this comment.
Yes, this is a good callout.
|
I have a thought for how to fix the perf regression. It's not pretty, but hopefully may work! I'll try it out tomorrow. |
5f97c54 to
545d09b
Compare
|
Why does this need to be a trait? I thought using some options should be enough, if the goal is to store tokens. I'm always against using traits or macros or whatever that slows down compilation, Oxc codebase is massive, I don't want the parser to get compiled twice. This also bloats binary size if one day we decide to compile the linter and formatter together, which is very likely. Have we also confirmed that theses tokens align with eslint? |
### What this PR does Introduce `ParserConfig` trait (another try at #16785). The aim is to remove the large performance regression in parser that #19497 created, by making whether the parser generates tokens or not a compile-time option. `ParserConfig::tokens` method replaces `ParseOptions::collect_tokens` property. The former can be const-folded at compile time, where the latter couldn't. ### 3 options This PR also introduces 3 different config types that users can pass to the parser: * `NoTokensParserConfig` (default) * `TokensParserConfig` * `RuntimeParserConfig` The first 2 set whether tokens are collected or not at compile time. The last sets it at runtime. All 3 implement `ParserConfig`. `NoTokensParserConfig` is the default, and is what's used in compiler pipeline. It switches tokens off in the parser, and makes all the tokens-related code dead code, which the compiler eliminates. This makes the ability of the parser to generate tokens zero cost when it's not used (in the compiler pipeline). `TokensParserConfig` is the one to use where you *always* want tokens. This is probably the config that linter will use. `RuntimeParserConfig` is the one to use when an application decides whether to generate tokens or not at runtime. This config avoids compiling the parser twice, at the cost of runtime checks. This is what NAPI parser package will use. ### Future extension #### Supporting additional features In future we intend to build the UTF-8 to UTF-16 offsets conversion table in the parser. This will be more performant than searching through the source text for unicode characters in a 2nd pass later on. But this feature is only required for uses of the parser where we're interacting with JS side (NAPI parser package, linter with JS plugins). `ParserConfig` can be extended to toggle this feature on/off at compile time or runtime, in the same way as you toggle on/off tokens. #### Options and configs This PR introduces `ParserConfig` but leaves `ParseOptions` as it is. So we now have 2 sets of options, passed to `Parser` with `with_options(...)` and `with_config(...)`. This is confusing. We could merge the 2 by making `ParseOptions` implement `ParserConfig`, so then you'd define all options with one `with_options` call. This would have the side effect of making all other parser options (e.g. `preserve_parens`) able to be set at either runtime or compile time, depending on the use case. For users consuming `oxc_parser` as a library, with specific needs, they could also configure `Parser` to their needs e.g. create a parser which only handles plain JS code with all the code paths for JSX and TS shaken out as dead code. This would likely improve parsing speed significantly for these use cases. ### Implementation details Why a trait instead of a cargo feature? IMO a trait is preferable for the following reasons: 1. We have 3 different use cases we need to support (the 3 provided configs). 3 different Cargo features would be unwieldy. 2. This situation would become far worse once we introduce more features e.g. UTF-8 -> UTF-16 conversion. 3. No problems around feature unification. We found Cargo features caused headaches when we used them in linter for toggling on/off JS plugins support. 4. No clippy errors which only appear when the feature is/isn't disabled, requiring complex `#[cfg_attr(feature = "whatever", expect(clippy::unused_async))]` etc. The introduction of a trait does not seem to significantly affect compile time: ``` # Before this PR cargo build -p oxc_parser 15.33s user 3.27s system 254% cpu 7.316 total cargo build -p oxc_parser --release 17.36s user 2.26s system 231% cpu 8.477 total cargo build -p oxc_parser --example parser 18.43s user 3.75s system 271% cpu 8.156 total cargo build -p oxc_parser --example parser --release 32.52s user 2.59s system 180% cpu 19.454 total # After this PR cargo build -p oxc_parser 15.00s user 3.24s system 272% cpu 6.692 total cargo build -p oxc_parser --release 16.71s user 2.12s system 287% cpu 6.539 total cargo build -p oxc_parser --example parser 18.50s user 3.91s system 285% cpu 7.845 total cargo build -p oxc_parser --example parser --release 33.48s user 2.63s system 169% cpu 21.263 total ``` Measured on Mac Mini M4 Pro, `cargo clean` run before each. The difference appears to be mostly within the noise threshold.
Part of #16207. Preparation for the generic
ParserConfigtrait forLexer,Parser, and `ParserImpl. This is a generic interface that concretely does nothing for most compilation targets and will store tokens in an arena when compiled for oxlint.Of particular note for reviewing is the
BYTE_HANDLERSarray. It had to be switched from astaticarray to a returned array from aconstfunction to safely handle the generic. This seems to be the cause of a huge parsing performance downtick.In future, this abstraction will also be used to conditionally compile UTF-8 to UTF-16 translation only in oxlint.
The oxlint-exclusive storage of tokens is not implemented in this PR. At this point, the goal is to ensure there is no change in behavior or performance.
ParserConfiglilnasy/oxc#2.