Skip to content

refactor(oxc_parser): ParserConfig for Lexer and Parser#16785

Closed
lilnasy wants to merge 1 commit intooxc-project:mainfrom
lilnasy:refactor/parseconfig
Closed

refactor(oxc_parser): ParserConfig for Lexer and Parser#16785
lilnasy wants to merge 1 commit intooxc-project:mainfrom
lilnasy:refactor/parseconfig

Conversation

@lilnasy
Copy link
Contributor

@lilnasy lilnasy commented Dec 12, 2025

Part of #16207. Preparation for the generic ParserConfig trait for Lexer, Parser, and `ParserImpl. This is a generic interface that concretely does nothing for most compilation targets and will store tokens in an arena when compiled for oxlint.

Of particular note for reviewing is the BYTE_HANDLERS array. It had to be switched from a static array to a returned array from a const function to safely handle the generic. This seems to be the cause of a huge parsing performance downtick.

In future, this abstraction will also be used to conditionally compile UTF-8 to UTF-16 translation only in oxlint.

The oxlint-exclusive storage of tokens is not implemented in this PR. At this point, the goal is to ensure there is no change in behavior or performance.

Copilot AI review requested due to automatic review settings December 12, 2025 22:37
@github-actions github-actions bot added A-parser Area - Parser C-cleanup Category - technical debt or refactoring. Solution not expected to change behavior labels Dec 12, 2025
@lilnasy lilnasy changed the title refactor(oxc_parser): generic Lexer and Parser refactor(oxc_parser): ParserConfig for Lexer and Parser Dec 12, 2025
@codspeed-hq
Copy link

codspeed-hq bot commented Dec 12, 2025

CodSpeed Performance Report

Merging #16785 will not alter performance

Comparing lilnasy:refactor/parseconfig (545d09b) with main (c897794)1

Summary

✅ 42 untouched
⏩ 3 skipped2

Footnotes

  1. No successful run was found on main (d77e22d) during the generation of this report, so c897794 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

  2. 3 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a generic ParserConfig trait to the Lexer and Parser types as preparation for conditional compilation features (token storage and UTF-8 to UTF-16 translation for oxlint). The refactoring uses zero-cost abstractions via PhantomData and is intended to have no behavioral or performance changes at this stage.

Key Changes

  • Added ParserConfig trait and StandardParserConfig implementation for compile-time configuration
  • Made Parser, ParserImpl, and Lexer generic over C: ParserConfig with StandardParserConfig as the default
  • Updated all impl blocks across parser and lexer modules to include the generic parameter
  • Converted byte handlers to use a const fn that's parameterized by config type

Reviewed changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
crates/oxc_parser/src/lib.rs Added ParserConfig trait, made Parser and ParserImpl generic, added PhantomData markers
crates/oxc_parser/src/ts/types.rs Updated ParserImpl impl block to include generic C: ParserConfig parameter
crates/oxc_parser/src/ts/statement.rs Updated ParserImpl impl block to include generic C: ParserConfig parameter
crates/oxc_parser/src/modifiers.rs Updated ParserImpl impl blocks and nested function to include generic parameter
crates/oxc_parser/src/lexer/mod.rs Made Lexer and LexerCheckpoint generic, added PhantomData markers
crates/oxc_parser/src/lexer/byte_handlers.rs Converted byte handlers table to const fn, updated all handler functions to be generic
crates/oxc_parser/src/lexer/whitespace.rs Updated Lexer impl block to include generic parameter
crates/oxc_parser/src/lexer/unicode.rs Updated Lexer impl block to include generic parameter
crates/oxc_parser/src/lexer/typescript.rs Updated Lexer impl block to include generic parameter
crates/oxc_parser/src/lexer/template.rs Updated Lexer impl block and test to use generic parameter
crates/oxc_parser/src/lexer/string.rs Updated Lexer impl block to include generic parameter
crates/oxc_parser/src/lexer/regex.rs Updated Lexer impl block to include generic parameter
crates/oxc_parser/src/lexer/punctuation.rs Updated Lexer impl block to include generic parameter
crates/oxc_parser/src/lexer/numeric.rs Updated Lexer impl block to include generic parameter
crates/oxc_parser/src/lexer/jsx.rs Updated Lexer impl block to include generic parameter
crates/oxc_parser/src/lexer/identifier.rs Updated Lexer impl block to include generic parameter
crates/oxc_parser/src/lexer/comment.rs Updated Lexer impl block to include generic parameter
crates/oxc_parser/src/jsx/mod.rs Updated ParserImpl impl block to include generic parameter
crates/oxc_parser/src/js/statement.rs Updated ParserImpl impl block to include generic parameter
crates/oxc_parser/src/js/object.rs Updated ParserImpl impl block to include generic parameter
crates/oxc_parser/src/js/module.rs Updated ParserImpl impl block to include generic parameter
crates/oxc_parser/src/js/grammar.rs Updated CoverGrammar trait and all implementations to include generic parameter
crates/oxc_parser/src/js/function.rs Updated ParserImpl impl block to include generic parameter
crates/oxc_parser/src/js/expression.rs Updated ParserImpl impl block to include generic parameter
crates/oxc_parser/src/js/declaration.rs Updated ParserImpl impl block to include generic parameter
crates/oxc_parser/src/js/class.rs Updated ParserImpl impl block to include generic parameter
crates/oxc_parser/src/js/binding.rs Updated ParserImpl impl block to include generic parameter
crates/oxc_parser/src/js/arrow.rs Updated ParserImpl impl block to include generic parameter
crates/oxc_parser/src/error_handler.rs Updated ParserImpl impl blocks to include generic parameter
crates/oxc_parser/src/cursor.rs Made ParserCheckpoint generic, updated method signatures and closures

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +280 to +290
/// Create a new [`Parser`]
///
/// # Parameters
/// - `allocator`: [Memory arena](oxc_allocator::Allocator) for allocating AST nodes
/// - `source_text`: Source code to parse
/// - `source_type`: Source type (e.g. JavaScript, TypeScript, JSX, ESM Module, Script)
pub fn new_with_config(
allocator: &'a Allocator,
source_text: &'a str,
source_type: SourceType,
) -> Self {
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation for this method is incomplete - it's described as "Create a new Parser" but doesn't explain how it differs from the new method, or when this method should be used instead of new. The documentation should clarify that this is for creating a Parser with a custom ParserConfig implementation.

Copilot uses AI. Check for mistakes.
Comment on lines +450 to +460
/// Create a new `ParserImpl`.
///
/// Requiring a `UniquePromise` to be provided guarantees only 1 `ParserImpl` can exist
/// on a single thread at one time.
#[inline]
pub fn new_with_config(
allocator: &'a Allocator,
source_text: &'a str,
source_type: SourceType,
options: ParseOptions,
unique: UniquePromise,
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation for this method is incomplete - it duplicates the documentation from the new method without explaining the difference. The documentation should clarify that this is for creating a ParserImpl with a custom ParserConfig implementation, and explain when to use this instead of new.

Copilot uses AI. Check for mistakes.
use crate::diagnostics;

use super::{Kind, Lexer, RegExpFlags, Token};
use super::{Kind, Lexer, ParserConfig, RegExpFlags, Token};
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import path for ParserConfig is incorrect. ParserConfig is defined in the crate root (crate), not in the parent module (super). This should be use crate::ParserConfig; in a separate use statement, similar to the pattern used in other lexer modules like numeric.rs, string.rs, etc.

Copilot uses AI. Check for mistakes.
Comment on lines +186 to +188
/// Compile-time configuration for the parser and lexer.
///
/// Will be used to configure token collection and UTF-8 to UTF-16 translation only for oxlint.
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ParserConfig trait documentation should provide more details about its purpose and usage. Consider adding:

  1. An explanation that this trait is a marker trait for zero-cost compile-time configuration
  2. Examples of when and how to implement custom configs
  3. A note that most users should use StandardParserConfig via Parser::new()
  4. A reference to the related new_with_config() methods for custom configs
Suggested change
/// Compile-time configuration for the parser and lexer.
///
/// Will be used to configure token collection and UTF-8 to UTF-16 translation only for oxlint.
/// Marker trait for zero-cost compile-time configuration of the parser and lexer.
///
/// # Purpose
///
/// `ParserConfig` is a marker trait used to enable compile-time configuration of the parser and lexer
/// without incurring runtime overhead. It is primarily used to control features such as token collection
/// and UTF-8 to UTF-16 translation, which are only needed in specialized scenarios (e.g., for linters like Oxlint).
///
/// # Usage
///
/// Most users should use the default [`StandardParserConfig`] by constructing a parser with [`Parser::new()`].
/// Custom configurations are only necessary if you need to enable or disable specific compile-time features.
///
/// To use a custom configuration, implement the `ParserConfig` trait for your type and use the
/// [`Parser::new_with_config()`] or [`Parser::with_options_and_config()`] methods.
///
/// # Example
///
/// ```
/// use oxc_parser::{Parser, ParserConfig, StandardParserConfig};
///
/// // Define a custom config if you need to enable special features.
/// pub struct MyCustomConfig;
///
/// impl ParserConfig for MyCustomConfig {}
///
/// // Use the custom config with the parser.
/// let parser = Parser::new_with_config(source_text, MyCustomConfig);
/// ```
///
/// # Note
///
/// For most use cases, prefer [`StandardParserConfig`] and [`Parser::new()`].
///
/// # See also
///
/// - [`Parser::new_with_config()`]
/// - [`Parser::with_options_and_config()`]

Copilot uses AI. Check for mistakes.
/// Will be used to configure token collection and UTF-8 to UTF-16 translation only for oxlint.
pub trait ParserConfig {}

/// Parser configuration intended for general use.
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The StandardParserConfig type should have documentation explaining that it's the default configuration for the parser and lexer, used for general-purpose parsing. Add a doc comment describing when to use this versus implementing a custom ParserConfig.

Suggested change
/// Parser configuration intended for general use.
/// The default parser and lexer configuration for general-purpose parsing.
///
/// Use `StandardParserConfig` when you want the standard behavior of the parser and lexer.
/// This is suitable for most use cases. If you need to customize token collection,
/// UTF-8 to UTF-16 translation, or other parser behaviors (e.g., for integration with
/// tools like linters or IDEs), implement your own `ParserConfig`.

Copilot uses AI. Check for mistakes.
options: ParseOptions,

pub(crate) lexer: Lexer<'a>,
// TODO: investigate whether this needs to be `pub(crate)`
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO comment should be expanded to explain why this investigation is needed. What are the trade-offs of making the lexer field pub(crate) versus private? Is there a plan to address this, or should it be tracked in an issue?

Suggested change
// TODO: investigate whether this needs to be `pub(crate)`
/// TODO: Investigate whether this field needs to be `pub(crate)` or can be made private.
///
/// Rationale: Currently, the `lexer` field is `pub(crate)`, which allows access from other modules within this crate.
/// This may be necessary for benchmarks or internal tooling, but increases the surface area for unintended usage.
/// Making it private would better encapsulate the parser's internals, but could break existing code that relies on direct access.
///
/// Action: Review all usages of `lexer` outside this module to determine if `pub(crate)` is required.
/// If not, consider making it private for better encapsulation.
/// If it is required, document the use cases and consider whether a more controlled API is appropriate.
///
/// Tracking: If this investigation is not immediately actionable, consider filing a tracking issue.

Copilot uses AI. Check for mistakes.
}

impl<'a> Parser<'a> {
impl<'a, C: ParserConfig> Parser<'a, C> {
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making this impl block generic over C: ParserConfig introduces a bug. The methods inside this block (parse and parse_expression) call ParserImpl::new, which is only implemented for StandardParserConfig. When C is not StandardParserConfig, these calls will fail to compile. The calls should use ParserImpl::new_with_config instead.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a good callout.

@overlookmotel
Copy link
Member

I have a thought for how to fix the perf regression. It's not pretty, but hopefully may work! I'll try it out tomorrow.

@overlookmotel overlookmotel self-assigned this Dec 15, 2025
@lilnasy lilnasy force-pushed the refactor/parseconfig branch from 5f97c54 to 545d09b Compare December 15, 2025 18:49
@Boshen
Copy link
Member

Boshen commented Dec 16, 2025

Why does this need to be a trait? I thought using some options should be enough, if the goal is to store tokens.

I'm always against using traits or macros or whatever that slows down compilation, Oxc codebase is massive, I don't want the parser to get compiled twice. This also bloats binary size if one day we decide to compile the linter and formatter together, which is very likely.

Have we also confirmed that theses tokens align with eslint?

@Boshen Boshen marked this pull request as draft December 16, 2025 02:03
@camc314 camc314 closed this Feb 19, 2026
graphite-app bot pushed a commit that referenced this pull request Feb 24, 2026
### What this PR does

Introduce `ParserConfig` trait (another try at #16785).

The aim is to remove the large performance regression in parser that #19497 created, by making whether the parser generates tokens or not a compile-time option.

`ParserConfig::tokens` method replaces `ParseOptions::collect_tokens` property. The former can be const-folded at compile time, where the latter couldn't.

### 3 options

This PR also introduces 3 different config types that users can pass to the parser:

* `NoTokensParserConfig` (default)
* `TokensParserConfig`
* `RuntimeParserConfig`

The first 2 set whether tokens are collected or not at compile time. The last sets it at runtime.

All 3 implement `ParserConfig`.

`NoTokensParserConfig` is the default, and is what's used in compiler pipeline. It switches tokens off in the parser, and makes all the tokens-related code dead code, which the compiler eliminates. This makes the ability of the parser to generate tokens zero cost when it's not used (in the compiler pipeline).

`TokensParserConfig` is the one to use where you *always* want tokens. This is probably the config that linter will use.

`RuntimeParserConfig` is the one to use when an application decides whether to generate tokens or not at runtime. This config avoids compiling the parser twice, at the cost of runtime checks. This is what NAPI parser package will use.

### Future extension

#### Supporting additional features

In future we intend to build the UTF-8 to UTF-16 offsets conversion table in the parser. This will be more performant than searching through the source text for unicode characters in a 2nd pass later on. But this feature is only required for uses of the parser where we're interacting with JS side (NAPI parser package, linter with JS plugins).

`ParserConfig` can be extended to toggle this feature on/off at compile time or runtime, in the same way as you toggle on/off tokens.

#### Options and configs

This PR introduces `ParserConfig` but leaves `ParseOptions` as it is. So we now have 2 sets of options, passed to `Parser` with `with_options(...)` and `with_config(...)`. This is confusing.

We could merge the 2 by making `ParseOptions` implement `ParserConfig`, so then you'd define all options with one `with_options` call.

This would have the side effect of making all other parser options (e.g. `preserve_parens`) able to be set at either runtime or compile time, depending on the use case.

For users consuming `oxc_parser` as a library, with specific needs, they could also configure `Parser` to their needs e.g. create a parser which only handles plain JS code with all the code paths for JSX and TS shaken out as dead code. This would likely improve parsing speed significantly for these use cases.

### Implementation details

Why a trait instead of a cargo feature?

IMO a trait is preferable for the following reasons:

1. We have 3 different use cases we need to support (the 3 provided configs). 3 different Cargo features would be unwieldy.
2. This situation would become far worse once we introduce more features e.g. UTF-8 -> UTF-16 conversion.
3. No problems around feature unification. We found Cargo features caused headaches when we used them in linter for toggling on/off JS plugins support.
4. No clippy errors which only appear when the feature is/isn't disabled, requiring complex `#[cfg_attr(feature = "whatever", expect(clippy::unused_async))]` etc.

The introduction of a trait does not seem to significantly affect compile time:

```
# Before this PR
cargo build -p oxc_parser                             15.33s user 3.27s system 254% cpu 7.316 total
cargo build -p oxc_parser --release                   17.36s user 2.26s system 231% cpu 8.477 total
cargo build -p oxc_parser --example parser            18.43s user 3.75s system 271% cpu 8.156 total
cargo build -p oxc_parser --example parser --release  32.52s user 2.59s system 180% cpu 19.454 total

# After this PR
cargo build -p oxc_parser                             15.00s user 3.24s system 272% cpu 6.692 total
cargo build -p oxc_parser --release                   16.71s user 2.12s system 287% cpu 6.539 total
cargo build -p oxc_parser --example parser            18.50s user 3.91s system 285% cpu 7.845 total
cargo build -p oxc_parser --example parser --release  33.48s user 2.63s system 169% cpu 21.263 total
```

Measured on Mac Mini M4 Pro, `cargo clean` run before each. The difference appears to be mostly within the noise threshold.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-parser Area - Parser C-cleanup Category - technical debt or refactoring. Solution not expected to change behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants