feat(parser): add parser by Boshen · Pull Request #5 · oxc-project/oxc

Boshen · 2023-02-11T13:03:02Z

No description provided.

MichaReiser · 2023-02-11T13:43:11Z

crates/oxc_parser/src/js/class.rs

@@ -0,0 +1,462 @@
+use oxc_allocator::{Box, Vec};


Impressive how you managed to write the parsing in less than 500 lines. The class parsing is like 2000 something lines in Rome!

- Convert section headers to use backticks (e.g., "Expression Statement" → "`ExpressionStatement`") - Fix type references to use proper `[`Type`]` format instead of `[description](Type)` - Update cross-references in js.rs, ts.rs to be consistent Part 1 of AST documentation improvements addressing issue #5 (inconsistent formatting) Co-authored-by: Boshen <1430279+Boshen@users.noreply.github.com>

- Fix remaining section headers in ts.rs to use consistent capitalization - Address TypeScript-specific documentation formatting issues - Final cleanup for inconsistent formatting (Issue #5) Ready to proceed with second PR addressing pointless comments. Co-authored-by: Boshen <1430279+Boshen@users.noreply.github.com>

Co-authored-by: Boshen <1430279+Boshen@users.noreply.github.com>

Replace multi-function calls and multiple enum variant checks with simple range checks, reducing assembly instructions in hot paths. ## Changes **`is_any_keyword()`**: - Before: Called 4 separate functions (`is_reserved_keyword()`, `is_contextual_keyword()`, `is_strict_mode_contextual_keyword()`, `is_future_reserved_keyword()`) checking 70+ enum variants with early returns - After: Single range check `Await..=Yield` since all keywords are contiguous in the enum **`is_number()`**: - Before: Matched 11 separate enum variants - After: Single range check `Decimal..=HexBigInt` since all numeric literals are contiguous ## Assembly Impact Multi-function approach generated **5 instructions** with complex bitmask setup: ```asm mov x8, #992 movk x8, #992, lsl #16 movk x8, #240, lsl #32 lsr x8, x8, x0 and w0, w8, #0x1 ``` Range check generates **4 instructions** with simple arithmetic: ```asm and w8, w0, #0xff sub w8, w8, #5 cmp w8, #39 cset w0, lo ``` ## Performance - `is_any_keyword()` is called from `advance()` on **every single token** - 20% fewer instructions (5 → 4) - Simpler logic enables better branch prediction - Eliminates complex constant loading Added tests to verify enum layout assumptions remain valid. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Reorder Kind enum to make all keywords truly contiguous, enabling a pure range check without additional OR clauses. ## Changes **Enum Reordering**: Moved `True`, `False`, `Null` from the literals section (after punctuation) to immediately after `Yield`, making all keywords contiguous from `Await..=Null`. **Before**: ```rust pub fn is_any_keyword(self) -> bool { matches!(self as u8, x if x >= Await as u8 && x <= Yield as u8) || matches!(self, True | False | Null) // Extra check needed! } ``` **After**: ```rust pub fn is_any_keyword(self) -> bool { matches!(self as u8, x if x >= Await as u8 && x <= Null as u8) // Pure range check! } ``` ## Assembly Impact ### Before (range + OR) ```asm ; Range check for Await..=Yield and w8, w0, #0xff sub w8, w8, #5 cmp w8, #86 cset w9, lo ; Additional checks for True/False/Null cmp w0, #168 ccmp w0, #171, #0, ne cset w0, lo orr w0, w9, w0 ; Combine results ``` **7 instructions** ### After (pure range) ```asm and w8, w0, #0xff sub w8, w8, #5 cmp w8, #89 cset w0, lo ``` **4 instructions** (43% reduction!) ## Why This Works `True`, `False`, and `Null` are **both** keywords AND literals per ECMAScript spec: - Keywords: Reserved words that cannot be used as identifiers - Literals: Values with specific meanings Grouping them with keywords is semantically correct and enables better optimization. The `is_literal()` function explicitly checks for these tokens, so their position in the enum doesn't affect correctness. ## Performance - Called from `advance()` on **every token** - **3 fewer instructions** on the hottest path - **Simpler control flow** = better branch prediction - **No OR operation** = faster execution Added comprehensive tests including verification that True/False/Null work correctly as both keywords and literals. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…14410) ## Summary Replace multi-function calls and multiple enum variant checks with simple range checks, reducing assembly instructions in hot paths. ## Changes ### `is_any_keyword()` **Before**: Called 4 separate functions checking 70+ enum variants: - `is_reserved_keyword()` - 38 variants - `is_contextual_keyword()` - 39 variants - `is_strict_mode_contextual_keyword()` - 8 variants - `is_future_reserved_keyword()` - 7 variants **After**: Single range check `Await..=Yield` since all keywords are contiguous in the enum ### `is_number()` **Before**: Matched 11 separate enum variants **After**: Single range check `Decimal..=HexBigInt` since numeric literals are contiguous ## Assembly Analysis ### Before (with scattered checks) ```asm mov x8, #992 ; Load bitmask constant movk x8, #992, lsl #16 ; More bitmask setup movk x8, #240, lsl #32 ; Even more bitmask setup lsr x8, x8, x0 ; Shift by kind value and w0, w8, #0x1 ; Extract result bit ``` **5 instructions** with complex constant loading ### After (with range check) ```asm and w8, w0, #0xff ; Extract byte sub w8, w8, #5 ; Subtract range start cmp w8, #39 ; Compare to range size cset w0, lo ; Set result ``` **4 instructions** with simple arithmetic ## Performance Impact - **20% fewer instructions** (5 → 4) - **Simpler logic** = better CPU pipeline utilization - **No complex constants** = smaller code size - **Better branch prediction** with single comparison This is particularly important because: - `is_any_keyword()` is called from `advance()` on **every single token** - This is one of the hottest code paths in the entire parser ## Testing Added unit tests to verify that: - All keywords remain contiguous in the enum (`Await..=Yield`) - All numeric literals remain contiguous (`Decimal..=HexBigInt`) These tests will catch any future enum reordering that would break the optimization. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

Boshen force-pushed the parser branch 4 times, most recently from 70893ff to 8294e9f Compare February 11, 2023 13:20

feat(parser): add parser

5c1622a

Boshen force-pushed the parser branch from 8294e9f to 5c1622a Compare February 11, 2023 13:22

Boshen merged commit 1fdc635 into main Feb 11, 2023

Boshen deleted the parser branch February 11, 2023 13:26

MichaReiser reviewed Feb 11, 2023

View reviewed changes

Copilot AI mentioned this pull request Aug 2, 2025

[WIP] Improve documentation of AST #12763

Closed

20 tasks

Copilot AI mentioned this pull request Aug 2, 2025

Fix AST documentation formatting and remove pointless comments #12767

Closed

Copilot AI added a commit that referenced this pull request Aug 2, 2025

Fix inconsistent formatting in AST documentation (Issue #5)

d279aff

Co-authored-by: Boshen <1430279+Boshen@users.noreply.github.com>

overlookmotel mentioned this pull request Jan 16, 2026

Replace SemanticBuilder::enter_kind + leave_kind with visit methods #18098

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

feat(parser): add parser#5

feat(parser): add parser#5
Boshen merged 1 commit intomainfrom
parser

Boshen commented Feb 11, 2023

Uh oh!

MichaReiser Feb 11, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Comments

Conversation

Boshen commented Feb 11, 2023

Uh oh!

MichaReiser Feb 11, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants