perf(parser): remove bounds check for getting strings from lexer by camchenry · Pull Request #16135 · oxc-project/oxc

camchenry · 2025-11-26T02:56:45Z

One profile of the parser showed that this function took up 1% of the time on binder.ts. It seemed like this is a case where we could possibly remove the bounds checks, since our tokens are guaranteed to be contained within the source string. Since this is called really often, it's worth optimizing.

+1% on all parser benchmarks:

camchenry · 2025-11-26T02:57:05Z

How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

0-merge - adds this PR to the back of the merge queue
hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

codspeed-hq · 2025-11-26T03:06:02Z

CodSpeed Performance Report

Merging #16135 will not alter performance

_{Comparing 11-25-perf_parser_remove_bounds_check_for_getting_strings_from_lexer (fa1ebb5) with main (82d784f)}

Summary

✅ 42 untouched
⏩ 3 skipped¹

3 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

Copilot

Pull request overview

This PR optimizes the get_string method in the lexer by replacing safe string slicing operations with unsafe get_unchecked to eliminate bounds checking. The optimization targets a hot path that profiling showed took 1% of parsing time on binder.ts, achieving a +1% performance improvement across all parser benchmarks.

Key Changes:

Refactored string extraction to compute adjusted start/end positions before slicing
Replaced safe slice operations with unsafe { get_unchecked(start..end) }
Added comprehensive debug assertions to validate safety invariants

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

overlookmotel · 2025-12-01T02:34:56Z

I've been um-ing and ah-ing about this. Yes, token.start() and token.end() should make a valid slice, but can we prove it? And can we prove that the Token's Kind corresponds to its span, so we're offsetting start / end correctly?

When using unsafe code, IMO you should have more confidence than "it should be right".

I'd like to see how far we can get without resorting to unsafe. Here's a start: #16317.

I also wonder if we can remove the logic for altering start / end based on Kind from get_string, by moving that logic into the call sites. Each call site I think knows what the Kind is, so knows what to do. The logic doesn't need to be centralized.

We could also see if putting #[inline] on this function makes a difference. The gain in this PR might be down to reducing the size of the function, so it gets inlined, rather than the change in logic itself. Maybe...

overlookmotel · 2025-12-01T02:46:50Z

Note: #16283 would also change this function.

overlookmotel · 2025-12-01T12:50:30Z

We could also see if putting #[inline] on this function makes a difference. The gain in this PR might be down to reducing the size of the function, so it gets inlined, rather than the change in logic itself. Maybe...

Nope. Tried this in #16329 and it had no effect.

camchenry · 2025-12-01T15:04:09Z

I've been um-ing and ah-ing about this. Yes, token.start() and token.end() should make a valid slice, but can we prove it? And can we prove that the Token's Kind corresponds to its span, so we're offsetting start / end correctly?

When using unsafe code, IMO you should have more confidence than "it should be right".

I'd like to see how far we can get without resorting to unsafe. Here's a start: #16317.

I also wonder if we can remove the logic for altering start / end based on Kind from get_string, by moving that logic into the call sites. Each call site I think knows what the Kind is, so knows what to do. The logic doesn't need to be centralized.

We could also see if putting #[inline] on this function makes a difference. The gain in this PR might be down to reducing the size of the function, so it gets inlined, rather than the change in logic itself. Maybe...

Yeah, I'm not sure I can think of a proof of the top of my head that would definitely prove that it always is correct. In the same sense that spans on tokens should always be correct since they should refer to offsets within the program, but there's nothing technically stopping us from creating an invalid span accidentally (or on purpose). I think the reasoning is sound, but it's only a 1% performance improvement or so according to our benchmarks, so I don't feel like it's absolutely critical to change.

I've also been thinking about this issue recently: oxc-project/backlog#46. I feel like some of the ideas in that thread have the potential to be much more impactful than just a meager 1% in exchange for adding an unsafe block. I'd be more inclined to merge this if we were seeing 5% improvements or something.

overlookmotel · 2025-12-01T20:12:19Z

1% is definitely worth having. It's really tempting.

But my personal view about unsafe code is that's it's about using constraints in the type system to guarantee that the code is sound. You should be able to put a comment above the unsafe { } block explaining how it's impossible that your assumptions can be violated. In this case, as you say, it's trivial to break the rules - we haven't locked down the types to prevent you doing that.

So, on balance, I'm afraid that I don't think we should do this.

However, I do think there's probably space to get at least some of the perf improvement without unsafe. This idea, for example:

I also wonder if we can remove the logic for altering start / end based on Kind from get_string, by moving that logic into the call sites. Each call site I think knows what the Kind is, so knows what to do. The logic doesn't need to be centralized.

There's probably other ways too that I've not thought of.

I've also been thinking about this issue recently: oxc-project/backlog#46

Yes, I agree, I think there's a lot of potential there. In particular, an Ident type which stores the hash inline would make all the hashmap lookups in oxc_semantic a good bit faster. I've also been slowly laying the groundwork for oxc-project/backlog#46 (comment) in work on allocator and raw transfer. When we finally replace Bumpalo with our own allocator (I hope early next year), we'll store all strings in a single contiguous block at start of the allocator, and then we can try this out.

If you're interested in working on our string types, please shout on that issue, and let's discuss.

Boshen · 2025-12-02T02:15:15Z

So, on balance, I'm afraid that I don't think we should do this.

github-actions bot added A-parser Area - Parser C-performance Category - Solution not expected to change functional behavior, only performance labels Nov 26, 2025

This was referenced Nov 26, 2025

feat(parser): improve diagnostic for unexpected optional declarations #16131

Merged

feat(parser): add diagnostic for expected ident after optional chain #16132

Merged

feat(parser): add diagnostic for JSX identifiers with hyphens #16133

Merged

camchenry changed the base branch from 11-25-feat_parser_add_diagnostic_for_jsx_identifiers_with_hyphens to graphite-base/16135 November 26, 2025 02:59

camchenry force-pushed the 11-25-perf_parser_remove_bounds_check_for_getting_strings_from_lexer branch from 53034be to 9d60b34 Compare November 26, 2025 03:00

camchenry force-pushed the graphite-base/16135 branch from b3e1014 to 891e0b4 Compare November 26, 2025 03:00

camchenry force-pushed the 11-25-perf_parser_remove_bounds_check_for_getting_strings_from_lexer branch from 9d60b34 to 5c73320 Compare November 26, 2025 03:05

camchenry changed the base branch from graphite-base/16135 to main November 26, 2025 03:18

camchenry requested a review from overlookmotel November 26, 2025 03:18

camchenry marked this pull request as ready for review November 26, 2025 03:18

Copilot AI review requested due to automatic review settings November 26, 2025 03:18

Copilot started reviewing on behalf of camchenry November 26, 2025 03:19 View session

Copilot finished reviewing on behalf of camchenry November 26, 2025 03:20

Copilot AI reviewed Nov 26, 2025

View reviewed changes

Boshen assigned overlookmotel Nov 26, 2025

graphite-app bot changed the base branch from main to graphite-base/16135 November 26, 2025 04:34

graphite-app bot force-pushed the 11-25-perf_parser_remove_bounds_check_for_getting_strings_from_lexer branch from 5c73320 to 099a5ce Compare November 26, 2025 04:39

graphite-app bot changed the base branch from graphite-base/16135 to main November 26, 2025 04:39

graphite-app bot force-pushed the 11-25-perf_parser_remove_bounds_check_for_getting_strings_from_lexer branch from 099a5ce to 811c1a4 Compare November 26, 2025 04:39

graphite-app bot changed the base branch from main to graphite-base/16135 November 26, 2025 09:23

graphite-app bot force-pushed the 11-25-perf_parser_remove_bounds_check_for_getting_strings_from_lexer branch from 811c1a4 to 1fca5af Compare November 26, 2025 09:28

graphite-app bot changed the base branch from graphite-base/16135 to main November 26, 2025 09:29

graphite-app bot force-pushed the 11-25-perf_parser_remove_bounds_check_for_getting_strings_from_lexer branch from 1fca5af to 45740bd Compare November 26, 2025 09:29

perf(parser): remove bounds check for getting strings from lexer

fa1ebb5

camchenry force-pushed the 11-25-perf_parser_remove_bounds_check_for_getting_strings_from_lexer branch from 45740bd to fa1ebb5 Compare December 1, 2025 03:11

Boshen closed this Dec 2, 2025

camchenry deleted the 11-25-perf_parser_remove_bounds_check_for_getting_strings_from_lexer branch December 2, 2025 03:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

perf(parser): remove bounds check for getting strings from lexer#16135

perf(parser): remove bounds check for getting strings from lexer#16135
camchenry wants to merge 1 commit intomainfrom
11-25-perf_parser_remove_bounds_check_for_getting_strings_from_lexer

camchenry commented Nov 26, 2025 •

edited

Loading

Uh oh!

camchenry commented Nov 26, 2025 •

edited

Loading

Uh oh!

codspeed-hq bot commented Nov 26, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

overlookmotel commented Dec 1, 2025 •

edited

Loading

Uh oh!

overlookmotel commented Dec 1, 2025

Uh oh!

overlookmotel commented Dec 1, 2025

Uh oh!

camchenry commented Dec 1, 2025

Uh oh!

overlookmotel commented Dec 1, 2025 •

edited

Loading

Uh oh!

Boshen commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Comments

Conversation

camchenry commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

camchenry commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to use the Graphite Merge Queue

Uh oh!

codspeed-hq bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #16135 will not alter performance

Summary

Footnotes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

overlookmotel commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

overlookmotel commented Dec 1, 2025

Uh oh!

overlookmotel commented Dec 1, 2025

Uh oh!

camchenry commented Dec 1, 2025

Uh oh!

overlookmotel commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Boshen commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

camchenry commented Nov 26, 2025 •

edited

Loading

camchenry commented Nov 26, 2025 •

edited

Loading

codspeed-hq bot commented Nov 26, 2025 •

edited

Loading

overlookmotel commented Dec 1, 2025 •

edited

Loading

overlookmotel commented Dec 1, 2025 •

edited

Loading