Skip to content

Comments

perf(parser): remove bounds check for getting strings from lexer#16135

Closed
camchenry wants to merge 1 commit intomainfrom
11-25-perf_parser_remove_bounds_check_for_getting_strings_from_lexer
Closed

perf(parser): remove bounds check for getting strings from lexer#16135
camchenry wants to merge 1 commit intomainfrom
11-25-perf_parser_remove_bounds_check_for_getting_strings_from_lexer

Conversation

@camchenry
Copy link
Member

@camchenry camchenry commented Nov 26, 2025

One profile of the parser showed that this function took up 1% of the time on binder.ts. It seemed like this is a case where we could possibly remove the bounds checks, since our tokens are guaranteed to be contained within the source string. Since this is called really often, it's worth optimizing.

+1% on all parser benchmarks:

image

Copy link
Member Author

camchenry commented Nov 26, 2025


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@camchenry camchenry changed the base branch from 11-25-feat_parser_add_diagnostic_for_jsx_identifiers_with_hyphens to graphite-base/16135 November 26, 2025 02:59
@camchenry camchenry force-pushed the 11-25-perf_parser_remove_bounds_check_for_getting_strings_from_lexer branch from 53034be to 9d60b34 Compare November 26, 2025 03:00
@camchenry camchenry force-pushed the 11-25-perf_parser_remove_bounds_check_for_getting_strings_from_lexer branch from 9d60b34 to 5c73320 Compare November 26, 2025 03:05
@codspeed-hq
Copy link

codspeed-hq bot commented Nov 26, 2025

CodSpeed Performance Report

Merging #16135 will not alter performance

Comparing 11-25-perf_parser_remove_bounds_check_for_getting_strings_from_lexer (fa1ebb5) with main (82d784f)

Summary

✅ 42 untouched
⏩ 3 skipped1

Footnotes

  1. 3 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@camchenry camchenry changed the base branch from graphite-base/16135 to main November 26, 2025 03:18
@camchenry camchenry marked this pull request as ready for review November 26, 2025 03:18
Copilot AI review requested due to automatic review settings November 26, 2025 03:18
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the get_string method in the lexer by replacing safe string slicing operations with unsafe get_unchecked to eliminate bounds checking. The optimization targets a hot path that profiling showed took 1% of parsing time on binder.ts, achieving a +1% performance improvement across all parser benchmarks.

Key Changes:

  • Refactored string extraction to compute adjusted start/end positions before slicing
  • Replaced safe slice operations with unsafe { get_unchecked(start..end) }
  • Added comprehensive debug assertions to validate safety invariants

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@graphite-app graphite-app bot changed the base branch from main to graphite-base/16135 November 26, 2025 04:34
@graphite-app graphite-app bot force-pushed the 11-25-perf_parser_remove_bounds_check_for_getting_strings_from_lexer branch from 5c73320 to 099a5ce Compare November 26, 2025 04:39
@graphite-app graphite-app bot changed the base branch from graphite-base/16135 to main November 26, 2025 04:39
@graphite-app graphite-app bot force-pushed the 11-25-perf_parser_remove_bounds_check_for_getting_strings_from_lexer branch from 099a5ce to 811c1a4 Compare November 26, 2025 04:39
@graphite-app graphite-app bot changed the base branch from main to graphite-base/16135 November 26, 2025 09:23
@graphite-app graphite-app bot force-pushed the 11-25-perf_parser_remove_bounds_check_for_getting_strings_from_lexer branch from 811c1a4 to 1fca5af Compare November 26, 2025 09:28
@graphite-app graphite-app bot changed the base branch from graphite-base/16135 to main November 26, 2025 09:29
@graphite-app graphite-app bot force-pushed the 11-25-perf_parser_remove_bounds_check_for_getting_strings_from_lexer branch from 1fca5af to 45740bd Compare November 26, 2025 09:29
@overlookmotel
Copy link
Member

overlookmotel commented Dec 1, 2025

I've been um-ing and ah-ing about this. Yes, token.start() and token.end() should make a valid slice, but can we prove it? And can we prove that the Token's Kind corresponds to its span, so we're offsetting start / end correctly?

When using unsafe code, IMO you should have more confidence than "it should be right".

I'd like to see how far we can get without resorting to unsafe. Here's a start: #16317.

I also wonder if we can remove the logic for altering start / end based on Kind from get_string, by moving that logic into the call sites. Each call site I think knows what the Kind is, so knows what to do. The logic doesn't need to be centralized.

We could also see if putting #[inline] on this function makes a difference. The gain in this PR might be down to reducing the size of the function, so it gets inlined, rather than the change in logic itself. Maybe...

@overlookmotel
Copy link
Member

Note: #16283 would also change this function.

@camchenry camchenry force-pushed the 11-25-perf_parser_remove_bounds_check_for_getting_strings_from_lexer branch from 45740bd to fa1ebb5 Compare December 1, 2025 03:11
@overlookmotel
Copy link
Member

We could also see if putting #[inline] on this function makes a difference. The gain in this PR might be down to reducing the size of the function, so it gets inlined, rather than the change in logic itself. Maybe...

Nope. Tried this in #16329 and it had no effect.

@camchenry
Copy link
Member Author

I've been um-ing and ah-ing about this. Yes, token.start() and token.end() should make a valid slice, but can we prove it? And can we prove that the Token's Kind corresponds to its span, so we're offsetting start / end correctly?

When using unsafe code, IMO you should have more confidence than "it should be right".

I'd like to see how far we can get without resorting to unsafe. Here's a start: #16317.

I also wonder if we can remove the logic for altering start / end based on Kind from get_string, by moving that logic into the call sites. Each call site I think knows what the Kind is, so knows what to do. The logic doesn't need to be centralized.

We could also see if putting #[inline] on this function makes a difference. The gain in this PR might be down to reducing the size of the function, so it gets inlined, rather than the change in logic itself. Maybe...

Yeah, I'm not sure I can think of a proof of the top of my head that would definitely prove that it always is correct. In the same sense that spans on tokens should always be correct since they should refer to offsets within the program, but there's nothing technically stopping us from creating an invalid span accidentally (or on purpose). I think the reasoning is sound, but it's only a 1% performance improvement or so according to our benchmarks, so I don't feel like it's absolutely critical to change.

I've also been thinking about this issue recently: oxc-project/backlog#46. I feel like some of the ideas in that thread have the potential to be much more impactful than just a meager 1% in exchange for adding an unsafe block. I'd be more inclined to merge this if we were seeing 5% improvements or something.

@overlookmotel
Copy link
Member

overlookmotel commented Dec 1, 2025

1% is definitely worth having. It's really tempting.

But my personal view about unsafe code is that's it's about using constraints in the type system to guarantee that the code is sound. You should be able to put a comment above the unsafe { } block explaining how it's impossible that your assumptions can be violated. In this case, as you say, it's trivial to break the rules - we haven't locked down the types to prevent you doing that.

So, on balance, I'm afraid that I don't think we should do this.

However, I do think there's probably space to get at least some of the perf improvement without unsafe. This idea, for example:

I also wonder if we can remove the logic for altering start / end based on Kind from get_string, by moving that logic into the call sites. Each call site I think knows what the Kind is, so knows what to do. The logic doesn't need to be centralized.

There's probably other ways too that I've not thought of.


I've also been thinking about this issue recently: oxc-project/backlog#46

Yes, I agree, I think there's a lot of potential there. In particular, an Ident type which stores the hash inline would make all the hashmap lookups in oxc_semantic a good bit faster. I've also been slowly laying the groundwork for oxc-project/backlog#46 (comment) in work on allocator and raw transfer. When we finally replace Bumpalo with our own allocator (I hope early next year), we'll store all strings in a single contiguous block at start of the allocator, and then we can try this out.

If you're interested in working on our string types, please shout on that issue, and let's discuss.

@Boshen
Copy link
Member

Boshen commented Dec 2, 2025

So, on balance, I'm afraid that I don't think we should do this.

@Boshen Boshen closed this Dec 2, 2025
@camchenry camchenry deleted the 11-25-perf_parser_remove_bounds_check_for_getting_strings_from_lexer branch December 2, 2025 03:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-parser Area - Parser C-performance Category - Solution not expected to change functional behavior, only performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants