perf(parser): remove bounds check for getting strings from lexer#16135
perf(parser): remove bounds check for getting strings from lexer#16135
Conversation
How to use the Graphite Merge QueueAdd either label to this PR to merge it via the merge queue:
You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
53034be to
9d60b34
Compare
b3e1014 to
891e0b4
Compare
9d60b34 to
5c73320
Compare
CodSpeed Performance ReportMerging #16135 will not alter performanceComparing Summary
Footnotes
|
There was a problem hiding this comment.
Pull request overview
This PR optimizes the get_string method in the lexer by replacing safe string slicing operations with unsafe get_unchecked to eliminate bounds checking. The optimization targets a hot path that profiling showed took 1% of parsing time on binder.ts, achieving a +1% performance improvement across all parser benchmarks.
Key Changes:
- Refactored string extraction to compute adjusted start/end positions before slicing
- Replaced safe slice operations with
unsafe { get_unchecked(start..end) } - Added comprehensive debug assertions to validate safety invariants
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
5c73320 to
099a5ce
Compare
099a5ce to
811c1a4
Compare
811c1a4 to
1fca5af
Compare
1fca5af to
45740bd
Compare
|
I've been um-ing and ah-ing about this. Yes, When using unsafe code, IMO you should have more confidence than "it should be right". I'd like to see how far we can get without resorting to unsafe. Here's a start: #16317. I also wonder if we can remove the logic for altering We could also see if putting |
|
Note: #16283 would also change this function. |
45740bd to
fa1ebb5
Compare
Nope. Tried this in #16329 and it had no effect. |
Yeah, I'm not sure I can think of a proof of the top of my head that would definitely prove that it always is correct. In the same sense that spans on tokens should always be correct since they should refer to offsets within the program, but there's nothing technically stopping us from creating an invalid span accidentally (or on purpose). I think the reasoning is sound, but it's only a 1% performance improvement or so according to our benchmarks, so I don't feel like it's absolutely critical to change. I've also been thinking about this issue recently: oxc-project/backlog#46. I feel like some of the ideas in that thread have the potential to be much more impactful than just a meager 1% in exchange for adding an |
|
1% is definitely worth having. It's really tempting. But my personal view about unsafe code is that's it's about using constraints in the type system to guarantee that the code is sound. You should be able to put a comment above the So, on balance, I'm afraid that I don't think we should do this. However, I do think there's probably space to get at least some of the perf improvement without unsafe. This idea, for example:
There's probably other ways too that I've not thought of.
Yes, I agree, I think there's a lot of potential there. In particular, an If you're interested in working on our string types, please shout on that issue, and let's discuss. |
|

One profile of the parser showed that this function took up 1% of the time on
binder.ts. It seemed like this is a case where we could possibly remove the bounds checks, since our tokens are guaranteed to be contained within the source string. Since this is called really often, it's worth optimizing.+1% on all parser benchmarks: