perf(lexer): remove branches from unicode handling#15328
perf(lexer): remove branches from unicode handling#15328graphite-app[bot] merged 1 commit intomainfrom
Conversation
How to use the Graphite Merge QueueAdd either label to this PR to merge it via the merge queue:
You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
CodSpeed Performance ReportMerging #15328 will not alter performanceComparing Summary
Footnotes |
There was a problem hiding this comment.
Pull Request Overview
This PR refactors Unicode character handling in the lexer's Source module to improve performance by providing better hints to the compiler through assert_unchecked!. The changes replace unwrap_unchecked() calls with unwrap() paired with assert_unchecked! to communicate invariants more explicitly.
- Replaced unsafe
unwrap_unchecked()with safeunwrap()after informing the compiler of invariants viaassert_unchecked! - Refactored Unicode character handling functions to remove the
byteparameter and make them fully responsible for handling Unicode - Restructured control flow to use if-else expressions instead of early returns
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
a17b0e4 to
05102f0
Compare
05102f0 to
cd66ab0
Compare
cd66ab0 to
e3986b3
Compare
0093db6 to
6c09c1f
Compare
e3986b3 to
f39e645
Compare
Merge activity
|
#12768 split `next_char`, `next_2_chars`, and `peek_char` into separate functions for the hot and cold paths. That was a good change, but had one side-effect - because the unicode branch is now in a separate function which isn't inlined, the compiler loses knowledge of the context - that `Source` isn't at EOF, and that (in 2 of the 3 methods) the next character is known not to be ASCII. Add unchecked assertions to inform compiler of the known facts, so it can remove 2 branches when calling `chars.next().unwrap()`. This code is on a cold path, so will likely not make a noticeable difference in files which don't contain many Unicode chars (like our benchmark fixtures), but why not?
f39e645 to
5f08c69
Compare

#12768 split
next_char,next_2_chars, andpeek_charinto separate functions for the hot and cold paths.That was a good change, but had one side-effect - because the unicode branch is now in a separate function which isn't inlined, the compiler loses knowledge of the context - that
Sourceisn't at EOF, and that (in 2 of the 3 methods) the next character is known not to be ASCII.Add unchecked assertions to inform compiler of the known facts, so it can remove 2 branches when calling
chars.next().unwrap().This code is on a cold path, so will likely not make a noticeable difference in files which don't contain many Unicode chars (like our benchmark fixtures), but why not?