Skip to content

perf(lexer): remove branches from unicode handling#15328

Merged
graphite-app[bot] merged 1 commit intomainfrom
11-01-perf_lexer_remove_branches_from_unicode_handling
Nov 5, 2025
Merged

perf(lexer): remove branches from unicode handling#15328
graphite-app[bot] merged 1 commit intomainfrom
11-01-perf_lexer_remove_branches_from_unicode_handling

Conversation

@overlookmotel
Copy link
Member

@overlookmotel overlookmotel commented Nov 5, 2025

#12768 split next_char, next_2_chars, and peek_char into separate functions for the hot and cold paths.

That was a good change, but had one side-effect - because the unicode branch is now in a separate function which isn't inlined, the compiler loses knowledge of the context - that Source isn't at EOF, and that (in 2 of the 3 methods) the next character is known not to be ASCII.

Add unchecked assertions to inform compiler of the known facts, so it can remove 2 branches when calling chars.next().unwrap().

This code is on a cold path, so will likely not make a noticeable difference in files which don't contain many Unicode chars (like our benchmark fixtures), but why not?

Copy link
Member Author

overlookmotel commented Nov 5, 2025


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions github-actions bot added A-parser Area - Parser C-performance Category - Solution not expected to change functional behavior, only performance labels Nov 5, 2025
@codspeed-hq
Copy link

codspeed-hq bot commented Nov 5, 2025

CodSpeed Performance Report

Merging #15328 will not alter performance

Comparing 11-01-perf_lexer_remove_branches_from_unicode_handling (f39e645) with main (70bf817)1

Summary

✅ 37 untouched

Footnotes

  1. No successful run was found on main (6c09c1f) during the generation of this report, so 70bf817 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@overlookmotel overlookmotel marked this pull request as ready for review November 5, 2025 14:54
Copilot AI review requested due to automatic review settings November 5, 2025 14:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors Unicode character handling in the lexer's Source module to improve performance by providing better hints to the compiler through assert_unchecked!. The changes replace unwrap_unchecked() calls with unwrap() paired with assert_unchecked! to communicate invariants more explicitly.

  • Replaced unsafe unwrap_unchecked() with safe unwrap() after informing the compiler of invariants via assert_unchecked!
  • Refactored Unicode character handling functions to remove the byte parameter and make them fully responsible for handling Unicode
  • Restructured control flow to use if-else expressions instead of early returns

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@overlookmotel overlookmotel changed the base branch from main to graphite-base/15328 November 5, 2025 15:03
@overlookmotel overlookmotel force-pushed the 11-01-perf_lexer_remove_branches_from_unicode_handling branch from a17b0e4 to 05102f0 Compare November 5, 2025 15:03
@overlookmotel overlookmotel changed the base branch from graphite-base/15328 to 11-05-test_transformer_update_transformer_conformance_snapshots November 5, 2025 15:04
@overlookmotel overlookmotel force-pushed the 11-01-perf_lexer_remove_branches_from_unicode_handling branch from 05102f0 to cd66ab0 Compare November 5, 2025 15:05
@graphite-app graphite-app bot changed the base branch from 11-05-test_transformer_update_transformer_conformance_snapshots to graphite-base/15328 November 5, 2025 15:06
@graphite-app graphite-app bot force-pushed the 11-01-perf_lexer_remove_branches_from_unicode_handling branch from cd66ab0 to e3986b3 Compare November 5, 2025 15:12
@graphite-app graphite-app bot force-pushed the graphite-base/15328 branch from 0093db6 to 6c09c1f Compare November 5, 2025 15:12
@graphite-app graphite-app bot changed the base branch from graphite-base/15328 to main November 5, 2025 15:12
@graphite-app graphite-app bot force-pushed the 11-01-perf_lexer_remove_branches_from_unicode_handling branch from e3986b3 to f39e645 Compare November 5, 2025 15:13
@overlookmotel overlookmotel self-assigned this Nov 5, 2025
@overlookmotel overlookmotel added the 0-merge Merge with Graphite Merge Queue label Nov 5, 2025
Copy link
Member Author

overlookmotel commented Nov 5, 2025

Merge activity

#12768 split `next_char`, `next_2_chars`, and `peek_char` into separate functions for the hot and cold paths.

That was a good change, but had one side-effect - because the unicode branch is now in a separate function which isn't inlined, the compiler loses knowledge of the context - that `Source` isn't at EOF, and that (in 2 of the 3 methods) the next character is known not to be ASCII.

Add unchecked assertions to inform compiler of the known facts, so it can remove 2 branches when calling `chars.next().unwrap()`.

This code is on a cold path, so will likely not make a noticeable difference in files which don't contain many Unicode chars (like our benchmark fixtures), but why not?
@graphite-app graphite-app bot force-pushed the 11-01-perf_lexer_remove_branches_from_unicode_handling branch from f39e645 to 5f08c69 Compare November 5, 2025 15:26
@graphite-app graphite-app bot merged commit 5f08c69 into main Nov 5, 2025
21 checks passed
@graphite-app graphite-app bot deleted the 11-01-perf_lexer_remove_branches_from_unicode_handling branch November 5, 2025 15:32
@graphite-app graphite-app bot removed the 0-merge Merge with Graphite Merge Queue label Nov 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-parser Area - Parser C-performance Category - Solution not expected to change functional behavior, only performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments