Skip to content

Comments

refactor(ast_visit): Utf8ToUtf16 record end offset of multi-byte chars#13339

Merged
graphite-app[bot] merged 1 commit intomainfrom
08-27-refactor_ast_visit_utf8toutf16_record_end_offset_of_multi-byte_chars
Aug 28, 2025
Merged

refactor(ast_visit): Utf8ToUtf16 record end offset of multi-byte chars#13339
graphite-app[bot] merged 1 commit intomainfrom
08-27-refactor_ast_visit_utf8toutf16_record_end_offset_of_multi-byte_chars

Conversation

@overlookmotel
Copy link
Member

@overlookmotel overlookmotel commented Aug 27, 2025

Utf8ToUtf16 converter creates a table of translations from UTF-8 offset to UTF-16 offset.

Previously it recorded UTF-8 offsets as the start of the multi-byte character + 1. This PR changes that to record the offset of the end of character.

This costs a couple more instructions when adding records to the table, but makes no difference to the amount of work involved in converting offsets.

The advantages are:

  1. Reduces complexity due to not having to handle edge cases where arithmetic overflow was possible previously.
  2. Makes it possible to use the same table to convert back from UTF-16 to UTF-8 (next PR in this stack).

As Unicode chars are rare, and the extra cost is small anyway, I think this trade-off is worth it.

@github-actions github-actions bot added the C-cleanup Category - technical debt or refactoring. Solution not expected to change behavior label Aug 27, 2025
Copy link
Member Author

overlookmotel commented Aug 27, 2025


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@codspeed-hq
Copy link

codspeed-hq bot commented Aug 27, 2025

CodSpeed Instrumentation Performance Report

Merging #13339 will not alter performance

Comparing 08-27-refactor_ast_visit_utf8toutf16_record_end_offset_of_multi-byte_chars (cd5a9ca) with main (33e0e8b)1

Summary

✅ 34 untouched benchmarks

Footnotes

  1. No successful run was found on main (cd5a9ca) during the generation of this report, so 33e0e8b was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@overlookmotel overlookmotel marked this pull request as ready for review August 27, 2025 22:21
@overlookmotel overlookmotel force-pushed the 08-27-refactor_ast_visit_utf8toutf16_record_end_offset_of_multi-byte_chars branch from e579c71 to 7554d37 Compare August 27, 2025 23:01
@graphite-app graphite-app bot added the 0-merge Merge with Graphite Merge Queue label Aug 28, 2025
@graphite-app
Copy link
Contributor

graphite-app bot commented Aug 28, 2025

Merge activity

…ars (#13339)

`Utf8ToUtf16` converter creates a table of translations from UTF-8 offset to UTF-16 offset.

Previously it recorded UTF-8 offsets as the start of the multi-byte character + 1. This PR changes that to record the offset of the *end* of character.

This costs a couple more instructions when adding records to the table, but makes no difference to the amount of work involved in converting offsets.

The advantages are:

1. Reduces complexity due to not having to handle edge cases where arithmetic overflow was possible previously.
2. Makes it possible to use the same table to convert back from UTF-16 to UTF-8 (next PR in this stack).

As Unicode chars are rare, and the extra cost is small anyway, I think this trade-off is worth it.
@graphite-app graphite-app bot force-pushed the 08-27-refactor_ast_visit_utf8toutf16_record_end_offset_of_multi-byte_chars branch from 7554d37 to cd5a9ca Compare August 28, 2025 09:24
@graphite-app graphite-app bot merged commit cd5a9ca into main Aug 28, 2025
24 checks passed
@graphite-app graphite-app bot deleted the 08-27-refactor_ast_visit_utf8toutf16_record_end_offset_of_multi-byte_chars branch August 28, 2025 09:29
@graphite-app graphite-app bot removed the 0-merge Merge with Graphite Merge Queue label Aug 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-cleanup Category - technical debt or refactoring. Solution not expected to change behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants