perf(allocator/vec2): resolve performance regression for extend by marking reserve as #[cold] and #[inline(never)]#10675
Merged
graphite-app[bot] merged 1 commit intomainfrom May 3, 2025
Conversation
This was referenced Apr 28, 2025
Member
Author
CodSpeed Instrumentation Performance ReportMerging #10675 will create unknown performance changesComparing Summary
Benchmarks breakdown
|
1a7091a to
f8fcce8
Compare
extendextend by marking reserve as #[cold] and #[inline(never)]
This was referenced Apr 29, 2025
1e4bc1b to
ef784ca
Compare
f8fcce8 to
4769723
Compare
01c3a4d to
c70f032
Compare
ef784ca to
e64c52c
Compare
c70f032 to
04757c5
Compare
11d0723 to
a3d2995
Compare
04757c5 to
b56f8e1
Compare
b56f8e1 to
67bd57f
Compare
a3d2995 to
9db9639
Compare
67bd57f to
3b1f7b7
Compare
3b1f7b7 to
4682891
Compare
overlookmotel
approved these changes
May 3, 2025
Contributor
Merge activity
|
…marking reserve as `#[cold]` and `#[inline(never)]` (#10675) I guess the performance regression reason is that the current implementation has more instructions than before. Here to use the lower of `size_hint` to reserve space, which is bloating the loop body. Also, the `for` loop is easier to optimize by the compiler. `reserve` inside `extend` is rarely taken, so mark it as `#[cold]` and `#[inline(never)]`, which can reduce the instructions in `while` loop. We got a 3%-4% performance improvement in the `minfier`, but the transformer performance did not fully get back to before #10670. Anyway, I think we can accept the less than 1% performance regression; this change can unblock us from pushing forward the `Vec` improvement; we will get it back in at the end of the stack! See #9856
9db9639 to
6ce3bbb
Compare
4682891 to
b4953b4
Compare
Base automatically changed from
04-28-feat_allocator_vec2_introduce_extend_desugared_method_as_extend_internal_implementation
to
main
May 3, 2025 13:09
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

I guess the performance regression reason is that the current implementation has more instructions than before. Here to use the lower of
size_hintto reserve space, which is bloating the loop body. Also, theforloop is easier to optimize by the compiler.reserveinsideextendis rarely taken, so mark it as#[cold]and#[inline(never)], which can reduce the instructions inwhileloop. We got a 3%-4% performance improvement in theminfier, but the transformer performance did not fully get back to before #10670.Anyway, I think we can accept the less than 1% performance regression; this change can unblock us from pushing forward the
Vecimprovement; we will get it back in at the end of the stack! See #9856