Skip to content

perf(allocator/bitset): store bits as usizes#13450

Merged
graphite-app[bot] merged 1 commit intomainfrom
08-30-perf_allocator_bitset_store_bits_as_usize_s
Aug 31, 2025
Merged

perf(allocator/bitset): store bits as usizes#13450
graphite-app[bot] merged 1 commit intomainfrom
08-30-perf_allocator_bitset_store_bits_as_usize_s

Conversation

@overlookmotel
Copy link
Member

@overlookmotel overlookmotel commented Aug 30, 2025

BitSet store bits as usizes, instead of u8s. This serves 2 purposes:

  1. Data in arena is almost all pointer-aligned, so there's little point storing odd-numbered arrays of bytes, as they'll likely be padded to a multiple of 8 bytes anyway. Higher alignment makes the contents of a BitSet less likely to span multiple cache lines.

  2. It makes creating a new BitSet faster. Vec::from_iter_in is unfortunately not very efficient, and does a bounds check on each turn of the loop. So writing bits / 64 items (usizes) is faster than writing bits / 8 items (u8s).

Copy link
Member Author

overlookmotel commented Aug 30, 2025


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions github-actions bot added the C-performance Category - Solution not expected to change functional behavior, only performance label Aug 30, 2025
@codspeed-hq
Copy link

codspeed-hq bot commented Aug 30, 2025

CodSpeed Instrumentation Performance Report

Merging #13450 will improve performances by 31.74%

Comparing 08-30-perf_allocator_bitset_store_bits_as_usize_s (cdfa48d) with main (edeebc6)1

Summary

⚡ 3 improvements
✅ 34 untouched benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
mangler[binder.ts] 895 µs 811.9 µs +10.24%
mangler[cal.com.tsx] 4.1 ms 3.1 ms +31.74%
mangler[react.development.js] 292.8 µs 279.4 µs +4.78%

Footnotes

  1. No successful run was found on main (afa0877) during the generation of this report, so edeebc6 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@overlookmotel overlookmotel marked this pull request as ready for review August 30, 2025 19:16
@graphite-app
Copy link
Contributor

graphite-app bot commented Aug 31, 2025

Merge activity

`BitSet` store bits as `usize`s, instead of `u8`s. This serves 2 purposes:

1. Data in arena is almost all pointer-aligned, so there's little point storing odd-numbered arrays of bytes, as they'll likely be padded to a multiple of 8 bytes anyway. Higher alignment makes the contents of a `BitSet` less likely to span multiple cache lines.

2. It makes creating a new `BitSet` faster. `Vec::from_iter_in` is unfortunately not very efficient, and does a bounds check on each turn of the loop. So writing `bits / 64` items (`usize`s) is faster than writing `bits / 8` items (`u8`s).
@graphite-app graphite-app bot force-pushed the 08-30-feat_allocator_introduce_bitset_type branch from 0d98998 to afa0877 Compare August 31, 2025 04:59
@graphite-app graphite-app bot force-pushed the 08-30-perf_allocator_bitset_store_bits_as_usize_s branch from 036d148 to cdfa48d Compare August 31, 2025 05:00
Base automatically changed from 08-30-feat_allocator_introduce_bitset_type to main August 31, 2025 05:04
@graphite-app graphite-app bot removed the 0-merge Merge with Graphite Merge Queue label Aug 31, 2025
@graphite-app graphite-app bot merged commit cdfa48d into main Aug 31, 2025
27 checks passed
@graphite-app graphite-app bot deleted the 08-30-perf_allocator_bitset_store_bits_as_usize_s branch August 31, 2025 05:05
graphite-app bot pushed a commit that referenced this pull request Sep 6, 2025
Implement `CloneIn` for `Box<[T]>`. Use an efficient implementation which doesn't perform any bounds checks. Hopefully, where `T` is `Copy`, compiler will be able to boil this down to a single `memcpy` call.

(this stack is mostly leftovers from #13450 which turned out not to be required for that PR, but are still useful for other purposes)
Copilot AI pushed a commit that referenced this pull request Sep 8, 2025
Implement `CloneIn` for `Box<[T]>`. Use an efficient implementation which doesn't perform any bounds checks. Hopefully, where `T` is `Copy`, compiler will be able to boil this down to a single `memcpy` call.

(this stack is mostly leftovers from #13450 which turned out not to be required for that PR, but are still useful for other purposes)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-performance Category - Solution not expected to change functional behavior, only performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant