Optimize staked_nodes() for 3-4x performance improvement#8516
Optimize staked_nodes() for 3-4x performance improvement#8516HaoranYi merged 4 commits intoanza-xyz:masterfrom
Conversation
f8e5f71 to
89499d9
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #8516 +/- ##
=======================================
Coverage 83.1% 83.2%
=======================================
Files 846 846
Lines 368573 368576 +3
=======================================
+ Hits 306652 306709 +57
+ Misses 61921 61867 -54 🚀 New features to boost your workflow:
|
vadorovsky
left a comment
There was a problem hiding this comment.
Great stuff! I think there is a chance to use the pubkey hasher.
| ) | ||
| // Pre-allocate HashMap with estimated capacity to reduce reallocations | ||
| let mut staked_nodes = | ||
| HashMap::with_capacity(self.vote_accounts.len().saturating_div(2)); |
There was a problem hiding this comment.
Given that this is a hash map of validators, I think we could use PubkeyHasherBuilder instead of the default hasher:
| HashMap::with_capacity(self.vote_accounts.len().saturating_div(2)); | |
| HashMap::with_capacity_and_hasher(self.vote_accounts.len().saturating_div(2), PubkeyHasherBuilder::default()); |
That should speed it up even more. 🙂 You'll need to add the PubkeyHasherBuilder generic to the return type in the function signature.
There was a problem hiding this comment.
PubkeyHasherBuilder is behind "rand" feature. vote package doesn't enable rand feature.
What's your experience on enable rand feature? Would it be a more broader change and require more testing to ensure no regression?
How about doing it as a separate optimization in a follow-up pr?
There was a problem hiding this comment.
Sounds good, we can do it separately.
Regarding adding the rand feature - in this PR #7307 I added the rand feature to solana-pubkey in solana-accounts-db. The only randomness that comes with it is randomization of 8 byte subslice of the pubkey to be used as a hash map key:
https://github.com/anza-xyz/solana-sdk/blob/a7e12b1d4af8fba2a43d447af31aebdc3dbe8a1d/address/src/lib.rs#L13-L14
https://github.com/anza-xyz/solana-sdk/blob/a7e12b1d4af8fba2a43d447af31aebdc3dbe8a1d/address/src/hasher.rs#L57-L81
There is no other module being pulled by this feature. So I don't think there should be any unexpected impact on the validator.
Replace the itertools grouping_map().aggregate() pattern with direct HashMap construction using entry().and_modify().or_insert() for better performance and reduced allocations. Performance improvement: - Before: 27,206 ns/iter - After: 12,586 ns/iter - Speedup: 2.16x (53.7% faster) The benchmark simulates a realistic scenario with 100 validator nodes, each having 3-5 vote accounts, measuring stake aggregation by node pubkey. Changes: - Remove unused itertools::Itertools import - Pre-allocate HashMap with estimated capacity - Use direct entry API for stake aggregation - Add benchmark for staked_nodes computation
89499d9 to
71e573f
Compare
Replace saturating_div(2) with exact count of non-zero stake accounts for optimal memory allocation. Based on mainnet data showing ~14% of vote accounts have stake: - Old: capacity = vote_accounts.len() / 2 (~3350 for 6700 accounts) - New: capacity = non_zero_count (~970 for 6700 accounts) Trade-offs: - Pro: Exact capacity, zero reallocation, saves ~2380 pre-allocated entries - Pro: Better memory efficiency (86% of vote accounts have zero stake) - Con: Adds ~800ns overhead from counting iteration (~6% slower) Benchmark results: - Before: 12,516 ns/iter - After: 13,353 ns/iter - Trade-off: Slightly slower but uses exact memory
|
I updated the benchmark with realistic mainnet data (6700 vote accounts, 970 with stakes) Benchmark result
Pre-scan is 10% slower than div(2) estimate. But it prevent us from wasting more memory as the zero stake vote account grow. |
6e9e91b to
8c566a0
Compare
|
@brooksprumo I update the benchmark and this dismiss your approval. Can you re-approve it? No actual prod code change. |
|
I'm not sure the new benchmarks should be added. They are comparing different implementations, which is valuable for this PR and choosing one, but beyond this PR I don't see the value. I would think we only want benches in the repo for our current code. We could put the benchmarks for comparing impls as text/source in this PR, which would make it useful if we want to revisit in the future. |
8c566a0 to
29ecdc7
Compare
OK. Reverted the bench commit. The benchmark result is already in the PR comments. If we want to revisit, we can look at the PR. |
brooksprumo
left a comment
There was a problem hiding this comment.
![]()
If we need to make future changes, I think we should rename the non_zero_ stuff to staked_.
* Optimize staked_nodes() by replacing itertools with manual HashMap Replace the itertools grouping_map().aggregate() pattern with direct HashMap construction using entry().and_modify().or_insert() for better performance and reduced allocations. Performance improvement: - Before: 27,206 ns/iter - After: 12,586 ns/iter - Speedup: 2.16x (53.7% faster) The benchmark simulates a realistic scenario with 100 validator nodes, each having 3-5 vote accounts, measuring stake aggregation by node pubkey. Changes: - Remove unused itertools::Itertools import - Pre-allocate HashMap with estimated capacity - Use direct entry API for stake aggregation - Add benchmark for staked_nodes computation * pr feedback * Simplify stake aggregation logic per PR feedback * Use exact non-zero count for HashMap capacity Replace saturating_div(2) with exact count of non-zero stake accounts for optimal memory allocation. Based on mainnet data showing ~14% of vote accounts have stake: - Old: capacity = vote_accounts.len() / 2 (~3350 for 6700 accounts) - New: capacity = non_zero_count (~970 for 6700 accounts) Trade-offs: - Pro: Exact capacity, zero reallocation, saves ~2380 pre-allocated entries - Pro: Better memory efficiency (86% of vote accounts have zero stake) - Con: Adds ~800ns overhead from counting iteration (~6% slower) Benchmark results: - Before: 12,516 ns/iter - After: 13,353 ns/iter - Trade-off: Slightly slower but uses exact memory
Problem
The
staked_nodes()method inVoteAccountswas using itertools'into_grouping_map().aggregate()pattern, which creates intermediate allocations and adds unnecessary overhead for a hot path in validator operations.Summary of Changes
Replace the itertools grouping_map implementation with direct HashMap construction for better performance:
itertools::Itertoolsimportentry().and_modify().or_insert()Performance Impact
Benchmark with realistic scenario (400 validator nodes):
Running on mainnet shows 3-4x Speedup
log