Optimize staked_nodes() for 3-4x performance improvement by HaoranYi · Pull Request #8516 · anza-xyz/agave

HaoranYi · 2025-10-16T15:29:59Z

Problem

The staked_nodes() method in VoteAccounts was using itertools' into_grouping_map().aggregate() pattern, which creates intermediate allocations and adds unnecessary overhead for a hot path in validator operations.

Summary of Changes

Replace the itertools grouping_map implementation with direct HashMap construction for better performance:

Remove unused itertools::Itertools import
Optimize stake aggregation using entry().and_modify().or_insert()
Pre-allocate HashMap with estimated capacity to reduce reallocations
Add benchmark to measure and track performance

Performance Impact

Benchmark with realistic scenario (400 validator nodes):

Before: 27,206 ns/iter
After: 12,586 ns/iter
Improvement: 2.16x speedup (53.7% faster)

Running on mainnet shows 3-4x Speedup

log

[2025-10-16T16:01:25.813898562Z INFO  solana_metrics::metrics] datapoint: staked_nodes_timing num_vote_accounts=6755i old_impl_us=1057i new_impl_us=313i speedup_ratio=3.376996805111821
[2025-10-16T16:01:25.813904720Z INFO  solana_metrics::metrics] datapoint: staked_nodes_timing num_vote_accounts=6750i old_impl_us=864i new_impl_us=245i speedup_ratio=3.526530612244898
[2025-10-16T16:01:29.378669009Z INFO  solana_metrics::metrics] datapoint: staked_nodes_timing num_vote_accounts=6750i old_impl_us=662i new_impl_us=152i speedup_ratio=4.355263157894737

codecov-commenter · 2025-10-16T16:19:03Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.2%. Comparing base (56d328c) to head (8c566a0).
⚠️ Report is 51 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #8516   +/-   ##
=======================================
  Coverage    83.1%    83.2%           
=======================================
  Files         846      846           
  Lines      368573   368576    +3     
=======================================
+ Hits       306652   306709   +57     
+ Misses      61921    61867   -54

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

vadorovsky

Great stuff! I think there is a chance to use the pubkey hasher.

vadorovsky · 2025-10-17T09:04:00Z

-                )
+                // Pre-allocate HashMap with estimated capacity to reduce reallocations
+                let mut staked_nodes =
+                    HashMap::with_capacity(self.vote_accounts.len().saturating_div(2));


Given that this is a hash map of validators, I think we could use PubkeyHasherBuilder instead of the default hasher:

Suggested change

HashMap::with_capacity(self.vote_accounts.len().saturating_div(2));

HashMap::with_capacity_and_hasher(self.vote_accounts.len().saturating_div(2), PubkeyHasherBuilder::default());

That should speed it up even more. 🙂 You'll need to add the PubkeyHasherBuilder generic to the return type in the function signature.

PubkeyHasherBuilder is behind "rand" feature. vote package doesn't enable rand feature.
What's your experience on enable rand feature? Would it be a more broader change and require more testing to ensure no regression?
How about doing it as a separate optimization in a follow-up pr?

Sounds good, we can do it separately.

Regarding adding the rand feature - in this PR #7307 I added the rand feature to solana-pubkey in solana-accounts-db. The only randomness that comes with it is randomization of 8 byte subslice of the pubkey to be used as a hash map key:

https://github.com/anza-xyz/solana-sdk/blob/a7e12b1d4af8fba2a43d447af31aebdc3dbe8a1d/address/src/lib.rs#L13-L14
https://github.com/anza-xyz/solana-sdk/blob/a7e12b1d4af8fba2a43d447af31aebdc3dbe8a1d/address/src/hasher.rs#L57-L81

There is no other module being pulled by this feature. So I don't think there should be any unexpected impact on the validator.

Replace the itertools grouping_map().aggregate() pattern with direct HashMap construction using entry().and_modify().or_insert() for better performance and reduced allocations. Performance improvement: - Before: 27,206 ns/iter - After: 12,586 ns/iter - Speedup: 2.16x (53.7% faster) The benchmark simulates a realistic scenario with 100 validator nodes, each having 3-5 vote accounts, measuring stake aggregation by node pubkey. Changes: - Remove unused itertools::Itertools import - Pre-allocate HashMap with estimated capacity - Use direct entry API for stake aggregation - Add benchmark for staked_nodes computation

Replace saturating_div(2) with exact count of non-zero stake accounts for optimal memory allocation. Based on mainnet data showing ~14% of vote accounts have stake: - Old: capacity = vote_accounts.len() / 2 (~3350 for 6700 accounts) - New: capacity = non_zero_count (~970 for 6700 accounts) Trade-offs: - Pro: Exact capacity, zero reallocation, saves ~2380 pre-allocated entries - Pro: Better memory efficiency (86% of vote accounts have zero stake) - Con: Adds ~800ns overhead from counting iteration (~6% slower) Benchmark results: - Before: 12,516 ns/iter - After: 13,353 ns/iter - Trade-off: Slightly slower but uses exact memory

brooksprumo

HaoranYi · 2025-10-23T14:10:19Z

I updated the benchmark with realistic mainnet data (6700 vote accounts, 970 with stakes)

Benchmark result

Approach	Performance	Memory	Decision
Default (no capacity)	76,157 ns	Minimal initially, grows	Too slow (2.3x)
Pre-scan exact count	36,861 ns	Optimal (970 entries)	11% slower
div(2) estimate	33,249 ns	3,350 entries	Fastest

Pre-scan is 10% slower than div(2) estimate. But it prevent us from wasting more memory as the zero stake vote account grow.
Pre-scan is a better long term solution.

HaoranYi · 2025-10-23T15:26:27Z

@brooksprumo I update the benchmark and this dismiss your approval. Can you re-approve it? No actual prod code change.

brooksprumo · 2025-10-23T16:28:21Z

I'm not sure the new benchmarks should be added. They are comparing different implementations, which is valuable for this PR and choosing one, but beyond this PR I don't see the value. I would think we only want benches in the repo for our current code.

We could put the benchmarks for comparing impls as text/source in this PR, which would make it useful if we want to revisit in the future.

HaoranYi · 2025-10-23T16:40:39Z

I'm not sure the new benchmarks should be added. They are comparing different implementations, which is valuable for this PR and choosing one, but beyond this PR I don't see the value. I would think we only want benches in the repo for our current code.

We could put the benchmarks for comparing impls as text/source in this PR, which would make it useful if we want to revisit in the future.

OK. Reverted the bench commit. The benchmark result is already in the PR comments. If we want to revisit, we can look at the PR.

brooksprumo

If we need to make future changes, I think we should rename the non_zero_ stuff to staked_.

* Optimize staked_nodes() by replacing itertools with manual HashMap Replace the itertools grouping_map().aggregate() pattern with direct HashMap construction using entry().and_modify().or_insert() for better performance and reduced allocations. Performance improvement: - Before: 27,206 ns/iter - After: 12,586 ns/iter - Speedup: 2.16x (53.7% faster) The benchmark simulates a realistic scenario with 100 validator nodes, each having 3-5 vote accounts, measuring stake aggregation by node pubkey. Changes: - Remove unused itertools::Itertools import - Pre-allocate HashMap with estimated capacity - Use direct entry API for stake aggregation - Add benchmark for staked_nodes computation * pr feedback * Simplify stake aggregation logic per PR feedback * Use exact non-zero count for HashMap capacity Replace saturating_div(2) with exact count of non-zero stake accounts for optimal memory allocation. Based on mainnet data showing ~14% of vote accounts have stake: - Old: capacity = vote_accounts.len() / 2 (~3350 for 6700 accounts) - New: capacity = non_zero_count (~970 for 6700 accounts) Trade-offs: - Pro: Exact capacity, zero reallocation, saves ~2380 pre-allocated entries - Pro: Better memory efficiency (86% of vote accounts have zero stake) - Con: Adds ~800ns overhead from counting iteration (~6% slower) Benchmark results: - Before: 12,516 ns/iter - After: 13,353 ns/iter - Trade-off: Slightly slower but uses exact memory

HaoranYi force-pushed the optimize-staked-nodes-hashmap branch from f8e5f71 to 89499d9 Compare October 16, 2025 15:32

HaoranYi changed the title ~~optimize staked nodes hashmap~~ Optimize staked_nodes() for 2.16x performance improvement Oct 16, 2025

HaoranYi changed the title ~~Optimize staked_nodes() for 2.16x performance improvement~~ Optimize staked_nodes() for 3-4x performance improvement Oct 16, 2025

HaoranYi mentioned this pull request Oct 16, 2025

ledger: Build the grouped slot leaders manually #8451

Merged

vadorovsky requested changes Oct 17, 2025

View reviewed changes

vadorovsky previously approved these changes Oct 20, 2025

View reviewed changes

HaoranYi dismissed vadorovsky’s stale review via 71e573f October 20, 2025 19:45

HaoranYi force-pushed the optimize-staked-nodes-hashmap branch from 89499d9 to 71e573f Compare October 20, 2025 19:45

HaoranYi requested a review from brooksprumo October 21, 2025 15:06

brooksprumo requested a review from vadorovsky October 22, 2025 12:29

brooksprumo reviewed Oct 22, 2025

View reviewed changes

Comment thread vote/benches/vote_account.rs Outdated

Comment thread vote/benches/vote_account.rs Outdated

brooksprumo self-requested a review October 22, 2025 12:36

pr feedback

1f007a1

brooksprumo reviewed Oct 22, 2025

View reviewed changes

Comment thread vote/src/vote_account.rs Outdated

Comment thread vote/src/vote_account.rs Outdated

HaoranYi added 2 commits October 22, 2025 16:59

Simplify stake aggregation logic per PR feedback

3c18cd1

brooksprumo previously approved these changes Oct 23, 2025

View reviewed changes

HaoranYi dismissed brooksprumo’s stale review via ea4763e October 23, 2025 14:03

HaoranYi force-pushed the optimize-staked-nodes-hashmap branch from 6e9e91b to 8c566a0 Compare October 23, 2025 14:22

HaoranYi enabled auto-merge October 23, 2025 15:25

HaoranYi requested a review from brooksprumo October 23, 2025 15:25

HaoranYi force-pushed the optimize-staked-nodes-hashmap branch from 8c566a0 to 29ecdc7 Compare October 23, 2025 16:40

brooksprumo approved these changes Oct 23, 2025

View reviewed changes

HaoranYi added this pull request to the merge queue Oct 23, 2025

Merged via the queue into anza-xyz:master with commit c3a829f Oct 23, 2025
55 checks passed

HaoranYi deleted the optimize-staked-nodes-hashmap branch October 23, 2025 17:42

	HashMap::with_capacity(self.vote_accounts.len().saturating_div(2));
	HashMap::with_capacity_and_hasher(self.vote_accounts.len().saturating_div(2), PubkeyHasherBuilder::default());

Conversation

HaoranYi commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Summary of Changes

Performance Impact

Uh oh!

codecov-commenter commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

vadorovsky left a comment

Choose a reason for hiding this comment

Uh oh!

vadorovsky Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

HaoranYi Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

vadorovsky Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

brooksprumo left a comment

Choose a reason for hiding this comment

Uh oh!

HaoranYi commented Oct 23, 2025

Uh oh!

HaoranYi commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brooksprumo commented Oct 23, 2025

Uh oh!

HaoranYi commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brooksprumo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HaoranYi commented Oct 16, 2025 •

edited

Loading

codecov-commenter commented Oct 16, 2025 •

edited

Loading

HaoranYi commented Oct 23, 2025 •

edited

Loading

HaoranYi commented Oct 23, 2025 •

edited

Loading