runtime: Avoid locking during stake vote rewards calculation by vadorovsky · Pull Request #6900 · anza-xyz/agave

vadorovsky · 2025-07-09T16:58:59Z

Problem

calculate_stake_vote_rewards was storing accumulated rewards per vote account in a DashMap, which then was used in a parallel iterator over all stake delegations.

There are over 1,000,000 stake delegations and around 1,000 validators. Each thread processed one of the stake delegations and tried to acquire a lock on a DashMap shard corresponding to a validator. Given that the number of validators is disproportionally small and they have thousands of delegations, such solution resulted in high contention, with some threads spending the most of their time on waiting for lock.

The time spent on these calculations was ~232.21ms:

redeem_rewards_us=232210i

Threads spent 65% of their time on waiting for locks:

Summary of Changes

Fix that by:

Removing the DashMap and instead using fold and reduce operations to build a regular HashMap.
Pre-allocating the stake_rewards vector and passing &mut [MaybeUninit<PartitionedStakeReward>] to the thread pool.
Pulling the optimization of StakeHistory::get in solana-stake-interface interface: Optimize the StakeHistory::get function solana-program/stake#81

The time spent on reward calculations goes down to ~48.78ms:

redeem_rewards_us=48781i

Threads spend the most of time doing actual calculations:

Fixes #6899

codecov-commenter · 2025-07-15T21:03:18Z

Codecov Report

❌ Patch coverage is 86.95652% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.4%. Comparing base (94eb488) to head (9e74fcc).
⚠️ Report is 20 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff            @@
##           master    #6900    +/-   ##
========================================
  Coverage    83.4%    83.4%            
========================================
  Files         813      813            
  Lines      366220   366291    +71     
========================================
+ Hits       305691   305806   +115     
+ Misses      60529    60485    -44

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull Request Overview

This PR optimizes the stake vote rewards calculation by replacing the contention-heavy DashMap approach with a custom thread pool implementation that provides per-thread mutable states. This eliminates the need for locking during parallel reward calculations, improving performance from ~208ms to ~12ms.

Key Changes

Introduces a custom scoped thread pool with per-thread worker states to avoid locking
Replaces DashMap with thread-local HashMap collections for vote account rewards
Removes ThreadPool dependency from reward calculation functions

Reviewed Changes

Copilot reviewed 7 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
runtime/src/thread_pool.rs	New custom thread pool implementation with per-thread mutable states
runtime/src/lib.rs	Adds thread_pool module to public interface
runtime/src/bank/partitioned_epoch_rewards/calculation.rs	Refactors reward calculation to use custom thread pool and eliminates DashMap
runtime/src/bank.rs	Updates vote rewards type and calc_vote_accounts_to_store signature
runtime/src/bank/tests.rs	Updates tests to use HashMap instead of DashMap
runtime/Cargo.toml	Adds crossbeam-deque dependency
Cargo.toml	Adds crossbeam-deque to workspace dependencies

`calculate_stake_vote_rewards` was storing accumulated rewards per vote account in a `DashMap`, which then was used in a parallel iterator over all stake delegations. There are over 1,000,000 stake delegations and around 1,000 validators. Each thread processes one of the stake delegations and tries to acquire the lock on a `DashMap` shard corresponding to a validator. Given that the number of validators is disproportionally small and they have thousands of delegations, such solution results in high contention, with some threads spending the most of their time on waiting for lock. The time spent on these calculations was ~208.47ms: ``` redeem_rewards_us=208475i ``` Fix that by: * Removing the `DashMap` and instead using `fold` and `reduce` operations to build a regular `HashMap`. * Pre-allocating the `stake_rewards` vector and passing `&mut [MaybeUninit<PartitionedStakeReward>]` to the thread pool. * Pulling the optimization of `StakeHistory::get` in `solana-stake-interface`. solana-program/stake#81 ``` redeem_rewards_us=48781i ```

…re_capacity_mut`

vadorovsky · 2025-08-25T15:23:16Z

@jstarry @HaoranYi It's ready for review.

Is this the simplest approach? It looks like we can remove the dashmap if we split stake reward calculation across multiple workers and then aggregate the results when all the workers are finished.

I ended up doing exactly that. The HashMap is now built used fold and reduce - that came out to be performing the best.

That said - aggregating PartitionedStakeRewards with fold and reduce came out to be very slow. I tried that in 0fbf93e and the profile looked terrible, allocations in reduce ended up taking half of time:

Which makes sense, there is over 1,000,000 delegations.

That's why in 27519db, I ended up pre-allocating the stake_rewards Vec and passing it as &mut [MaybeUninit<T>] to rayon. And the final, better looking profiles, are in the PR description.

t-nelson · 2025-08-25T20:50:24Z

    ) {
        let epoch_rewards_sysvar = self.get_epoch_rewards_sysvar();
        if epoch_rewards_sysvar.active {
+            let thread_pool = ThreadPoolBuilder::new()


why do we want to lower the thread pool creation? tests should be able to run on a smaller pool. there might even be an idle pool given we're between both slots and epochs

Reward calculation is the only place up from Bank::new_from_fields which needs a thread pool. And we calculate rewards only if the epoch_rewards_sysvar is active.

In the previous place (runtime/src/bank.rs:1828) there was even a TODO:

// TODO: Only create the thread pool if we need to recalculate rewards, // i.e. epoch_reward_status is active. Currently, this thread pool is // always created and used for recalculate_partitioned_rewards and // lt_hash calculation. Once lt_hash feature is active, lt_hash won't // need the thread pool. Thereby, after lt_hash feature activation, we // can change to create the thread pool only when we need to recalculate // rewards.

By moving this thread pool here, we make sure that if someone starts a validator during the time this sysvar is inactive, that validator doesn't waste time on spawning threads, which don't end up being used.

sure. my contention is more that we lose control over pool configuration the lower it's instantiated. buried, adhoc pools like this are how we got a billion pools in the first place and the inability to configure them was a major ci slow down

maybe just punt the change to be addressed in its own pr so we don't hold up the rest of the wins here?

Fair, I moved the pool back.

HaoranYi

Thanks for fixing.
lgtm.

mergify · 2025-08-26T16:56:06Z

Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis.

`calculate_stake_vote_rewards` was storing accumulated rewards per vote account in a `DashMap`, which then was used in a parallel iterator over all stake delegations. There are over 1,000,000 stake delegations and around 1,000 validators. Each thread processes one of the stake delegations and tries to acquire the lock on a `DashMap` shard corresponding to a validator. Given that the number of validators is disproportionally small and they have thousands of delegations, such solution results in high contention, with some threads spending the most of their time on waiting for lock. The time spent on these calculations was ~208.47ms: ``` redeem_rewards_us=208475i ``` Fix that by: * Removing the `DashMap` and instead using `fold` and `reduce` operations to build a regular `HashMap`. * Pre-allocating the `stake_rewards` vector and passing `&mut [MaybeUninit<PartitionedStakeReward>]` to the thread pool. * Pulling the optimization of `StakeHistory::get` in `solana-stake-interface`. solana-program/stake#81 ``` redeem_rewards_us=48781i ``` (cherry picked from commit e752ae6) # Conflicts: # Cargo.toml # programs/sbf/Cargo.toml

jstarry

Sorry but you'll have to revert this. Looks like the calculation in get_reward_distribution_num_blocks for num_partitions will be altered by this change because the length of PartitionedStakeRewards now includes items for stake accounts that do not have rewards because of the internal Option.

…nza-xyz#6900)" This reverts commit e752ae6.

…6900)" (#7738) This reverts commit e752ae6.

vadorovsky force-pushed the epoch-threadpool branch from aa79dfa to f346d5d Compare July 15, 2025 20:17

vadorovsky force-pushed the epoch-threadpool branch 12 times, most recently from 5230af7 to 74bf762 Compare July 18, 2025 16:50

vadorovsky force-pushed the epoch-threadpool branch 4 times, most recently from 2d72aef to 3c9f292 Compare July 28, 2025 15:56

vadorovsky requested a review from Copilot July 28, 2025 15:57

vadorovsky assigned alessandrod Jul 28, 2025

Copilot AI reviewed Jul 28, 2025

View reviewed changes

Comment thread runtime/src/thread_pool.rs Outdated

Comment thread runtime/src/thread_pool.rs Outdated

Comment thread runtime/src/bank/partitioned_epoch_rewards/calculation.rs Outdated

Comment thread runtime/src/bank.rs Outdated

vadorovsky commented Jul 28, 2025

View reviewed changes

Comment thread runtime/src/thread_pool.rs Outdated

vadorovsky force-pushed the epoch-threadpool branch 3 times, most recently from 97df3b0 to acc7d39 Compare July 29, 2025 07:11

vadorovsky requested a review from alessandrod July 29, 2025 07:37

vadorovsky unassigned alessandrod Jul 29, 2025

vadorovsky force-pushed the epoch-threadpool branch from acc7d39 to c7c08a6 Compare July 29, 2025 07:48

vadorovsky marked this pull request as ready for review July 29, 2025 07:52

vadorovsky requested a review from HaoranYi July 29, 2025 07:53

vadorovsky added 3 commits August 25, 2025 17:13

Allocate the stake_rewards vector once and build it using `Vec::spa…

27519db

…re_capacity_mut`

Remove the total_stake_rewards atomic

0856e31

vadorovsky force-pushed the epoch-threadpool branch from 19e575f to 0856e31 Compare August 25, 2025 15:14

vadorovsky marked this pull request as ready for review August 25, 2025 15:15

vadorovsky requested review from HaoranYi and jstarry August 25, 2025 15:26

HaoranYi reviewed Aug 25, 2025

View reviewed changes

Comment thread runtime/src/bank/partitioned_epoch_rewards/calculation.rs Outdated

HaoranYi reviewed Aug 25, 2025

View reviewed changes

Comment thread runtime/src/bank/partitioned_epoch_rewards/calculation.rs Outdated

HaoranYi reviewed Aug 25, 2025

View reviewed changes

Comment thread runtime/src/bank/partitioned_epoch_rewards/calculation.rs Outdated

Address Haroan's comments

b97b351

t-nelson reviewed Aug 25, 2025

View reviewed changes

HaoranYi previously approved these changes Aug 25, 2025

View reviewed changes

Move the thread pool back in the old place

9e74fcc

vadorovsky dismissed HaoranYi’s stale review via 9e74fcc August 26, 2025 15:00

HaoranYi approved these changes Aug 26, 2025

View reviewed changes

vadorovsky merged commit e752ae6 into anza-xyz:master Aug 26, 2025
54 checks passed

vadorovsky deleted the epoch-threadpool branch August 26, 2025 16:43

vadorovsky added the v3.0 label Aug 26, 2025

mergify Bot mentioned this pull request Aug 26, 2025

v3.0: runtime: Avoid locking during stake vote rewards calculation (backport of #6900) #7725

Closed

jstarry reviewed Aug 27, 2025

View reviewed changes

vadorovsky restored the epoch-threadpool branch August 27, 2025 07:50

vadorovsky removed the v3.0 label Aug 27, 2025

vadorovsky added a commit to vadorovsky/agave that referenced this pull request Aug 27, 2025

Revert "runtime: Avoid locking during stake vote rewards calculation (a…

6bac923

…nza-xyz#6900)" This reverts commit e752ae6.

vadorovsky mentioned this pull request Aug 27, 2025

Revert "runtime: Avoid locking during stake vote rewards calculation (#6900)" #7738

Merged

vadorovsky added a commit that referenced this pull request Aug 27, 2025

Revert "runtime: Avoid locking during stake vote rewards calculation (#…

23d0e95

…6900)" (#7738) This reverts commit e752ae6.

vadorovsky mentioned this pull request Aug 27, 2025

runtime: Avoid locking during stake vote rewards calculation #7742

Merged

Conversation

vadorovsky commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Summary of Changes

Uh oh!

codecov-commenter commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vadorovsky commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

t-nelson Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

vadorovsky Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

t-nelson Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

vadorovsky Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

HaoranYi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mergify Bot commented Aug 26, 2025

Uh oh!

jstarry left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

vadorovsky commented Jul 9, 2025 •

edited

Loading

codecov-commenter commented Jul 15, 2025 •

edited

Loading

vadorovsky commented Aug 25, 2025 •

edited

Loading

vadorovsky Aug 26, 2025 •

edited

Loading