runtime: Avoid locking during stake vote rewards calculation by vadorovsky · Pull Request #7742 · anza-xyz/agave

vadorovsky · 2025-08-27T10:37:44Z

Problem

calculate_stake_vote_rewards was storing accumulated rewards per vote account in a DashMap, which then was used in a parallel iterator over all stake delegations.

There are over 1,000,000 stake delegations and around 1,000 validators. Each thread processes one of the stake delegations and tries to acquire the lock on a DashMap shard corresponding to a validator. Given that the number of validators is disproportionally small and they have thousands of delegations, such solution results in high contention, with some threads spending the most of their time on waiting for lock.

The time spent on these calculations was ~208.47ms:

redeem_rewards_us=208475i

Threads spent 65% of their time on waiting for locks:

Summary of Changes

Fix that by:

Removing the DashMap and instead using fold and reduce operations to build a regular HashMap.
Pre-allocating the stake_rewards vector and passing &mut [MaybeUninit<PartitionedStakeReward>] to the thread pool.
Pulling the optimization of StakeHistory::get in solana-stake-interface. interface: Optimize the StakeHistory::get function solana-program/stake#81

The time spent on the calculation decreased to ~49ms:

redeem_rewards_us=48781i

Threads spend the most of time doing actual calculations:

Fixes #6899

vadorovsky · 2025-08-27T10:39:32Z

+        // SAFETY: We initialized all the `stake_rewards` elements up to the capacity.
+        unsafe {
+            stake_rewards.set_len(stake_rewards.capacity());
+            stake_rewards.set_len_some(len_stake_rewards_some);


This addresses #6900 (review)

vadorovsky · 2025-08-27T10:40:57Z

+    }
+
+    /// Number of `Some` elements.
+    pub(crate) fn len_some(&self) -> usize {


This method addresses #6900 (review)

I didn't implement Deref and DerefMut on purpose - this way, the len method from the inner Vec is not available, so consumers of PartitionedStakeRewards are forced to use len_some.

vadorovsky · 2025-08-27T10:41:32Z

+    /// * there is no payout or if any deserved payout is < 1 lamport
+    /// * corresponding vote account was not found in cache and accounts-db
+    #[test]
+    fn test_get_reward_distribution_num_blocks_none() {


A test to make sure we don't break get_reward_distribution_num_blocks

vadorovsky · 2025-08-27T10:43:32Z

-            .0
-            .bank;
+        // Delegations with sufficient stake to get rewards (2 SOL).
+        let delegations_with_rewards = 100;


I was tempted to put 1_000_000 to match the numbers we're seeing on mainnet, but unfortunately, even 1_000 causes create_reward_bank_with_specific_stakes to execute for over a minute... I will try to fix that and improve the test in a separate PR.

codecov-commenter · 2025-08-27T11:47:39Z

Codecov Report

❌ Patch coverage is 98.10811% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.0%. Comparing base (680bb32) to head (53c3121).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff            @@
##           master    #7742    +/-   ##
========================================
  Coverage    83.0%    83.0%            
========================================
  Files         815      815            
  Lines      357726   357966   +240     
========================================
+ Hits       297171   297412   +241     
+ Misses      60555    60554     -1

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

vadorovsky · 2025-09-05T14:59:26Z

@jstarry all your comments are addressed, PTAL

jstarry · 2025-09-05T15:51:54Z

 pub fn redeem_rewards(
    rewarded_epoch: Epoch,
-    stake_state: &mut StakeStateV2,
+    stake_account: &StakeAccount<Delegation>,


nit: looks like we only need to pass stake_state as before, not the full stake_account

If we pass stake_state like before, then we are going to lose the win that you commented below. 😅 We would need to do stake_account.stake_state().unwrap() in the caller (calculation.rs redeem_delegation_rewards). I prefer matching on the whole stake account here and returning a copy of stake here.

jstarry · 2025-09-05T16:28:57Z

Looks good! Just a bunch of small things that you can take or leave.

vadorovsky · 2025-09-11T07:22:21Z

@jstarry All comments should be addressed now. I disagree with only one of them #7742 (comment), all others are fixed in the way you proposed.

jstarry

Looks very solid. Sorry have a few other suggestions still but the PR looks correct and ready to go otherwise

`calculate_stake_vote_rewards` was storing accumulated rewards per vote account in a `DashMap`, which then was used in a parallel iterator over all stake delegations. There are over 1,000,000 stake delegations and around 1,000 validators. Each thread processes one of the stake delegations and tries to acquire the lock on a `DashMap` shard corresponding to a validator. Given that the number of validators is disproportionally small and they have thousands of delegations, such solution results in high contention, with some threads spending the most of their time on waiting for lock. The time spent on these calculations was ~208.47ms: ``` redeem_rewards_us=208475i ``` Fix that by: * Removing the `DashMap` and instead using `fold` and `reduce` operations to build a regular `HashMap`. * Pre-allocating the `stake_rewards` vector and passing `&mut [MaybeUninit<PartitionedStakeReward>]` to the thread pool. * Pulling the optimization of `StakeHistory::get` in `solana-stake-interface`. solana-program/stake#81 ``` redeem_rewards_us=48781i ```

mergify · 2025-09-11T13:35:36Z

Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis.

`calculate_stake_vote_rewards` was storing accumulated rewards per vote account in a `DashMap`, which then was used in a parallel iterator over all stake delegations. There are over 1,000,000 stake delegations and around 1,000 validators. Each thread processes one of the stake delegations and tries to acquire the lock on a `DashMap` shard corresponding to a validator. Given that the number of validators is disproportionally small and they have thousands of delegations, such solution results in high contention, with some threads spending the most of their time on waiting for lock. The time spent on these calculations was ~208.47ms: ``` redeem_rewards_us=208475i ``` Fix that by: * Removing the `DashMap` and instead using `fold` and `reduce` operations to build a regular `HashMap`. * Pre-allocating the `stake_rewards` vector and passing `&mut [MaybeUninit<PartitionedStakeReward>]` to the thread pool. * Pulling the optimization of `StakeHistory::get` in `solana-stake-interface`. solana-program/stake#81 ``` redeem_rewards_us=48781i ``` (cherry picked from commit 8aa41ea) # Conflicts: # runtime/src/bank/partitioned_epoch_rewards/calculation.rs

…z#7742) `calculate_stake_vote_rewards` was storing accumulated rewards per vote account in a `DashMap`, which then was used in a parallel iterator over all stake delegations. There are over 1,000,000 stake delegations and around 1,000 validators. Each thread processes one of the stake delegations and tries to acquire the lock on a `DashMap` shard corresponding to a validator. Given that the number of validators is disproportionally small and they have thousands of delegations, such solution results in high contention, with some threads spending the most of their time on waiting for lock. The time spent on these calculations was ~208.47ms: ``` redeem_rewards_us=208475i ``` Fix that by: * Removing the `DashMap` and instead using `fold` and `reduce` operations to build a regular `HashMap`. * Pre-allocating the `stake_rewards` vector and passing `&mut [MaybeUninit<PartitionedStakeReward>]` to the thread pool. * Pulling the optimization of `StakeHistory::get` in `solana-stake-interface`. solana-program/stake#81 ``` redeem_rewards_us=48781i ``` (cherry picked from commit 8aa41ea)

`calculate_stake_vote_rewards` was storing accumulated rewards per vote account in a `DashMap`, which then was used in a parallel iterator over all stake delegations. There are over 1,000,000 stake delegations and around 1,000 validators. Each thread processes one of the stake delegations and tries to acquire the lock on a `DashMap` shard corresponding to a validator. Given that the number of validators is disproportionally small and they have thousands of delegations, such solution results in high contention, with some threads spending the most of their time on waiting for lock. The time spent on these calculations was ~208.47ms: ``` redeem_rewards_us=208475i ``` Fix that by: * Removing the `DashMap` and instead using `fold` and `reduce` operations to build a regular `HashMap`. * Pre-allocating the `stake_rewards` vector and passing `&mut [MaybeUninit<PartitionedStakeReward>]` to the thread pool. * Pulling the optimization of `StakeHistory::get` in `solana-stake-interface`. solana-program/stake#81 ``` redeem_rewards_us=48781i ``` (cherry picked from commit 8aa41ea)

…z#7742) `calculate_stake_vote_rewards` was storing accumulated rewards per vote account in a `DashMap`, which then was used in a parallel iterator over all stake delegations. There are over 1,000,000 stake delegations and around 1,000 validators. Each thread processes one of the stake delegations and tries to acquire the lock on a `DashMap` shard corresponding to a validator. Given that the number of validators is disproportionally small and they have thousands of delegations, such solution results in high contention, with some threads spending the most of their time on waiting for lock. The time spent on these calculations was ~208.47ms: ``` redeem_rewards_us=208475i ``` Fix that by: * Removing the `DashMap` and instead using `fold` and `reduce` operations to build a regular `HashMap`. * Pre-allocating the `stake_rewards` vector and passing `&mut [MaybeUninit<PartitionedStakeReward>]` to the thread pool. * Pulling the optimization of `StakeHistory::get` in `solana-stake-interface`. solana-program/stake#81 ``` redeem_rewards_us=48781i ``` (cherry picked from commit 8aa41ea) Conflicts: runtime/src/bank/partitioned_epoch_rewards/calculation.rs

…ackport of #7742) (#8012) `calculate_stake_vote_rewards` was storing accumulated rewards per vote account in a `DashMap`, which then was used in a parallel iterator over all stake delegations. There are over 1,000,000 stake delegations and around 1,000 validators. Each thread processes one of the stake delegations and tries to acquire the lock on a `DashMap` shard corresponding to a validator. Given that the number of validators is disproportionally small and they have thousands of delegations, such solution results in high contention, with some threads spending the most of their time on waiting for lock. The time spent on these calculations was ~208.47ms: ``` redeem_rewards_us=208475i ``` Fix that by: * Removing the `DashMap` and instead using `fold` and `reduce` operations to build a regular `HashMap`. * Pre-allocating the `stake_rewards` vector and passing `&mut [MaybeUninit<PartitionedStakeReward>]` to the thread pool. * Pulling the optimization of `StakeHistory::get` in `solana-stake-interface`. solana-program/stake#81 ``` redeem_rewards_us=48781i ``` (cherry picked from commit 8aa41ea) Co-authored-by: Michal R <vad.sol@proton.me>

vadorovsky commented Aug 27, 2025

View reviewed changes

vadorovsky requested review from alessandrod and jstarry August 27, 2025 12:09

vadorovsky marked this pull request as ready for review August 27, 2025 12:10

vadorovsky force-pushed the epoch-threadpool-v2 branch from 0f44e4b to 9915bc5 Compare August 27, 2025 12:16

jstarry reviewed Sep 3, 2025

View reviewed changes

vadorovsky force-pushed the epoch-threadpool-v2 branch 3 times, most recently from c49991e to 3d61dfb Compare September 5, 2025 12:49

jstarry reviewed Sep 5, 2025

View reviewed changes

vadorovsky force-pushed the epoch-threadpool-v2 branch from 3e3311a to 8ef764d Compare September 8, 2025 07:36

jstarry reviewed Sep 11, 2025

View reviewed changes

Comment thread runtime/src/inflation_rewards/mod.rs Outdated

Comment thread runtime/src/bank/partitioned_epoch_rewards/mod.rs Outdated

Comment thread runtime/src/bank/partitioned_epoch_rewards/calculation.rs

jstarry previously approved these changes Sep 11, 2025

View reviewed changes

vadorovsky dismissed jstarry’s stale review via 53c3121 September 11, 2025 11:40

vadorovsky force-pushed the epoch-threadpool-v2 branch from f7c3158 to 53c3121 Compare September 11, 2025 11:40

jstarry approved these changes Sep 11, 2025

View reviewed changes

vadorovsky merged commit 8aa41ea into anza-xyz:master Sep 11, 2025
43 checks passed

vadorovsky added the v3.0 label Sep 11, 2025

mergify Bot mentioned this pull request Sep 11, 2025

v3.0: runtime: Avoid locking during stake vote rewards calculation (backport of #7742) #8012

Merged

vadorovsky mentioned this pull request Feb 23, 2026

runtime: bench stakes cache #10760

Merged

Conversation

vadorovsky commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Summary of Changes

Uh oh!

vadorovsky Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

vadorovsky Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

vadorovsky Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

vadorovsky Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vadorovsky commented Sep 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jstarry Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

vadorovsky Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jstarry commented Sep 5, 2025

Uh oh!

vadorovsky commented Sep 11, 2025

Uh oh!

jstarry left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify Bot commented Sep 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vadorovsky commented Aug 27, 2025 •

edited

Loading

codecov-commenter commented Aug 27, 2025 •

edited

Loading