runtime: Avoid locking during stake vote rewards calculation#7742
runtime: Avoid locking during stake vote rewards calculation#7742vadorovsky merged 1 commit intoanza-xyz:masterfrom
Conversation
| // SAFETY: We initialized all the `stake_rewards` elements up to the capacity. | ||
| unsafe { | ||
| stake_rewards.set_len(stake_rewards.capacity()); | ||
| stake_rewards.set_len_some(len_stake_rewards_some); |
| } | ||
|
|
||
| /// Number of `Some` elements. | ||
| pub(crate) fn len_some(&self) -> usize { |
There was a problem hiding this comment.
This method addresses #6900 (review)
I didn't implement Deref and DerefMut on purpose - this way, the len method from the inner Vec is not available, so consumers of PartitionedStakeRewards are forced to use len_some.
| /// * there is no payout or if any deserved payout is < 1 lamport | ||
| /// * corresponding vote account was not found in cache and accounts-db | ||
| #[test] | ||
| fn test_get_reward_distribution_num_blocks_none() { |
There was a problem hiding this comment.
A test to make sure we don't break get_reward_distribution_num_blocks
| .0 | ||
| .bank; | ||
| // Delegations with sufficient stake to get rewards (2 SOL). | ||
| let delegations_with_rewards = 100; |
There was a problem hiding this comment.
I was tempted to put 1_000_000 to match the numbers we're seeing on mainnet, but unfortunately, even 1_000 causes create_reward_bank_with_specific_stakes to execute for over a minute... I will try to fix that and improve the test in a separate PR.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #7742 +/- ##
========================================
Coverage 83.0% 83.0%
========================================
Files 815 815
Lines 357726 357966 +240
========================================
+ Hits 297171 297412 +241
+ Misses 60555 60554 -1 🚀 New features to boost your workflow:
|
0f44e4b to
9915bc5
Compare
c49991e to
3d61dfb
Compare
|
@jstarry all your comments are addressed, PTAL |
| pub fn redeem_rewards( | ||
| rewarded_epoch: Epoch, | ||
| stake_state: &mut StakeStateV2, | ||
| stake_account: &StakeAccount<Delegation>, |
There was a problem hiding this comment.
nit: looks like we only need to pass stake_state as before, not the full stake_account
There was a problem hiding this comment.
If we pass stake_state like before, then we are going to lose the win that you commented below. 😅 We would need to do stake_account.stake_state().unwrap() in the caller (calculation.rs redeem_delegation_rewards). I prefer matching on the whole stake account here and returning a copy of stake here.
|
Looks good! Just a bunch of small things that you can take or leave. |
3e3311a to
8ef764d
Compare
|
@jstarry All comments should be addressed now. I disagree with only one of them #7742 (comment), all others are fixed in the way you proposed. |
jstarry
left a comment
There was a problem hiding this comment.
Looks very solid. Sorry have a few other suggestions still but the PR looks correct and ready to go otherwise
`calculate_stake_vote_rewards` was storing accumulated rewards per vote account in a `DashMap`, which then was used in a parallel iterator over all stake delegations. There are over 1,000,000 stake delegations and around 1,000 validators. Each thread processes one of the stake delegations and tries to acquire the lock on a `DashMap` shard corresponding to a validator. Given that the number of validators is disproportionally small and they have thousands of delegations, such solution results in high contention, with some threads spending the most of their time on waiting for lock. The time spent on these calculations was ~208.47ms: ``` redeem_rewards_us=208475i ``` Fix that by: * Removing the `DashMap` and instead using `fold` and `reduce` operations to build a regular `HashMap`. * Pre-allocating the `stake_rewards` vector and passing `&mut [MaybeUninit<PartitionedStakeReward>]` to the thread pool. * Pulling the optimization of `StakeHistory::get` in `solana-stake-interface`. solana-program/stake#81 ``` redeem_rewards_us=48781i ```
f7c3158 to
53c3121
Compare
|
Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis. |
`calculate_stake_vote_rewards` was storing accumulated rewards per vote account in a `DashMap`, which then was used in a parallel iterator over all stake delegations. There are over 1,000,000 stake delegations and around 1,000 validators. Each thread processes one of the stake delegations and tries to acquire the lock on a `DashMap` shard corresponding to a validator. Given that the number of validators is disproportionally small and they have thousands of delegations, such solution results in high contention, with some threads spending the most of their time on waiting for lock. The time spent on these calculations was ~208.47ms: ``` redeem_rewards_us=208475i ``` Fix that by: * Removing the `DashMap` and instead using `fold` and `reduce` operations to build a regular `HashMap`. * Pre-allocating the `stake_rewards` vector and passing `&mut [MaybeUninit<PartitionedStakeReward>]` to the thread pool. * Pulling the optimization of `StakeHistory::get` in `solana-stake-interface`. solana-program/stake#81 ``` redeem_rewards_us=48781i ``` (cherry picked from commit 8aa41ea) # Conflicts: # runtime/src/bank/partitioned_epoch_rewards/calculation.rs
…z#7742) `calculate_stake_vote_rewards` was storing accumulated rewards per vote account in a `DashMap`, which then was used in a parallel iterator over all stake delegations. There are over 1,000,000 stake delegations and around 1,000 validators. Each thread processes one of the stake delegations and tries to acquire the lock on a `DashMap` shard corresponding to a validator. Given that the number of validators is disproportionally small and they have thousands of delegations, such solution results in high contention, with some threads spending the most of their time on waiting for lock. The time spent on these calculations was ~208.47ms: ``` redeem_rewards_us=208475i ``` Fix that by: * Removing the `DashMap` and instead using `fold` and `reduce` operations to build a regular `HashMap`. * Pre-allocating the `stake_rewards` vector and passing `&mut [MaybeUninit<PartitionedStakeReward>]` to the thread pool. * Pulling the optimization of `StakeHistory::get` in `solana-stake-interface`. solana-program/stake#81 ``` redeem_rewards_us=48781i ``` (cherry picked from commit 8aa41ea)
`calculate_stake_vote_rewards` was storing accumulated rewards per vote account in a `DashMap`, which then was used in a parallel iterator over all stake delegations. There are over 1,000,000 stake delegations and around 1,000 validators. Each thread processes one of the stake delegations and tries to acquire the lock on a `DashMap` shard corresponding to a validator. Given that the number of validators is disproportionally small and they have thousands of delegations, such solution results in high contention, with some threads spending the most of their time on waiting for lock. The time spent on these calculations was ~208.47ms: ``` redeem_rewards_us=208475i ``` Fix that by: * Removing the `DashMap` and instead using `fold` and `reduce` operations to build a regular `HashMap`. * Pre-allocating the `stake_rewards` vector and passing `&mut [MaybeUninit<PartitionedStakeReward>]` to the thread pool. * Pulling the optimization of `StakeHistory::get` in `solana-stake-interface`. solana-program/stake#81 ``` redeem_rewards_us=48781i ``` (cherry picked from commit 8aa41ea)
…z#7742) `calculate_stake_vote_rewards` was storing accumulated rewards per vote account in a `DashMap`, which then was used in a parallel iterator over all stake delegations. There are over 1,000,000 stake delegations and around 1,000 validators. Each thread processes one of the stake delegations and tries to acquire the lock on a `DashMap` shard corresponding to a validator. Given that the number of validators is disproportionally small and they have thousands of delegations, such solution results in high contention, with some threads spending the most of their time on waiting for lock. The time spent on these calculations was ~208.47ms: ``` redeem_rewards_us=208475i ``` Fix that by: * Removing the `DashMap` and instead using `fold` and `reduce` operations to build a regular `HashMap`. * Pre-allocating the `stake_rewards` vector and passing `&mut [MaybeUninit<PartitionedStakeReward>]` to the thread pool. * Pulling the optimization of `StakeHistory::get` in `solana-stake-interface`. solana-program/stake#81 ``` redeem_rewards_us=48781i ``` (cherry picked from commit 8aa41ea) Conflicts: runtime/src/bank/partitioned_epoch_rewards/calculation.rs
…ackport of #7742) (#8012) `calculate_stake_vote_rewards` was storing accumulated rewards per vote account in a `DashMap`, which then was used in a parallel iterator over all stake delegations. There are over 1,000,000 stake delegations and around 1,000 validators. Each thread processes one of the stake delegations and tries to acquire the lock on a `DashMap` shard corresponding to a validator. Given that the number of validators is disproportionally small and they have thousands of delegations, such solution results in high contention, with some threads spending the most of their time on waiting for lock. The time spent on these calculations was ~208.47ms: ``` redeem_rewards_us=208475i ``` Fix that by: * Removing the `DashMap` and instead using `fold` and `reduce` operations to build a regular `HashMap`. * Pre-allocating the `stake_rewards` vector and passing `&mut [MaybeUninit<PartitionedStakeReward>]` to the thread pool. * Pulling the optimization of `StakeHistory::get` in `solana-stake-interface`. solana-program/stake#81 ``` redeem_rewards_us=48781i ``` (cherry picked from commit 8aa41ea) Co-authored-by: Michal R <vad.sol@proton.me>
Problem
calculate_stake_vote_rewardswas storing accumulated rewards per vote account in aDashMap, which then was used in a parallel iterator over all stake delegations.There are over 1,000,000 stake delegations and around 1,000 validators. Each thread processes one of the stake delegations and tries to acquire the lock on a
DashMapshard corresponding to a validator. Given that the number of validators is disproportionally small and they have thousands of delegations, such solution results in high contention, with some threads spending the most of their time on waiting for lock.The time spent on these calculations was ~208.47ms:
Threads spent 65% of their time on waiting for locks:
Summary of Changes
Fix that by:
DashMapand instead usingfoldandreduceoperations to build a regularHashMap.stake_rewardsvector and passing&mut [MaybeUninit<PartitionedStakeReward>]to the thread pool.StakeHistory::getinsolana-stake-interface. interface: Optimize theStakeHistory::getfunction solana-program/stake#81The time spent on the calculation decreased to ~49ms:
Threads spend the most of time doing actual calculations:
Fixes #6899