v3.0: runtime: Avoid locking during stake vote rewards calculation (backport of #7742)#8012
v3.0: runtime: Avoid locking during stake vote rewards calculation (backport of #7742)#8012vadorovsky merged 2 commits intov3.0from
Conversation
|
Cherry-pick of 8aa41ea has failed: To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally |
c75ce81 to
9b2ce48
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## v3.0 #8012 +/- ##
========================================
Coverage 83.4% 83.4%
========================================
Files 810 810
Lines 365464 365704 +240
========================================
+ Hits 305114 305344 +230
- Misses 60350 60360 +10 🚀 New features to boost your workflow:
|
9b2ce48 to
b250989
Compare
|
@vadorovsky for backports we typically keep the backported commit untouched even if it has conflicts. The conflicts should be resolved in a follow up commit so that backport reviewers can easily see what the conflicts were and how they were resolved |
OK, will do that next time. |
Please do it for this one too. You can reset your branch back to c75ce81 and then add a commit for the conflict resolution changes. FYI, in order for CI to pass we will need to have this PR rebased after #8040 gets merged. So feel free to hold off on fixing up the branch until that's in |
|
#8040 is merged now! |
|
@vadorovsky can you get this one fixed up? |
I'm trying, but... rebasing the original backport commit c75ce81 with the current v3.0 adds even more conflicts. What should I do? Squash the new conflicts into the original backport commit, then solve all conflicts together in the next commit? The problem with that is that the diff after that rebase looks quite awkward, as the conflicts become nested: diff --cc runtime/src/bank/partitioned_epoch_rewards/calculation.rs
index 64076afa07,a5a27551cd..0000000000
--- a/runtime/src/bank/partitioned_epoch_rewards/calculation.rs
+++ b/runtime/src/bank/partitioned_epoch_rewards/calculation.rs
@@@ -580,10 -688,9 +688,14 @@@ mod tests
stake_account::StakeAccount,
stakes::Stakes,
},
+ agave_feature_set::FeatureSet,
rayon::ThreadPoolBuilder,
solana_account::{accounts_equal, state_traits::StateMut, ReadableAccount},
++<<<<<<< HEAD
+ solana_genesis_config::GenesisConfig,
++=======
+ solana_accounts_db::partitioned_rewards::PartitionedEpochRewardsConfig,
++>>>>>>> c75ce81ed9 (runtime: Avoid locking during stake vote rewards calculation (#7742))
solana_native_token::LAMPORTS_PER_SOL,
solana_reward_info::RewardType,
solana_stake_interface::state::{Delegation, StakeStateV2},
@@@ -1131,6 -1258,8 +1263,11 @@@
bank.recalculate_partitioned_rewards(null_tracer(), &thread_pool);
assert_eq!(bank.epoch_reward_status, EpochRewardStatus::Inactive);
}
+ <<<<<<< HEAD
+ =======
++<<<<<<< HEAD
++=======
++>>>>>>> c75ce81ed9 (runtime: Avoid locking during stake vote rewards calculation (#7742))
#[test]
fn test_initialize_after_snapshot_restore() {
@@@ -1165,9 -1294,7 +1302,13 @@@
// Run post snapshot restore initialization which should first apply
// active features and then recalculate rewards
let thread_pool = ThreadPoolBuilder::new().num_threads(1).build().unwrap();
++<<<<<<< HEAD
+ bank.initialize_after_snapshot_restore(&GenesisConfig::default(), None, false, || {
+ &thread_pool
+ });
++=======
+ bank.initialize_after_snapshot_restore(|| &thread_pool);
++>>>>>>> c75ce81ed9 (runtime: Avoid locking during stake vote rewards calculation (#7742))
let EpochRewardStatus::Active(EpochRewardPhase::Distribution(distribution_status)) =
bank.epoch_reward_status.clone()
@@@ -1184,8 -1311,93 +1325,100 @@@
distribution_status.distribution_starting_block_height
);
assert_eq!(
++<<<<<<< HEAD
+ calculation_status.all_stake_rewards.len(),
+ expected_num_stake_rewards
+ );
+ }
++=======
+ calculation_status.all_stake_rewards.num_rewards(),
+ expected_num_stake_rewards
+ );
+ }
+
+ #[test]
+ fn test_reward_accumulator() {
+ let mut accumulator1 = RewardsAccumulator::default();
+ let mut accumulator2 = RewardsAccumulator::default();
+
+ let vote_pubkey_a = Pubkey::new_unique();
+ let vote_account_a =
+ vote_state::create_account(&vote_pubkey_a, &Pubkey::new_unique(), 20, 100);
+ let vote_pubkey_b = Pubkey::new_unique();
+ let vote_account_b =
+ vote_state::create_account(&vote_pubkey_b, &Pubkey::new_unique(), 20, 100);
+ let vote_pubkey_c = Pubkey::new_unique();
+ let vote_account_c =
+ vote_state::create_account(&vote_pubkey_c, &Pubkey::new_unique(), 20, 100);
+
+ accumulator1.add_reward(
+ vote_pubkey_a,
+ VoteReward {
+ vote_account: vote_account_a.clone(),
+ commission: 10,
+ vote_rewards: 50,
+ },
+ 50,
+ );
+ accumulator1.add_reward(
+ vote_pubkey_b,
+ VoteReward {
+ vote_account: vote_account_b.clone(),
+ commission: 10,
+ vote_rewards: 50,
+ },
+ 50,
+ );
+ accumulator2.add_reward(
+ vote_pubkey_b,
+ VoteReward {
+ vote_account: vote_account_b,
+ commission: 10,
+ vote_rewards: 30,
+ },
+ 30,
+ );
+ accumulator2.add_reward(
+ vote_pubkey_c,
+ VoteReward {
+ vote_account: vote_account_c,
+ commission: 10,
+ vote_rewards: 50,
+ },
+ 50,
+ );
+
+ assert_eq!(accumulator1.num_stake_rewards, 2);
+ assert_eq!(accumulator1.total_stake_rewards_lamports, 100);
+ let vote_reward_a_1 = accumulator1.vote_rewards.get(&vote_pubkey_a).unwrap();
+ assert_eq!(vote_reward_a_1.commission, 10);
+ assert_eq!(vote_reward_a_1.vote_rewards, 50);
+ let vote_reward_b_1 = accumulator1.vote_rewards.get(&vote_pubkey_b).unwrap();
+ assert_eq!(vote_reward_b_1.commission, 10);
+ assert_eq!(vote_reward_b_1.vote_rewards, 50);
+
+ let vote_reward_b_2 = accumulator2.vote_rewards.get(&vote_pubkey_b).unwrap();
+ assert_eq!(vote_reward_b_2.commission, 10);
+ assert_eq!(vote_reward_b_2.vote_rewards, 30);
+ let vote_reward_c_2 = accumulator2.vote_rewards.get(&vote_pubkey_c).unwrap();
+ assert_eq!(vote_reward_c_2.commission, 10);
+ assert_eq!(vote_reward_c_2.vote_rewards, 50);
+
+ let accumulator = accumulator1 + accumulator2;
+
+ assert_eq!(accumulator.num_stake_rewards, 4);
+ assert_eq!(accumulator.total_stake_rewards_lamports, 180);
+ let vote_reward_a = accumulator.vote_rewards.get(&vote_pubkey_a).unwrap();
+ assert_eq!(vote_reward_a.commission, 10);
+ assert_eq!(vote_reward_a.vote_rewards, 50);
+ let vote_reward_b = accumulator.vote_rewards.get(&vote_pubkey_b).unwrap();
+ assert_eq!(vote_reward_b.commission, 10);
+ // sum of the vote rewards from both accumulators
+ assert_eq!(vote_reward_b.vote_rewards, 80);
+ let vote_reward_c = accumulator.vote_rewards.get(&vote_pubkey_c).unwrap();
+ assert_eq!(vote_reward_c.commission, 10);
+ assert_eq!(vote_reward_c.vote_rewards, 50);
+ }
+ >>>>>>> 8aa41ea86 (runtime: Avoid locking during stake vote rewards calculation (#7742))
++>>>>>>> c75ce81ed9 (runtime: Avoid locking during stake vote rewards calculation (#7742))
}Another option (which makes more sense to me) would be starting completely from scratch - by cherry-picking the commit from master, committing it with conflicts, then solving them in the next commit. Would that work for you? |
yeah it's the same thing mergify does. just be sure to force push the result back up to this branch |
`calculate_stake_vote_rewards` was storing accumulated rewards per vote account in a `DashMap`, which then was used in a parallel iterator over all stake delegations. There are over 1,000,000 stake delegations and around 1,000 validators. Each thread processes one of the stake delegations and tries to acquire the lock on a `DashMap` shard corresponding to a validator. Given that the number of validators is disproportionally small and they have thousands of delegations, such solution results in high contention, with some threads spending the most of their time on waiting for lock. The time spent on these calculations was ~208.47ms: ``` redeem_rewards_us=208475i ``` Fix that by: * Removing the `DashMap` and instead using `fold` and `reduce` operations to build a regular `HashMap`. * Pre-allocating the `stake_rewards` vector and passing `&mut [MaybeUninit<PartitionedStakeReward>]` to the thread pool. * Pulling the optimization of `StakeHistory::get` in `solana-stake-interface`. solana-program/stake#81 ``` redeem_rewards_us=48781i ``` (cherry picked from commit 8aa41ea) Conflicts: runtime/src/bank/partitioned_epoch_rewards/calculation.rs
b250989 to
7a5206f
Compare
done |
| impl Add for RewardsAccumulator { | ||
| type Output = Self; | ||
|
|
||
| fn add(self, rhs: Self) -> Self::Output { |
There was a problem hiding this comment.
nit: prefer descriptive method names to overriding arithmetic operators with unintuitive behavior. doesn't need fixed here, but a master change would be much appreciated
| self.rewards | ||
| .iter() | ||
| .enumerate() | ||
| .filter_map(|(index, reward)| Some((index, reward.as_ref()?))) |
There was a problem hiding this comment.
nit: i think this would have been more intuitive (don't change here)
| .filter_map(|(index, reward)| Some((index, reward.as_ref()?))) | |
| .filter_map(|(index, reward)| reward.map(|reward| (index, reward))) |
Problem
calculate_stake_vote_rewardswas storing accumulated rewards per vote account in aDashMap, which then was used in a parallel iterator over all stake delegations.There are over 1,000,000 stake delegations and around 1,000 validators. Each thread processes one of the stake delegations and tries to acquire the lock on a
DashMapshard corresponding to a validator. Given that the number of validators is disproportionally small and they have thousands of delegations, such solution results in high contention, with some threads spending the most of their time on waiting for lock.The time spent on these calculations was ~208.47ms:
Threads spent 65% of their time on waiting for locks:
Summary of Changes
Fix that by:
DashMapand instead usingfoldandreduceoperations to build a regularHashMap.stake_rewardsvector and passing&mut [MaybeUninit<PartitionedStakeReward>]to the thread pool.StakeHistory::getinsolana-stake-interface. interface: Optimize theStakeHistory::getfunction solana-program/stake#81The time spent on the calculation decreased to ~49ms:
Threads spend the most of time doing actual calculations:
Fixes #6899
This is an automatic backport of pull request #7742 done by [Mergify](https://mergify.com).