Add more reporting for invalid stake cache members and prune them#21654
Add more reporting for invalid stake cache members and prune them#21654jstarry merged 2 commits intosolana-labs:masterfrom
Conversation
123ee58 to
162a14a
Compare
Codecov Report
@@ Coverage Diff @@
## master #21654 +/- ##
=========================================
- Coverage 81.6% 81.6% -0.1%
=========================================
Files 511 511
Lines 143085 143132 +47
=========================================
- Hits 116848 116820 -28
- Misses 26237 26312 +75 |
brooksprumo
left a comment
There was a problem hiding this comment.
I'm still reading through load_vote_and_stake_accounts_with_thread_pool(); wanted to post these few comments now.
brooksprumo
left a comment
There was a problem hiding this comment.
Overall the changes look good to me. Personally, the load_vote_and_stake_accounts_with_thread_pool() function seems quite large and could benefit from some simplification, but I don't think that needs to gate this PR. Additionally, I don't know all the context surrounding this code, so I'm not sure if there are any corner cases that I've missed.
|
As a follow-up, I think that more-knowing eyes should approve the changes before the backports go through. |
| invalid_stake_keys_set: DashSet<Pubkey>, | ||
| invalid_vote_keys_set: DashSet<Pubkey>, |
There was a problem hiding this comment.
If we're looking to prevent the cache entering an inconsistent state, it might be valuable to tag the addresses with the reason they are invalid; missing, wrong owner, bad state
There was a problem hiding this comment.
Added reasons, thanks for the suggestion
Pull request has been modified.
6fff7a7 to
6d28303
Compare
|
automerge label removed due to a CI failure |
…ckport #21654) (#21740) * Add more reporting for invalid stake cache members and prune them (#21654) * Add more reporting for invalid stake cache members * feedback (cherry picked from commit 6fc3291) # Conflicts: # Cargo.lock # programs/bpf/Cargo.lock # runtime/Cargo.toml # runtime/src/bank.rs * resolve conflicts Co-authored-by: Justin Starry <justin@solana.com>
|
Tests? |
…them (backport solana-labs#21654) (solana-labs#21740)" This reverts commit 43e7368.
…d prune them (backport solana-labs#21654) (solana-labs#21740)"" This reverts commit 8028f21.
| if let Some(stakes_cache) = maybe_stakes_cache.as_mut() { | ||
| stakes_cache.remove_stake_delegation(&stake_pubkey); | ||
| } |
There was a problem hiding this comment.
@jstarry
Looks like this is pruning stake-account if the voter-pubkey does not exist. But in this scenario:
- create a vote account
- delegate a stake to it
- delete the vote account
- stake account is pruned from the cache at epoch boundary after rewards calculation.
- recreate the vote account with same pubkey
doesn't this make the cache inconsistent with accounts-db? i.e. you end up with a valid stake account in accounts-db which is not cached.
Or is above scenario not possible.
cc @joncinque
There was a problem hiding this comment.
It doesn't prune stake accounts if the voter pubkey doesn't exist. It only prunes them if they don't exist in accounts db or are not valid stake accounts.
There was a problem hiding this comment.
oh, seems right, my mistake 👍
Problem
Due to a lack of confidence that the stakes cache is consistent with accounts db, we don't make use of it when calculating stake rewards on epoch boundaries. We have some warning metrics for some scenarios where the cache is inconsistent, but not all cases are covered. We also don't have a way to remove any new invalid stakes cache entries.
Summary of Changes
Fixes #