Skip to content
This repository was archived by the owner on Jan 22, 2025. It is now read-only.

Add more reporting for invalid stake cache members and prune them#21654

Merged
jstarry merged 2 commits intosolana-labs:masterfrom
jstarry:clean-up-stakes-cache
Dec 9, 2021
Merged

Add more reporting for invalid stake cache members and prune them#21654
jstarry merged 2 commits intosolana-labs:masterfrom
jstarry:clean-up-stakes-cache

Conversation

@jstarry
Copy link
Copy Markdown
Contributor

@jstarry jstarry commented Dec 6, 2021

Problem

Due to a lack of confidence that the stakes cache is consistent with accounts db, we don't make use of it when calculating stake rewards on epoch boundaries. We have some warning metrics for some scenarios where the cache is inconsistent, but not all cases are covered. We also don't have a way to remove any new invalid stakes cache entries.

Summary of Changes

  • Add feature to evict detected invalid stakes cache entries
  • Report metrics for more scenarios when stakes cache entries are invalid

Fixes #

@jstarry jstarry force-pushed the clean-up-stakes-cache branch from 123ee58 to 162a14a Compare December 7, 2021 00:11
@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 7, 2021

Codecov Report

Merging #21654 (6d28303) into master (a1adcb2) will decrease coverage by 0.0%.
The diff coverage is 64.5%.

@@            Coverage Diff            @@
##           master   #21654     +/-   ##
=========================================
- Coverage    81.6%    81.6%   -0.1%     
=========================================
  Files         511      511             
  Lines      143085   143132     +47     
=========================================
- Hits       116848   116820     -28     
- Misses      26237    26312     +75     

@jstarry jstarry requested review from brooksprumo, ryoqun and t-nelson and removed request for t-nelson December 7, 2021 02:57
Copy link
Copy Markdown
Contributor

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still reading through load_vote_and_stake_accounts_with_thread_pool(); wanted to post these few comments now.

Comment thread runtime/src/bank.rs
Comment thread runtime/src/bank.rs Outdated
Comment thread runtime/src/bank.rs
@brooksprumo brooksprumo self-requested a review December 8, 2021 18:08
brooksprumo
brooksprumo previously approved these changes Dec 8, 2021
Copy link
Copy Markdown
Contributor

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the changes look good to me. Personally, the load_vote_and_stake_accounts_with_thread_pool() function seems quite large and could benefit from some simplification, but I don't think that needs to gate this PR. Additionally, I don't know all the context surrounding this code, so I'm not sure if there are any corner cases that I've missed.

@brooksprumo
Copy link
Copy Markdown
Contributor

As a follow-up, I think that more-knowing eyes should approve the changes before the backports go through.

@jstarry
Copy link
Copy Markdown
Contributor Author

jstarry commented Dec 8, 2021

As a follow-up, I think that more-knowing eyes should approve the changes before the backports go through.

Agreed, @ryoqun or @t-nelson do you have bandwidth to do a pass?

t-nelson
t-nelson previously approved these changes Dec 8, 2021
Copy link
Copy Markdown
Contributor

@t-nelson t-nelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Comment thread runtime/src/bank.rs Outdated
Comment on lines +1059 to +1060
invalid_stake_keys_set: DashSet<Pubkey>,
invalid_vote_keys_set: DashSet<Pubkey>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're looking to prevent the cache entering an inconsistent state, it might be valuable to tag the addresses with the reason they are invalid; missing, wrong owner, bad state

Copy link
Copy Markdown
Contributor Author

@jstarry jstarry Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added reasons, thanks for the suggestion

@mergify mergify Bot dismissed stale reviews from brooksprumo and t-nelson December 9, 2021 15:11

Pull request has been modified.

@jstarry jstarry force-pushed the clean-up-stakes-cache branch from 6fff7a7 to 6d28303 Compare December 9, 2021 15:13
@jstarry jstarry added the automerge Merge this Pull Request automatically once CI passes label Dec 9, 2021
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Dec 9, 2021

automerge label removed due to a CI failure

@mergify mergify Bot removed the automerge Merge this Pull Request automatically once CI passes label Dec 9, 2021
@jstarry jstarry merged commit 6fc3291 into solana-labs:master Dec 9, 2021
@jstarry jstarry deleted the clean-up-stakes-cache branch December 9, 2021 19:12
mergify Bot pushed a commit that referenced this pull request Dec 9, 2021
…1654)

* Add more reporting for invalid stake cache members

* feedback

(cherry picked from commit 6fc3291)

# Conflicts:
#	Cargo.lock
#	programs/bpf/Cargo.lock
#	runtime/Cargo.toml
#	runtime/src/bank.rs
mergify Bot pushed a commit that referenced this pull request Dec 9, 2021
…1654)

* Add more reporting for invalid stake cache members

* feedback

(cherry picked from commit 6fc3291)
mergify Bot added a commit that referenced this pull request Dec 10, 2021
…1654) (#21741)

* Add more reporting for invalid stake cache members

* feedback

(cherry picked from commit 6fc3291)

Co-authored-by: Justin Starry <justin@solana.com>
jstarry added a commit that referenced this pull request Dec 10, 2021
…ckport #21654) (#21740)

* Add more reporting for invalid stake cache members and prune them (#21654)

* Add more reporting for invalid stake cache members

* feedback

(cherry picked from commit 6fc3291)

# Conflicts:
#	Cargo.lock
#	programs/bpf/Cargo.lock
#	runtime/Cargo.toml
#	runtime/src/bank.rs

* resolve conflicts

Co-authored-by: Justin Starry <justin@solana.com>
@sakridge
Copy link
Copy Markdown
Contributor

Tests?

t-nelson added a commit to t-nelson/solana that referenced this pull request Dec 13, 2021
tao-stones pushed a commit that referenced this pull request Dec 13, 2021
CriesofCarrots pushed a commit to CriesofCarrots/solana that referenced this pull request Dec 16, 2021
CriesofCarrots pushed a commit that referenced this pull request Dec 17, 2021
@brooksprumo brooksprumo mentioned this pull request Jan 5, 2022
Comment thread runtime/src/bank.rs
Comment on lines +2303 to +2305
if let Some(stakes_cache) = maybe_stakes_cache.as_mut() {
stakes_cache.remove_stake_delegation(&stake_pubkey);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jstarry
Looks like this is pruning stake-account if the voter-pubkey does not exist. But in this scenario:

  • create a vote account
  • delegate a stake to it
  • delete the vote account
  • stake account is pruned from the cache at epoch boundary after rewards calculation.
  • recreate the vote account with same pubkey

doesn't this make the cache inconsistent with accounts-db? i.e. you end up with a valid stake account in accounts-db which is not cached.
Or is above scenario not possible.
cc @joncinque

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't prune stake accounts if the voter pubkey doesn't exist. It only prunes them if they don't exist in accounts db or are not valid stake accounts.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, seems right, my mistake 👍

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants