Skip to content

Gets snapshot storages before calling verify_accounts_hash()#1202

Merged
brooksprumo merged 1 commit into
anza-xyz:masterfrom
brooksprumo:verify-accounts-hash/get-storages
May 7, 2024
Merged

Gets snapshot storages before calling verify_accounts_hash()#1202
brooksprumo merged 1 commit into
anza-xyz:masterfrom
brooksprumo:verify-accounts-hash/get-storages

Conversation

@brooksprumo
Copy link
Copy Markdown

Problem

On pop256, one of the nodes kept failing during accounts hash verification. The root cause was due to the incremental accounts hash calculation getting wrong snapshot storages to use for the actual calculation.

Accounts hash verification happens in the background, and getting the snapshot storages is concurrent with normal node activity. Verification also checks the full accounts hash first; so by the time incremental accounts hash is checked, it has been many minutes later on pop256, which has over 5 billion accounts.

Getting the snapshot storages happens lazily—right before calculating each accounts hash. Thus, calculating the incremental accounts hash doesn't get the snapshot storages it needs until the node has been running for a while, and clean has run many times. In this case, the storages were marked as dead, and thus the IAH did not get all the storages it needed for the calculation.

We need to get the snapshot storages before calling verify/before the foreground processing begins, to ensure we have all the correct storages.

Summary of Changes

Pass in the snapshot storages as a function parameter to verify_accounts_hash_and_lamports() as the fix for startup verification accounts hash mismatch issues due to storages getting cleaned away prematurely.

This change fixed the issue seen on pop256.

@brooksprumo brooksprumo self-assigned this May 6, 2024
@brooksprumo brooksprumo marked this pull request as ready for review May 6, 2024 19:10
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

Attention: Patch coverage is 97.29730% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 82.1%. Comparing base (aa2f078) to head (a1d7153).
Report is 7 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##           master    #1202     +/-   ##
=========================================
- Coverage    82.1%    82.1%   -0.1%     
=========================================
  Files         886      886             
  Lines      236417   236442     +25     
=========================================
- Hits       194266   194256     -10     
- Misses      42151    42186     +35     

Comment thread runtime/src/bank.rs
Comment on lines +5623 to +5630
// The snapshot storages must be captured *before* starting the background verification.
// Otherwise, it is possible that a delayed call to `get_snapshot_storages()` will *not*
// get the correct storages required to calculate and verify the accounts hashes.
let snapshot_storages = self
.rc
.accounts
.accounts_db
.get_snapshot_storages(RangeFull);
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get the snapshot storages here, before starting the background thread to verify the account hashes.


assert_matches!(
db.verify_accounts_hash_and_lamports(some_slot, 1, None, config.clone()),
db.verify_accounts_hash_and_lamports_for_tests(some_slot, 1, config.clone()),
Copy link
Copy Markdown
Author

@brooksprumo brooksprumo May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this one in the previous PR.

))
assert!(accounts
.accounts_db
.verify_accounts_hash_and_lamports_for_tests(
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmark needed to be updated too.

Copy link
Copy Markdown

@jeffwashington jeffwashington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Copy Markdown

@HaoranYi HaoranYi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice finding! lgtm!

@brooksprumo brooksprumo merged commit 9c8b621 into anza-xyz:master May 7, 2024
@brooksprumo brooksprumo deleted the verify-accounts-hash/get-storages branch May 7, 2024 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants