This repository was archived by the owner on Jan 22, 2025. It is now read-only.
Hack to skip cleanup_dead_slots upon snapshot load#8561
Merged
mvines merged 1 commit intosolana-labs:masterfrom Mar 2, 2020
Merged
Hack to skip cleanup_dead_slots upon snapshot load#8561mvines merged 1 commit intosolana-labs:masterfrom
mvines merged 1 commit intosolana-labs:masterfrom
Conversation
Contributor
Contributor
Author
Contributor
Author
|
This PR is first worked on the Bad: Good: |
ryoqun
commented
Mar 2, 2020
| let (accounts, skip_account_assertion) = f(accounts, current_slot); | ||
|
|
||
| assert_eq!(4, accounts.accounts_index.read().unwrap().roots.len()); | ||
| if skip_account_assertion { |
Contributor
Author
There was a problem hiding this comment.
The following assertion causes failure currently for test_accounts_purge_chained_purge_after_snapshot_restore.
This should be fixed by #8337
Contributor
|
Bench failure is unrelated, |
Contributor
Author
|
This got needed in the v0.23 branch for #8436 |
Contributor
Author
|
@Mergifyio refresh |
solana-grimes
pushed a commit
that referenced
this pull request
Mar 3, 2020
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
A validator started from snapshot may create bad snapshots, which cause bank hash mismatch error when consumed by other validators. This is because the validator internally removes too much AccountsStorage and bad snapshots don't include some necessary ones anymore.
Detail of problem
generate_indexhas been broken for long time about maintaining the correct count of storage's counts, which confusespurge_zero_lamport_accounts()(demonstrating assertion are currently disabled in this pr, real fix #8337 is pending).This causes too many slots to be incorrectly marked as dead.
However, it hadn't caused problems because
cleanup_dead_slots()had been broken too when called immediately after snapshot restoration. That function didn't removedead_slotsat all in that case.In short, the caller wrongly requested too many
slots to remove and the callee wrongly didn't anything. So, there had been a peace...But it changed since #8148. This PR is needed as one of fixes for the enlarged snapshot issue #8168. And that PR was thought rather harmless..
After #8148, the callee started to remove actually. But things don't break immediately. Snapshots load successfully. And validator just works.
Its rather subtle ramification of unbalanced fix is at the future snapshot creation: Because
purge_zero_lamport_accounts()incorrectly removes too many slots as dead in some corner case, newly-created snapshots silently start to omit to include that removedstorages backing that slots, causing the deadlyBankHashMismatcherror because of obvious incorrect view into the accounts set.Summary of Changes
Until #8337 lands, just disable erroneous too eager removing behavior as an ugly hack...
This PR should have the same effect as reverting #8148, with better merge-conflict-free characteristics. #8337 should revert most of changes in this PR.
Fixes #8560
Backport only to v1.0 because v0.23 was branched before #8148.