Skip to content
This repository was archived by the owner on Jan 22, 2025. It is now read-only.

Propagate dead slots up to replay#17227

Merged
carllin merged 3 commits intosolana-labs:masterfrom
carllin:FixBlockstoreProcessor
May 25, 2021
Merged

Propagate dead slots up to replay#17227
carllin merged 3 commits intosolana-labs:masterfrom
carllin:FixBlockstoreProcessor

Conversation

@carllin
Copy link
Copy Markdown
Contributor

@carllin carllin commented May 14, 2021

Problem

  1. The state machine cluster_slot_state_verifier in replay needs dead slots in the form of banks to make decisions about duplicate slot (detecting a confirmed version, dumping their descendants, repairing the correct version, etc.). However, currently get_slots_since() currently ignores dead slots if they were marked dead from a previous replay (and then the validator restarted for instance), so those slots never become banks.

  2. Another issue is blockstore processor currently replays banks, but if they error/are marked dead, it continues here:

    , which means those banks are dropped and not added to initial_forks via the following call
    &mut initial_forks,
    , and don't end up in the resulting BankForks. However, the call to new_from_parent() while constructing these dead banks already modified the account state by adding sysvars to the accounts state for that dead slot.

This inconsistency means if we then try to reconstruct the dead banks during replay via another call to new_from_parent(), we encounter errors/conflicts with that existing state.

Summary of Changes

Blockstore processor now returns the tips of dead slots unless they have been outdated by a root, and then clean will sweep up all the dead state once a new root is set

Fixes #

@carllin carllin requested review from jstarry, ryoqun and sakridge May 14, 2021 10:28
@carllin carllin force-pushed the FixBlockstoreProcessor branch 2 times, most recently from abdef06 to ce586c8 Compare May 14, 2021 22:06
@codecov
Copy link
Copy Markdown

codecov Bot commented May 15, 2021

Codecov Report

Merging #17227 (2d1be87) into master (d8bc56f) will increase coverage by 0.0%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##           master   #17227   +/-   ##
=======================================
  Coverage    82.7%    82.7%           
=======================================
  Files         425      425           
  Lines      118809   118803    -6     
=======================================
+ Hits        98265    98270    +5     
+ Misses      20544    20533   -11     

@carllin carllin marked this pull request as ready for review May 15, 2021 00:07
Comment thread ledger/src/blockstore.rs Outdated
Comment on lines 2758 to 2759
let valid_next_slots: Vec<u64> = meta.next_slots.to_vec();
(*height, valid_next_slots)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: fold into (*height, meta.next_slots.to_vec())?

Comment on lines 955 to 964
// Block must be frozen by this point, otherwise `process_single_slot` would
// have errored above
Copy link
Copy Markdown
Contributor

@ryoqun ryoqun May 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: // Bank must be frozen by this point because any errors should have been bailed out above.

Comment thread ledger/src/blockstore.rs
@@ -2755,12 +2755,7 @@ impl Blockstore {
.zip(slot_metas)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like get_slots_since isn't used for rpc. So, it's only used by replay stage.

Comment thread ledger/src/blockstore_processor.rs Outdated
Comment thread ledger/src/blockstore_processor.rs Outdated
);
// Insert even dead banks into the BankForks so that cleanup
// of account state created on Bank creation via `Bank::new_from_parent()`
// will occur when the bank is dropped. Otherwise, this state is not
Copy link
Copy Markdown
Contributor

@ryoqun ryoqun May 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lazy question: So, purge_slot() called inside Bank::drop() isn't enough for cleanup of account state?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, maybe this?: #17269

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryoqun with #17319, I think this change is no longer necessary, so reverted the blockstore_processor changes!

Comment thread ledger/src/blockstore_processor.rs
// 1) Validator can't vote on earlier ancestor of last vote due to switch threshold (can't vote
// on ancestors of last vote)
// 2) Won't reset to this earlier ancestor becasue reset can only happen on same voted fork if
// it's for the last vote slot or later
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@carllin carllin force-pushed the FixBlockstoreProcessor branch from ce586c8 to 74d7f29 Compare May 24, 2021 21:57
@carllin carllin force-pushed the FixBlockstoreProcessor branch from 74d7f29 to 2d1be87 Compare May 25, 2021 00:31
@carllin carllin merged commit 3dfe879 into solana-labs:master May 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants