[POC] heal the worldstate#4972
Conversation
f173297 to
d440264
Compare
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
750c53b to
fd31a2d
Compare
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
6593ccd to
1f1f96b
Compare
39f1ed8 to
dead4f8
Compare
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
3103951 to
1311342
Compare
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
20f76e2 to
3f3000d
Compare
Signed-off-by: Matt Nelson <monels11@gmail.com>
3f3000d to
5cab1c2
Compare
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
90bfed2 to
960a8bb
Compare
| public boolean isInitialSyncPhaseDone() { | ||
| return isInitialSyncPhaseDone; | ||
| public boolean isResyncNeeded() { | ||
| return isResyncNeeded; |
There was a problem hiding this comment.
we never reset this value afaict. once we need a resync, we always need a resync ;)
There was a problem hiding this comment.
fixed I switch this flag on markInitialSyncPhaseAsDone
| != BlockHeader.GENESIS_BLOCK_NUMBER) { | ||
| LOG.info( | ||
| "Checkpoint sync was requested, but cannot be enabled because the local blockchain is not empty."); | ||
| "Snap sync was requested, but cannot be enabled because the local blockchain is not empty."); |
There was a problem hiding this comment.
nit: should be "checkpoint"
| trieLogStorage, | ||
| fallbackNodeFinder); | ||
| final KeyValueStorage trieLogStorage) { | ||
| super(accountStorage, codeStorage, storageStorage, trieBranchStorage, trieLogStorage); |
| static FinalBlockConfirmation ancestorConfirmation(final MutableBlockchain blockchain) { | ||
| return firstHeader -> blockchain.contains(firstHeader.getParentHash()); | ||
| return firstHeader -> | ||
| blockchain.getChainHeadHeader().getHash().equals(firstHeader.getParentHash()); |
There was a problem hiding this comment.
This might this break backward sync for a reorg. e.g. blockhain.contains() will return true if we have block A' but it isn't the chain head. This check requires it to be the chain head...
There was a problem hiding this comment.
I'm using the blck number. Should be fine now but. if you can verify
There was a problem hiding this comment.
when I pull this branch here I see:
return firstHeader -> blockchain.getChainHeadBlockNumber() + 1 >= firstHeader.getNumber();
I might be missing something, but I read the firstAncestor block as being the earliest block fetched by backward sync. and finalBlockConfirmation.ancestorHeaderReached as the termination condition for fetching further back in BackwardSyncAlgorithm.pickNextStep.
If we are backward syncing because of a fork from 5 or 10 blocks behind head, I think this condition could terminate early. it is almost logically identical to the conditional above it:
return firstHeader -> blockchain.getChainHeadBlockNumber() + 1 >= firstHeader.getNumber();
The first condition makes sense to me b/c we really just need to ensure this backward path resolves to something we have, rather than a particular height.
There was a problem hiding this comment.
Ok I think I understand your remark. On the other hand, I have to keep this check because it allows me to have the backardsync works after the heal in case we rewind. Because we will have the blocks but I still want to go back to import them again . So I think I should add the initial condition + this one. In the case of a REORG the first condition will make sure that it does not stop too soon
There was a problem hiding this comment.
I proposed another fix on this PR https://github.com/hyperledger/besu/pull/5059/files#diff-0361d2ff12cd43f507bf88a3cb4d781a537ca765329e0811f60a9507527cbe13R47
Thanks
garyschulte
left a comment
There was a problem hiding this comment.
Solid, some of the backward sync changes are concerning.
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
siladu
left a comment
There was a problem hiding this comment.
Should ethereum/referencetests/src/reference-test/external-resources
be changing as part of this PR?
| private final WorldStateArchive worldStateArchive; | ||
| private final ConsensusContext consensusContext; | ||
|
|
||
| private Optional<Synchronizer> synchronizer; |
There was a problem hiding this comment.
Think we should consider alternatives to bloating ProtocolContext, a global object.
There was a problem hiding this comment.
yes, this is indeed a problem. I tried a lot to do otherwise but this was the cleanest I found with the current code structure
| if (worldStateStorage instanceof BonsaiWorldStateKeyValueStorage) { | ||
| LOG.info("Clearing bonsai flat account db"); | ||
| worldStateStorage.clearFlatDatabase(); | ||
| worldStateStorage.clearTrieLog(); |
There was a problem hiding this comment.
we will not be able to go back before this pivot block. In addition, we could have bad trielog so it is preferable to heal them too. Finally keeping old trielog could make a false positive for the isWorldStateAvailable.
| protocolContext.getBlockchain().getChainHeadHash().equals(pivotBlockHeader.getHash()); | ||
| if (!isValidChainHead) { | ||
| if (protocolContext.getBlockchain().contains(pivotBlockHeader.getHash())) { | ||
| protocolContext.getBlockchain().rewindToBlock(pivotBlockHeader.getHash()); |
There was a problem hiding this comment.
Could the pivot block be > the max allowable bonsai trie layers away? If so, could this rewind take an unfeasible amount of time or maybe error? If so, then what state is the user left in and is it recoverable?
There was a problem hiding this comment.
this part rewind only the blockchain part not the worldstate part.
If the pivot block is before the current head, we do a rewind otherwise we do nothing. In any case the worldstate will be healed with the new pivot block without doing a rollback or rollforward because we cannot do it if it is corrupted
it is important to choose a new pivot block and not to heal the current head. Because we are not sure that block will remain on the canonical chain
| fastSyncStateStorage.loadState(ScheduleBasedBlockHeaderFunctions.create(protocolSchedule)); | ||
|
|
||
| if (isResync) { | ||
| worldStateStorage.clear(); |
There was a problem hiding this comment.
why no need to clear the worldStateStorage now?
There was a problem hiding this comment.
For a resync there is already a clear in the Worldstate downloader part https://github.com/hyperledger/besu/pull/4972/files#diff-28d8c7a62dddded52099a727bf58aa864d58b055795b95ebfeee6ac8e2a8cea2R173
So it's clearly useless to do it again @garyschulte
yes good catch I fixed that |
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
|
Because of some release difficulties with this PR , I created another one @garyschulte @siladu -> #5059 |
Signed-off-by: Karim TAAM karim.t2am@gmail.com
PR description
This PR add a heal mechanism fo the worldtstate in case of inconsistency (unable to trie node). To fix this, we start a quick heal of the worldstate automatically and once the fix is done we restart the block import.
After the detection of an invalid path
This feature can also heal a node that has been inconsistent for a long time, but it will take longer because there will be more nodes to heal. With this PR the healing will be done as soon as the problem is detected so there will not a lot to heal and it will be fast
Performed tests
Fixed Issue(s)
#4379
#4785
#4768
Documentation
doc-change-requiredlabel to this PR ifupdates are required.
Changelog