[POC] heal the worldstate by matkt · Pull Request #4972 · besu-eth/besu

matkt · 2023-01-22T10:19:21Z

Signed-off-by: Karim TAAM karim.t2am@gmail.com

PR description

This PR add a heal mechanism fo the worldtstate in case of inconsistency (unable to trie node). To fix this, we start a quick heal of the worldstate automatically and once the fix is done we restart the block import.

After the detection of an invalid path

we delete this path to force the healing of this part.
then we delete the trielogs.
we select a pivot block (before of after the heal)
we move the blokchain to this pivot block (rewind or download the missing blocks)
then we launch a worldstate heal.

This feature can also heal a node that has been inconsistent for a long time, but it will take longer because there will be more nodes to heal. With this PR the healing will be done as soon as the problem is detected so there will not a lot to heal and it will be fast

Performed tests

Trigger multiple inconsistencies to fix (passed)
Fixed a node that has been inconsistent for a long time (passed)
Run a snapsync from scratch on goerli (passed)
Run a checkpoint sync from scratch on goelri (passed)
Run a checkpoint sync from scratch on main (in progress)
Run a validator teku+besu on goerli (passed)
Profile performance on this node to avoid perf regression (passed)

Fixed Issue(s)

#4379
#4785
#4768

Documentation

I thought about documentation and added the doc-change-required label to this PR if
updates are required.

Changelog

I thought about the changelog and included a changelog update if required.

Signed-off-by: Karim TAAM <karim.t2am@gmail.com>

Signed-off-by: Matt Nelson <monels11@gmail.com>

Signed-off-by: Karim TAAM <karim.t2am@gmail.com>

Signed-off-by: garyschulte <garyschulte@gmail.com>

Signed-off-by: Karim TAAM <karim.t2am@gmail.com>

garyschulte · 2023-02-04T02:47:55Z

-  public boolean isInitialSyncPhaseDone() {
-    return isInitialSyncPhaseDone;
+  public boolean isResyncNeeded() {
+    return isResyncNeeded;


we never reset this value afaict. once we need a resync, we always need a resync ;)

fixed I switch this flag on markInitialSyncPhaseAsDone

garyschulte · 2023-02-04T02:54:53Z

            != BlockHeader.GENESIS_BLOCK_NUMBER) {
      LOG.info(
-          "Checkpoint sync was requested, but cannot be enabled because the local blockchain is not empty.");
+          "Snap sync was requested, but cannot be enabled because the local blockchain is not empty.");


nit: should be "checkpoint"

done thanks

garyschulte · 2023-02-04T02:57:20Z

-        trieLogStorage,
-        fallbackNodeFinder);
+      final KeyValueStorage trieLogStorage) {
+    super(accountStorage, codeStorage, storageStorage, trieBranchStorage, trieLogStorage);


garyschulte · 2023-02-04T03:07:04Z

  static FinalBlockConfirmation ancestorConfirmation(final MutableBlockchain blockchain) {
-    return firstHeader -> blockchain.contains(firstHeader.getParentHash());
+    return firstHeader ->
+        blockchain.getChainHeadHeader().getHash().equals(firstHeader.getParentHash());


This might this break backward sync for a reorg. e.g. blockhain.contains() will return true if we have block A' but it isn't the chain head. This check requires it to be the chain head...

I'm using the blck number. Should be fine now but. if you can verify

when I pull this branch here I see:
return firstHeader -> blockchain.getChainHeadBlockNumber() + 1 >= firstHeader.getNumber();
I might be missing something, but I read the firstAncestor block as being the earliest block fetched by backward sync. and finalBlockConfirmation.ancestorHeaderReached as the termination condition for fetching further back in BackwardSyncAlgorithm.pickNextStep.

If we are backward syncing because of a fork from 5 or 10 blocks behind head, I think this condition could terminate early. it is almost logically identical to the conditional above it:
return firstHeader -> blockchain.getChainHeadBlockNumber() + 1 >= firstHeader.getNumber();

The first condition makes sense to me b/c we really just need to ensure this backward path resolves to something we have, rather than a particular height.

Ok I think I understand your remark. On the other hand, I have to keep this check because it allows me to have the backardsync works after the heal in case we rewind. Because we will have the blocks but I still want to go back to import them again . So I think I should add the initial condition + this one. In the case of a REORG the first condition will make sure that it does not stop too soon

I proposed another fix on this PR https://github.com/hyperledger/besu/pull/5059/files#diff-0361d2ff12cd43f507bf88a3cb4d781a537ca765329e0811f60a9507527cbe13R47
Thanks

garyschulte

Solid, some of the backward sync changes are concerning.

Signed-off-by: Karim TAAM <karim.t2am@gmail.com>

siladu

Should ethereum/referencetests/src/reference-test/external-resources
be changing as part of this PR?

siladu · 2023-02-04T22:35:01Z

  private final WorldStateArchive worldStateArchive;
  private final ConsensusContext consensusContext;

+  private Optional<Synchronizer> synchronizer;


Think we should consider alternatives to bloating ProtocolContext, a global object.

yes, this is indeed a problem. I tried a lot to do otherwise but this was the cleanest I found with the current code structure

siladu · 2023-02-04T23:28:59Z

    if (worldStateStorage instanceof BonsaiWorldStateKeyValueStorage) {
      LOG.info("Clearing bonsai flat account db");
      worldStateStorage.clearFlatDatabase();
+      worldStateStorage.clearTrieLog();


why do we need to do this?

we will not be able to go back before this pivot block. In addition, we could have bad trielog so it is preferable to heal them too. Finally keeping old trielog could make a false positive for the isWorldStateAvailable.

siladu · 2023-02-04T23:36:39Z

+        protocolContext.getBlockchain().getChainHeadHash().equals(pivotBlockHeader.getHash());
+    if (!isValidChainHead) {
+      if (protocolContext.getBlockchain().contains(pivotBlockHeader.getHash())) {
+        protocolContext.getBlockchain().rewindToBlock(pivotBlockHeader.getHash());


Could the pivot block be > the max allowable bonsai trie layers away? If so, could this rewind take an unfeasible amount of time or maybe error? If so, then what state is the user left in and is it recoverable?

this part rewind only the blockchain part not the worldstate part.
If the pivot block is before the current head, we do a rewind otherwise we do nothing. In any case the worldstate will be healed with the new pivot block without doing a rollback or rollforward because we cannot do it if it is corrupted

it is important to choose a new pivot block and not to heal the current head. Because we are not sure that block will remain on the canonical chain

siladu · 2023-02-04T23:40:30Z

        fastSyncStateStorage.loadState(ScheduleBasedBlockHeaderFunctions.create(protocolSchedule));

-    if (isResync) {
-      worldStateStorage.clear();


why no need to clear the worldStateStorage now?

For a resync there is already a clear in the Worldstate downloader part https://github.com/hyperledger/besu/pull/4972/files#diff-28d8c7a62dddded52099a727bf58aa864d58b055795b95ebfeee6ac8e2a8cea2R173
So it's clearly useless to do it again @garyschulte

matkt · 2023-02-05T08:56:59Z

Should ethereum/referencetests/src/reference-test/external-resources be changing as part of this PR?

yes good catch I fixed that

Signed-off-by: Karim TAAM <karim.t2am@gmail.com>

matkt · 2023-02-06T07:24:21Z

Because of some release difficulties with this PR , I created another one @garyschulte @siladu -> #5059

matkt force-pushed the feature/heal-worldstate branch 2 times, most recently from f173297 to d440264 Compare January 22, 2023 10:53

init heal worldstate implementation

9d179f9

Signed-off-by: Karim TAAM <karim.t2am@gmail.com>

matkt force-pushed the feature/heal-worldstate branch 4 times, most recently from 750c53b to fd31a2d Compare January 25, 2023 09:02

update implementation

d3b6941

Signed-off-by: Karim TAAM <karim.t2am@gmail.com>

matkt force-pushed the feature/heal-worldstate branch 2 times, most recently from 6593ccd to 1f1f96b Compare January 30, 2023 18:59

clean

dead4f8

Signed-off-by: Karim TAAM <karim.t2am@gmail.com>

matkt force-pushed the feature/heal-worldstate branch from 39f1ed8 to dead4f8 Compare January 31, 2023 09:22

matkt and others added 3 commits January 31, 2023 10:33

Merge branch 'main' into feature/heal-worldstate

e5d62cc

added logging changes

3103951

added logging changes

1311342

Signed-off-by: Karim TAAM <karim.t2am@gmail.com>

matkt force-pushed the feature/heal-worldstate branch from 3103951 to 1311342 Compare February 1, 2023 13:56

matkt added 5 commits February 1, 2023 14:57

fix heal issues

0d72427

Signed-off-by: Karim TAAM <karim.t2am@gmail.com>

Merge commit 'main' into feature/heal-worldstate

e511dab

Signed-off-by: Karim TAAM <karim.t2am@gmail.com>

fix build

f76b4c5

Signed-off-by: Karim TAAM <karim.t2am@gmail.com>

clean

21bf689

Signed-off-by: Karim TAAM <karim.t2am@gmail.com>

clean

20f76e2

Signed-off-by: Karim TAAM <karim.t2am@gmail.com>

non-fungible-nelson force-pushed the feature/heal-worldstate branch from 20f76e2 to 3f3000d Compare February 1, 2023 17:02

added logging changes

5cab1c2

Signed-off-by: Matt Nelson <monels11@gmail.com>

non-fungible-nelson force-pushed the feature/heal-worldstate branch from 3f3000d to 5cab1c2 Compare February 1, 2023 17:10

matkt and others added 6 commits February 2, 2023 11:09

Merge branch 'main' into feature/heal-worldstate

1297c9a

Merge branch main

808643c

Signed-off-by: Karim TAAM <karim.t2am@gmail.com>

fixing some issues

e2179d4

Signed-off-by: Karim TAAM <karim.t2am@gmail.com>

Merge branch 'main' into feature/heal-worldstate

1bdb63c

add another path

8c4c437

Signed-off-by: Karim TAAM <karim.t2am@gmail.com>

spotless

d4ec504

Signed-off-by: garyschulte <garyschulte@gmail.com>

garyschulte marked this pull request as ready for review February 3, 2023 20:54

fix tests

960a8bb

Signed-off-by: Karim TAAM <karim.t2am@gmail.com>

matkt force-pushed the feature/heal-worldstate branch from 90bfed2 to 960a8bb Compare February 3, 2023 22:41

Merge branch 'main' into feature/heal-worldstate

744f772

garyschulte reviewed Feb 4, 2023

View reviewed changes

matkt and others added 3 commits February 4, 2023 10:23

fix backward sync for reorg

2b25930

Signed-off-by: Karim TAAM <karim.t2am@gmail.com>

fix metrics tests

bb2b0b2

Signed-off-by: Karim TAAM <karim.t2am@gmail.com>

Merge branch 'main' into feature/heal-worldstate

5a0587d

siladu reviewed Feb 4, 2023

View reviewed changes

fix review comment

686a9cc

Signed-off-by: Karim TAAM <karim.t2am@gmail.com>

matkt mentioned this pull request Feb 5, 2023

Critical Exception Processing Transaction #4768

Closed

matkt closed this Feb 6, 2023

matkt mentioned this pull request Feb 6, 2023

Add worldstate heal mechanism #5059

Merged

2 tasks

garyschulte mentioned this pull request Feb 14, 2023

Bonsai consistency check startup subcommand #4925

Closed

2 tasks

matkt deleted the feature/heal-worldstate branch July 5, 2023 07:23

Uh oh!

Conversation

matkt commented Jan 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR description

Performed tests

Fixed Issue(s)

Documentation

Changelog

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matkt Feb 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

garyschulte left a comment

Choose a reason for hiding this comment

Uh oh!

siladu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matkt commented Feb 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matkt commented Feb 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

matkt commented Jan 22, 2023 •

edited

Loading

matkt Feb 6, 2023 •

edited

Loading

matkt commented Feb 5, 2023 •

edited

Loading