Client: keep syncing at tip of chain by jochem-brouwer · Pull Request #1132 · ethereumjs/ethereumjs-monorepo

jochem-brouwer · 2021-02-28T18:28:07Z

This PR intends to keep the fullsync-syncer syncing at the tip of the chain. (Note: current branch target is the YoloV3 branch)

Add enqueueTask to fetcher. This allows one to manually enqueue a task, and possibly restart the fetcher
Add an option to not destroy the fetcher when it has processed all tasks
Optimize full sync: in case no peers are available, then wait 5 seconds and try again, to get a best peer (this possibly fixes the re-bootstrap at some points)
Fullsync sync method never resolves.
Enqueue new blocks when the best peer announces new blocks

This is very WIP. Here are some things which need to be resolved:

Add logic when there is a gap in requested blocks (possibly, we just change the block fetcher, such that we can just update the highest block number on-demand, and not add all this extra logic)
Add tests
Fix the error as described below

Currently, it seems to randomly work for some blocks (when we are at tip of chain, i.e. on YoloV3, every 15 seconds it imports and executes a new block), but then at some point it throws:

  devp2p:eth Received NEW_BLOCK_HASHES message from 3.9.20.133:30303: e6e5a0dcc51743c5579b7bcdfadda2209ee2280e16e3e065d74333ba9ee8... +12s
  devp2p:eth Send GET_BLOCK_HEADERS message to 3.9.20.133:30303: c7830150d5018080 +5ms
  devp2p:eth Received BLOCK_HEADERS message from 3.9.20.133:30303: f9025cf90259a0d28799ec1625a5d9853afc3e10f415dd05071191af1bbf... +15ms
  devp2p:eth Send GET_BLOCK_BODIES message to 3.9.20.133:30303: e1a0dcc51743c5579b7bcdfadda2209ee2280e16e3e065d74333ba9ee836... +5ms
  devp2p:eth Received BLOCK_BODIES message from 3.9.20.133:30303: c3c2c0c0 +15ms
ERROR [02-28|18:27:21] Error [ERR_STREAM_PUSH_AFTER_EOF]: stream.push() after EOF
    at readableAddChunk (_stream_readable.js:260:30)
    at BlockFetcher.Readable.push (_stream_readable.js:213:10)
    at BlockFetcher.dequeue (/Volumes/Ethereum/ethereumjs-vm/packages/client/lib/sync/fetcher/fetcher.ts:163:17)
    at BlockFetcher.success (/Volumes/Ethereum/ethereumjs-vm/packages/client/lib/sync/fetcher/fetcher.ts:219:14)
    at /Volumes/Ethereum/ethereumjs-vm/packages/client/lib/sync/fetcher/fetcher.ts:268:44
    at processTicksAndRejections (internal/process/task_queues.js:97:5)
  devp2p:eth Send GET_BLOCK_HEADERS message to 3.9.20.133:30303: c7830150d0048080 +28ms

codecov · 2021-02-28T18:29:31Z

Codecov Report

Merging #1132 (909b3aa) into master (3d9774d) will increase coverage by 0.77%.
The diff coverage is 77.60%.

Flag	Coverage Δ
block	`86.76% <ø> (-0.17%)`	⬇️
blockchain	`83.56% <ø> (ø)`
client	`83.41% <77.60%> (-0.87%)`	⬇️
common	`94.39% <ø> (-0.11%)`	⬇️
devp2p	`82.86% <ø> (-0.11%)`	⬇️
ethash	`86.56% <ø> (ø)`
tx	`88.24% <ø> (-0.12%)`	⬇️
vm	`82.78% <ø> (+2.60%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

holgerd77 · 2021-03-05T11:09:07Z

Rebased this one respectively cherry-picked the new commit.

jochem-brouwer · 2021-03-05T14:49:49Z

Will take a look at the tests ASAP, they probably fail because this Promise in full sync now never resolves.

jochem-brouwer · 2021-03-27T17:12:00Z

Yolo's bootnode is down =(

holgerd77 · 2021-03-27T17:24:49Z

Yes, I guess Yolo is shut down completely 😕, a bit sad, right? Maybe we want to get to a situation quickly where we can spin up our own networks between two plus clients, then we can do some proper integrated testing.

jochem-brouwer · 2021-03-27T17:30:54Z

Yeah, I think we need some tooling to import custom genesis blocks so we can setup a development environment to sync with private nodes.

holgerd77 · 2021-08-16T11:12:45Z

Rebased this.

holgerd77 · 2021-08-16T11:47:17Z

Short writeup on how to get to a quick test scenario with two local clients:

# 1. Start a dev1 client to fill in a first few hundred blocks, cut off quickly (goerli works more streamlined than mainnet)
npm run client:start -- --network=goerli --datadir=datadir-dev1

# 2. Restart dev1 client in a useful debug scenario and with discovery switched off, keep running
DEBUG=devp2p:ETH npm run client:start:dev1 -- --datadir=datadir-dev1

# 3. Start dev2 client from another window to connect to dev1 client, should sync these first couple of hundred blocks
rm -Rf ./datadir-dev2 && DEBUG=devp2p:ETH npm run client:start:dev1 -- --datadir=datadir-dev1

Should lead to something like this:

acolytec3 · 2021-08-16T11:54:39Z

Short writeup on how to get to a quick test scenario with two local clients:

# 1. Start a dev1 client to fill in a first few hundred blocks, cut off quickly (goerli works more streamlined than mainnet)
npm run client:start -- --network=goerli --datadir=datadir-dev1

# 2. Restart dev1 client in a useful debug scenario and with discovery switched off, keep running
DEBUG=devp2p:ETH npm run client:start:dev1 -- --datadir=datadir-dev1

# 3. Start dev2 client from another window to connect to dev1 client, should sync these first couple of hundred blocks
rm -Rf ./datadir-dev2 && DEBUG=devp2p:ETH npm run client:start:dev1 -- --datadir=datadir-dev1

Should lead to something like this:

Nice! I've been trying to conceive of how to approach this.

holgerd77 · 2021-08-16T15:43:01Z

Ok, this should already bring us some good way in the right direction.

I've now reworked larger parts of this PR and then integrated some more fine-grained and practical setting of the SYNCHRONIZED status. This has now been moved to config so that it is more widely available. A status update is now not working for a Promise from the Peer sync to resolve. Instead on every chain update it is checked if we are reaching a targeted head of the chain which is taken and updated from the NEW_BLOCK_HASHES methods. If the chain receives no further updates for some certain amount of time (set this to 60s for a start) the SYNCHRONIZED status is reset to false.

I am not claiming that I've always found the optimal structures here everywhere, so @jochem-brouwer and others feel free to evolve on this if you have some strong feeling for further reworking things.

Hope this is some good start though. 😄

//cc @gabrocheleau @acolytec3

ryanio · 2021-08-17T00:37:57Z

-          this.pool.size
-        }`
+    // eslint-disable-next-line no-async-promise-executor
+    return await new Promise(async () => {


hm there is no resolve or reject on this promise

hm, maybe this is a place where Event.SYNC_SYNCHRONIZED would be useful, we could resolve when received if the typedoc above is still accurate: * @return Resolves when sync completed

(edit: tentatively added this, although we can reevaluate if it's appropriate here, if not please feel free to remove, but we should consider how to have at least one path to resolve(true))

ryanio · 2021-08-17T03:21:06Z

Ok I was able to accomplish quite a few things, please browse my added commits for details on each one.

The integration tests still aren't fully working for me yet, my last commit helped but now [Integration:FullSync] is hanging after finishing, have to stop for today but I can continue tomorrow.

holgerd77 · 2021-08-17T11:36:25Z

-      })
-      if (min.eqn(-1)) {
-        return
+  handleNewBlockHashes(data: [Buffer, BN][]) {


Ah, thanks, that's a lot better! 😄

holgerd77 · 2021-08-17T11:47:56Z

-    return await new Promise(async () => {
-      if (!peer) return false
+    return await new Promise(async (resolve, reject) => {
+      if (!peer) return resolve(false)


Still no Promise expert, what would happen if we skip this whole Promise wrapping? 🤔

Generally this whole Promise resolving has gotten less important with the extraction of the sync status determination by listening and acting to chain updates (I already thougth about just returning Promise<void> on this function).

We just need to make sure that in the Sync.start() call no concurrent sync() calls are triggered.

Update: tested this with removing the Promise wrapper, this then stops at another test in sync.spec.ts unit test, couldn't investigate further but still not completely getting why (if) this is needed.

I believe it's part of some promise chain that awaits for results, but I'm sure we could find an alternative architecture, I think it is probably a bit redundant now since it just awaits the SYNC'd event we can probably just listen for that wherever this promise is functionally being used.

…eumClient to config

…e, get latest block number from status msg

…ompile error

holgerd77 · 2021-08-19T10:22:09Z

Rebased this.

…UPDATED handling to also work for light clients

holgerd77 · 2021-08-19T11:11:33Z

Ok, tests are processing further with the latest changes in 28bf3b4, still not finishing the # should sync with best peer tests in the [Integration:LightSync] tests (npm run tape -- test/integration/lightsync.spec.ts).

holgerd77 · 2021-08-19T12:00:12Z

+        } else if (best?.les) {
+          targetHeight = new BN(best.les.status.headNum)
+        }
+      }


I guess we want to have this for directly setting the sync target height on a first iteration.

Have given this a bit more thought. The timestamp from the block is actually also not a good indicator for having the chain synced. For example for a stalled chain with 3 participants and 2 signers where we are the one signer and the other signer dropped some time ago (and we therefore would be on charge to produce a next block) there just won't be any new NEW_BLOCK_HASHES triggers that would update the chain and bring us in a synced state so that we could finally decide to start on block producing.

I guess with this combination (setting to the peer height and then soon after to the block height from NEW_BLOCK_HASHES) we should already cover a good flavour of scenarios (in the "normal" case there should also somewhat soon be a NEW_BLOCK_HASHES message coming in).

One scenario which is not covered yet though: how to decide in a two peer setup where we are the block producer and the other peer is just waiting for us that we "are synced" and therefore should actually start/continue producing blocks? 🤔

Have added this in ccc0e09.

Ok, I guess this b24dc4a is my first take on this, to wait for some time if we find a best peer with a higher target height (and in the meantime there is also the chance that some NEW_BLOCK_HASHES come in) and otherwise set the sync target height to the local DB height and emit a SYNCHRONIZED event.

This should now also cover this 2 peer scenario and give us the basics to start on block producing.

(really tricky edge cases here though, I wonder e.g. if we should add an explicit flag to allow block production from genesis onwards to not accidentally start new chains if there is no network connectivity)

… on local DB is found

…ds test fixes before activation)

holgerd77 · 2021-08-20T12:33:42Z

Phew, I had the most intense debug session on this I guess I ever had this year, roughly three hours straight and testing close to everything to see why these tests are hanging in test/integration/fullsync.spec.ts.

I did try to get hold of the hanging processes by using ps aux and then going down with lsof -p {process.pid} which I got from this leaked-handles package, but this was not very informative.

I manually injected some console.log(obj) into the node_modules/leaked-handles sources to get the full stream objects hanging and not only this summarized version. This at least revealed that these were actually two WriteStream and one ReadStream handles hanging, still not sure if the TCP hint from leaked-handles was therefore generally misguiding or if a TCP stream is also a WriteStream.

I digged super deep into the mocks/mockserver.ts and mocks/network.ts code and entered debug output in the respective destroy() methods to see if all streams got destroyed. Seems that everything is good there.

Finally I went down all the commits. The complete hanging was actually already there after the first commit from Jochem starting the work here.

So as some side note and (re-)learning: we really really should do client work and commits on a very granular basis and extremely regularly run the tests. This is getting so hard otherwise to track down things, especially since there often additionally come things like timing/handlers/unresolved Promises issues into play.

I am now at least at a point where the tests pass every second time or so. Lol.

Hope that this can be resolved with some timer or so. Very much open for everyone to be investigated and continued.

I would have a strong tendency to then merge this PR once we have got the tests reliably running. This is otherwise getting too extensive and I think we have got a good basis here. This can then already be used to continue work e.g. on the tx pool or the block builder and things from here can also still later be revised or evolved. These kind of changes from here will also get easier to debug once we have the local chain sync functionality (so tx pool + block builder) in place. That will allow us to very easily simulate various edge cases locally.

holgerd77

Ok, I would now merge this in so that we can build on top of this in other PRs, also see my latest comment on that.

This has also been cross-reviewed along the work process by having had three people working on the PR.

jochem-brouwer added PR state: WIP package: client labels Feb 28, 2021

jochem-brouwer mentioned this pull request Mar 1, 2021

Implement EIP2718 and EIP2930 #1048

Merged

4 tasks

holgerd77 force-pushed the yolov3-client branch from 8acd154 to 0bc81f7 Compare March 5, 2021 10:04

Base automatically changed from yolov3-client to master March 5, 2021 11:04

holgerd77 force-pushed the continous-syncing branch from 8aa8341 to 1626309 Compare March 5, 2021 11:08

holgerd77 reviewed Mar 5, 2021

View reviewed changes

Comment thread packages/client/lib/sync/fullsync.ts Outdated

holgerd77 mentioned this pull request Mar 22, 2021

Implement EIP1559 Fee Market + EIP3198 BaseFee #1148

Merged

holgerd77 mentioned this pull request Mar 30, 2021

Update ETH peer status via announcements #988

Closed

holgerd77 mentioned this pull request Jun 22, 2021

Client: Calaveras Sync Improvements + Working Branch #1310

Merged

gabrocheleau mentioned this pull request Jul 7, 2021

Client: eth_chainID and eth_syncing RPC implementations #1314

Merged

2 tasks

This was referenced Aug 10, 2021

Release new version of @ethereumjs/client #1365

Closed

Client: add basic TxPool #1176

Merged

holgerd77 force-pushed the continous-syncing branch from 1626309 to 99fe9fc Compare August 16, 2021 11:11

holgerd77 reviewed Aug 16, 2021

View reviewed changes

Comment thread packages/client/lib/sync/fetcher/fetcher.ts Outdated

ryanio reviewed Aug 17, 2021

View reviewed changes

Comment thread packages/client/lib/sync/lightsync.ts Outdated

holgerd77 reviewed Aug 17, 2021

View reviewed changes

ryanio added 9 commits August 19, 2021 12:21

update client.synchronized to client.config.synchronized

06f43c1

tidy up handleNewBlockHashes code

725135e

add resolve and reject conditions to syncWithPeer, fix tests

4cd53b6

clean up interval use, stops memory leak

719f98e

remove redundant test now that synchronized property moved from Ether…

dcff9c4

…eumClient to config

emit event to finish sync

e12fc3c

to help finish integration tests, if syncTargetHeight is not availabl…

6b78a52

…e, get latest block number from status msg

implement newly added abstract method to SynchronizerTest to fix ts c…

a570a73

…ompile error

fix promise that was throwing due to undefined

cd2245c

holgerd77 force-pushed the continous-syncing branch from 0947669 to cd2245c Compare August 19, 2021 10:21

holgerd77 added 2 commits August 19, 2021 12:27

client: align destroyWhenDone and destroyOnFinish naming in Fetcher

d5473fe

client: switched from blocks to headers height in Synchronizer CHAIN_…

28bf3b4

…UPDATED handling to also work for light clients

client: use best peer status height for a first sync target height

ccc0e09

holgerd77 reviewed Aug 19, 2021

View reviewed changes

holgerd77 added 5 commits August 19, 2021 14:23

client: unify full and light sync() method in Synchronizer

a9350ac

client: update sync state if no peer with a higher status height than…

b24dc4a

… on local DB is found

client: fixed sync unit test

298f787

client: comment out sync status update on stale peer connections (nee…

7ccc23f

…ds test fixes before activation)

client: comment out whole sync reconnection logic until tests are fixed

b607574

client: make full sync integration tests more robust

909b3aa

holgerd77 marked this pull request as ready for review August 20, 2021 13:54

holgerd77 added PR state: needs review and removed PR state: WIP labels Aug 20, 2021

holgerd77 approved these changes Aug 20, 2021

View reviewed changes

holgerd77 merged commit cbcb7ac into master Aug 20, 2021

holgerd77 deleted the continous-syncing branch August 20, 2021 13:56

holgerd77 mentioned this pull request Aug 23, 2021

Client: Invalid remainder Sync Error #1416

Closed

Uh oh!

Conversation

jochem-brouwer commented Feb 28, 2021

Uh oh!

codecov Bot commented Feb 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

holgerd77 commented Mar 5, 2021

Uh oh!

jochem-brouwer commented Mar 5, 2021

Uh oh!

Uh oh!

jochem-brouwer commented Mar 27, 2021

Uh oh!

holgerd77 commented Mar 27, 2021

Uh oh!

jochem-brouwer commented Mar 27, 2021

Uh oh!

holgerd77 commented Aug 16, 2021

Uh oh!

Uh oh!

holgerd77 commented Aug 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

acolytec3 commented Aug 16, 2021

Uh oh!

holgerd77 commented Aug 16, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryanio Aug 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ryanio commented Aug 17, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

holgerd77 commented Aug 19, 2021

Uh oh!

holgerd77 commented Aug 19, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

holgerd77 commented Aug 20, 2021

Uh oh!

holgerd77 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov Bot commented Feb 28, 2021 •

edited

Loading

holgerd77 commented Aug 16, 2021 •

edited

Loading

ryanio Aug 17, 2021 •

edited

Loading