Client: keep syncing at tip of chain#1132
Conversation
Codecov Report
Flags with carried forward coverage won't be shown. Click here to find out more. |
8aa8341 to
1626309
Compare
|
Rebased this one respectively cherry-picked the new commit. |
|
Will take a look at the tests ASAP, they probably fail because this Promise in full sync now never resolves. |
|
Yolo's bootnode is down =( |
|
Yes, I guess Yolo is shut down completely 😕, a bit sad, right? Maybe we want to get to a situation quickly where we can spin up our own networks between two plus clients, then we can do some proper integrated testing. |
|
Yeah, I think we need some tooling to import custom genesis blocks so we can setup a development environment to sync with private nodes. |
1626309 to
99fe9fc
Compare
|
Rebased this. |
|
Short writeup on how to get to a quick test scenario with two local clients: # 1. Start a dev1 client to fill in a first few hundred blocks, cut off quickly (goerli works more streamlined than mainnet)
npm run client:start -- --network=goerli --datadir=datadir-dev1
# 2. Restart dev1 client in a useful debug scenario and with discovery switched off, keep running
DEBUG=devp2p:ETH npm run client:start:dev1 -- --datadir=datadir-dev1
# 3. Start dev2 client from another window to connect to dev1 client, should sync these first couple of hundred blocks
rm -Rf ./datadir-dev2 && DEBUG=devp2p:ETH npm run client:start:dev1 -- --datadir=datadir-dev1Should lead to something like this: |
Nice! I've been trying to conceive of how to approach this. |
|
Ok, this should already bring us some good way in the right direction. I've now reworked larger parts of this PR and then integrated some more fine-grained and practical setting of the I am not claiming that I've always found the optimal structures here everywhere, so @jochem-brouwer and others feel free to evolve on this if you have some strong feeling for further reworking things. Hope this is some good start though. 😄 //cc @gabrocheleau @acolytec3 |
| this.pool.size | ||
| }` | ||
| // eslint-disable-next-line no-async-promise-executor | ||
| return await new Promise(async () => { |
There was a problem hiding this comment.
hm there is no resolve or reject on this promise
There was a problem hiding this comment.
hm, maybe this is a place where Event.SYNC_SYNCHRONIZED would be useful, we could resolve when received if the typedoc above is still accurate: * @return Resolves when sync completed
(edit: tentatively added this, although we can reevaluate if it's appropriate here, if not please feel free to remove, but we should consider how to have at least one path to resolve(true))
|
Ok I was able to accomplish quite a few things, please browse my added commits for details on each one. The integration tests still aren't fully working for me yet, my last commit helped but now |
| }) | ||
| if (min.eqn(-1)) { | ||
| return | ||
| handleNewBlockHashes(data: [Buffer, BN][]) { |
There was a problem hiding this comment.
Ah, thanks, that's a lot better! 😄
| return await new Promise(async () => { | ||
| if (!peer) return false | ||
| return await new Promise(async (resolve, reject) => { | ||
| if (!peer) return resolve(false) |
There was a problem hiding this comment.
Still no Promise expert, what would happen if we skip this whole Promise wrapping? 🤔
Generally this whole Promise resolving has gotten less important with the extraction of the sync status determination by listening and acting to chain updates (I already thougth about just returning Promise<void> on this function).
We just need to make sure that in the Sync.start() call no concurrent sync() calls are triggered.
There was a problem hiding this comment.
Update: tested this with removing the Promise wrapper, this then stops at another test in sync.spec.ts unit test, couldn't investigate further but still not completely getting why (if) this is needed.
There was a problem hiding this comment.
I believe it's part of some promise chain that awaits for results, but I'm sure we could find an alternative architecture, I think it is probably a bit redundant now since it just awaits the SYNC'd event we can probably just listen for that wherever this promise is functionally being used.
…eumClient to config
…e, get latest block number from status msg
0947669 to
cd2245c
Compare
|
Rebased this. |
…UPDATED handling to also work for light clients
|
Ok, tests are processing further with the latest changes in 28bf3b4, still not finishing the |
| } else if (best?.les) { | ||
| targetHeight = new BN(best.les.status.headNum) | ||
| } | ||
| } |
There was a problem hiding this comment.
I guess we want to have this for directly setting the sync target height on a first iteration.
Have given this a bit more thought. The timestamp from the block is actually also not a good indicator for having the chain synced. For example for a stalled chain with 3 participants and 2 signers where we are the one signer and the other signer dropped some time ago (and we therefore would be on charge to produce a next block) there just won't be any new NEW_BLOCK_HASHES triggers that would update the chain and bring us in a synced state so that we could finally decide to start on block producing.
I guess with this combination (setting to the peer height and then soon after to the block height from NEW_BLOCK_HASHES) we should already cover a good flavour of scenarios (in the "normal" case there should also somewhat soon be a NEW_BLOCK_HASHES message coming in).
One scenario which is not covered yet though: how to decide in a two peer setup where we are the block producer and the other peer is just waiting for us that we "are synced" and therefore should actually start/continue producing blocks? 🤔
There was a problem hiding this comment.
Ok, I guess this b24dc4a is my first take on this, to wait for some time if we find a best peer with a higher target height (and in the meantime there is also the chance that some NEW_BLOCK_HASHES come in) and otherwise set the sync target height to the local DB height and emit a SYNCHRONIZED event.
This should now also cover this 2 peer scenario and give us the basics to start on block producing.
(really tricky edge cases here though, I wonder e.g. if we should add an explicit flag to allow block production from genesis onwards to not accidentally start new chains if there is no network connectivity)
… on local DB is found
…ds test fixes before activation)
|
Phew, I had the most intense debug session on this I guess I ever had this year, roughly three hours straight and testing close to everything to see why these tests are hanging in I did try to get hold of the hanging processes by using I manually injected some I digged super deep into the Finally I went down all the commits. The complete hanging was actually already there after the first commit from Jochem starting the work here. So as some side note and (re-)learning: we really really should do client work and commits on a very granular basis and extremely regularly run the tests. This is getting so hard otherwise to track down things, especially since there often additionally come things like timing/handlers/unresolved Promises issues into play. I am now at least at a point where the tests pass every second time or so. Lol. Hope that this can be resolved with some timer or so. Very much open for everyone to be investigated and continued. I would have a strong tendency to then merge this PR once we have got the tests reliably running. This is otherwise getting too extensive and I think we have got a good basis here. This can then already be used to continue work e.g. on the tx pool or the block builder and things from here can also still later be revised or evolved. These kind of changes from here will also get easier to debug once we have the local chain sync functionality (so tx pool + block builder) in place. That will allow us to very easily simulate various edge cases locally. |
holgerd77
left a comment
There was a problem hiding this comment.
Ok, I would now merge this in so that we can build on top of this in other PRs, also see my latest comment on that.
This has also been cross-reviewed along the work process by having had three people working on the PR.

This PR intends to keep the fullsync-syncer syncing at the tip of the chain. (Note: current branch target is the YoloV3 branch)
enqueueTasktofetcher. This allows one to manually enqueue a task, and possibly restart the fetchersyncmethod never resolves.This is very WIP. Here are some things which need to be resolved:
Currently, it seems to randomly work for some blocks (when we are at tip of chain, i.e. on YoloV3, every 15 seconds it imports and executes a new block), but then at some point it throws: