Client: Calaveras Sync Improvements + Working Branch by holgerd77 · Pull Request #1310 · ethereumjs/ethereumjs-monorepo

holgerd77 · 2021-06-21T08:11:21Z

Related: #1306

Working branch to sync with the Calaveras network, also coming with some calaveras related improvements (feel free to directly push on this branch or in doubt make a PR towards the branch if you have got things to add).

First thing I added here is an official maxPerRequest CLI option, then we don't always have to monkey-patch if running with a 50 maxPerRequest setting doesn't work but can just run npm run client:start -- --network=calaveras --maxPerRequest=1.

The option should be generally useful (so: not just a monkey-patch-ease option 🙂) since it will be likely useful for peopel to play a bit with this, this will likely allow to get to somewhat higher download rates if adjusted due to the specific network and ones own internet connection speed and things like that.

codecov · 2021-06-21T08:12:59Z

Codecov Report

Merging #1310 (2192982) into master (855e154) will decrease coverage by 0.17%.
The diff coverage is 44.44%.

Flag	Coverage Δ
block	`86.20% <ø> (-0.25%)`	⬇️
blockchain	`83.46% <ø> (+0.06%)`	⬆️
client	`84.00% <44.44%> (-0.68%)`	⬇️
common	`88.12% <ø> (ø)`
devp2p	`84.00% <ø> (ø)`
ethash	`82.83% <ø> (ø)`
tx	`89.49% <ø> (-0.12%)`	⬇️
vm	`76.83% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

holgerd77 · 2021-06-21T08:23:27Z

This should also be helpful and give some more orientation and feeling for the network behavior: outputting the base fee during sync along london blocks or later:

holgerd77 · 2021-06-21T10:28:30Z

Yeah! 🎉

This db72a6d actually fixes the maxPerRequest issue and sync is now working with the normal 50 block requests set 😄, we can keep the maxPerRequest option nevertheless I guess.

The block fetcher didn't allow for result sets lower than the requested number before and on Calaveras sync a cut happened on the result set in the first spam-blocks-range (2000-2200 or so) due to the large block sizes (as happened on Baikal before). The fix now allows for results with a lower lengths to be processed (a zero length is still not allowed though to prevent circular requests) and then re-calcuates the first block numbers of the outstanding jobs in the request queue. This should make the client sync generally more resilient for these kind of situations.

holgerd77 · 2021-06-21T11:09:51Z

Ah, also wanted to post this screenshot along:

Sync is incredibly slow along these blocks. But at the end, these are also an awful lot of transactions. Might nevertheless be a good test bed to have a look for some bootlenecks (I guess these are all simple value transfer txs to the same account. I did some tests here with this permanent cache (surviving checkpointing) for MPT idea (//cc @gabrocheleau) but this had not really an effect here (which makes sense since there is just not much state touched with this).

holgerd77 · 2021-06-21T12:01:37Z

Don't want to side track too much, but I did some rough performance analysis by running the client on the spammed calaveras blocks with the VM debugger DEBUG=vm:* npm run client:start -- --network=calaveras and had a closer look where the ms diff outputs are (this is actually a not too shabby method to get some pointers on where bootlenecks are. The places for such diffs are very much repeatable, one has to look closely if the debug output is actually before or after an associated method call (at some point we should unify and always call after the execution, makes more sense).

Here is a picture of some associated debug output where I then added some console.log() statements outputting time diffs for closer analysis.

This is the associated code in VM runBlock.ts:

// Add to total block gas usage
gasUsed = gasUsed.add(txRes.gasUsed)
if (this.DEBUG) {
  debug(`Add tx gas used (${txRes.gasUsed}) to total block gas usage (-> ${gasUsed})`)
}
console.log(`B1${(new Date()).getSeconds()}:${(new Date()).getMilliseconds()}`)
// Combine blooms via bitwise OR
bloom.or(txRes.bloom)
console.log(`B2${(new Date()).getSeconds()}:${(new Date()).getMilliseconds()}`)

// Add receipt to trie to later calculate receipt root
receipts.push(txRes.receipt)
console.log(`B3${(new Date()).getSeconds()}:${(new Date()).getMilliseconds()}`)
const encodedReceipt = encodeReceipt(tx, txRes.receipt)
console.log(`B4${(new Date()).getSeconds()}:${(new Date()).getMilliseconds()}`)
const encoded = encode(txIdx)
console.log(`B4.2${(new Date()).getSeconds()}:${(new Date()).getMilliseconds()}`)
await receiptTrie.put(encoded, encodedReceipt)
console.log(`B5${(new Date()).getSeconds()}:${(new Date()).getMilliseconds()}`)

The large time diff is happening between B4.2 and B5 and can therefore clearly attributed to the update of the receiptTrie.

I find this pretty interesting. On the one hand it is the Trie library, so the usual suspects. 😋 On the other hand I totally not had these smaller trie usages like the receipt trie - might be the same for the tx trie e.g. - on the radar, so we might also want to give this some focus. This is interesting because these tries should be pretty small.

So too things which might be drawn from this short experiment:

We might want to look if especially the execution of small trie setups can be improved (made more performant)
We might dedicately want to look at the trie usage code parts and integrations for these smaller tries

holgerd77 · 2021-06-21T12:02:11Z

(side note: I have no idea why this stupid client integration test is failing 🙁)

holgerd77 · 2021-06-21T12:31:42Z

But apart from that: Calaveras is actually syncing, just synced up successfully up to block 60.142 (see block exporer for reference on latest block). 😄

holgerd77 · 2021-06-21T12:36:22Z

Still not sure about the test failure though, this is happening between commit 3b8d315 and 2a00695, if someone wants to give it a try run:

git checkout 3b8d315
npx tape -r ts-node/register test/integration/fullsync.spec.ts

Should work.

git checkout 2a00695
npx tape -r ts-node/register test/integration/fullsync.spec.ts

Should fail.

Really strange though, if I didn't miss something the extra if else clause from 2a00695 is not even run once along the test run. 🤨

ryanio · 2021-06-21T23:07:16Z

The large time diff is happening between B4.2 and B5 and can therefore clearly attributed to the update of the receiptTrie.

I might be missing something but doesn't your screenshot show:

B4.245:486
B545:487

Doesn't that show only :001 ms elapsed between them?

ryanio · 2021-06-22T01:07:18Z

Ok this failing integration test was definitely an interesting and tricky one. So it sets up two peers, one with 9 blocks and one with 10 blocks, and wants to ensure that it syncs all 10 blocks.

In the code prior to this PR, it would reject the 9 block result since it requested 10. This is actually not really how we would have wanted the test to behave, so I'm happy we have some new and improved code here.

The test fails because it receives the 'synchronized' result with 9 blocks first and fails expecting 10.

I updated the integration test to wait to listen for a possible next event and check that the length is correct.

I also cleaned up some of the no-unnecessary-condition since it is safe to just check the length of the object.

Will rebase on master.

…ith BlockFetcherBase version)

…requested job count in BlockFetcher and HeaderFetcher

holgerd77 · 2021-06-22T07:27:21Z

The large time diff is happening between B4.2 and B5 and can therefore clearly attributed to the update of the receiptTrie.

I might be missing something but doesn't your screenshot show:
B4.245:486
B545:487
Doesn't that show only :001 ms elapsed between them?

Yes, I guess that's the time diff range we are operating in here. That's 1 ms and if you do the math that means for a single block with 1000 txs this makes a difference of 1s (or half a second if we would let's say double the performance). That's not nothing respectively substantial, if we catch 3-4 of such somewhat smaller bottlenecks and could improve here that would already mean a performance increase of some seconds per block. Correct me if I got something wrong here? 🤔

Thanks a lot for digging into this test issue, really grateful here 🙂 🌻, I guess I would likely not have found this myself. I also have the impression that these test code parts produced problems before occasionally, so would be great if we now have a structural improvement here which prevents this on future test runs.

Since you commented on all the changes I've done and Calaveras is actually syncing successfully I guess I will just take this as an informal review and then just merge here. There are not really outstanding tasks and one benefit of this might be - if someone finds the time - that it would be a good occasion to test and improve on the on-tip-of-the-chain behavior from #1132 as long as Calaveras is still up and running. 😄

holgerd77

Will take approving comments from @ryanio as an informal review and merge here (see previous comment for some reasoning).

holgerd77 added type: feature type: meta PR state: WIP type: enhancement package: client labels Jun 21, 2021

holgerd77 force-pushed the calaveras-sync branch from f770293 to db72a6d Compare June 21, 2021 10:22

holgerd77 force-pushed the calaveras-sync branch from db72a6d to 2a00695 Compare June 21, 2021 10:30

holgerd77 and others added 8 commits June 21, 2021 18:07

client: removed redundant maxPerRequest setting in BlockFetcherBase

38a9c98

client: added maxPerRequest as a configuration option

6078c48

client: output basefee on block sync being on london HF or greater

6c9e94f

client: maxPerRequest related test fix

99ecbe7

client: removed redundant tasks() method in BlockFetcher (identical w…

413fb0e

…ith BlockFetcherBase version)

client: allow processing of result sets with a length lower than the …

619cf7b

…requested job count in BlockFetcher and HeaderFetcher

fix integration test

dd239d0

remove no-unnecessary-condition

2192982

ryanio force-pushed the calaveras-sync branch from 2a00695 to 2192982 Compare June 22, 2021 01:08

holgerd77 commented Jun 22, 2021

View reviewed changes

holgerd77 merged commit 75aff49 into master Jun 22, 2021

holgerd77 deleted the calaveras-sync branch June 22, 2021 07:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Client: Calaveras Sync Improvements + Working Branch#1310

Client: Calaveras Sync Improvements + Working Branch#1310
holgerd77 merged 8 commits into
masterfrom
calaveras-sync

holgerd77 commented Jun 21, 2021

Uh oh!

codecov Bot commented Jun 21, 2021 •

edited

Loading

Uh oh!

holgerd77 commented Jun 21, 2021

Uh oh!

holgerd77 commented Jun 21, 2021

Uh oh!

holgerd77 commented Jun 21, 2021

Uh oh!

holgerd77 commented Jun 21, 2021

Uh oh!

holgerd77 commented Jun 21, 2021

Uh oh!

holgerd77 commented Jun 21, 2021

Uh oh!

holgerd77 commented Jun 21, 2021

Uh oh!

ryanio commented Jun 21, 2021

Uh oh!

ryanio commented Jun 22, 2021 •

edited

Loading

Uh oh!

holgerd77 commented Jun 22, 2021

Uh oh!

holgerd77 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

holgerd77 commented Jun 21, 2021

Uh oh!

codecov Bot commented Jun 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

holgerd77 commented Jun 21, 2021

Uh oh!

holgerd77 commented Jun 21, 2021

Uh oh!

holgerd77 commented Jun 21, 2021

Uh oh!

holgerd77 commented Jun 21, 2021

Uh oh!

holgerd77 commented Jun 21, 2021

Uh oh!

holgerd77 commented Jun 21, 2021

Uh oh!

holgerd77 commented Jun 21, 2021

Uh oh!

ryanio commented Jun 21, 2021

Uh oh!

ryanio commented Jun 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

holgerd77 commented Jun 22, 2021

Uh oh!

holgerd77 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Jun 21, 2021 •

edited

Loading

ryanio commented Jun 22, 2021 •

edited

Loading