p2p: make dial faster by streamlined discovery process#31678
p2p: make dial faster by streamlined discovery process#31678cskiraly wants to merge 2 commits intoethereum:masterfrom
Conversation
|
First hour of dial progress before this PR: number of outgoing peers First hour of dial progress after this PR (and #31592): number of outgoing peers |
a5b1b9d to
016bf44
Compare
|
@fjl I was thinking about the increased UDP traffic. There is the case of a very small network, smaller than maxpeers. For example a small test network with only a few nodes. In that case discovery might go full-steam continuously, looking for new dial candidates. In this specific case this PR would increase the background UDP traffic by roughly 3x. If we think this is a problem, we might use discoveryParallelLookups=2 to be more conservative. Or should we maybe introduce an explicit "smallnet" flag and tune parameters accordingly? Or can we just expect people to set maxpeers wisely for a small network? I would say the last option would be fine, but we need some mods to make that work. See below. A short description of what (supposed to) happen in detail. There are two cases:
In discovery, we only have duplicate filtering in lookup, by the Dial pulls from this, and has its own duplicate detection through If instead maxpeers is set correctly, dial will try to slow down. It will reduce slots. But even with a single slot, it will pull and discard in a loop, making discovery go full steam. Should we introduce some time-based mechanism here? E.g. throttle if checkDial fails too many times in a row? |
|
Separated from #31944 . Now this PR only contains the parallelized lookup part. |
|
I think we have all of this in already. |
|
Actually the parallelised lookup part is not yet merged. I'm reopening this to rebase what's still missing. |
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
|
We have discussed to implement this in a different way. Closing. |






This PR improves the speed of Disc/v4 and Disc/v5 based discovery. It builds on #31592. To be rebased after merging that PR.
Our dial process is rate-limited, but until now, during startup, our discovery was too slow to serve dial. So the bottleneck was discovery, not the rate limit in dial, resulting in slow ramp-up of outgoing peer count.
This PR making discovery fast enough to serve dial's needs. The rate-limit of dial is still limiting the discovery process, so we are not risking being too aggressive in our discovery.
The PR adds: