fix(p2p): return Head() early when enough peers confirm the same header#372
Conversation
|
How it works now: We request all peers and give them 90% of the deadline to respond. Those who gave responses within the window are judged for the bestHead. How it works with PR: We await the first 2 responses with the same hash and return asap. Both should work, and there is a test proving that the existing solution works as well. The difference is that the original solution intentionally tries to get as many responses as possible to maximise security. What I think actually broke is that the 10% given for the rest of the operation was not enough for whatever else the node was doing after the Head request, leading to ctx deadline. |
|
You're right that the current approach intentionally maximizes responses within the 90% window. The problem is exactly what you identified and the remaining 10% isn't enough for what follows. I think the early return is actually the better approach here. The security threshold (minHeadResponses) is what defines how many agreeing peers we need to trust a head and once that's satisfied, collecting more responses doesn't meaningfully improve security. It just eats into the budget that downstream operations (GetByHeight, syncer init) need to complete within the startup deadline. So rather than trying to tune the 90/10 split, we should let the security threshold be the only thing that governs when Head returns. Fasterstartup and the same security guarantee. |
|
I tend to agree with you here on relying on the threshold. One thing we should probably do then is to increase the threshold and make it more dynamic based on the number of peers/responses we got. Additionally, the current code keeps both code paths for fast and best heads. The fast head goes on top of bestHead, and bestHead, in fact, never gets triggered. If we go for one way only, we should update the code to do one thing only. |
0d1f33a to
08d2d8f
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #372 +/- ##
==========================================
+ Coverage 52.99% 54.90% +1.91%
==========================================
Files 41 41
Lines 4663 4759 +96
==========================================
+ Hits 2471 2613 +142
+ Misses 2007 1951 -56
- Partials 185 195 +10 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Head() currently waits for ALL trusted peers to respond or timeout before returning. With many bootstrappers, if even one peer is slow to dial (~18s), the entire Head() call takes ~18s — dangerously close to typical startup deadlines. This change tracks received headers by hash in real-time during the collection loop. Once a header hash reaches minHeadResponses (2) confirmations, the function cancels outstanding peer requests via the shared request context and returns immediately.
6edb3b6 to
3c11a8b
Compare
Summary
Head()now tracks received headers by hash in real-time during the collection loopminHeadResponses(2) confirmations, outstanding peer requests are cancelled via the shared request context and the function returns immediatelyHead()waited for ALL trusted peers to respond or timeout — with many bootstrappers, a single slow peer could delay the entire call close to startup deadlinesContext
Light nodes on mocha fail to start because
exchange.Head()waits for all trusted peers (bootstrappers) to respond or timeout before returning. With 7 mocha bootstrappers, if even 1 peer is slow to dial (~18s timeout), the entireHead()call takes ~18s — dangerously close to the 20s startup deadline. The 4 working peers respond within <230ms, so the node should return as soon as it has consensus.Closes #373
Closes https://linear.app/celestia/issue/DA-1157