fix(p2p): return Head() early when enough peers confirm the same header by walldiss · Pull Request #372 · celestiaorg/go-header

walldiss · 2026-03-04T18:00:12Z

Summary

Head() now tracks received headers by hash in real-time during the collection loop
Once a header hash reaches minHeadResponses (2) confirmations, outstanding peer requests are cancelled via the shared request context and the function returns immediately
Previously Head() waited for ALL trusted peers to respond or timeout — with many bootstrappers, a single slow peer could delay the entire call close to startup deadlines

Context

Light nodes on mocha fail to start because exchange.Head() waits for all trusted peers (bootstrappers) to respond or timeout before returning. With 7 mocha bootstrappers, if even 1 peer is slow to dial (~18s timeout), the entire Head() call takes ~18s — dangerously close to the 20s startup deadline. The 4 working peers respond within <230ms, so the node should return as soon as it has consensus.

Closes #373
Closes https://linear.app/celestia/issue/DA-1157

p2p/exchange.go

Wondertan · 2026-03-05T14:42:33Z

How it works now: We request all peers and give them 90% of the deadline to respond. Those who gave responses within the window are judged for the bestHead.

How it works with PR: We await the first 2 responses with the same hash and return asap.

Both should work, and there is a test proving that the existing solution works as well. The difference is that the original solution intentionally tries to get as many responses as possible to maximise security.

What I think actually broke is that the 10% given for the rest of the operation was not enough for whatever else the node was doing after the Head request, leading to ctx deadline.

walldiss · 2026-03-05T16:39:21Z

You're right that the current approach intentionally maximizes responses within the 90% window. The problem is exactly what you identified and the remaining 10% isn't enough for what follows.

I think the early return is actually the better approach here. The security threshold (minHeadResponses) is what defines how many agreeing peers we need to trust a head and once that's satisfied, collecting more responses doesn't meaningfully improve security. It just eats into the budget that downstream operations (GetByHeight, syncer init) need to complete within the startup deadline.

So rather than trying to tune the 90/10 split, we should let the security threshold be the only thing that governs when Head returns. Fasterstartup and the same security guarantee.

Wondertan · 2026-03-05T16:53:38Z

I tend to agree with you here on relying on the threshold. One thing we should probably do then is to increase the threshold and make it more dynamic based on the number of peers/responses we got.

Additionally, the current code keeps both code paths for fast and best heads. The fast head goes on top of bestHead, and bestHead, in fact, never gets triggered. If we go for one way only, we should update the code to do one thing only.

codecov-commenter · 2026-03-06T16:01:09Z

Codecov Report

❌ Patch coverage is 89.65517% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 54.90%. Comparing base (aa5c4a6) to head (bf80e89).
⚠️ Report is 9 commits behind head on main.

Files with missing lines	Patch %	Lines
p2p/exchange.go	88.00%	2 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #372      +/-   ##
==========================================
+ Coverage   52.99%   54.90%   +1.91%     
==========================================
  Files          41       41              
  Lines        4663     4759      +96     
==========================================
+ Hits         2471     2613     +142     
+ Misses       2007     1951      -56     
- Partials      185      195      +10

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

p2p/exchange.go

Head() currently waits for ALL trusted peers to respond or timeout before returning. With many bootstrappers, if even one peer is slow to dial (~18s), the entire Head() call takes ~18s — dangerously close to typical startup deadlines. This change tracks received headers by hash in real-time during the collection loop. Once a header hash reaches minHeadResponses (2) confirmations, the function cancels outstanding peer requests via the shared request context and returns immediately.

renaynay

Logic + test LGTM - might want to revisit minHeadResponses threshold @walldiss

p2p/exchange.go

renaynay

Ok w me

walldiss requested review from Wondertan, renaynay and vgonkivs as code owners March 4, 2026 18:00

walldiss mentioned this pull request Mar 4, 2026

Light nodes fail to start on mocha: Head() waits for all trusted peers instead of returning on consensus #373

Closed

walldiss self-assigned this Mar 4, 2026

renaynay reviewed Mar 5, 2026

View reviewed changes

p2p/exchange.go Outdated Show resolved Hide resolved

walldiss force-pushed the fix/head-early-return branch from 0d1f33a to 08d2d8f Compare March 6, 2026 15:16

Wondertan reviewed Mar 9, 2026

View reviewed changes

p2p/exchange.go Outdated Show resolved Hide resolved

p2p/exchange.go Outdated Show resolved Hide resolved

walldiss added 5 commits March 9, 2026 18:05

simplify Head: remove bestHead, return matching header directly

0cb1607

nolint gosec G118 for struct-stored context cancels

6b3c729

remove unnecessary 90% deadline split in Head

3836802

make minHeadResponses dynamic based on peer count

3c11a8b

walldiss force-pushed the fix/head-early-return branch from 6edb3b6 to 3c11a8b Compare March 9, 2026 15:06

Wondertan approved these changes Mar 9, 2026

View reviewed changes

renaynay reviewed Mar 9, 2026

View reviewed changes

p2p/exchange.go Show resolved Hide resolved

walldiss enabled auto-merge (squash) March 9, 2026 15:47

renaynay approved these changes Mar 9, 2026

View reviewed changes

walldiss merged commit 9747578 into celestiaorg:main Mar 9, 2026
2 checks passed

walldiss mentioned this pull request Mar 9, 2026

chore(deps): bump go-header to v0.8.4-rc celestiaorg/celestia-node#4837

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(p2p): return Head() early when enough peers confirm the same header#372

fix(p2p): return Head() early when enough peers confirm the same header#372
walldiss merged 5 commits intocelestiaorg:mainfrom
walldiss:fix/head-early-return

walldiss commented Mar 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Wondertan commented Mar 5, 2026

Uh oh!

walldiss commented Mar 5, 2026

Uh oh!

Wondertan commented Mar 5, 2026

Uh oh!

codecov-commenter commented Mar 6, 2026

Uh oh!

Uh oh!

Uh oh!

renaynay left a comment

Uh oh!

Uh oh!

renaynay left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

walldiss commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Context

Uh oh!

Uh oh!

Wondertan commented Mar 5, 2026

Uh oh!

walldiss commented Mar 5, 2026

Uh oh!

Wondertan commented Mar 5, 2026

Uh oh!

codecov-commenter commented Mar 6, 2026

Codecov Report

Uh oh!

Uh oh!

Uh oh!

renaynay left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

renaynay left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

walldiss commented Mar 4, 2026 •

edited

Loading