Skip to content

Conversation

@alpe
Copy link
Contributor

@alpe alpe commented Nov 27, 2025

Overview

Resolves #2609

The basic idea is to store an additional DAHightHint field within the p2p store.

As SignedHeader and Data are used in other places, too. I added an an DAHeightHintContainer type to wrap the tuple for the store only.
The DAHeight Hint is added by the da_submitter and read in the syncer to fetch the missing DA header/data for the most recent block as required.

Please note: this is a breaking change to the p2p network and store

@claude
Copy link
Contributor

claude bot commented Nov 27, 2025

Claude finished @alpe's task in 4m 43s —— View job


Comprehensive Code Review: DA Hints in P2P

I've completed a thorough review of the DA hints implementation. Overall, this is a well-designed feature with strong architectural decisions, particularly the binary compatibility approach. Below are my detailed findings organized by focus area.


1. Code Quality & Architecture ⭐

Strengths

  • Excellent binary compatibility design: The P2PSignedHeader and P2PData types with optional protobuf fields elegantly solve the migration problem
  • Clean separation of concerns: DA hints are isolated to p2p layer while preserving existing types.SignedHeader and types.Data API
  • Generic interface design: EntityWithDAHint[V] interface in sync_service.go:39-43 is well-abstracted
  • Async DA retrieval: The worker pool pattern in AsyncDARetriever is a solid performance optimization

Issues & Recommendations

🔴 CRITICAL: Unused DAHeightHintContainer type

Location: pkg/sync/da_hint_container.go

This entire file (65 lines) appears to be dead code:

  • Not referenced anywhere in the codebase except comments
  • The PR initially introduced this as a wrapper type but then switched to P2PSignedHeader/P2PData
  • The MarshalBinary implementation silently drops the DA hint (lines 58-60), which would be a serious bug if used

Recommendation: Remove this file entirely in a follow-up commit to avoid confusion.

🟡 Interface over-delegation in da_hint_container.go

If this type were to be used, it delegates every header.Header[H] method to the wrapped entry. This is boilerplate-heavy. Consider embedding instead:

type DAHeightHintContainer[H header.Header[H]] struct {
    H
    DAHeightHint uint64
}

🟡 Inconsistent naming: DaHeight vs DAHeight

  • Protobuf uses da_height_hint (snake_case)
  • Go uses DAHeightHint (correct per Go conventions)
  • Code has mixed usage: DaHeightHints (syncer.go:513) vs DAHeightHint elsewhere

Recommendation: Ensure consistent casing in Go code (use DAHeightHint everywhere).


2. Error Handling & Resilience ⭐⭐

Strengths

  • AsyncDARetriever gracefully handles failures and logs them (async_da_retriever.go:99-101)
  • P2PHandler validates proposer address and data hash consistency (p2p_handler.go:90-108)
  • Context cancellation properly handled throughout

Issues & Recommendations

🔴 Silent error swallowing in DA hint append

Location: da_submitter.go:201-204, da_submitter.go:250-253

if err := s.headerDAHintAppender.AppendDAHint(ctx, res.Height, hashes...); err != nil {
    s.logger.Error().Err(err).Msg("failed to append da height hint in header p2p store")
    // ignoring error here, since we don't want to block the block submission'
}

Issues:

  1. Typo in comment: "block the block submission**'**" (trailing apostrophe)
  2. This silently creates inconsistent state where DA submission succeeds but p2p store isn't updated
  3. No metrics recorded for this failure

Recommendation:

if err := s.headerDAHintAppender.AppendDAHint(ctx, res.Height, hashes...); err != nil {
    s.logger.Error().Err(err).Msg("failed to append DA height hint to header p2p store")
    s.recordFailure(common.DAHintAppendFailure) // Add new metric
    // Continue - don't block submission, but track this degraded state
}

🟡 Missing context deadline in AsyncDARetriever

Location: async_da_retriever.go:98

The RetrieveFromDA call uses the global context without timeout. If DA retrieval hangs, workers are blocked indefinitely.

Recommendation:

func (r *AsyncDARetriever) processRetrieval(height uint64) {
    defer func() {
        r.mu.Lock()
        delete(r.inFlight, height)
        r.mu.Unlock()
    }()

    // Add timeout for DA retrieval
    ctx, cancel := context.WithTimeout(r.ctx, 30*time.Second)
    defer cancel()

    events, err := r.retriever.RetrieveFromDA(ctx, height)
    if err != nil {
        r.logger.Debug().Err(err).Uint64("height", height).Msg("async DA retrieval failed")
        return
    }
    // ...
}

🟡 No error handling for dropped requests

Location: async_da_retriever.go:73-75

When the work channel is full, requests are silently dropped with only a debug log. This could lead to data loss.

Recommendation: Add metrics and consider a buffered pending queue or backpressure mechanism.


3. Security 🔒

Strengths

  • Proposer address validation prevents unauthorized blocks (p2p_handler.go:132-137)
  • Data hash validation prevents data/header inconsistency (p2p_handler.go:104-108)
  • Signature validation deferred to proper execution context (p2p_handler.go:110-111)

Issues & Recommendations

🟢 No critical security issues found

Minor observations:

  • DA hints are untrusted data from p2p peers. The code correctly treats them as hints, not authoritative (good!)
  • No hint validation (e.g., DA height should be ≥ block height), but this is acceptable since hints are advisory

4. Performance & Resource Efficiency ⚡

Strengths

  • Worker pool design: 5 concurrent workers (async_da_retriever.go:43) is reasonable
  • Deduplication: In-flight map prevents redundant DA fetches (async_da_retriever.go:65-67)
  • Non-blocking hint requests: RequestRetrieval is non-blocking (async_da_retriever.go:59-76)
  • Buffered channels: Work channel has 100 buffer (async_da_retriever.go:33)

Issues & Recommendations

🟡 Worker count hardcoded

Location: async_da_retriever.go:43

for i := 0; i < 5; i++ {

Recommendation: Make this configurable:

const defaultWorkerCount = 5

func NewAsyncDARetriever(
    retriever DARetriever,
    resultCh chan<- common.DAHeightEvent,
    logger zerolog.Logger,
    workerCount int, // Add parameter
) *AsyncDARetriever {
    if workerCount <= 0 {
        workerCount = defaultWorkerCount
    }
    // ...
}

🟡 Unbounded in-flight map

Location: async_da_retriever.go:16

The inFlight map could grow unbounded if DA retrieval consistently fails. Consider adding a maximum size or TTL.

🟡 Potential lock contention

Location: async_da_retriever.go:62-63

Every RequestRetrieval call acquires a mutex. Under heavy load, this could become a bottleneck.

Recommendation: Consider using atomic operations or a lock-free design for the common path.


5. Testing & Reliability ✅

Test Coverage Analysis

From codecov report: 60.53% patch coverage with 103 lines missing coverage.

Well-Tested Components

  • async_da_retriever_test.go: Excellent tests for deduplication and worker pool limits
  • binary_compatibility_test.go: Critical tests ensuring migration safety
  • ✅ Syncer tests updated to handle DA hints

Missing Test Coverage

🔴 da_hint_container.go: 0% coverage

All 27 new lines are untested. However, since this file should be deleted (see above), this is moot.

🟡 sync_service.go:184-198 - AppendDAHint method: Low coverage

This critical method has 11 missing lines. Need tests for:

  • Hash not found scenario
  • Partial updates (some hashes exist, others don't)
  • Error propagation from store.Append

🟡 p2p_signed_header.go & p2p_data.go: 47% coverage

Missing test cases:

  • Nil pointer handling in FromProto (lines 69, 64)
  • Error cases in unmarshaling
  • Round-trip with DA hints set

Recommendation: Add test:

func TestP2PSignedHeaderRoundTrip(t *testing.T) {
    header, err := types.GetRandomSignedHeader("chain-id")
    require.NoError(t, err)
    
    p2pHeader := &types.P2PSignedHeader{
        SignedHeader: *header,
        DAHeightHint: 12345,
    }
    
    bytes, err := p2pHeader.MarshalBinary()
    require.NoError(t, err)
    
    var decoded types.P2PSignedHeader
    err = decoded.UnmarshalBinary(bytes)
    require.NoError(t, err)
    
    assert.Equal(t, p2pHeader.DAHeightHint, decoded.DAHeightHint)
    assert.Equal(t, p2pHeader.SignedHeader, decoded.SignedHeader)
}

6. Concurrency & Goroutine Safety 🔄

Strengths

  • ✅ Proper WaitGroup usage in AsyncDARetriever (async_da_retriever.go:19, 55)
  • ✅ Context cancellation propagation (async_da_retriever.go:51-54)
  • ✅ Mutex protection for shared state (async_da_retriever.go:62-63, 92-95)
  • ✅ Channel-based coordination (async_da_retriever.go:104-110)

Issues & Recommendations

🟡 Potential goroutine leak on context cancel

Location: async_da_retriever.go:104-110

for _, event := range events {
    select {
    case r.resultCh <- event:
    case <-r.ctx.Done():
        return
    }
}

If resultCh consumer is slow and context is canceled mid-loop, some events are dropped silently.

Assessment: This is acceptable for graceful shutdown, but should be documented.

🟢 P2PHandler processedHeight: Excellent CAS loop

Location: p2p_handler.go:63-72

The CompareAndSwap loop with bounded retries (1000) is well-designed. Minor suggestion: log if loop exhausts iterations (indicates severe contention).


7. Documentation & Developer Experience 📚

Strengths

  • ✅ Clear comments explaining binary compatibility (types/p2p_signed_header.go)
  • ✅ Good inline documentation in p2p_handler.go
  • ✅ Comprehensive review comments from author

Issues & Recommendations

🟡 Missing godoc for exported types

Location: types/p2p_signed_header.go:14, types/p2p_data.go:14

// P2PSignedHeader wraps SignedHeader with an optional DA height hint for p2p gossip.
// The hint indicates which DA layer height contains this block's header,
// enabling targeted retrieval instead of sequential scanning.
// This type is binary-compatible with SignedHeader via optional protobuf fields.
type P2PSignedHeader struct {
    SignedHeader
    DAHeightHint uint64
}

🟡 TODO comment should be removed

Location: syncer.go:584

if event.Source == common.SourceDA { // TODO(@julienrbrt): To be reverted once DA Hints are merged (https://github.com/evstack/ev-node/pull/2891)

This PR IS the DA hints merge. This TODO is now stale. The logic should either:

  1. Be removed if DA hints fully replace this pattern
  2. Have the TODO updated to clarify what needs reverting

Recommendation: Clarify the migration plan in the comment or remove if obsolete.


Migration Strategy & Breaking Changes 🚧

Binary Compatibility: EXCELLENT ✅

The use of optional protobuf fields is the right approach:

message P2PSignedHeader {
  Header header = 1;
  bytes signature = 2;
  Signer signer = 3;
  optional uint64 da_height_hint = 4;  // ← Optional field
}

Why this works:

  • Old nodes: Unmarshal P2PSignedHeader → ignore unknown field 4 → still get SignedHeader data ✅
  • New nodes: Can read old messages (field 4 absent → defaults to 0) ✅
  • binary_compatibility_test.go validates this bidirectional compatibility ✅

Store Migration

🟡 No migration path for existing p2p stores

Location: See PR description comment from @alpe (Dec 5)

"This PR is missing a migration strategy for the p2p store (and ideally network)"

Current status: Binary compatibility solves network compatibility, but existing stored data:

  • Old stores can be read by new code (DA hint = 0)
  • New nodes writing to old stores works fine

Recommendation: Document that no migration is needed (stores are forward/backward compatible), but DA hints will only be populated for new blocks.


Specific Code Issues

syncer.go:510-548 - DA Hint Processing Logic

Good: The logic correctly handles 3 cases:

  1. Both hints present
  2. Only header hint present
  3. Only data hint present

Issue: Complex nested conditionals make this hard to follow.

Recommendation: Extract to helper method:

func (s *Syncer) extractDAHintsToRetrieve(event *common.DAHeightEvent) []uint64 {
    if event.DaHeightHints == [2]uint64{0, 0} {
        return nil
    }
    
    var hints []uint64
    
    if event.DaHeightHints[0] != 0 {
        if _, exists := s.cache.GetHeaderDAIncluded(event.Header.Hash().String()); !exists {
            hints = append(hints, event.DaHeightHints[0])
        }
    }
    
    if event.DaHeightHints[1] != 0 {
        if _, exists := s.cache.GetDataDAIncluded(event.Data.Hash().String()); !exists {
            // Deduplicate if both hints are same
            if len(hints) == 0 || hints[0] != event.DaHeightHints[1] {
                hints = append(hints, event.DaHeightHints[1])
            }
        }
    }
    
    return hints
}

Summary & Recommendations

Overall Assessment: ⭐⭐⭐⭐ (4/5)

This is a high-quality implementation of a non-trivial feature. The binary compatibility strategy is exemplary.

Must Address Before Merge

  1. 🔴 Remove pkg/sync/da_hint_container.go - dead code
  2. 🔴 Fix error handling in DA hint append (da_submitter.go:201-204)
  3. 🔴 Update/remove stale TODO (syncer.go:584)

Strongly Recommended

  1. 🟡 Add timeout to AsyncDARetriever DA calls
  2. 🟡 Improve test coverage for AppendDAHint and P2P types
  3. 🟡 Add metrics for dropped DA retrieval requests
  4. 🟡 Document migration strategy in PR description or CHANGELOG

Nice to Have

  1. Make worker count configurable
  2. Add godoc to exported P2P types
  3. Refactor DA hint extraction logic in syncer.go

Test Plan Verification

The PR adds significant test coverage:

  • ✅ Binary compatibility tests
  • ✅ AsyncDARetriever worker pool tests
  • ✅ Integration tests updated

Missing: E2E test demonstrating full flow:

  1. Block submitted to DA
  2. DA hint propagated via p2p
  3. Syncing node uses hint for targeted retrieval

Excellent work on a complex feature! The binary compatibility approach shows strong architectural thinking. Address the critical issues and this will be ready to merge.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 27, 2025

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedDec 16, 2025, 9:28 AM

@alpe alpe changed the title Alex/hints feat: DA hints in p2p Nov 28, 2025
@codecov
Copy link

codecov bot commented Nov 28, 2025

Codecov Report

❌ Patch coverage is 60.53640% with 103 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.25%. Comparing base (95140e4) to head (f0a505f).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
pkg/sync/da_hint_container.go 0.00% 27 Missing ⚠️
types/p2p_signed_header.go 47.61% 16 Missing and 6 partials ⚠️
types/p2p_data.go 47.36% 15 Missing and 5 partials ⚠️
block/internal/syncing/syncer.go 58.62% 12 Missing ⚠️
pkg/sync/sync_service.go 69.44% 7 Missing and 4 partials ⚠️
block/internal/syncing/async_da_retriever.go 86.53% 6 Missing and 1 partial ⚠️
block/internal/submitting/da_submitter.go 80.95% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2891      +/-   ##
==========================================
+ Coverage   60.14%   60.25%   +0.11%     
==========================================
  Files          88       92       +4     
  Lines        8427     8647     +220     
==========================================
+ Hits         5068     5210     +142     
- Misses       2787     2851      +64     
- Partials      572      586      +14     
Flag Coverage Δ
combined 60.25% <60.53%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

alpe added 3 commits November 28, 2025 17:20
* main:
  refactor: omit unnecessary reassignment (#2892)
  build(deps): Bump the all-go group across 5 directories with 6 updates (#2881)
  chore: fix inconsistent method name in retryWithBackoffOnPayloadStatus comment (#2889)
  fix: ensure consistent network ID usage in P2P subscriber (#2884)
cache.SetHeaderDAIncluded(headerHash.String(), res.Height, header.Height())
hashes[i] = headerHash
}
if err := s.headerDAHintAppender.AppendDAHint(ctx, res.Height, hashes...); err != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the DA height is passed to the sync service to update the p2p store

Msg("P2P event with DA height hint, triggering targeted DA retrieval")

// Trigger targeted DA retrieval in background via worker pool
s.asyncDARetriever.RequestRetrieval(daHeightHint)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the "fetch from DA" is triggered for the current block event height

type SignedHeaderWithDAHint = DAHeightHintContainer[*types.SignedHeader]
type DataWithDAHint = DAHeightHintContainer[*types.Data]

type DAHeightHintContainer[H header.Header[H]] struct {
Copy link
Contributor Author

@alpe alpe Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a data container to persist the DA hint together with the block header or data.
types.SignedHeader and types.Data are used all over the place so I did not modify them but added introduced this type for the p2p store and transfer only.

It may make sense to do make this a Proto type. WDYT?

return nil
}

func (s *SyncService[V]) AppendDAHint(ctx context.Context, daHeight uint64, hashes ...types.Hash) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stores the DA height hints

@alpe alpe marked this pull request as ready for review December 1, 2025 09:32
@tac0turtle
Copy link
Contributor

if da hint is not in the proto how do other nodes get knowledge of the hint?

also how would an existing network handle using this feature? its breaking so is it safe to upgrade?

"github.com/evstack/ev-node/block/internal/cache"
"github.com/evstack/ev-node/block/internal/common"
"github.com/evstack/ev-node/block/internal/da"
coreda "github.com/evstack/ev-node/core/da"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: gci linter

Copy link
Member

@julienrbrt julienrbrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! It really makes sense.

I share the same concern as @tac0turtle however about the upgrade strategy given it is p2p breaking.

julienrbrt
julienrbrt previously approved these changes Dec 2, 2025
@alpe
Copy link
Contributor Author

alpe commented Dec 2, 2025

if da hint is not in the proto how do other nodes get knowledge of the hint?

The sync_service wraps the header/data payload in a DAHeightHintContainer object that is passed upstream to the p2p layer. When the DA height is known, the store is updated.

also how would an existing network handle using this feature? its breaking so is it safe to upgrade?

It is a breaking change. Instead of signed header or data types, the p2p network exchanges DAHeightHintContainer. This would be incompatible. Also the existing p2p stores would need migration to work.

@julienrbrt
Copy link
Member

julienrbrt commented Dec 4, 2025

Could we broadcast both until every networks are updated? Then for final we can basically discard the previous one.

@alpe
Copy link
Contributor Author

alpe commented Dec 5, 2025

fyi: This PR is missing a migration strategy for the p2p store ( and ideally network)

* main:
  refactor(sequencers): persist prepended batch (#2907)
  feat(evm): add force inclusion command (#2888)
  feat: DA client, remove interface part 1: copy subset of types needed for the client using blob rpc. (#2905)
  feat: forced inclusion (#2797)
  fix: fix and cleanup metrics (sequencers + block) (#2904)
  build(deps): Bump mdast-util-to-hast from 13.2.0 to 13.2.1 in /docs in the npm_and_yarn group across 1 directory (#2900)
  refactor(block): centralize timeout in client (#2903)
  build(deps): Bump the all-go group across 2 directories with 3 updates (#2898)
  chore: bump default timeout (#2902)
  fix: revert default db (#2897)
  refactor: remove obsolete // +build tag (#2899)
  fix:da visualiser namespace  (#2895)
alpe added 3 commits December 15, 2025 10:52
* main:
  chore: execute goimports to format the code (#2924)
  refactor(block)!: remove GetLastState from components (#2923)
  feat(syncing): add grace period for missing force txs inclusion (#2915)
  chore: minor improvement for docs (#2918)
  feat: DA Client remove interface part 2,  add client for celestia blob api   (#2909)
  chore: update rust deps (#2917)
  feat(sequencers/based): add based batch time (#2911)
  build(deps): Bump golangci/golangci-lint-action from 9.1.0 to 9.2.0 (#2914)
  refactor(sequencers): implement batch position persistance (#2908)
github-merge-queue bot pushed a commit that referenced this pull request Dec 15, 2025
<!--
Please read and fill out this form before submitting your PR.

Please make sure you have reviewed our contributors guide before
submitting your
first PR.

NOTE: PR titles should follow semantic commits:
https://www.conventionalcommits.org/en/v1.0.0/
-->

## Overview

Temporary fix until #2891.
After #2891 the verification for p2p blocks will be done in the
background.

ref: #2906

<!-- 
Please provide an explanation of the PR, including the appropriate
context,
background, goal, and rationale. If there is an issue with this
information,
please provide a tl;dr and link the issue. 

Ex: Closes #<issue number>
-->
@alpe
Copy link
Contributor Author

alpe commented Dec 15, 2025

I have added 2 new types for the p2p store that are binary compatible to the types.Data and SignedHeader. With this, we should be able to roll this out without breaking the in-flight p2p data and store.

alpe added 3 commits December 15, 2025 14:49
* main:
  fix(syncing): skip forced txs checks for p2p blocks (#2922)
  build(deps): Bump the all-go group across 5 directories with 5 updates (#2919)
  chore: loosen syncer state check (#2927)
@alpe alpe requested a review from julienrbrt December 15, 2025 15:00
julienrbrt
julienrbrt previously approved these changes Dec 15, 2025
Copy link
Member

@julienrbrt julienrbrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! I can see how useful the async retriever will be for force inclusion verification as well. We should have @auricom verify if p2p still works with Eden.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be really useful for force inclusion checks as well.

* main:
  build(deps): Bump actions/cache from 4 to 5 (#2934)
  build(deps): Bump actions/download-artifact from 6 to 7 (#2933)
  build(deps): Bump actions/upload-artifact from 5 to 6 (#2932)
  feat: DA Client remove interface part 3, replace types with new code (#2910)
  DA Client remove interface: Part 2.5, create e2e test to validate that a blob is posted in DA layer. (#2920)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sync: P2P should provide da inclusion hints

4 participants