Skip to content

Conversation

@julienrbrt
Copy link
Member

@julienrbrt julienrbrt commented Oct 9, 2025

Overview

Extract fixes from #2736 to make PR clearer:

  • Remove unused arguments from da retreiver + p2p retriever -> improvement
  • Reuse function for empty data hash, and retrieve last data hash directly in process loop, so it happens for both da and p2p (was only p2p before) --> bug fix
  • Send p2p retrieved info directly to cache instead of batching them -> improvement

@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2025

The latest Buf updates on your PR. Results from workflow CI and Release / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedOct 9, 2025, 4:19 PM

@claude
Copy link
Contributor

claude bot commented Oct 9, 2025

Claude finished @julienrbrt's task —— View job


Comprehensive Code Review

I've completed a thorough review of this PR. The refactoring achieves its stated goals, but there are critical concurrency issues that must be addressed before merging.


🔴 Critical Issues

1. Missing Goroutine Leak in syncLoop (syncer.go:243-260)

Severity: Critical - This is a goroutine leak that will cause resource exhaustion.

The previous review by Gemini was incorrect - there is no WaitGroup being created in the loop at all. Looking at the current code:

func (s *Syncer) syncLoop() {
    // ... setup ...
    
    for {
        select {
        case <-s.ctx.Done():
            return
        default:
        }

        s.processPendingEvents()
        s.tryFetchFromP2P()
        s.tryFetchFromDA(nextDARequestAt)

        // Prevent busy-waiting when no events are processed
        select {
        case <-s.ctx.Done():
            return
        case <-time.After(min(10*time.Millisecond, s.config.Node.BlockTime.Duration)):
        }
    }
}

This is actually CORRECT - the methods are called synchronously, not as goroutines. There is no goroutine leak here. The Gemini review appears to have been reviewing a different version or made an error.

However, I do see a potential race condition concern with the channel writes in P2PHandler methods.


2. Potential Deadlock in P2P Handler Methods (p2p_handler.go:94-98, 146-150)

Severity: High - Risk of deadlock if heightInCh is full

Both ProcessHeaderRange and ProcessDataRange have this pattern:

select {
case heightInCh <- event:
default:
    h.cache.SetPendingEvent(event.Header.Height(), &event)
}

Issue: If the heightInCh is full, events are written to cache. However, this could lead to subtle race conditions:

  • The event might be processed out of order
  • If processPendingEvents() runs concurrently, it might try to send the same event twice
  • There's no guarantee the pending event will be retrieved if the channel becomes available later in the same iteration

Recommendation:

  • Document this behavior clearly in both methods
  • Consider adding metrics to track how often events are cached vs. sent directly
  • Verify that processPendingEvents() handles potential duplicates correctly (it appears to with the GetNextPendingEvent pattern, but this should be verified)

3. LastDataHash Bug Fix Implementation (syncer.go:369-376)

Good: This correctly fixes the bug mentioned in the PR description. The code now properly retrieves LastDataHash from the previous block for empty data:

// LastDataHash must be gotten from store when the data hash is empty.
if bytes.Equal(event.Header.DataHash, common.DataHashForEmptyTxs) && currentHeight > 0 {
    _, lastData, err := s.store.GetBlockData(s.ctx, currentHeight)
    if err != nil {
        s.logger.Error().Err(err).Msg("failed to get last data")
        return
    }
    event.Data.LastDataHash = lastData.Hash()
}

Concern: What happens if GetBlockData fails? The event is dropped silently. This should probably re-queue the event or handle the error more gracefully.

Location: syncer.go:371-374


⚠️ High Priority Issues

4. Context Handling in P2P Methods

Both ProcessHeaderRange and ProcessDataRange check context cancellation in the loop:

for height := startHeight; height <= endHeight; height++ {
    select {
    case <-ctx.Done():
        return
    default:
    }
    // ... processing ...
}

Issue: If context is cancelled, the function returns immediately, potentially leaving some heights unprocessed. Partially processed ranges could lead to gaps in block synchronization.

Recommendation: Either:

  • Document that caller must handle partial processing
  • Or track which heights were processed and return that information
  • Or ensure idempotency by having the caller re-call with adjusted range

Location: p2p_handler.go:50-53, 110-113


5. Removal of BlockOptions Parameter

The PR removes the common.BlockOptions parameter from DARetriever and P2PHandler constructors.

Observation: Previously P2PHandler stored options but never used it (confirmed by the diff). So this removal is correct.

However: The Syncer still has and uses options (line 48, 525 in syncer.go). Verify this is intentional and whether the retrievers might need access to these options in the future.

Location: da_retriever.go:48, p2p_handler.go:32


💡 Architecture & Design

6. Centralized Empty Data Creation ✅

Excellent: Moving createEmptyDataForHeader and isEmptyDataExpected to package-level functions shared between DA and P2P handlers is a great refactoring:

func createEmptyDataForHeader(ctx context.Context, header *types.SignedHeader) *types.Data {
    return &types.Data{
        Txs: make(types.Txs, 0),
        Metadata: &types.Metadata{
            ChainID:      header.ChainID(),
            Height:       header.Height(),
            Time:         header.BaseHeader.Time,
            LastDataHash: nil, // Must be filled in syncer
        },
    }
}

This correctly sets LastDataHash to nil with a comment explaining it must be filled by the syncer, which it now does (fix #3 above).

Location: da_retriever.go:332-343


7. Direct Channel Communication vs. Batching ✅

Good: The change from returning batched events to directly sending to channels is a solid optimization:

Before:

func ProcessHeaderRange(...) []common.DAHeightEvent {
    var events []common.DAHeightEvent
    // ... collect ...
    return events
}

After:

func ProcessHeaderRange(..., heightInCh chan<- common.DAHeightEvent) {
    // ... send directly to channel ...
}

Benefits:

  • Reduced memory allocation
  • Earlier processing of events
  • Better backpressure handling

Tradeoff: Slightly more complex error handling (see issue #2)


🔒 Security & Error Handling

8. Error Handling in tryFetchFromDA ✅

Good error classification and backoff strategy:

if errors.Is(err, coreda.ErrBlobNotFound) {
    s.SetDAHeight(daHeight + 1)
    *nextDARequestAt = time.Time{}
    return
}
// Back off on other errors
backoffDelay := s.config.DA.BlockTime.Duration
if backoffDelay <= 0 {
    backoffDelay = 2 * time.Second
}
*nextDARequestAt = now.Add(backoffDelay)

Good: Distinguishes between "no data at height" (expected, continue) vs. other errors (back off).

Location: syncer.go:278-296


9. Proposer Validation Consistency ✅

Both DA retriever and P2P handler validate proposer addresses consistently:

func (r *DARetriever) assertExpectedProposer(proposerAddr []byte) error {
    if string(proposerAddr) != string(r.genesis.ProposerAddress) {
        return fmt.Errorf("unexpected proposer: got %x, expected %x", ...)
    }
    return nil
}

Note: Using string() comparison for byte slices works but bytes.Equal() would be more idiomatic (though functionally equivalent).

Location: da_retriever.go:292-297, p2p_handler.go:156-162


🧪 Testing

10. Test Coverage ⚠️

The PR updates tests appropriately, but coverage decreased (62.15% → 62.08% per Codecov).

New uncovered code (from Codecov report):

  • syncer.go: 13 lines missing coverage (61.76% patch coverage)
  • p2p_handler.go: 11 lines missing (31.25% patch coverage)
  • syncer_mock.go: 27 lines missing (10.00% patch coverage)

Recommendation: Add tests for:

  1. tryFetchFromP2P with concurrent header/data updates
  2. Edge cases in processHeightEvent when GetBlockData fails
  3. Channel backpressure scenarios in P2P handler

Location: Test files across the package


11. Test Refactoring ✅

The test changes properly reflect the new channel-based API:

// Before
events := p2pData.Handler.ProcessHeaderRange(ctx, 5, 5)

// After  
ch := make(chan common.DAHeightEvent, 10)
p2pData.Handler.ProcessHeaderRange(ctx, 5, 5, ch)
events := collectEvents(t, ch, 100*time.Millisecond)

The new collectEvents helper is well-designed with proper timeout handling.

Location: p2p_handler_test.go:111-131


⚡ Performance

12. Channel Buffer Size ✅

The heightInCh channel is buffered with 10,000 capacity:

heightInCh:  make(chan common.DAHeightEvent, 10_000),

Good: Large buffer prevents blocking in most cases.

Consideration: Under extreme load, this could still fill up. The fallback to cache (issue #2) handles this, but monitoring is recommended.

Location: syncer.go:105


13. Busy-Wait Prevention ✅

Good use of minimum sleep duration:

select {
case <-s.ctx.Done():
    return
case <-time.After(min(10*time.Millisecond, s.config.Node.BlockTime.Duration)):
}

Prevents CPU spinning while maintaining responsiveness.

Location: syncer.go:255-259


📋 Summary

Must Fix Before Merge:

  1. Address the potential deadlock/race in P2P handler channel writes (Issue Peer discovery #2) - add documentation and verify behavior
  2. Improve error handling for GetBlockData failure (Issue Transaction gossiping #3) - should re-queue event
  3. Add tests for new code paths to restore coverage (Issue Define Fraud Proof data type #10)

Recommended Improvements:

  1. ⚠️ Document context cancellation behavior in P2P methods (Issue Fraud proofs gossiping #4)
  2. ⚠️ Consider using bytes.Equal instead of string conversion (Issue Add design doc to readme #9)
  3. ⚠️ Add metrics for cache fallback frequency (Issue Peer discovery #2)

Excellent Work:

  • ✅ Centralized empty data handling
  • ✅ Removed unused parameters
  • ✅ Direct channel communication
  • ✅ Proper error classification and backoff
  • ✅ Correct LastDataHash fix

Overall Assessment: This is a solid refactoring that achieves its goals, but the critical issues around error handling and testing must be addressed before merging. The architectural improvements are well-thought-out.


@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @julienrbrt, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors and optimizes the block synchronization mechanism. It addresses a bug related to the LastDataHash for empty blocks by centralizing its retrieval within the syncer. Furthermore, it introduces performance improvements by parallelizing DA fetching and P2P event processing, and by streamlining P2P event handling to directly utilize the cache. These changes aim to make the synchronization process more robust, efficient, and maintainable.

Highlights

  • Refactored Data Handling for Empty Blocks: The logic for creating empty data blocks has been consolidated into a shared, package-level function, ensuring consistent behavior across both DA and P2P retrievers.
  • Optimized P2P Event Processing: P2P retrieved header and data events are now directly sent to the cache via a channel, removing intermediate batching and improving efficiency.
  • Parallelized Sync Loop Operations: The main synchronization loop now concurrently fetches data from the DA layer and processes P2P events, enhancing overall sync performance.
  • Streamlined Retriever Initialization: Unused common.BlockOptions arguments have been removed from the constructors of DARetriever and P2PHandler, simplifying their interfaces.
  • Improved Last Data Hash Retrieval: The syncer now explicitly retrieves the LastDataHash for empty data blocks directly from the store, ensuring accuracy for both DA and P2P sources.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several significant refactorings and a key bug fix. It removes unused arguments, parallelizes parts of the sync loop for better performance, and centralizes the logic for handling the LastDataHash of empty blocks, which resolves an important bug. The changes are generally well-implemented and improve the codebase. However, I've found a critical issue in the new parallel sync loop that could lead to a goroutine leak and should be addressed.

@julienrbrt julienrbrt requested a review from tac0turtle October 9, 2025 13:03
@codecov
Copy link

codecov bot commented Oct 9, 2025

Codecov Report

❌ Patch coverage is 45.74468% with 51 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.08%. Comparing base (1ff30ea) to head (abd24f7).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
block/internal/syncing/syncer_mock.go 10.00% 27 Missing ⚠️
block/internal/syncing/syncer.go 61.76% 10 Missing and 3 partials ⚠️
block/internal/syncing/p2p_handler.go 31.25% 11 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2747      +/-   ##
==========================================
- Coverage   62.15%   62.08%   -0.07%     
==========================================
  Files          80       80              
  Lines        8518     8459      -59     
==========================================
- Hits         5294     5252      -42     
+ Misses       2733     2715      -18     
- Partials      491      492       +1     
Flag Coverage Δ
combined 62.08% <45.74%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tac0turtle
tac0turtle previously approved these changes Oct 9, 2025
@julienrbrt julienrbrt enabled auto-merge October 9, 2025 13:50
select {
case heightInCh <- event:
default:
h.cache.SetPendingEvent(event.Header.Height(), &event)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should set pending check for duplicates or if the height has already passed? right now we add the event to cache with no checks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. We can evict all previous height when we call GetNextPendingEvent. Checking the height before adding a pending event may slow down the system.

Copy link
Contributor

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left one question other wise utACK

@julienrbrt julienrbrt added this pull request to the merge queue Oct 9, 2025
@julienrbrt julienrbrt removed this pull request from the merge queue due to a manual request Oct 9, 2025
@julienrbrt julienrbrt added this pull request to the merge queue Oct 9, 2025
Merged via the queue into main with commit ac00304 Oct 9, 2025
33 checks passed
@julienrbrt julienrbrt deleted the julien/syncer branch October 9, 2025 19:57
@github-project-automation github-project-automation bot moved this to Done in Evolve Oct 9, 2025
alpe added a commit that referenced this pull request Oct 10, 2025
* main:
  build(deps): Bump alpine from 3.22.1 to 3.22.2 in the patch-updates group (#2749)
  refactor(syncer): fix last data check for both da and syncer + optimize (#2747)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants