refactor(syncer): fix last data check for both da and syncer + optimize #2747

julienrbrt · 2025-10-09T12:58:59Z

Overview

Extract fixes from #2736 to make PR clearer:

Remove unused arguments from da retreiver + p2p retriever -> improvement
Reuse function for empty data hash, and retrieve last data hash directly in process loop, so it happens for both da and p2p (was only p2p before) --> bug fix
Send p2p retrieved info directly to cache instead of batching them -> improvement

github-actions · 2025-10-09T12:59:12Z

The latest Buf updates on your PR. Results from workflow CI and Release / buf-check (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`⏩ skipped`	`✅ passed`	`✅ passed`	Oct 9, 2025, 4:19 PM

claude · 2025-10-09T12:59:15Z

Claude finished @julienrbrt's task —— View job

Comprehensive Code Review

I've completed a thorough review of this PR. The refactoring achieves its stated goals, but there are critical concurrency issues that must be addressed before merging.

🔴 Critical Issues

1. Missing Goroutine Leak in syncLoop (syncer.go:243-260)

Severity: Critical - This is a goroutine leak that will cause resource exhaustion.

The previous review by Gemini was incorrect - there is no WaitGroup being created in the loop at all. Looking at the current code:

func (s *Syncer) syncLoop() {
    // ... setup ...
    
    for {
        select {
        case <-s.ctx.Done():
            return
        default:
        }

        s.processPendingEvents()
        s.tryFetchFromP2P()
        s.tryFetchFromDA(nextDARequestAt)

        // Prevent busy-waiting when no events are processed
        select {
        case <-s.ctx.Done():
            return
        case <-time.After(min(10*time.Millisecond, s.config.Node.BlockTime.Duration)):
        }
    }
}

This is actually CORRECT - the methods are called synchronously, not as goroutines. There is no goroutine leak here. The Gemini review appears to have been reviewing a different version or made an error.

However, I do see a potential race condition concern with the channel writes in P2PHandler methods.

2. Potential Deadlock in P2P Handler Methods (p2p_handler.go:94-98, 146-150)

Severity: High - Risk of deadlock if heightInCh is full

Both ProcessHeaderRange and ProcessDataRange have this pattern:

select {
case heightInCh <- event:
default:
    h.cache.SetPendingEvent(event.Header.Height(), &event)
}

Issue: If the heightInCh is full, events are written to cache. However, this could lead to subtle race conditions:

The event might be processed out of order
If processPendingEvents() runs concurrently, it might try to send the same event twice
There's no guarantee the pending event will be retrieved if the channel becomes available later in the same iteration

Recommendation:

Document this behavior clearly in both methods
Consider adding metrics to track how often events are cached vs. sent directly
Verify that processPendingEvents() handles potential duplicates correctly (it appears to with the GetNextPendingEvent pattern, but this should be verified)

3. LastDataHash Bug Fix Implementation (syncer.go:369-376)

Good: This correctly fixes the bug mentioned in the PR description. The code now properly retrieves LastDataHash from the previous block for empty data:

// LastDataHash must be gotten from store when the data hash is empty.
if bytes.Equal(event.Header.DataHash, common.DataHashForEmptyTxs) && currentHeight > 0 {
    _, lastData, err := s.store.GetBlockData(s.ctx, currentHeight)
    if err != nil {
        s.logger.Error().Err(err).Msg("failed to get last data")
        return
    }
    event.Data.LastDataHash = lastData.Hash()
}

Concern: What happens if GetBlockData fails? The event is dropped silently. This should probably re-queue the event or handle the error more gracefully.

Location: syncer.go:371-374

⚠️ High Priority Issues

4. Context Handling in P2P Methods

Both ProcessHeaderRange and ProcessDataRange check context cancellation in the loop:

for height := startHeight; height <= endHeight; height++ {
    select {
    case <-ctx.Done():
        return
    default:
    }
    // ... processing ...
}

Issue: If context is cancelled, the function returns immediately, potentially leaving some heights unprocessed. Partially processed ranges could lead to gaps in block synchronization.

Recommendation: Either:

Document that caller must handle partial processing
Or track which heights were processed and return that information
Or ensure idempotency by having the caller re-call with adjusted range

Location: p2p_handler.go:50-53, 110-113

5. Removal of BlockOptions Parameter

The PR removes the common.BlockOptions parameter from DARetriever and P2PHandler constructors.

Observation: Previously P2PHandler stored options but never used it (confirmed by the diff). So this removal is correct.

However: The Syncer still has and uses options (line 48, 525 in syncer.go). Verify this is intentional and whether the retrievers might need access to these options in the future.

Location: da_retriever.go:48, p2p_handler.go:32

💡 Architecture & Design

6. Centralized Empty Data Creation ✅

Excellent: Moving createEmptyDataForHeader and isEmptyDataExpected to package-level functions shared between DA and P2P handlers is a great refactoring:

func createEmptyDataForHeader(ctx context.Context, header *types.SignedHeader) *types.Data {
    return &types.Data{
        Txs: make(types.Txs, 0),
        Metadata: &types.Metadata{
            ChainID:      header.ChainID(),
            Height:       header.Height(),
            Time:         header.BaseHeader.Time,
            LastDataHash: nil, // Must be filled in syncer
        },
    }
}

This correctly sets LastDataHash to nil with a comment explaining it must be filled by the syncer, which it now does (fix #3 above).

Location: da_retriever.go:332-343

7. Direct Channel Communication vs. Batching ✅

Good: The change from returning batched events to directly sending to channels is a solid optimization:

Before:

func ProcessHeaderRange(...) []common.DAHeightEvent {
    var events []common.DAHeightEvent
    // ... collect ...
    return events
}

After:

func ProcessHeaderRange(..., heightInCh chan<- common.DAHeightEvent) {
    // ... send directly to channel ...
}

Benefits:

Reduced memory allocation
Earlier processing of events
Better backpressure handling

Tradeoff: Slightly more complex error handling (see issue #2)

🔒 Security & Error Handling

8. Error Handling in tryFetchFromDA ✅

Good error classification and backoff strategy:

if errors.Is(err, coreda.ErrBlobNotFound) {
    s.SetDAHeight(daHeight + 1)
    *nextDARequestAt = time.Time{}
    return
}
// Back off on other errors
backoffDelay := s.config.DA.BlockTime.Duration
if backoffDelay <= 0 {
    backoffDelay = 2 * time.Second
}
*nextDARequestAt = now.Add(backoffDelay)

Good: Distinguishes between "no data at height" (expected, continue) vs. other errors (back off).

Location: syncer.go:278-296

9. Proposer Validation Consistency ✅

Both DA retriever and P2P handler validate proposer addresses consistently:

func (r *DARetriever) assertExpectedProposer(proposerAddr []byte) error {
    if string(proposerAddr) != string(r.genesis.ProposerAddress) {
        return fmt.Errorf("unexpected proposer: got %x, expected %x", ...)
    }
    return nil
}

Note: Using string() comparison for byte slices works but bytes.Equal() would be more idiomatic (though functionally equivalent).

Location: da_retriever.go:292-297, p2p_handler.go:156-162

🧪 Testing

10. Test Coverage ⚠️

The PR updates tests appropriately, but coverage decreased (62.15% → 62.08% per Codecov).

New uncovered code (from Codecov report):

syncer.go: 13 lines missing coverage (61.76% patch coverage)
p2p_handler.go: 11 lines missing (31.25% patch coverage)
syncer_mock.go: 27 lines missing (10.00% patch coverage)

Recommendation: Add tests for:

tryFetchFromP2P with concurrent header/data updates
Edge cases in processHeightEvent when GetBlockData fails
Channel backpressure scenarios in P2P handler

Location: Test files across the package

11. Test Refactoring ✅

The test changes properly reflect the new channel-based API:

// Before
events := p2pData.Handler.ProcessHeaderRange(ctx, 5, 5)

// After  
ch := make(chan common.DAHeightEvent, 10)
p2pData.Handler.ProcessHeaderRange(ctx, 5, 5, ch)
events := collectEvents(t, ch, 100*time.Millisecond)

The new collectEvents helper is well-designed with proper timeout handling.

Location: p2p_handler_test.go:111-131

⚡ Performance

12. Channel Buffer Size ✅

The heightInCh channel is buffered with 10,000 capacity:

heightInCh:  make(chan common.DAHeightEvent, 10_000),

Good: Large buffer prevents blocking in most cases.

Consideration: Under extreme load, this could still fill up. The fallback to cache (issue #2) handles this, but monitoring is recommended.

Location: syncer.go:105

13. Busy-Wait Prevention ✅

Good use of minimum sleep duration:

select {
case <-s.ctx.Done():
    return
case <-time.After(min(10*time.Millisecond, s.config.Node.BlockTime.Duration)):
}

Prevents CPU spinning while maintaining responsiveness.

Location: syncer.go:255-259

📋 Summary

Must Fix Before Merge:

❌ Address the potential deadlock/race in P2P handler channel writes (Issue Peer discovery #2) - add documentation and verify behavior
❌ Improve error handling for GetBlockData failure (Issue Transaction gossiping #3) - should re-queue event
❌ Add tests for new code paths to restore coverage (Issue Define Fraud Proof data type #10)

Recommended Improvements:

⚠️ Document context cancellation behavior in P2P methods (Issue Fraud proofs gossiping #4)
⚠️ Consider using bytes.Equal instead of string conversion (Issue Add design doc to readme #9)
⚠️ Add metrics for cache fallback frequency (Issue Peer discovery #2)

Excellent Work:

✅ Centralized empty data handling
✅ Removed unused parameters
✅ Direct channel communication
✅ Proper error classification and backoff
✅ Correct LastDataHash fix

Overall Assessment: This is a solid refactoring that achieves its goals, but the critical issues around error handling and testing must be addressed before merging. The architectural improvements are well-thought-out.

gemini-code-assist · 2025-10-09T12:59:25Z

Summary of Changes

Hello @julienrbrt, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors and optimizes the block synchronization mechanism. It addresses a bug related to the LastDataHash for empty blocks by centralizing its retrieval within the syncer. Furthermore, it introduces performance improvements by parallelizing DA fetching and P2P event processing, and by streamlining P2P event handling to directly utilize the cache. These changes aim to make the synchronization process more robust, efficient, and maintainable.

Highlights

Refactored Data Handling for Empty Blocks: The logic for creating empty data blocks has been consolidated into a shared, package-level function, ensuring consistent behavior across both DA and P2P retrievers.
Optimized P2P Event Processing: P2P retrieved header and data events are now directly sent to the cache via a channel, removing intermediate batching and improving efficiency.
Parallelized Sync Loop Operations: The main synchronization loop now concurrently fetches data from the DA layer and processes P2P events, enhancing overall sync performance.
Streamlined Retriever Initialization: Unused common.BlockOptions arguments have been removed from the constructors of DARetriever and P2PHandler, simplifying their interfaces.
Improved Last Data Hash Retrieval: The syncer now explicitly retrieves the LastDataHash for empty data blocks directly from the store, ensuring accuracy for both DA and P2P sources.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces several significant refactorings and a key bug fix. It removes unused arguments, parallelizes parts of the sync loop for better performance, and centralizes the logic for handling the LastDataHash of empty blocks, which resolves an important bug. The changes are generally well-implemented and improve the codebase. However, I've found a critical issue in the new parallel sync loop that could lead to a goroutine leak and should be addressed.

block/internal/syncing/syncer.go

codecov · 2025-10-09T13:07:17Z

Codecov Report

❌ Patch coverage is 45.74468% with 51 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.08%. Comparing base (1ff30ea) to head (abd24f7).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
block/internal/syncing/syncer_mock.go	10.00%	27 Missing ⚠️
block/internal/syncing/syncer.go	61.76%	10 Missing and 3 partials ⚠️
block/internal/syncing/p2p_handler.go	31.25%	11 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2747      +/-   ##
==========================================
- Coverage   62.15%   62.08%   -0.07%     
==========================================
  Files          80       80              
  Lines        8518     8459      -59     
==========================================
- Hits         5294     5252      -42     
+ Misses       2733     2715      -18     
- Partials      491      492       +1

Flag	Coverage Δ
combined	`62.08% <45.74%> (-0.07%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tac0turtle · 2025-10-09T19:41:28Z

block/internal/syncing/p2p_handler.go

+		select {
+		case heightInCh <- event:
+		default:
+			h.cache.SetPendingEvent(event.Header.Height(), &event)


should set pending check for duplicates or if the height has already passed? right now we add the event to cache with no checks

That makes sense. We can evict all previous height when we call GetNextPendingEvent. Checking the height before adding a pending event may slow down the system.

tac0turtle

left one question other wise utACK

* main: build(deps): Bump alpine from 3.22.1 to 3.22.2 in the patch-updates group (#2749) refactor(syncer): fix last data check for both da and syncer + optimize (#2747)

julienrbrt added 2 commits October 9, 2025 14:23

refactor(syncer): fix last data check for both da and syncer + optimize

1ca404b

fix tests

f3af490

github-project-automation bot added this to Evolve Oct 9, 2025

github-actions bot assigned julienrbrt Oct 9, 2025

Merge branch 'main' into julien/syncer

7c2fa2c

gemini-code-assist bot reviewed Oct 9, 2025

View reviewed changes

block/internal/syncing/syncer.go Show resolved Hide resolved

forgot wait

5db2712

julienrbrt requested a review from tac0turtle October 9, 2025 13:03

re-add test

4cec38e

synchronous again

bc3d57a

tac0turtle previously approved these changes Oct 9, 2025

View reviewed changes

julienrbrt enabled auto-merge October 9, 2025 13:50

updates

abd24f7

julienrbrt dismissed tac0turtle’s stale review via abd24f7 October 9, 2025 16:18

julienrbrt requested a review from tac0turtle October 9, 2025 16:19

tac0turtle reviewed Oct 9, 2025

View reviewed changes

tac0turtle approved these changes Oct 9, 2025

View reviewed changes

julienrbrt added this pull request to the merge queue Oct 9, 2025

julienrbrt removed this pull request from the merge queue due to a manual request Oct 9, 2025

julienrbrt added this pull request to the merge queue Oct 9, 2025

Merged via the queue into main with commit ac00304 Oct 9, 2025
33 checks passed

julienrbrt deleted the julien/syncer branch October 9, 2025 19:57

github-project-automation bot moved this to Done in Evolve Oct 9, 2025

claude bot mentioned this pull request Oct 14, 2025

build(deps): Bump the all-go group across 5 directories with 6 updates #2757

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(syncer): fix last data check for both da and syncer + optimize #2747

refactor(syncer): fix last data check for both da and syncer + optimize #2747

Uh oh!

julienrbrt commented Oct 9, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 9, 2025 •

edited

Loading

Uh oh!

claude bot commented Oct 9, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Oct 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

codecov bot commented Oct 9, 2025 •

edited

Loading

Uh oh!

tac0turtle Oct 9, 2025

Uh oh!

julienrbrt Oct 9, 2025

Uh oh!

tac0turtle left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

refactor(syncer): fix last data check for both da and syncer + optimize #2747

refactor(syncer): fix last data check for both da and syncer + optimize #2747

Uh oh!

Conversation

julienrbrt commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Uh oh!

github-actions bot commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comprehensive Code Review

🔴 Critical Issues

1. Missing Goroutine Leak in syncLoop (syncer.go:243-260)

2. Potential Deadlock in P2P Handler Methods (p2p_handler.go:94-98, 146-150)

3. LastDataHash Bug Fix Implementation (syncer.go:369-376)

⚠️ High Priority Issues

4. Context Handling in P2P Methods

5. Removal of BlockOptions Parameter

💡 Architecture & Design

6. Centralized Empty Data Creation ✅

7. Direct Channel Communication vs. Batching ✅

🔒 Security & Error Handling

8. Error Handling in tryFetchFromDA ✅

9. Proposer Validation Consistency ✅

🧪 Testing

10. Test Coverage ⚠️

11. Test Refactoring ✅

⚡ Performance

12. Channel Buffer Size ✅

13. Busy-Wait Prevention ✅

📋 Summary

Must Fix Before Merge:

Recommended Improvements:

Excellent Work:

Uh oh!

gemini-code-assist bot commented Oct 9, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

codecov bot commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

tac0turtle Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

julienrbrt Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

tac0turtle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

julienrbrt commented Oct 9, 2025 •

edited

Loading

github-actions bot commented Oct 9, 2025 •

edited

Loading

claude bot commented Oct 9, 2025 •

edited

Loading

codecov bot commented Oct 9, 2025 •

edited

Loading