fix(execution/evm): verify payload status #2863

julienrbrt · 2025-11-20T09:00:13Z

PR adding retry logic and better payload verification (as found in the audit).
~~It was one shot by Claude Sonnet, so still checking it :D~~ done and updated.

claude · 2025-11-20T09:00:30Z

Claude finished @julienrbrt's task —— View job

Comprehensive Code Review

I've completed a thorough review of the retry logic and payload verification implementation. Overall, this is a solid improvement that properly implements Engine API specification compliance with retry logic for transient failures. The test coverage is excellent. However, I've identified several areas for improvement across code quality, design, and maintainability.

🔴 High Priority Issues

1. Unreachable Code in `retryWithBackoffOnPayloadStatus` (`execution/evm/execution.go:110`)

Issue: Line 110 is unreachable. The for loop will always return early via one of these paths:

Success (line 83)
Non-retriable error (lines 88, 93)
Max retries exceeded (line 98)

return fmt.Errorf("max retries (%d) exceeded for %s", maxRetries, operation)

Fix: Remove line 110 entirely.

Rationale: Dead code creates confusion and suggests the logic may not be fully understood. The gemini-code-assist reviewer also caught this.

2. Inconsistent Error Handling Between `InitChain` and `ExecuteTxs/setFinal`

Problem: The retry pattern is clean in InitChain (lines 189-214) but becomes convoluted in ExecuteTxs and setFinal. Specifically:

In ExecuteTxs (lines 293-332): The forkchoiceUpdatedV3 RPC call happens, then the closure captures and modifies payloadID from outside scope
Similar issue with newPayloadV4 call (lines 342-368)
In setFinal (lines 399-416): Same pattern

Recommendation: Refactor to move the entire RPC operation inside the retry closure for consistency and clarity, matching the InitChain pattern:

// Current (complex):
var payloadID *engine.PayloadID
err = retryWithBackoffOnPayloadStatus(ctx, func() error {
    var forkchoiceResult engine.ForkChoiceResponse
    err := c.engineClient.CallContext(...)
    // ... validation ...
    payloadID = forkchoiceResult.PayloadID  // Modifying captured variable
    return nil
}, ...)

// Better (consistent with InitChain):
var payloadID *engine.PayloadID
err = retryWithBackoffOnPayloadStatus(ctx, func() error {
    var forkchoiceResult engine.ForkChoiceResponse
    err := c.engineClient.CallContext(...)
    if err != nil {
        return fmt.Errorf("forkchoice update failed: %w", err)
    }
    
    if err := validatePayloadStatus(forkchoiceResult.PayloadStatus); err != nil {
        // ... logging ...
        return err
    }
    
    if forkchoiceResult.PayloadID == nil {
        // ... logging ...
        return fmt.Errorf("returned nil PayloadID")
    }
    
    payloadID = forkchoiceResult.PayloadID
    return nil
}, ...)

This is the same feedback from gemini-code-assist and I agree it would significantly improve maintainability.

3. Missing Retry Logic for `engine_getPayloadV4` (`execution/evm/execution.go:336`)

Question from Review: Should line 336's engine_getPayloadV4 call also have retry logic?

err = c.engineClient.CallContext(ctx, &payloadResult, "engine_getPayloadV4", *payloadID)

Analysis: According to the Engine API spec, engine_getPayloadV4 can also return SYNCING status in the payload status field within the response envelope. If the EL is syncing, this call could fail transiently.

Recommendation: Add retry wrapper for consistency and resilience, or document why it's safe to skip (e.g., payload is already built, just retrieval).

🟡 Medium Priority Issues

4. Function Parameter `operation` Could Be More Descriptive

The operation parameter in retryWithBackoffOnPayloadStatus is used for error messages. Consider more descriptive operation names:

Current: "InitChain", "ExecuteTxs forkchoice", "ExecuteTxs newPayload"
Better: "InitChain.forkchoiceUpdatedV3", "ExecuteTxs.forkchoiceUpdatedV3", "ExecuteTxs.newPayloadV4"

This provides clearer context in logs when debugging.

5. Error Channel Usage Pattern (`errorCh chan<- error`)

Observation: The errorCh parameter is added to executor.go, submitter.go, and syncer.go but the actual retry logic doesn't use it.

Question: What's the intended use of errorCh? Should permanent failures (ErrInvalidPayloadStatus) be sent to this channel to trigger node shutdown? The comment says "Channel to report critical execution client failures" but the retry logic doesn't leverage it.

Suggestion: If the intent is to report INVALID payloads as critical failures, consider:

if errors.Is(err, ErrInvalidPayloadStatus) {
    select {
    case s.errorCh <- err:
    default:
    }
    return err
}

Otherwise, clarify in comments when errorCh should be used.

🟢 Code Quality & Best Practices

6. Excellent Test Coverage ✅

The execution_status_test.go file is comprehensive:

✅ Status validation for all Engine API states (VALID, INVALID, SYNCING, ACCEPTED)
✅ Retry scenarios (success on first attempt, after retries, exhaustion)
✅ Context cancellation and timeout handling
✅ RPC error handling (both wrapped and unwrapped)
✅ Proper use of table-driven tests

Praise: This is production-quality test coverage.

7. Constants and Documentation ✅

✅ Well-defined constants (MaxPayloadStatusRetries, InitialRetryBackoff)
✅ Clear error types (ErrInvalidPayloadStatus, ErrPayloadSyncing)
✅ Comprehensive function documentation explaining Engine API spec compliance

8. Concurrency Safety ✅

The mutex usage in setFinal (lines 381-396) properly protects shared state before constructing the forkchoice state. Good pattern.

🔒 Security Review

9. Logging of Sensitive Data

Check: The logging statements include latestValidHash and validationError from the Engine API. These are safe to log and don't contain secrets. ✅

10. Context Handling

✅ Proper context propagation and cancellation handling throughout
✅ Context timeouts respected in retry logic

⚡ Performance Considerations

11. Exponential Backoff Configuration

Current settings:

Max retries: 3
Initial backoff: 1 second
Backoff multiplier: 2x

Timeline: 1s + 2s + 4s = 7 seconds maximum delay

Assessment: Reasonable for block production use case. However:

Consideration: If the EL is syncing from a cold start, 3 retries may not be enough. The PR description mentions "preventing unnecessary node shutdowns during transient execution engine sync operations."

Question: Should these be configurable? Or should there be different retry policies for:

InitChain (node startup - more retries?)
ExecuteTxs (block production - fast failure?)
SetFinal (finalization - more retries?)

📋 Architecture & Design

12. Response to Previous Feedback

Alex's suggestion (alpe): Extract retry logic to utility or use decorator pattern

Julien's response: "This is a separate go module, as it depends only on core we have nothing else available."

My take: While I understand the module isolation constraint, consider:

The retry logic is generic enough that it could live in a pkg/retry or internal/retry package within the same module
However, since it's currently only used in one place and specifically tailored to Engine API payload status semantics, the current approach is acceptable
If retry patterns emerge elsewhere in the execution layer, then extract to a shared package

Decision: Current approach is fine given the constraints. Consider extraction if usage expands.

🎯 Summary & Recommendations

Overall Assessment: This PR successfully implements Engine API-compliant retry logic with excellent test coverage. The core logic is sound, but consistency and maintainability could be improved.

Required Changes:

✅ Remove unreachable code (line 110)
✅ Refactor ExecuteTxs and setFinal retry patterns to match InitChain consistency
⚠️ Evaluate and add retry logic for engine_getPayloadV4 or document why it's safe to skip

Recommended Improvements:
4. Clarify errorCh usage pattern or leverage it for critical failures
5. More descriptive operation names in retry error messages
6. Consider whether retry configuration should vary by operation

Strengths:

✅ Proper Engine API specification compliance
✅ Comprehensive test coverage
✅ Clear documentation and error types
✅ Appropriate concurrency safety measures

Files Reviewed

execution/evm/execution.go - Main implementation (454 additions)
execution/evm/execution_status_test.go - Test coverage (253 additions)
block/internal/executing/executor.go - Error channel integration
block/internal/submitting/submitter.go - Error channel integration
block/internal/syncing/syncer.go - Error channel integration
CHANGELOG.md - Documentation

github-actions · 2025-11-20T09:00:32Z

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`⏩ skipped`	`✅ passed`	`✅ passed`	Nov 24, 2025, 9:33 AM

gemini-code-assist · 2025-11-20T09:00:35Z

Summary of Changes

Hello @julienrbrt, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the reliability and resilience of the EVM execution client by introducing robust payload status validation and a sophisticated retry mechanism. By adhering to the Engine API specification, the system now intelligently handles transient SYNCING or ACCEPTED states with exponential backoff, preventing premature node shutdowns. Conversely, it ensures immediate failure for permanent INVALID statuses, streamlining error handling and improving overall stability.

Highlights

Improved EVM Payload Status Validation: Enhanced the validation of EVM execution client payload statuses to align with the Engine API specification, ensuring more robust error handling.
Retry Logic for Transient States: Implemented exponential backoff retry logic for SYNCING and ACCEPTED payload statuses in InitChain, ExecuteTxs, and SetFinal methods, preventing unnecessary node shutdowns during temporary execution engine unavailability.
Immediate Failure on Invalid Status: Configured the system to immediately fail on INVALID payload statuses, preventing unnecessary retries for permanent errors and improving efficiency.
New Utility Functions for Status Handling: Introduced validatePayloadStatus to categorize Engine API payload statuses and retryWithBackoff to provide a generic, context-aware mechanism for retrying operations with exponential backoff.
Comprehensive Testing: Added a new dedicated test file with unit tests for the payload status validation and retry logic, covering various scenarios including successful retries, exhaustion of retries, immediate failures, and context cancellation/timeout.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces retry logic for handling transient SYNCING statuses from the EVM execution client, which is a great improvement for node stability. The core changes involve a new retryWithBackoff helper and its integration into InitChain, ExecuteTxs, and SetFinal.

The overall approach is solid, and the new tests are comprehensive. However, I've identified a few areas for improvement:

The implementation of the retry logic in ExecuteTxs and setFinal is unnecessarily complex and inconsistent with the cleaner pattern used in InitChain. I've left comments suggesting a refactoring to improve clarity and robustness.
The retryWithBackoff helper contains some unreachable code that should be removed.
A minor cleanup in validatePayloadStatus is also suggested.

Addressing these points will make the new retry mechanism more maintainable and consistent across the codebase.

execution/evm/execution.go

codecov · 2025-11-20T09:04:01Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.82%. Comparing base (bfa745c) to head (a00d4ea).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2863      +/-   ##
==========================================
- Coverage   64.93%   64.82%   -0.11%     
==========================================
  Files          81       81              
  Lines        7350     7350              
==========================================
- Hits         4773     4765       -8     
- Misses       2036     2043       +7     
- Partials      541      542       +1

Flag	Coverage Δ
combined	`64.82% <ø> (-0.11%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

execution/evm/execution.go

alpe

The retry/backoff code looks good.
It looks familiar to other retry/backoff logic code that exists already. Would it make sense to extract it to some utility?

Have you thought about encapsulating the retry logic in a decorator to the engineClient.CallContext ? This way, the retry would be hidden and the code can be slightly more readable.

alpe · 2025-11-21T17:28:31Z

execution/evm/execution.go

 	// get payload
 	var payloadResult engine.ExecutionPayloadEnvelope
-	err = c.engineClient.CallContext(ctx, &payloadResult, "engine_getPayloadV4", *forkchoiceResult.PayloadID)
+	err = c.engineClient.CallContext(ctx, &payloadResult, "engine_getPayloadV4", *payloadID)


Should there be a retry for this call, too?

I have checked and ExecutionPayloadEnvelope doesn't return a payload status, so it cannot be retried with the same logic. I am updated the API naming to clarify the retry function usage (a00d4ea)

julienrbrt · 2025-11-21T17:42:17Z

The retry/backoff code looks good.
It looks familiar to other retry/backoff logic code that exists already. Would it make sense to extract it to some utility?

This is a seperate go module, as it depends only on core we have nothing else available.

Let me check for your other point.

tac0turtle

utACK, there is the one question from alex

* main: chore: remove extra github action yml file (#2882) fix(execution/evm): verify payload status (#2863) feat: fetch included da height from store (#2880) chore: better output on errors (#2879) refactor!: create da client and split cache interface (#2878) chore!: rename `evm-single` and `grpc-single` (#2839) build(deps): Bump golang.org/x/crypto from 0.42.0 to 0.45.0 in /tools/da-debug in the go_modules group across 1 directory (#2876) chore: parallel cache de/serialization (#2868) chore: bump blob size (#2877)

fix(execution/evm): verify payload status

6a7e857

github-actions bot assigned julienrbrt Nov 20, 2025

gemini-code-assist bot reviewed Nov 20, 2025

View reviewed changes

julienrbrt marked this pull request as draft November 20, 2025 09:07

julienrbrt added 3 commits November 20, 2025 10:13

simplify api

1a005f8

simplify logic

fd5d8a2

add note on ok double retry

2ba0a22

julienrbrt marked this pull request as ready for review November 20, 2025 10:31

julienrbrt added 3 commits November 20, 2025 11:31

Merge branch 'main' into julien/ai-execution-evm

4e719fc

cl

69b54d1

Merge branch 'main' into julien/ai-execution-evm

c1dff1d

julienrbrt requested review from alpe and tac0turtle November 20, 2025 12:07

tac0turtle reviewed Nov 20, 2025

View reviewed changes

execution/evm/execution.go Show resolved Hide resolved

julienrbrt added 2 commits November 21, 2025 10:36

Merge branch 'main' into julien/ai-execution-evm

50304a8

Merge branch 'main' into julien/ai-execution-evm

7b59bc7

julienrbrt requested a review from tac0turtle November 21, 2025 12:24

Merge branch 'main' into julien/ai-execution-evm

eb3d0f4

alpe reviewed Nov 21, 2025

View reviewed changes

tac0turtle previously approved these changes Nov 24, 2025

View reviewed changes

julienrbrt added 2 commits November 24, 2025 10:15

Merge branch 'main' into julien/ai-execution-evm

58d7c2b

clarify

a00d4ea

julienrbrt dismissed tac0turtle’s stale review via a00d4ea November 24, 2025 09:33

julienrbrt requested review from alpe and tac0turtle November 24, 2025 09:34

tac0turtle approved these changes Nov 24, 2025

View reviewed changes

julienrbrt added this pull request to the merge queue Nov 24, 2025

Merged via the queue into main with commit c161967 Nov 24, 2025
28 of 30 checks passed

julienrbrt deleted the julien/ai-execution-evm branch November 24, 2025 10:56

fix(execution/evm): verify payload status #2863

fix(execution/evm): verify payload status #2863

Conversation

julienrbrt commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comprehensive Code Review

🔴 High Priority Issues

1. Unreachable Code in retryWithBackoffOnPayloadStatus (execution/evm/execution.go:110)

2. Inconsistent Error Handling Between InitChain and ExecuteTxs/setFinal

3. Missing Retry Logic for engine_getPayloadV4 (execution/evm/execution.go:336)

🟡 Medium Priority Issues

4. Function Parameter operation Could Be More Descriptive

5. Error Channel Usage Pattern (errorCh chan<- error)

🟢 Code Quality & Best Practices

6. Excellent Test Coverage ✅

7. Constants and Documentation ✅

8. Concurrency Safety ✅

🔒 Security Review

9. Logging of Sensitive Data

10. Context Handling

⚡ Performance Considerations

11. Exponential Backoff Configuration

📋 Architecture & Design

12. Response to Previous Feedback

🎯 Summary & Recommendations

Files Reviewed

Uh oh!

github-actions bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Nov 20, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

alpe left a comment

Choose a reason for hiding this comment

Uh oh!

alpe Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

julienrbrt Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

julienrbrt commented Nov 21, 2025

Uh oh!

tac0turtle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

julienrbrt commented Nov 20, 2025 •

edited

Loading

claude bot commented Nov 20, 2025 •

edited

Loading

1. Unreachable Code in `retryWithBackoffOnPayloadStatus` (`execution/evm/execution.go:110`)

2. Inconsistent Error Handling Between `InitChain` and `ExecuteTxs/setFinal`

3. Missing Retry Logic for `engine_getPayloadV4` (`execution/evm/execution.go:336`)

4. Function Parameter `operation` Could Be More Descriptive

5. Error Channel Usage Pattern (`errorCh chan<- error`)

github-actions bot commented Nov 20, 2025 •

edited

Loading

codecov bot commented Nov 20, 2025 •

edited

Loading