Skip to content

engine/blobs: move partial return support to mandatory engine_getBlobsV3#674

Closed
raulk wants to merge 5 commits intoethereum:mainfrom
raulk:raulk/getblobsv3
Closed

engine/blobs: move partial return support to mandatory engine_getBlobsV3#674
raulk wants to merge 5 commits intoethereum:mainfrom
raulk:raulk/getblobsv3

Conversation

@raulk
Copy link
Member

@raulk raulk commented Jul 8, 2025

Context

The current specification is too lax: it does not explicitly require EL clients to support partial blob returns. Consequently, most EL clients (if not all) are omitting this support in Osaka. This gap will require additional coordination work in the future when the CL actually wants to use this feature. To prevent this, we propose biting the bullet now so that no future effort needs to be invested.

In addition, the current approach does not allow the CL (or another party) to determine which return mode the EL supports. This is suboptimal, given that the Engine API already supports capability exchange to signal feature support.

Proposed changes

After posting concerns on Discord and discussing separately with @MariusVanDerWijden, we propose:

  • Making partial blob return support mandatory via the ⁠engine_getBlobsV3 method.

  • Restoring the all-or-nothing return behavior for ⁠engine_getBlobsV2.

This approach seems to hit the optimal tradeoff:

  • Mandatorily adds functional support for partial blob returns.
  • Avoids the serde penalty while the CL consumes V2.
  • Enables the CL to adopt V3 when optimizations are implemented, without further ACD coordination.
  • Leverages built-in capability exchange mechanisms.

Implementation

This should be straightforward to implement. For reference, it's already done in Geth: ethereum/go-ethereum#32170.

Misc

Also fixed a bug in the OpenRPC spec that did not account for literal null returns from engine_getBlobsV2.

- Introduce mandatory `engine_getBlobsV3` method to support partial
  responses when fetching blobs from the EL blob pool.
- Add associated BlobAndProofV3 alias for BlobAndProofV2.
- Restore `engine_getBlobsV2` to all-or-nothing.

This mandates partial blob return support in Osaka, but avoids
extra serde in V2 while unutilized by the CL. The CL can switch to
V3 when p2p optimizations ship, without having to coordinate with
EL devs.
@raulk raulk changed the title engine/blobs: move partial return support to mandatory engine_getBlobsV3 engine/blobs: move partial return support to mandatory engine_getBlobsV3 Jul 8, 2025
2. Given an array of blob versioned hashes, if client software has every one of the requested blobs, it **MUST** return an array of _`BlobAndProofV3`_ objects whose order exactly matches the input array. For instance, if the request is `[A_versioned_hash, B_versioned_hash, C_versioned_hash]` and client software has `A`, `B` and `C` available, the response **MUST** be `[A, B, C]`.
3. If one or more of the requested blobs are unavailable, _the client **MUST** return an array of the same length and order, inserting `null` only at the positions of the missing blobs._ For instance, if the request is `[A_versioned_hash, B_versioned_hash, C_versioned_hash]` and client software has data for blobs `A` and `C`, but doesn't have data for `B`, _the response **MUST** be `[A, null, C]`. If all blobs are missing, the client software must return an array of matching length, filled with `null` at all positions._
4. Client software **MUST** support request sizes of at least 128 blob versioned hashes. The client **MUST** return `-38004: Too large request` error if the number of requested blobs is too large.
5. Client software **MUST** return `null` if syncing or otherwise unable to generally serve blob pool data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows an EL client to still return null as a response. From the CL's perspective V3 is identical to main's V2.

Copy link
Member Author

@raulk raulk Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

V3 guarantees that partial responses will be served (unless no responses can be served at all, and this is not due to an internal error, which is what the null literal case covers here). Main's V2 does not make any such behavioural guarantee.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

V3 changes the EL behavior relative to V2, no argument there.

From the CL's perspective V3 is identical to main's V2.

This is my point. The CL has to handle receiving null or a list of partial responses, just as it does with V2. This is not a point against the PR, just a note.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a deliberate choice. V1 would return a null filled array, but it’s clearer semantics to return a single null literal that translates to an Option or pointer type in Rust and Go, given this outcome affects the whole request anyway. IMO it’s even cleaner to return an error, but I assumed there was a reason V1 didn’t from the get-go.

Copy link
Contributor

@mkalinin mkalinin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, looks good to me 👍
Let’s get some approvals from EL and CL devs and then merge!

@kasey
Copy link

kasey commented Jul 14, 2025

In addition, the current approach does not allow the CL (or another party) to determine which return mode the EL supports.

I can think of 2 advantages to the changes this implies to v2:

  1. Save resources if the CL isn't able to do anything meaningful with partial responses.
  • As stated in discord, EL clients can already decide to save this overhead by just returning a nil value. So this seems like we aren't gaining anything.
  1. Mitigate risk of a bug in a dormant and insufficiently tested CL code path that gets triggered when an EL starts sending partial responses after an upgrade.
  • The extra version mitigates this risk and the need for extra testing at the fusaka fork.

TL;DR the choice is between extra testing at fusaka or a small bit of tech debt. If we are confident the partial response code in CL clients is safe, we wind up with simpler specs and code. If we want to avoid risk, the extra engine api version just adds a bit of boilerplate and complicates the getblobs code path.

@raulk
Copy link
Member Author

raulk commented Jul 14, 2025

@kasey thanks for chiming in. The risk argument is compelling -- I hadn't considered that advantage. Yes, in the case something goes wrong with partial responses (V3), CL clients can roll back to V2 (and the associated unoptimized CL code) without requiring any further action in the EL (e.g. changing a config flag). That's a benefit of stricter and more deterministic/explicit behaviour definitions for each method.

@raulk
Copy link
Member Author

raulk commented Jul 21, 2025

Not sure what happened in this commit, but some old wording crept in and it went unnoticed. Restored the correct text.

@raulk
Copy link
Member Author

raulk commented Jul 21, 2025

Tracking implementation status.

@raulk
Copy link
Member Author

raulk commented Jul 21, 2025

Closing as per #676.

@raulk raulk closed this Jul 21, 2025
rjl493456442 pushed a commit to ethereum/go-ethereum that referenced this pull request Dec 31, 2025
This is used by cell-level dissemination (aka partial messages) to give
the CL all blobs the EL knows about and let CL communicate efficiently
about any other missing blobs. In other words, partial responses from
the EL is useful now.

See the related (closed) PR:
ethereum/execution-apis#674 and the new PR:
ethereum/execution-apis#719
weiihann added a commit to weiihann/go-ethereum that referenced this pull request Jan 8, 2026
commit 64d22fd
Author: LittleBingoo <zpksdhr@gmail.com>
Date:   Thu Jan 8 11:49:13 2026 +0800

    internal/flags: update copyright year to 2026 (ethereum#33550)

commit 9623dcb
Author: rjl493456442 <garyrong0905@gmail.com>
Date:   Thu Jan 8 11:48:45 2026 +0800

    core/state: add cache statistics of contract code reader (ethereum#33532)

commit 01b39c9
Author: Ng Wei Han <47109095+weiihann@users.noreply.github.com>
Date:   Thu Jan 8 11:07:19 2026 +0800

    core/state, core/tracing: new state update hook (ethereum#33490)

    ### Description
    Add a new `OnStateUpdate` hook which gets invoked after state is
    committed.

    ### Rationale
    For our particular use case, we need to obtain the state size metrics at
    every single block when fuly syncing from genesis. With the current
    state sizer, whenever the node is stopped, the background process must
    be freshly initialized. During this re-initialization, it can skip some
    blocks while the node continues executing blocks, causing gaps in the
    recorded metrics.

    Using this state update hook allows us to customize our own data
    persistence logic, and we would never skip blocks upon node restart.

    ---------

    Co-authored-by: Gary Rong <garyrong0905@gmail.com>

commit 957a360
Author: cui <cuiweixie@gmail.com>
Date:   Wed Jan 7 10:02:27 2026 +0800

    core/vm: avoid escape to heap (ethereum#33537)

commit 7100084
Author: Csaba Kiraly <cskiraly@users.noreply.github.com>
Date:   Wed Jan 7 02:52:50 2026 +0100

    eth: txs fetch/send log at trace level only (ethereum#33541)

    This logging was too intensive at debug level, it is better to have it
    at trace level only.

    Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>

commit eaaa5b7
Author: rjl493456442 <garyrong0905@gmail.com>
Date:   Tue Jan 6 15:09:15 2026 +0800

    core: re-organize the stats category (ethereum#33525)

    Check out https://hackmd.io/dg7rizTyTXuCf2LSa2LsyQ for more details

commit a8a4804
Author: Andrew Davis <1709934+Savid@users.noreply.github.com>
Date:   Tue Jan 6 03:49:30 2026 +1100

    ethstats: report newPayload processing time to stats server (ethereum#33395)

    Add NewPayloadEvent to track engine API newPayload block processing
    times and report them to ethstats. This enables monitoring of block
    processing performance.

    https://notes.ethereum.org/@savid/block-observability

    related: ethereum#33231

    ---------

    Co-authored-by: MariusVanDerWijden <m.vanderwijden@live.de>

commit de5ea2f
Author: Mask Weller <Wellermask@gmail.com>
Date:   Sun Jan 4 13:47:28 2026 +0700

    core/rawdb: add trienode freezer support to InspectFreezerTable (ethereum#33515)

    Adds missing trienode freezer case to InspectFreezerTable, making it
    consistent with InspectFreezer which already supports it.

    Co-authored-by: m6xwzzz <maskk.weller@gmail.com>

commit b635e06
Author: rjl493456442 <garyrong0905@gmail.com>
Date:   Thu Jan 1 02:52:25 2026 +0800

    eth/fetcher: improve the condition to stall peer in tx fetcher (ethereum#32725)

    Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
    Co-authored-by: Csaba Kiraly <csaba.kiraly@gmail.com>

commit 32fea00
Author: shhhh <g1siddharthr@gmail.com>
Date:   Wed Dec 31 11:32:44 2025 +0530

    core/blockchain.go: cleanup finalized block on rewind in setHeadBeyondRoot (ethereum#33486)

    Fix ethereum#33390

    `setHeadBeyondRoot` was failing to invalidate finalized blocks because
    it compared against the original head instead of the rewound root. This
    fix updates the comparison to use the post-rewind block number,
    preventing the node from reporting a finalized block that no longer
    exists. Also added relevant test cases for it.

commit b2843a1
Author: Marco Munizaga <marco@marcopolo.io>
Date:   Tue Dec 30 17:48:50 2025 -0800

    eth/catalyst: implement getBlobsV3 (ethereum#33404)

    This is used by cell-level dissemination (aka partial messages) to give
    the CL all blobs the EL knows about and let CL communicate efficiently
    about any other missing blobs. In other words, partial responses from
    the EL is useful now.

    See the related (closed) PR:
    ethereum/execution-apis#674 and the new PR:
    ethereum/execution-apis#719

commit 25439aa
Author: Bashmunta <georgebashmunta@gmail.com>
Date:   Wed Dec 31 03:40:43 2025 +0200

    core/state/snapshot: fix storageList memory accounting (ethereum#33505)

commit 52ae75a
Author: Rim Dinov <rdin35051@gmail.com>
Date:   Wed Dec 31 01:04:38 2025 +0500

    cmd/geth: remove deprecated vulnerability check command (ethereum#33498)

    This PR removes the version-check command and its associated logic as
    discussed in issue ethereum#31222.

    Removed versionCheckCommand from misccmd.go and main.go.

    Deleted version_check.go and its corresponding tests.

    Cleaned up testdata/vcheck directory (~800 lines of JSON/signatures
    removed).

    Verified build with make geth

commit d9aaab1
Author: Fibonacci747 <albertofibonacci12@gmail.com>
Date:   Tue Dec 30 18:27:11 2025 +0100

    beacon/light/sync: clear reqFinalityEpoch on server unregistration (ethereum#33483)

    HeadSync kept reqFinalityEpoch entries for servers after receiving
    EvUnregistered, while other per-server maps were cleared. This left
    stale request.Server keys reachable from HeadSync, which can lead to a
    slow memory leak in setups that dynamically register and unregister
    servers.

    The fix adds deletion of the reqFinalityEpoch entry in the
    EvUnregistered handler. This aligns HeadSync with the cleanup pattern
    used by other sync modules and keeps the finality request bookkeeping
    strictly limited to currently registered servers.

commit b3e7d9e
Author: rjl493456442 <garyrong0905@gmail.com>
Date:   Tue Dec 30 23:05:13 2025 +0800

    triedb/pathdb: optimize history indexing efficiency (ethereum#33303)

    This pull request optimizes history indexing by splitting a single large
    database
     batch into multiple smaller chunks.

    Originally, the indexer will resolve a batch of state histories and
    commit all
    corresponding index entries atomically together with the indexing
    marker.

    While indexing more state histories in a single batch improves
    efficiency, excessively
    large batches can cause significant memory issues.

    To mitigate this, the pull request splits the mega-batch into several
    smaller batches
    and flushes them independently during indexing. However, this introduces
    a potential
    inconsistency that some index entries may be flushed while the indexing
    marker is not,
    and an unclean shutdown may leave the database in a partially updated
    state.
    This can corrupt index data.

    To address this, head truncation is introduced. After a restart, any
    excessive index
    entries beyond the expected indexing marker are removed, ensuring the
    index remains
    consistent after an unclean shutdown.

commit b84097d
Author: Guillaume Ballet <3272758+gballet@users.noreply.github.com>
Date:   Tue Dec 30 14:43:45 2025 +0100

    .github/workflows: preventively close PRs that seem AI-generated (ethereum#33414)

    This is a new step in my crusade against the braindead fad of starting
    PR titles with a word that is completely redundant with github labels,
    thus wasting prime first-line real-estate for something that isn't
    necessary.

    I noticed that every single one of these PRs are low-quality AI-slop, so
    I think there is a strong case to be made for these PRs to be
    auto-closed. A message is added before closing the PR, redirecting to
    our contribution guidelines, so I expect quality first-time contributors
    to read them and reopen the PR. In the case of spam PRs, the author is
    unlikely to revisit a given PR, and so auto-closing might have a
    positive impact. That's an experiment worth trying, imo.

commit 3f641db
Author: Guillaume Ballet <3272758+gballet@users.noreply.github.com>
Date:   Tue Dec 30 13:44:04 2025 +0100

    trie, go.mod: remove all references to go-verkle and go-ipa (ethereum#33461)

    In order to reduce the amount of code that is embedded into the keeper
    binary, I am removing all the verkle code that uses go-verkle and
    go-ipa. This will be followed by further PRs that are more like stubs to
    replace code when the keeper build is detected.

    I'm keeping the binary tree of course. This means that you will still
    see `isVerkle` variables all over the codebase, but they will be renamed
    when code is touched (i.e. this is not an invitation for 30+ AI slop
    PRs).

    ---------

    Co-authored-by: Gary Rong <garyrong0905@gmail.com>

commit 57f8486
Author: Archkon <180910180+Archkon@users.noreply.github.com>
Date:   Mon Dec 29 20:57:29 2025 +0800

    params: fix wrong comment (ethereum#33503)

    It seems that the comment for CopyGas was wrongly associated to
    SloadGas.

commit b9702ed
Author: oooLowNeoNooo <ooolowneonooo@gmail.com>
Date:   Mon Dec 29 09:23:51 2025 +0100

    console/prompt: use PromptInput in PromptConfirm method (ethereum#33445)

commit 4531bfe
Author: rjl493456442 <garyrong0905@gmail.com>
Date:   Mon Dec 29 16:13:30 2025 +0800

    eth/downloader: fix stale beacon header deletion (ethereum#33481)

    In this PR, two things have been fixed:

    ---

    (a) truncate the stale beacon headers with latest snap block

    Originally, b.filled is used as the indicator for deleting stale beacon headers.
    This field is set only after synchronization has been scheduled, under the
    assumption that the skeleton chain is already linked to the local chain.

    However, the local chain can be mutated via `debug_setHead`, which may
    cause `b.filled` outdated. For instance, `b.filled` refers to the last head snap block
    in the last sync cycle while after `debug_setHead`, the head snap block has been
    rewounded to 1.

    As a result, Geth can enter an unintended loop: it repeatedly downloads
    the missing beacon headers for the skeleton chain and attempts to schedule the
    actual synchronization, but in the final step, all recently fetched headers are removed
    by `cleanStales` due to the stale `b.filled` value.

    This issue is addressed by always using the latest snap block as the indicator,
    without relying on any cached value. However, note that before the skeleton
    chain is linked to the local chain, the latest snap block will always be below
    skeleton.tail, and this condition should not be treated as an error.

    ---

    (b) merge the subchains once the skeleton chain links to local chain

    Once the skeleton chain links with local one, it will try to schedule the
    synchronization by fetching the missing blocks and import them then.
    It's possible the last subchain already overwrites the previous subchain and
    results in having two subchains leftover. As a result, an error log will printed
    https://github.com/ethereum/go-ethereum/blob/master/eth/downloader/skeleton.go#L1074
weiihann pushed a commit to weiihann/go-ethereum that referenced this pull request Jan 14, 2026
This is used by cell-level dissemination (aka partial messages) to give
the CL all blobs the EL knows about and let CL communicate efficiently
about any other missing blobs. In other words, partial responses from
the EL is useful now.

See the related (closed) PR:
ethereum/execution-apis#674 and the new PR:
ethereum/execution-apis#719
weiihann pushed a commit to weiihann/go-ethereum that referenced this pull request Jan 16, 2026
This is used by cell-level dissemination (aka partial messages) to give
the CL all blobs the EL knows about and let CL communicate efficiently
about any other missing blobs. In other words, partial responses from
the EL is useful now.

See the related (closed) PR:
ethereum/execution-apis#674 and the new PR:
ethereum/execution-apis#719
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants