Skip to content

fix(storage): pin RocksDB snapshot for cross-store read consistency#23305

Closed
joshieDo wants to merge 9 commits into
joshie/defer-changeset-prunefrom
joshie/pin-rocksdb-snapshot
Closed

fix(storage): pin RocksDB snapshot for cross-store read consistency#23305
joshieDo wants to merge 9 commits into
joshie/defer-changeset-prunefrom
joshie/pin-rocksdb-snapshot

Conversation

@joshieDo
Copy link
Copy Markdown
Collaborator

Stacked on #23291.

Fixes block gas mismatch during aggressive reorgs with storage_v2. HistoricalStateProvider takes a fresh RocksDB snapshot per query, but its MDBX read tx is from creation time. If RemoveBlocksAbove commits between the two, the reader sees unwound RocksDB indices with pre-unwind MDBX state — InPlainState fallback returns wrong values.

Fix: pin the RocksDB snapshot at DatabaseProvider creation alongside the MDBX read tx, with a last_txnid retry loop to ensure consistency. Thread it through to HistoricalStateProvider for all history lookups.

…tency

HistoricalStateProvider takes its MDBX read tx at creation time (MVCC
snapshot) but previously took a fresh RocksDB snapshot on every
account_history_lookup/storage_history_lookup call. During aggressive
reorgs, RemoveBlocksAbove commits MDBX then RocksDB sequentially —
a reader with an older MDBX tx could see unwound RocksDB indices,
causing InPlainState fallback to return state at the wrong tip.

Fix: capture both MDBX read tx and RocksDB snapshot together in
ProviderFactory::provider() with a retry loop that verifies no write
commit landed between the two (via last_txnid check). The pinned
snapshot is threaded through DatabaseProvider → HistoricalStateProvider
and used for all history lookups instead of per-query snapshots.

This only affects storage_v2 where history indices live in RocksDB.
With storage_v1, history indices are in MDBX and share the read tx's
MVCC snapshot automatically.

Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019d4393-7938-71ab-a3fd-0132444583bd
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 31, 2026

⚠️ Changelog not found.

A changelog entry is required before merging. We've generated a suggested changelog based on your changes:

Preview
---
reth-provider: minor
---

Made `RocksReadSnapshot` an owned type (no lifetime parameter) by cloning the `RocksDBProvider` to keep the underlying DB alive, and pinned a matching snapshot alongside the MDBX read transaction in `ProviderFactory::provider()` for `storage_v2` nodes. Added `pinned_rocksdb_snapshot()` to `RocksDBProviderFactory` so read-only providers reuse their pinned snapshot instead of creating a fresh one per read.

Add changelog to commit this to your branch.

Comment on lines +295 to +297
if let Some(snapshot) = pinned_rocksdb_snapshot {
provider = provider.with_pinned_rocksdb_snapshot(snapshot);
}
Copy link
Copy Markdown
Collaborator Author

@joshieDo joshieDo Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

option since storage.v1 doesnt need/use it

@joshieDo joshieDo marked this pull request as ready for review March 31, 2026 17:09
@joshieDo joshieDo added C-bug An unexpected or incorrect behavior A-db Related to the database A-rocksdb Related to rocksdb integration labels Mar 31, 2026
Copy link
Copy Markdown
Member

@Rjected Rjected left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one comment otherwise i think this does fix the stated bug

Comment on lines +245 to +247
/// The helper retries if an MDBX write commit lands between opening the read transaction and
/// taking the `RocksDB` snapshot. This keeps both stores aligned for historical reads that
/// span MDBX state and `RocksDB` history indices.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to note that this:

  1. allows rocksdb to be ahead of mdbx, ie mdbx at block 7 and rocksdb up to block 10, and
  2. we need docs that outline what behavior is safe, for example the engine is safe because it creates txs sequentially, and we only ever perform SaveBlocks after we have closed a tx and added a block to the tree. I am not sure a single historical provider should outlive both a remove_blocks mdbx commit and a subsequent save_blocks

@joshieDo joshieDo closed this Apr 2, 2026
@github-project-automation github-project-automation Bot moved this from Backlog to Done in Reth Tracker Apr 2, 2026
@emmajam emmajam deleted the joshie/pin-rocksdb-snapshot branch May 1, 2026 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-db Related to the database A-rocksdb Related to rocksdb integration C-bug An unexpected or incorrect behavior

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants