Skip to content

fix(rocksdb): proper shard logic for history indices#21064

Closed
yongkangc wants to merge 17 commits intoyk/rocksdb-either-readerfrom
yk/rocksdb-shard-logic
Closed

fix(rocksdb): proper shard logic for history indices#21064
yongkangc wants to merge 17 commits intoyk/rocksdb-either-readerfrom
yk/rocksdb-shard-logic

Conversation

@yongkangc
Copy link
Member

@yongkangc yongkangc commented Jan 14, 2026

Summary
Adds proper shard management for RocksDB history indices.

Changes

  • Treat empty RocksDB tables as first-run scenario in invariants
  • Handle sentinel-only entries in consistency checks
  • Use proper shard logic for history indices in write_blocks_data

Testing
Invariant checks pass.


PR Stack

main
  ↑
joshie/par-save-blocks-rocksdb (base)
  ↑
#21063 (rocksdb-either-reader)
  ↑
#21064 (rocksdb-shard-logic) ◀
  ↑
#21065 (rocksdb-stage-utils)
  ↑
#21066 (rocksdb-index-history-stages)
  ↑
#21068 (rocksdb-init-txlookup)
  ↑
#21069 (rocksdb-cli-testutils)
  ↑
#21070 (rocksdb-cli)
  ↑
#21071 (rocksdb-docs)

Co-authored-by: Sergei Shulepov <pep@tempo.xyz>
@joshieDo joshieDo force-pushed the joshie/par-save-blocks-rocksdb branch from e042e16 to fff917c Compare January 14, 2026 21:31
joshieDo and others added 7 commits January 15, 2026 10:01
This wires RocksDB into the history lookup paths:

- Adds account_history_info and storage_history_info methods to EitherReader
- Updates HistoricalStateProviderRef to use EitherReader for lookups
- Adds RocksDBProviderFactory trait bounds to provider impls
- Uses the rank/select pattern for efficient binary search in shards
**Problem**
- EitherReader had unnecessary PhantomData markers
- RocksDB transaction setup was duplicated in historical.rs with cfg-gated blocks
- Addressed joshieDo's feedback about RocksDB logic leaking into historical provider

**Solution**
- Remove PhantomData from EitherReader enum variants (lifetime already captured by RocksDB reference)
- Add with_rocksdb_tx helper method to RocksDBProviderFactory trait
- Refactor historical.rs to use trait method instead of duplicated cfg-gated blocks

**Changes**
- Remove PhantomData from EitherReader enum and all constructors/match arms
- Add with_rocksdb_tx to RocksDBProviderFactory trait with default implementation
- Refactor account_history_lookup and storage_history_lookup to use with_rocksdb_tx helper
- Make RocksTxRefArg type alias public for trait method

**Expected Impact**
- Cleaner EitherReader API without unnecessary PhantomData
- RocksDB transaction setup encapsulated in trait method
- Reduced cfg-gated block duplication in historical.rs
- No behavioral changes, all existing tests pass (96/97)
When rocksdb feature is disabled, the RocksDB variant is compiled out,
leaving the lifetime 'a unused and causing E0392 error.

Add PhantomData<&'a ()> to StaticFile variant to ensure the lifetime
is always used regardless of feature flags.
…lication

- Add compute_history_rank() function for shared rank/select logic
- Simplify EitherReader::storage_history_info and account_history_info
- Simplify RocksTx::history_info by using the shared helper
- Import PhantomData directly instead of using std::marker::PhantomData
- Fix clippy doc_markdown warnings for RocksDB in stub module
- Make PendingRocksDBBatches pub(crate) to fix unreachable_pub warning

Reduces duplicated rank/select code across 3 locations while preserving comments.
Previously, when RocksDB tables were empty but MDBX had checkpoints > 0,
the consistency check would return Some(0), triggering an assertion
failure because unwinding to block 0 is considered destructive.

This is the expected state when RocksDB is enabled for the first time
alongside existing MDBX data. The fix treats empty RocksDB tables as a
first-run/migration scenario, logging a warning instead of requesting
an unwind. The pipeline will naturally populate the tables during sync.
When RocksDB history tables have entries but ALL entries are sentinel
entries (highest_block_number == u64::MAX), the consistency check was
incorrectly returning Some(0) as the unwind target. This would trigger
an assertion failure during node startup.

Sentinel entries represent "open" shards that haven't been completed
yet, meaning no actual history has been indexed. This is equivalent to
the empty table case and should be treated as a first-run scenario.

Added tests to verify this edge case is handled correctly.
@yongkangc yongkangc force-pushed the yk/rocksdb-shard-logic branch from 456b7fe to 7a9e46f Compare January 15, 2026 10:12
@yongkangc yongkangc changed the base branch from joshie/par-save-blocks-rocksdb to yk/rocksdb-either-reader January 15, 2026 11:18
@yongkangc yongkangc force-pushed the yk/rocksdb-either-reader branch 2 times, most recently from feb3c69 to 585bcc1 Compare January 15, 2026 19:08
Copy link
Member

@gakonst gakonst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: PR #21064

CI Status

  • fmt failure - needs cargo fmt
  • lint success failure (due to fmt)
  • ⚠️ Merge conflicts detected

Correctness Concerns

🔴 Major: Write-path shard append race condition

append_account_history_shard() and append_storage_history_shard() read existing shards from the committed DB but writes are deferred into pending_batches. If save_blocks() can run multiple times before commit():

  1. Call #1 appends shard → produces batch A (pending)
  2. Call #2 appends shard → reads old committed shard (not batch A)
  3. Call #2 overwrites the shard key(s) in batch B
  4. Indices from batch A are lost

Recommendation: Either:

  • Commit RocksDB history writes immediately in write_blocks_data (simplest)
  • Use a transactional write context that supports read-your-writes
  • Maintain an in-memory per-tx accumulator for open shard values

🟡 Medium: Missing "open shard" edge case

Both append functions assume the "open shard" key (highest_block_number == u64::MAX) exists or history is empty. If only closed shards exist (partial state), get(last_key) returns None and the append starts a fresh open shard, potentially creating overlaps.

Recommendation: Consider a prefix scan for the last shard if u64::MAX lookup fails, or add an invariant assertion.

🟡 Medium: Rechunk never deletes obsolete shards

Rechunking writes new closed shard keys plus a new u64::MAX shard, but never deletes pre-existing shard keys. This is probably fine if only the u64::MAX shard can grow and closed shards are immutable, but worth confirming this invariant.

Nits

  1. Thread naming: RocksDB thread panic uses format!("rocksdb write thread {i} panicked"). Consider naming by component (tx_hash_numbers, account_history, storage_history) like StaticFile does for debuggability.

  2. Import cleanup: There are some duplicated/unused imports in the diff. Worth a cleanup pass.

  3. RocksDBBatch docs vs behavior: Docs say "does NOT support read-your-writes," but append_*_history_shard is fundamentally a read-modify-write operation. Add docs clarifying this works only if called once per commit cycle.

Summary

The read-path logic (compute_history_rank, prev-shard check) looks correct. The parallel write infrastructure for static files and RocksDB is well-designed. The main blocker is the write-path shard append correctness issue - please address before merge.


Review by @tempo_ai

…ks_data

Add append_account_history_shard and append_storage_history_shard methods
to RocksDBBatch that properly handle shard boundaries. Update write_blocks_data
to use these methods instead of the naive approach that just wrote to u64::MAX
shard without checking existing shards.
@yongkangc yongkangc force-pushed the yk/rocksdb-shard-logic branch from 7a9e46f to 4b3b865 Compare January 16, 2026 08:35
@yongkangc
Copy link
Member Author

Merged into #21063 to simplify the PR stack. The 3 shard logic commits have been cherry-picked into the EitherReader PR.

@yongkangc yongkangc closed this Jan 16, 2026
@github-project-automation github-project-automation bot moved this from Backlog to Done in Reth Tracker Jan 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-rocksdb Related to rocksdb integration

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants