fix(rocksdb): proper shard logic for history indices#21064
fix(rocksdb): proper shard logic for history indices#21064yongkangc wants to merge 17 commits intoyk/rocksdb-either-readerfrom
Conversation
1986ff4 to
111d45e
Compare
Co-authored-by: Sergei Shulepov <pep@tempo.xyz>
e042e16 to
fff917c
Compare
fff917c to
d8a600c
Compare
111d45e to
456b7fe
Compare
This wires RocksDB into the history lookup paths: - Adds account_history_info and storage_history_info methods to EitherReader - Updates HistoricalStateProviderRef to use EitherReader for lookups - Adds RocksDBProviderFactory trait bounds to provider impls - Uses the rank/select pattern for efficient binary search in shards
**Problem** - EitherReader had unnecessary PhantomData markers - RocksDB transaction setup was duplicated in historical.rs with cfg-gated blocks - Addressed joshieDo's feedback about RocksDB logic leaking into historical provider **Solution** - Remove PhantomData from EitherReader enum variants (lifetime already captured by RocksDB reference) - Add with_rocksdb_tx helper method to RocksDBProviderFactory trait - Refactor historical.rs to use trait method instead of duplicated cfg-gated blocks **Changes** - Remove PhantomData from EitherReader enum and all constructors/match arms - Add with_rocksdb_tx to RocksDBProviderFactory trait with default implementation - Refactor account_history_lookup and storage_history_lookup to use with_rocksdb_tx helper - Make RocksTxRefArg type alias public for trait method **Expected Impact** - Cleaner EitherReader API without unnecessary PhantomData - RocksDB transaction setup encapsulated in trait method - Reduced cfg-gated block duplication in historical.rs - No behavioral changes, all existing tests pass (96/97)
When rocksdb feature is disabled, the RocksDB variant is compiled out, leaving the lifetime 'a unused and causing E0392 error. Add PhantomData<&'a ()> to StaticFile variant to ensure the lifetime is always used regardless of feature flags.
…lication - Add compute_history_rank() function for shared rank/select logic - Simplify EitherReader::storage_history_info and account_history_info - Simplify RocksTx::history_info by using the shared helper - Import PhantomData directly instead of using std::marker::PhantomData - Fix clippy doc_markdown warnings for RocksDB in stub module - Make PendingRocksDBBatches pub(crate) to fix unreachable_pub warning Reduces duplicated rank/select code across 3 locations while preserving comments.
Previously, when RocksDB tables were empty but MDBX had checkpoints > 0, the consistency check would return Some(0), triggering an assertion failure because unwinding to block 0 is considered destructive. This is the expected state when RocksDB is enabled for the first time alongside existing MDBX data. The fix treats empty RocksDB tables as a first-run/migration scenario, logging a warning instead of requesting an unwind. The pipeline will naturally populate the tables during sync.
When RocksDB history tables have entries but ALL entries are sentinel entries (highest_block_number == u64::MAX), the consistency check was incorrectly returning Some(0) as the unwind target. This would trigger an assertion failure during node startup. Sentinel entries represent "open" shards that haven't been completed yet, meaning no actual history has been indexed. This is equivalent to the empty table case and should be treated as a first-run scenario. Added tests to verify this edge case is handled correctly.
456b7fe to
7a9e46f
Compare
feb3c69 to
585bcc1
Compare
gakonst
left a comment
There was a problem hiding this comment.
Code Review: PR #21064
CI Status
- ❌ fmt failure - needs
cargo fmt - ❌ lint success failure (due to fmt)
⚠️ Merge conflicts detected
Correctness Concerns
🔴 Major: Write-path shard append race condition
append_account_history_shard() and append_storage_history_shard() read existing shards from the committed DB but writes are deferred into pending_batches. If save_blocks() can run multiple times before commit():
- Call #1 appends shard → produces batch A (pending)
- Call #2 appends shard → reads old committed shard (not batch A)
- Call #2 overwrites the shard key(s) in batch B
- Indices from batch A are lost
Recommendation: Either:
- Commit RocksDB history writes immediately in
write_blocks_data(simplest) - Use a transactional write context that supports read-your-writes
- Maintain an in-memory per-tx accumulator for open shard values
🟡 Medium: Missing "open shard" edge case
Both append functions assume the "open shard" key (highest_block_number == u64::MAX) exists or history is empty. If only closed shards exist (partial state), get(last_key) returns None and the append starts a fresh open shard, potentially creating overlaps.
Recommendation: Consider a prefix scan for the last shard if u64::MAX lookup fails, or add an invariant assertion.
🟡 Medium: Rechunk never deletes obsolete shards
Rechunking writes new closed shard keys plus a new u64::MAX shard, but never deletes pre-existing shard keys. This is probably fine if only the u64::MAX shard can grow and closed shards are immutable, but worth confirming this invariant.
Nits
-
Thread naming: RocksDB thread panic uses
format!("rocksdb write thread {i} panicked"). Consider naming by component (tx_hash_numbers,account_history,storage_history) like StaticFile does for debuggability. -
Import cleanup: There are some duplicated/unused imports in the diff. Worth a cleanup pass.
-
RocksDBBatchdocs vs behavior: Docs say "does NOT support read-your-writes," butappend_*_history_shardis fundamentally a read-modify-write operation. Add docs clarifying this works only if called once per commit cycle.
Summary
The read-path logic (compute_history_rank, prev-shard check) looks correct. The parallel write infrastructure for static files and RocksDB is well-designed. The main blocker is the write-path shard append correctness issue - please address before merge.
Review by @tempo_ai
…ks_data Add append_account_history_shard and append_storage_history_shard methods to RocksDBBatch that properly handle shard boundaries. Update write_blocks_data to use these methods instead of the naive approach that just wrote to u64::MAX shard without checking existing shards.
7a9e46f to
4b3b865
Compare
|
Merged into #21063 to simplify the PR stack. The 3 shard logic commits have been cherry-picked into the EitherReader PR. |
Summary
Adds proper shard management for RocksDB history indices.
Changes
Testing
Invariant checks pass.
PR Stack