perf(provider): optimize RocksDB history pruning with changeset-based approach#21358
Closed
perf(provider): optimize RocksDB history pruning with changeset-based approach#21358
Conversation
… approach Use MDBX changesets to identify affected (address, storage_key) pairs instead of iterating the entire StoragesHistory/AccountsHistory tables. This significantly improves pruning performance for mainnet-scale databases. Closes #20417 Signed-off-by: Hwangjae Lee <meetrick@gmail.com>
Verify no excess entries remain after optimized pruning via `last()`. Falls back to full scan if entries are missed due to incomplete changesets. Signed-off-by: Hwangjae Lee <meetrick@gmail.com>
Add tests verifying the defensive check catches RocksDB entries missed by changeset-based pruning when MDBX data is incomplete. - test_storages_history_defensive_check_catches_missed_entries - test_accounts_history_defensive_check_catches_missed_entries Signed-off-by: Hwangjae Lee <meetrick@gmail.com>
Signed-off-by: Hwangjae Lee <meetrick@gmail.com>
Signed-off-by: Hwangjae Lee <meetrick@gmail.com>
The iter_from method was incorrectly accessing `.db` field directly on RocksDBProviderInner, which is an enum and doesn't expose db as a public field. This was introduced during merge conflict resolution. Use the existing iterator_cf() helper method that properly handles both ReadWrite and ReadOnly variants. Signed-off-by: Hwangjae Lee <meetrick@gmail.com>
yongkangc
reviewed
Jan 23, 2026
yongkangc
reviewed
Jan 23, 2026
yongkangc
reviewed
Jan 23, 2026
yongkangc
reviewed
Jan 23, 2026
yongkangc
reviewed
Jan 23, 2026
yongkangc
reviewed
Jan 23, 2026
yongkangc
reviewed
Jan 23, 2026
yongkangc
reviewed
Jan 23, 2026
yongkangc
reviewed
Jan 23, 2026
yongkangc
reviewed
Jan 23, 2026
| /// excess block range, then only deletes History entries for those specific accounts. | ||
| /// This is more efficient than iterating the whole table. | ||
| /// | ||
| /// TODO(<https://github.com/paradigmxyz/reth/issues/20417>): this iterates the whole table, |
Member
There was a problem hiding this comment.
also can u verify logical equivalence?
Member
Author
There was a problem hiding this comment.
Logical equivalence verified:
Before (original):
for result in self.iter::<tables::StoragesHistory>()? {
let (key, _) = result?;
let highest_block = key.sharded_key.highest_block_number;
if max_block == 0 || (highest_block != u64::MAX && highest_block > max_block) {
to_delete.push(key);
}
}After (optimized path with changeset-based approach + fallback):
| Case | Original Behavior | New Behavior |
|---|---|---|
max_block == 0 |
Deletes all except sentinels | Calls prune_*_all() which deletes all except sentinels ✅ |
highest_block > max_block (with changesets) |
Full scan, delete matching | Changeset lookup → seek → delete matching ✅ |
highest_block > max_block (no changesets) |
Full scan | Falls back to full scan ✅ |
Sentinel entries (u64::MAX) |
Preserved | Preserved ✅ |
highest_block <= max_block |
Not deleted | Not deleted ✅ |
The defensive check after the optimized path ensures any entries missed by changesets are still caught via full scan fallback.
- Restore accidentally deleted methods in provider.rs (table_stats, raw_iter, account_history_shards, storage_history_shards, unwind_account_history_indices, unwind_storage_history_indices) - Move imports to top of file in invariants.rs - Remove redundant comments - Simplify variable patterns (use !is_empty() instead of len() > 0) - Use HashSet for O(1) deduplication in defensive check
yongkangc
reviewed
Jan 23, 2026
Update docstrings to reflect that provider.changed_storages_with_range() and provider.changed_accounts_with_range() correctly route to static files when storage_changesets_in_static_files / account_changesets_in_static_files are enabled, rather than always querying MDBX.
- Clarify that changeset queries route to static files or MDBX based on storage settings - Clarify that _all() functions preserve sentinel entries (u64::MAX) - Update 'clearing all entries' to 'clearing all non-sentinel entries'
24d2326 to
9667546
Compare
yongkangc
reviewed
Jan 23, 2026
Address review feedback - make comments more clear and concise
The previous defensive check used last() to detect missed entries, but with key ordering (address, storage_key, block), the lexicographically last key may not have the highest block number. This could cause entries to be missed. Now we scan all entries and early-exit on the first missed entry, then fall back to full scan only if needed. This is correct regardless of key ordering.
The defensive full-table scan was running on every prune, making the changeset-based optimization pointless (O(changeset) + O(table) = O(table)). Changesets are authoritative for which entries changed - if they're incomplete, that's a data integrity issue that should be caught elsewhere. The fallback to full scan when changesets are empty handles the edge cases correctly.
5fa2be2 to
2ef8cf5
Compare
The tests expected a defensive verification pass after the optimized changeset-based pruning path to catch entries that were missed due to incomplete changesets (e.g., entries in RocksDB that don't have corresponding MDBX/static file changesets). This adds a full table scan after the optimized path to ensure no entries are missed, using HashSet deduplication to avoid double-deletes.
2ef8cf5 to
b84b368
Compare
Member
Author
|
Closing this PR - the main branch already has the correct changeset-based implementation. Analysis: The current main branch
This PR regresses the implementation by:
The issue #20417 has already been addressed on main via the static file changeset integration (PR #18882). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Optimizes RocksDB history pruning by using a changeset-based approach instead of full table scans.
Changes
changed_storages_with_range/changed_accounts_with_range) which routes to static files or MDBX based on storage settingsiter_from()methodThis significantly improves pruning performance for mainnet-scale databases.
Closes #20417
Based on PR #20674 by @meetrick