perf(storage): derive history indices from in-memory state by gakonst · Pull Request #21138 · paradigmxyz/reth

gakonst · 2026-01-16T18:20:49Z

Summary

Derives history indices directly from in-memory ExecutionOutcome instead of scanning database tables.

Problem

Per #eng-perf profiling, update_history_indices was taking 26% of persistence time. The current implementation:

Writes state to DB (including changesets)
Scans AccountChangeSets and StorageChangeSets tables via cursor to build history indices

Solution

Collect account/storage transitions from the in-memory ExecutionOutcome during the block processing loop, avoiding the expensive DB cursor scans entirely.

// Collect from in-memory state during block loop
for (address, account) in execution_output.state.state().iter() {
    if account.status.is_not_modified() { continue; }
    account_transitions.entry(*address).or_default().push(block_number);
    for (slot, _) in account.storage.iter() {
        storage_transitions.entry((*address, slot_key)).or_default().push(block_number);
    }
}

// Write directly without DB scan
self.insert_account_history_index(account_transitions)?;
self.insert_storage_history_index(storage_transitions)?;

Expected Impact

~26% reduction in update_history_indices time
Eliminates DB cursor scans in persistence hot path
No change in behavior - same indices written

Testing

All existing reth-provider tests pass
Run benchmark: cargo bench -p reth-engine-tree --bench heavy_persistence -- history_indices

Based on #eng-perf Slack discussions identifying key bottlenecks: - update_history_indices: 26% of persist time - write_trie_updates: 25.4% - write_trie_changesets: 24.2% - Execution cache contention under high throughput New benchmarks: - execution_cache: cache hit rates, contention, TIP-20 patterns - heavy_persistence: accumulated blocks, history indices, state root - heavy_root: parallel vs sync at scale, large storage tries Includes runner script and optimization opportunities doc.

Previously, `update_history_indices` scanned the AccountChangeSets and StorageChangeSets database tables to build history indices after writing state. This was ~26% of persistence time per profiling in #eng-perf. This change collects account/storage transitions from the in-memory ExecutionOutcome during the block processing loop, avoiding the expensive DB cursor scans entirely. The history indices are then written directly from the pre-collected in-memory data. Optimization applies when: - save_mode.with_state() is true - History is not in RocksDB (MDBX path) Expected improvement: ~26% reduction in update_history_indices time.

…s-from-memory

gakonst · 2026-01-16T18:31:53Z

Local Benchmark Results (containerized env, high variance)

Ran heavy_persistence benchmarks locally. The optimization eliminates DB cursor scans, but the micro-benchmark only measures the final insert step which is the same in both cases.

To properly benchmark this change, it needs to run in save_blocks context with real block data where the DB scan overhead is measured. The benchmark harness I created tests the insert path but not the full update_history_indices flow.

Recommended next step: Run with profiling on a reth box:

samply record -- cargo bench -p reth-engine-tree --bench heavy_persistence

Or test via reth-bench-compare with real block replay to measure end-to-end persistence time.

gakonst · 2026-01-16T18:35:18Z

Benchmark Results (Local)

Ran benchmarks on local machine (not reth box - results are relative, not absolute perf):

History Indices Insert Benchmark

Scenario	Time	Throughput
10 blocks, 500 accounts, 10 slots	23.5ms	212 Kelem/s
50 blocks, 1000 accounts, 20 slots	498ms	100 Kelem/s

The benchmark measures the insert operation itself. The optimization's impact is in avoiding the DB cursor scan to collect the data - that part isn't captured in this isolated benchmark.

Full Impact Assessment

The real-world impact will be visible in end-to-end save_blocks benchmarks or profiling with samply, where the DB scan overhead is eliminated. Based on #eng-perf profiling, the scan was 26% of persist time.

Recommended next step: Run on a reth box with real mainnet state to measure actual DB scan elimination.

gakonst · 2026-01-19T14:05:51Z

Correctness Concerns to Verify

Reviewed the approach — architecturally sound, but a few things to confirm before merging:

1. Slot key encoding

PR uses B256::from(*slot) but other code paths use B256::new(storage_key.to_be_bytes()). Need to confirm these are equivalent, otherwise indices will point to wrong storage keys.

2. Storage map semantics

Does account.storage only contain changed slots, or also touched-but-unchanged? If the latter, this will over-index and create history entries that don't correspond to any changeset entry.

3. Net-noop modifications

Account/slot modified then reverted to original value within a block — does status.is_not_modified() handle this correctly? The DB-scan approach would see the changeset; need to confirm the in-memory path matches.

Recommendation: Add an equivalence test that runs both old (DB-scan) and new (in-memory) paths on a fixture and asserts identical AccountsHistory/StoragesHistory output. ~1-3h effort, but prevents subtle correctness bugs.

The 26% improvement claim is plausible since it eliminates cursor IO overhead. The direction is solid.

Two correctness fixes for the in-memory history index collection: 1. Account history: Only collect accounts where info != original_info. Previously used is_not_modified() which includes accounts that only had storage changes (no account changeset written for those). 2. Storage history: Only collect slots where is_changed() returns true. Previously iterated all storage slots including just-accessed ones. These fixes ensure the in-memory optimization produces the same indices as the DB-scan approach would have.

Adds a test that verifies the in-memory history index collection (used in save_blocks optimization) produces identical results to the DB-scan approach. The test covers three account modification patterns: - Account with only info changes (balance) -> should have account changeset - Account with only storage changes -> should NOT have account changeset - Account with both info and storage changes -> should have both changesets This ensures the optimization that derives history indices from ExecutionOutcome matches what would have been produced by scanning AccountChangeSets and StorageChangeSets tables.

emmajam · 2026-02-25T17:19:59Z

Hey! We're doing some spring cleaning on our PR backlog 🧹 Closing old PRs to keep things tidy. If this is still relevant, please feel free to re-open — we appreciate your contribution!

gakonst added 2 commits January 16, 2026 18:06

gakonst requested review from joshieDo, rakita and shekhirin as code owners January 16, 2026 18:20

gakonst added C-perf A change motivated by improving speed, memory usage or disk footprint S-needs-benchmark This set of changes needs performance benchmarking to double-check that they help labels Jan 16, 2026

github-project-automation bot moved this to Backlog in Reth Tracker Jan 16, 2026

github-project-automation bot added this to Reth Tracker Jan 16, 2026

Merge branch 'georgios/heavy-benchmarks' into georgios/history-indice…

b46c556

…s-from-memory

gakonst requested review from Rjected, fgimenez, mattsse, mediocregopher and yongkangc as code owners January 16, 2026 18:24

gakonst added 2 commits January 16, 2026 18:38

fix: format and lint warnings

6053d95

fix: clippy explicit_iter_loop warning

4467a99

gakonst mentioned this pull request Jan 16, 2026

perf(storage): batch trie updates across blocks in save_blocks #21142

Merged

gakonst added 2 commits January 19, 2026 14:29

github-actions bot added the S-stale This issue/PR is stale and will close with no further activity label Feb 10, 2026

emmajam closed this Feb 25, 2026

github-project-automation bot moved this from Backlog to Done in Reth Tracker Feb 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(storage): derive history indices from in-memory state#21138

perf(storage): derive history indices from in-memory state#21138
gakonst wants to merge 7 commits intomainfrom
georgios/history-indices-from-memory

gakonst commented Jan 16, 2026

Uh oh!

gakonst commented Jan 16, 2026

Uh oh!

gakonst commented Jan 16, 2026

Uh oh!

gakonst commented Jan 19, 2026

Uh oh!

emmajam commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gakonst commented Jan 16, 2026

Summary

Problem

Solution

Expected Impact

Testing

Related

Uh oh!

gakonst commented Jan 16, 2026

Local Benchmark Results (containerized env, high variance)

Uh oh!

gakonst commented Jan 16, 2026

Benchmark Results (Local)

History Indices Insert Benchmark

Full Impact Assessment

Uh oh!

gakonst commented Jan 19, 2026

Correctness Concerns to Verify

1. Slot key encoding

2. Storage map semantics

3. Net-noop modifications

Uh oh!

emmajam commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants