perf(storage): batch trie updates across blocks in save_blocks#21142
Merged
perf(storage): batch trie updates across blocks in save_blocks#21142
Conversation
Previously, `write_trie_updates_sorted` was called once per block in the save_blocks loop. This opened/closed cursors N times for N blocks. This change accumulates trie updates across all blocks using `extend_ref` and writes them in a single batch at the end. This reduces: - Cursor open/close overhead from N to 1 - MDBX transaction overhead For back-to-back block processing with 75-250 accumulated blocks (per #eng-perf profiling), this significantly reduces the ~25% of persist time spent in write_trie_updates. Expected improvement: ~50% reduction in write_trie_updates for b2b scenarios.
Member
|
this is ok but its the same as the other "accumulation" functions, we shouldn't accumulate into a sorted vec because it sorts on each iteration |
Address review feedback: replace iterative extend_ref (which re-sorts on each call, O(n*k)) with merge_batch which uses k-way merge for O(n log k) complexity. Also removes unnecessary intermediate clones by collecting Arc refs and passing them directly to merge_batch. Amp-Thread-ID: https://ampcode.com/threads/T-019bc863-a64e-73bd-93d5-fc65229fa862 Co-authored-by: Amp <amp@ampcode.com>
Add merge_batch_hybrid() to TrieUpdatesSorted and HashedPostStateSorted which uses a hybrid algorithm: - Small k (< 64): extend_ref loop with low constant factors - Large k (≥ 64): k-way merge_batch for O(n log k) complexity This consolidates the threshold logic that was duplicated in lazy_overlay.rs and makes it available for save_blocks batching. Amp-Thread-ID: https://ampcode.com/threads/T-019bc863-a64e-73bd-93d5-fc65229fa862 Co-authored-by: Amp <amp@ampcode.com>
yongkangc
reviewed
Jan 16, 2026
| prefix_sets: Default::default(), | ||
| } | ||
| // Collect all trie data first (blocks are newest-to-oldest) | ||
| let trie_data: Vec<_> = blocks.iter().map(|b| b.wait_cloned()).collect(); |
Member
There was a problem hiding this comment.
not sure if we can remove a collect here?
yongkangc
reviewed
Jan 16, 2026
| pub fn merge_batch_hybrid<'a>(states: impl IntoIterator<Item = &'a Self>) -> Self { | ||
| const MERGE_BATCH_THRESHOLD: usize = 64; | ||
|
|
||
| let states: Vec<_> = states.into_iter().collect(); |
Member
There was a problem hiding this comment.
we can skip this collect again
because we collected before this alrready?
yongkangc
reviewed
Jan 16, 2026
crates/trie/common/src/updates.rs
Outdated
| pub fn merge_batch_hybrid<'a>(updates: impl IntoIterator<Item = &'a Self>) -> Self { | ||
| const MERGE_BATCH_THRESHOLD: usize = 64; | ||
|
|
||
| let updates: Vec<_> = updates.into_iter().collect(); |
Member
There was a problem hiding this comment.
same here, i think we can prob remove this collect
yongkangc
reviewed
Jan 16, 2026
| if save_mode.with_state() { | ||
| let start = Instant::now(); | ||
| // Collect Arc refs first to extend their lifetime | ||
| let trie_updates: Vec<_> = blocks.iter().map(|b| b.trie_updates()).collect(); |
Member
There was a problem hiding this comment.
same here re collect. check for perf regression compared to previous
Addresses review feedback from yongkangc: avoid internal collect() in merge_batch_hybrid by changing signature to accept &[&Self] slice. Callers that already have collected data (lazy_overlay, provider) now pass the slice directly, avoiding a redundant collection inside the merge function. Amp-Thread-ID: https://ampcode.com/threads/T-019bc863-a64e-73bd-93d5-fc65229fa862 Co-authored-by: Amp <amp@ampcode.com>
yongkangc
reviewed
Jan 16, 2026
| } | ||
|
|
||
| if states.len() == 1 { | ||
| return (*states[0]).clone(); |
Member
There was a problem hiding this comment.
is this clone here bad? not 100% sure why we need to return a clone.
Address yongkangc review feedback: - Remove unused merge_batch_hybrid functions from reth-trie-common - Inline optimized hybrid logic in lazy_overlay and provider - Use Arc::make_mut for copy-on-write in small-k path (avoids clone if refcount is 1) - Use Arc::try_unwrap to avoid final clone when possible - Only collect for large-k path where k-way merge needs materialized data - Single block case returns data directly without allocation Amp-Thread-ID: https://ampcode.com/threads/T-019bc863-a64e-73bd-93d5-fc65229fa862 Co-authored-by: Amp <amp@ampcode.com>
yongkangc
reviewed
Jan 16, 2026
yongkangc
approved these changes
Jan 16, 2026
yongkangc
approved these changes
Jan 16, 2026
mediocregopher
approved these changes
Jan 16, 2026
mattsse
approved these changes
Jan 17, 2026
gakonst
added a commit
that referenced
this pull request
Jan 17, 2026
This refactor addresses code duplication identified in PR #21142 by extracting the hybrid merge algorithm into a reusable trait and helper functions. ## Changes ### New HybridMergeSorted trait (reth_trie::utils) - Trait for sorted types supporting hybrid merging (extend_ref + merge_batch) - Implemented for TrieUpdatesSorted and HashedPostStateSorted - Two helper functions: - `hybrid_merge_arc_slice`: merges Arc-wrapped items from a slice - `hybrid_merge_arc_vec`: consumes Vec for Arc::try_unwrap optimization ### Algorithm details - 0 items: returns default - 1 item: returns Arc clone (no data allocation) - < threshold: uses extend_ref loop with Arc::make_mut (copy-on-write) - >= threshold: uses k-way merge_batch for O(n log k) complexity ### Updated call sites 1. **lazy_overlay.rs**: merge_blocks now uses hybrid_merge_arc_slice 2. **provider.rs**: save_blocks now batches trie updates using hybrid_merge_arc_vec, reducing cursor open/close overhead from N to 1 ## Allocation behavior - No regression vs original code - Small batches: Arc::make_mut provides copy-on-write - Large batches: single allocation for k-way merge result - Arc::try_unwrap used to avoid clone when refcount is 1 ## Expected performance impact - ~50% reduction in write_trie_updates time for b2b scenarios (batched writes) - Maintains same allocation characteristics as original code
gakonst
added a commit
that referenced
this pull request
Jan 17, 2026
Batches trie updates across all blocks in `save_blocks` instead of writing per-block, reducing cursor open/close overhead from N to 1. ## Changes ### lazy_overlay.rs - Move MERGE_BATCH_THRESHOLD constant inside function - Use data directly instead of Arc::clone (avoids unnecessary refcount bump) - Move Arc::make_mut into the loop for proper copy-on-write semantics ### provider.rs Add batched trie updates with hybrid merge algorithm: - 0 blocks: default - 1 block: Arc::try_unwrap to avoid clone if refcount is 1 - < 30 blocks: extend_ref with Arc::make_mut (copy-on-write) - >= 30 blocks: k-way merge_batch for O(n log k) complexity ## Allocation Behavior (No Regression) Small batches avoid collect() by using Arc::make_mut directly in the loop. Only large batches (>= 30) collect Arcs for k-way merge. ## Expected Impact - ~50% reduction in write_trie_updates time for b2b scenarios - Maintains same allocation characteristics as original code ## Related - Based on optimizations from Slack #eng-perf thread - Learned from PR #21142 review: avoid unnecessary collect(), use Arc::make_mut
gakonst
added a commit
that referenced
this pull request
Jan 18, 2026
Batches trie updates across all blocks in `save_blocks` instead of writing per-block, reducing cursor open/close overhead from N to 1. ## Changes ### lazy_overlay.rs - Move MERGE_BATCH_THRESHOLD constant inside function - Use data directly instead of Arc::clone (avoids unnecessary refcount bump) - Move Arc::make_mut into the loop for proper copy-on-write semantics ### provider.rs Add batched trie updates with hybrid merge algorithm: - 0 blocks: default - 1 block: Arc::try_unwrap to avoid clone if refcount is 1 - < 30 blocks: extend_ref with Arc::make_mut (copy-on-write) - >= 30 blocks: k-way merge_batch for O(n log k) complexity ## Allocation Behavior (No Regression) Small batches avoid collect() by using Arc::make_mut directly in the loop. Only large batches (>= 30) collect Arcs for k-way merge. ## Expected Impact - ~50% reduction in write_trie_updates time for b2b scenarios - Maintains same allocation characteristics as original code ## Related - Based on optimizations from Slack #eng-perf thread - Learned from PR #21142 review: avoid unnecessary collect(), use Arc::make_mut
Member
|
Closes #20611 |
Vui-Chee
added a commit
to okx/reth
that referenced
this pull request
Jan 20, 2026
* tag 'v1.10.1': (49 commits) chore: bump version to 1.10.1 (paradigmxyz#21188) chore: rename extend_ref methods on sorted data structures (paradigmxyz#21043) fix(flashblocks): Add flashblock ws connection retry period (paradigmxyz#20510) chore(bench): add --disable-tx-gossip to benchmark node args (paradigmxyz#21171) refactor(stages): reuse history index cache buffers in `collect_history_indices` (paradigmxyz#21017) feat(download): resumable snapshot downloads with auto-retry (paradigmxyz#21161) ci: update to tempoxyz (paradigmxyz#21176) chore: apply spelling and typo fixes (paradigmxyz#21182) docs: document minimal storage mode in pruning FAQ (paradigmxyz#21025) chore(deps): weekly `cargo update` (paradigmxyz#21167) feat(execution-types): add receipts_iter helper (paradigmxyz#21162) revert: undo Chain crate, add LazyTrieData to trie-common (paradigmxyz#21155) feat(engine): add new_payload_interval metric (start-to-start) (paradigmxyz#21159) feat(engine): add time_between_new_payloads metric (paradigmxyz#21158) fix(storage-api): gate reth-chain dependency behind std feature perf(storage): batch trie updates across blocks in save_blocks (paradigmxyz#21142) refactor: use ExecutionOutcome::single instead of tuple From (paradigmxyz#21152) chore(chain-state): reorganize deferred_trie.rs impl blocks (paradigmxyz#21151) feat(primitives-traits): add try_recover_signers for parallel batch recovery (paradigmxyz#21103) perf: make Chain use DeferredTrieData (paradigmxyz#21137) ...
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Batches trie updates across all blocks in
save_blocksinstead of writing per-block.Problem
Per #eng-perf profiling,
write_trie_updateswas taking ~25% of persistence time. The current implementation callswrite_trie_updates_sortedonce per block, opening/closing cursors N times.In back-to-back (b2b) scenarios with 75-250 accumulated blocks, this overhead compounds significantly.
Solution
Accumulate trie updates across blocks using the existing
extend_refmethod, then write them all in a single batch:Expected Impact
write_trie_updatestime for b2b scenariosTesting
reth-providertests passreth-bench-compareCloses RETH-168
Related