Skip to content

feat: sparse trie as cache#21583

Merged
klkvr merged 94 commits into
mainfrom
klkvr/sparse-trie-cache-update-leaves
Jan 29, 2026
Merged

feat: sparse trie as cache#21583
klkvr merged 94 commits into
mainfrom
klkvr/sparse-trie-cache-update-leaves

Conversation

@klkvr
Copy link
Copy Markdown
Member

@klkvr klkvr commented Jan 29, 2026

Adds a new SparseTrieCacheTask that uses the in-memory sparse trie to drive proof fetching instead of relying on the MultiProofTask

klkvr and others added 30 commits January 28, 2026 23:31
Add prune() method to SparseTrieInterface trait for cross-payload
sparse trie caching. The method converts nodes beyond a specified
depth to Hash nodes, reducing memory while preserving root hash.

Implementation details:
- SerialSparseTrie: DFS traversal with node-depth semantics
- ParallelSparseTrie: Prunes across upper/lower subtries
- ConfiguredSparseTrie: Delegates to inner implementation
- SparseStateTrie: Evicts storage tries by node count, then prunes

Also adds revealed_node_count() to track non-Hash nodes and
DEFAULT_SPARSE_TRIE_PRUNE_DEPTH/DEFAULT_MAX_PRESERVED_STORAGE_TRIES
constants for configuration.

Amp-Thread-ID: https://ampcode.com/threads/T-019bfa26-6d57-72df-9b0e-89dc32090861
- Save cleared revealed_paths HashSets to cleared_revealed_paths for reuse
  instead of discarding them (matching the pattern used by cleared_tries)
- Add metrics recording for prune operations:
  - prune_account_nodes_converted
  - prune_storage_nodes_converted
  - prune_storage_tries_cleared
  - prune_storage_tries_retained
  - post_prune_account_nodes
  - post_prune_storage_nodes

Amp-Thread-ID: https://ampcode.com/threads/T-019bfad0-85ab-703d-a712-01dbf2098358
Extract branch_changes_on_leaf_removal and extension_changes_on_leaf_removal
into a shared leaf_removal module in reth-trie-sparse. These pure functions
compute structural transformations needed when removing leaves from sparse
tries.

Both SerialSparseTrie and ParallelSparseTrie now use the shared helpers,
eliminating ~94 lines of duplicated logic while maintaining identical behavior.

Amp-Thread-ID: https://ampcode.com/threads/T-019bfaf4-f717-7331-80bb-95da8f58ac9b
- Fix branch_node_masks retention in parallel prune to match serial
  (use starts_with_pruned instead of is_strict_descendant)
- Use sort_unstable for pruned roots (stability not needed)
- Improve prune() trait doc with edge case behavior
- Move TrieMask import to test module where it's used

Amp-Thread-ID: https://ampcode.com/threads/T-019bfaf4-e801-71ce-b83b-4f2fadf0dd37
After pruning retained storage tries, their revealed_paths sets were
not being cleared. This could cause subsequent multiproof/witness
reveals to incorrectly skip nodes that were pruned away, leading to
blinded-node errors.

Also clarified docstring: precondition requires root() specifically,
and documented that prune clears update tracking state.

Amp-Thread-ID: https://ampcode.com/threads/T-019bfd7a-bb64-732f-b725-3df2431ab50b
Move prune() from SparseTrie trait to a new SparseTrieExt extension trait
as specified in RETH-178. This makes pruning an opt-in capability:

- Create SparseTrieExt trait extending SparseTrie in traits.rs
- Only ParallelSparseTrie implements SparseTrieExt
- SerialSparseTrie keeps prune() as inherent method (not trait)
- SparseStateTrie::prune() now requires A: SparseTrieExt, S: SparseTrieExt bounds

Amp-Thread-ID: https://ampcode.com/threads/T-019bfdab-5e65-71ac-a8d2-73ccb6eb6409
Combine the two-phase prune algorithm (collect roots, then convert) into
a single DFS pass that converts eligible nodes to Hash stubs during
traversal. Uses SmallVec to collect children before mutation to satisfy
the borrow checker.

Amp-Thread-ID: https://ampcode.com/threads/T-019bfdc9-30c0-7328-b1c9-2f9ac8df5b7b
…rseTrie impl

- Remove SerialSparseTrie::prune() inherent method (~120 lines)
- Remove serial prune tests from sparse/trie.rs (~290 lines)
- Update parallel tests to use only ParallelSparseTrie
- Update SparseTrieExt trait doc to reflect only ParallelSparseTrie implements it
- Add large_account_value helper to parallel tests

This eliminates code duplication since prune() is only needed for
ParallelSparseTrie in production (via SparseStateTrie::prune).

Amp-Thread-ID: https://ampcode.com/threads/T-019bfdba-500a-76ed-a7ce-91506d242e24
Remove the shared leaf_removal.rs module and keep the original inline
code in SerialSparseTrie. The helper functions are kept as methods
only in ParallelSparseTrie.

Amp-Thread-ID: https://ampcode.com/threads/T-019bfdde-ac35-76ae-9881-df28f26ddbe0
Removes prune_storage_tries_retained as it's derivable from other metrics.
The post_prune_storage_nodes metric already captures retained size, and
prune_storage_tries_cleared captures eviction activity.

Amp-Thread-ID: https://ampcode.com/threads/T-019bfdec-0ab8-75bc-959d-05f705fb701a
- Remove ShrinkConfig struct and DEFAULT_SHRINK_* constants (not in RETH-178 spec)
- Remove shrink_config field and related methods from ParallelSparseTrie
- Restore #[derive(Eq)] on ParallelSparseTrie (no more f64 equality issues)
- Fix early return bug: clear updates/prefix_set at start of prune()
  to ensure bookkeeping is always reset even when nothing is pruned

Amp-Thread-ID: https://ampcode.com/threads/T-019bfdfa-8ad0-756e-a93e-58bffd7c0db2
- Add explicit preconditions section (must call root() first)
- Document max_depth == 0 behavior
- Clarify depth counts nodes, not nibbles (extension nodes count as 1)
- Document that prefix_set and updates are cleared
- Simplify inline comment for sort

Amp-Thread-ID: https://ampcode.com/threads/T-019bfdff-fab0-77e2-b9c5-d10321e0b243
Remove prune-related metrics to simplify PR for review:
- Remove prune_account_nodes_converted, prune_storage_nodes_converted,
  prune_storage_tries_cleared, post_prune_account_nodes,
  post_prune_storage_nodes fields and histograms
- Remove record_prune() method
- Simplify prune() implementation by removing #[cfg(feature = "metrics")] blocks

Metrics can be tuned and added back after core algorithm review.

Amp-Thread-ID: https://ampcode.com/threads/T-019bfe0d-5ace-71b4-b66a-415b3962dd97
…parallelization

- Use bit manipulation to iterate only set bits in branch state_mask
  (trailing_zeros + clear lowest bit pattern), avoiding 16 iterations per branch
- Collect revealed subtrie indices before parallelization, only use rayon
  when >=4 subtries need processing to reduce scheduling overhead
- Add stronger fast-path: clear entire lower subtries when upper prune root
  is a prefix of subtrie path (O(1) vs O(n) retain scan)

Amp-Thread-ID: https://ampcode.com/threads/T-019bfe03-dfc5-7772-a3d9-a582075d3175
- Replace O(n) revealed_node_count() scan with O(1) nodes.len() for capacity estimation
- Fix SmallVec bulk-initialization: use new() + push instead of from_buf_and_len
- Narrow stack depth type from usize (8 bytes) to u8 (1 byte)

Amp-Thread-ID: https://ampcode.com/threads/T-019bfe23-0ba2-75ce-8d23-69b24212af5b
Copy link
Copy Markdown
Member

@mediocregopher mediocregopher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small things but LGTM for merging, we can optimize after

/// Those are being moved into `account_updates` once storage roots
/// are revealed and/or calculated.
///
/// Invariant: for each entry in `pending_account_updates` account must either be already
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment right? I thought the account couldn't be in account_updates until we had a storage root, and if we had a storage root the update wouldn't be pending

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it could be there as Touched or Changed with outdated storage root

MultiProofMessage::StateUpdate(_, state) => {
self.on_state_update(state);
}
MultiProofMessage::EmptyProof { sequence_number: _, state } => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are empty proofs possible if we're bypassing the multiproof task?

Copy link
Copy Markdown
Member Author

@klkvr klkvr Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it's unreachable now. we can clean it up later via a new message enum or by removing multiprooftask and message entirely

@github-project-automation github-project-automation Bot moved this from Backlog to In Progress in Reth Tracker Jan 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants