Skip to content

feat(trie): Proof rewrite: implement stack-based algorithm for calculating trie nodes from leaves#19863

Merged
mediocregopher merged 37 commits into
mainfrom
mediocregopher/proof-rewrite-state-root
Nov 20, 2025
Merged

feat(trie): Proof rewrite: implement stack-based algorithm for calculating trie nodes from leaves#19863
mediocregopher merged 37 commits into
mainfrom
mediocregopher/proof-rewrite-state-root

Conversation

@mediocregopher
Copy link
Copy Markdown
Member

Towards #19512

This PR implements the next step of the proof calculation rewrite. The ProofCalculator can now take a series of leaf values, sorted by key, and calculate all trie nodes in the MPT which is built from those leaves.

Not all nodes are kept in memory, but rather a stack-based approach is used to only keep those nodes in-memory which are required to calculate subsequent nodes for future keys. This approach is essentially identical to that used by HashBuilder but is (imo) much easier to follow.

At the moment no retention of proof nodes is done; passed in proof targets are ignored and the root node is always returned. Proof retention will be implemented in the next PR.

Testing

Proptests are used to generate random account tries and compare the root node generated by the legacy reth-trie::Proof implementation with that generated by this one.

Towards #19512

In order to implement a reusable proof calculator we first need the
underlying cursors used by the calculator to be reusable.

For account cursors this isn't so difficult, a new `reset` method is
introduced which resets the ForwardInMemoryCursor and other state
fields.

For storage cursors the situation is complicated because reusing the
cursor might involve using it for a completely different hashed address.

To handle this the storage cursors are given a `set_hashed_address`
method which effectively resets them, as well as pulls out the correct
overlay for the chosen address. Implementing this requires that the
cursors now hold onto the full
`&TrieUpdatesSorted`/`&HashedPostStateSorted`.

It also requires slightly
different handling of the wiped case; before we were simply storing an
underlying cursor in an Option, with None indicating wiped, but now we
need to always have an underlying cursors (so we can reuse it for a
future non-wiped storage). A new boolean is introduced instead.
Towards #19512

In order to implement a reusable proof calculator we first need the
underlying cursors used by the calculator to be reusable.

For account cursors this isn't so difficult, a new `reset` method is
introduced which resets the ForwardInMemoryCursor and other state
fields.

For storage cursors the situation is complicated because reusing the
cursor might involve using it for a completely different hashed address.

To handle this the storage cursors are given a `set_hashed_address`
method which effectively resets them, as well as pulls out the correct
overlay for the chosen address. Implementing this requires that the
cursors now hold onto the full
`&TrieUpdatesSorted`/`&HashedPostStateSorted`.

It also requires slightly
different handling of the wiped case; before we were simply storing an
underlying cursor in an Option, with None indicating wiped, but now we
need to always have an underlying cursors (so we can reuse it for a
future non-wiped storage). A new boolean is introduced instead.
Towards #19512

This implements the skeleton of the new proof calculator rewrite. No actual logic is implemented yet, this only sets up most of the new types which will be involved.

* RevealedSparseNode is renamed to SparseTrieNode and moved to reth-trie-common. This is the type which will be returned from the calculator for proofs (letting avoid a translation step during sparse trie revealing). The rename helps to denote that this type is no longer just for revealing.

* ValueEncoder is defined. This is the primary mechanism by which we deal with the differences between storage and account tries, and which allows us to inject behavior in the future like dispatching storage root calculation to other threads for account proofs.

  * ValueEncoder::Value - Either Account or U256 for account and storage tries, respectively. This is the value returned from the DB.

  * ValueEncoder::Fut - A future-like type which will be called-upon later to encode the Value into its RLP form. For storage tries (U256) this is trivial. For account tries we need some mechanism to obtain the storage root of the account. A default `SyncAccountValueEncoder` is provided which synchronously computes the storage root when the future is invoked, but in later PRs we can to proof workers, add in caching, etc...

* ProofCalculator is where the actual logic of calculating proofs is going to live. For the moment it is un-implemented.
Comment thread crates/trie/trie/src/hashed_cursor/mock.rs
Comment thread crates/trie/trie/src/trie_cursor/mock.rs
Copy link
Copy Markdown
Collaborator

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, even with my rather limited trie knowledge these comments made it easy to follow along.

all of this makes sense to me

would like @shekhirin and @yongkangc to also take a look here

/// # Panics
///
/// Panics if the given `len` is greater than the length of the `Nibbles`.
pub(crate) fn trim_nibbles_prefix(n: &Nibbles, len: usize) -> Nibbles {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also upstream this to nybbles?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure, it's fairly trivial, was just nice to have here to clarify what's happening

/// This method expects that there already exists a child on the `child_stack`, and that that
/// child has a non-zero short key. The new branch is constructed based on the top child from
/// the `child_stack` and the given leaf.
fn push_new_branch(&mut self, leaf_key: Nibbles, leaf_val: VE::DeferredEncoder) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I fully get the encoder + deferred encoding now

///
/// # Panics
///
/// This method panics if `branch_stack` is empty.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would there ever be a situation that branch_stack is empty?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes there are a few:

  • For the first two leaves processed the branch_stack will be empty, only after the second leaf is done processing will there be a branch under construction.
  • If (for example) the third leaf is then a child of that branch, then that branch will be popped (leaving branch_stack empty) and a new one created with that previous branch+extension as a child and the third leaf as the other child (leaving branch_stack with a single branch again).

In this implementation pop_branch is only called in the case where the incoming leaf's shared prefix with the branch_path is shorter than the branch_path. For that to happen branch_path must itself be non-empty, which means there must be a branch on the stack.

Comment thread crates/trie/trie/src/proof_v2/mod.rs Outdated
/// This method expects that there already exists a child on the `child_stack`, and that that
/// child has a non-zero short key. The new branch is constructed based on the top child from
/// the `child_stack` and the given leaf.
fn push_new_branch(&mut self, leaf_key: Nibbles, leaf_val: VE::DeferredEncoder) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so this basically inserts an intermediate branch when the current branch already has a child on the same nibble as the incoming leaf node

Comment thread crates/trie/trie/src/proof_v2/mod.rs Outdated
} else {
// When there is a current branch then trim off its path as well as the nibble that it
// has set for this leaf.
leaf_key.slice_unchecked(self.branch_path.len() + 1, leaf_key.len())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see because the branch path length is < key length, so len() + 1 is in-bounds.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it's impossible that a branch path would have a length of 32, which will always be the key length, so this will always be in-bounds

// The new branch's first child is the child already on the top of the stack, for which
// we've already adjusted its short key.
self.child_stack
.push(ProofTrieBranchChild::Leaf { short_key: leaf_short_key, value: leaf_val });
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is why deffered encoding is important right? so we can encode only when we want to do pop_branch

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, basically we have the time between pushing a leaf onto the stack and calling pop_branch for its branch for the leaf's value to be fully resolved. For the accounts trie this gives us time to potentially dispatch a task to storage workers for missed leaves, so we don't have to do those synchronously like we do currently.

Copy link
Copy Markdown
Contributor

@yongkangc yongkangc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont get 100%, but nothing blocking stands out to me and was able to understand the branching logic

good docs

@github-project-automation github-project-automation Bot moved this from Backlog to In Progress in Reth Tracker Nov 20, 2025
Comment thread crates/trie/trie/src/proof_v2/node.rs
Comment thread crates/trie/trie/src/proof_v2/mod.rs Outdated
Base automatically changed from mediocregopher/mock-cursors-set-hashed-addr to main November 20, 2025 12:22
Copy link
Copy Markdown
Member

@shekhirin shekhirin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could follow along, really appreciate the comments

Comment thread crates/trie/trie/src/proof_v2/node.rs Outdated
mediocregopher and others added 2 commits November 20, 2025 14:04
Co-authored-by: Alexey Shekhirin <5773434+shekhirin@users.noreply.github.com>
@mediocregopher mediocregopher added this pull request to the merge queue Nov 20, 2025
Merged via the queue into main with commit b72bb67 Nov 20, 2025
42 checks passed
@mediocregopher mediocregopher deleted the mediocregopher/proof-rewrite-state-root branch November 20, 2025 14:07
@github-project-automation github-project-automation Bot moved this from In Progress to Done in Reth Tracker Nov 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-trie Related to Merkle Patricia Trie implementation

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants