feat(trie): Proof rewrite: implement stack-based algorithm for calculating trie nodes from leaves#19863
Conversation
Towards #19512 In order to implement a reusable proof calculator we first need the underlying cursors used by the calculator to be reusable. For account cursors this isn't so difficult, a new `reset` method is introduced which resets the ForwardInMemoryCursor and other state fields. For storage cursors the situation is complicated because reusing the cursor might involve using it for a completely different hashed address. To handle this the storage cursors are given a `set_hashed_address` method which effectively resets them, as well as pulls out the correct overlay for the chosen address. Implementing this requires that the cursors now hold onto the full `&TrieUpdatesSorted`/`&HashedPostStateSorted`. It also requires slightly different handling of the wiped case; before we were simply storing an underlying cursor in an Option, with None indicating wiped, but now we need to always have an underlying cursors (so we can reuse it for a future non-wiped storage). A new boolean is introduced instead.
Towards #19512 In order to implement a reusable proof calculator we first need the underlying cursors used by the calculator to be reusable. For account cursors this isn't so difficult, a new `reset` method is introduced which resets the ForwardInMemoryCursor and other state fields. For storage cursors the situation is complicated because reusing the cursor might involve using it for a completely different hashed address. To handle this the storage cursors are given a `set_hashed_address` method which effectively resets them, as well as pulls out the correct overlay for the chosen address. Implementing this requires that the cursors now hold onto the full `&TrieUpdatesSorted`/`&HashedPostStateSorted`. It also requires slightly different handling of the wiped case; before we were simply storing an underlying cursor in an Option, with None indicating wiped, but now we need to always have an underlying cursors (so we can reuse it for a future non-wiped storage). A new boolean is introduced instead.
…able-hashed-trie-cursors
Towards #19512 This implements the skeleton of the new proof calculator rewrite. No actual logic is implemented yet, this only sets up most of the new types which will be involved. * RevealedSparseNode is renamed to SparseTrieNode and moved to reth-trie-common. This is the type which will be returned from the calculator for proofs (letting avoid a translation step during sparse trie revealing). The rename helps to denote that this type is no longer just for revealing. * ValueEncoder is defined. This is the primary mechanism by which we deal with the differences between storage and account tries, and which allows us to inject behavior in the future like dispatching storage root calculation to other threads for account proofs. * ValueEncoder::Value - Either Account or U256 for account and storage tries, respectively. This is the value returned from the DB. * ValueEncoder::Fut - A future-like type which will be called-upon later to encode the Value into its RLP form. For storage tries (U256) this is trivial. For account tries we need some mechanism to obtain the storage root of the account. A default `SyncAccountValueEncoder` is provided which synchronously computes the storage root when the future is invoked, but in later PRs we can to proof workers, add in caching, etc... * ProofCalculator is where the actual logic of calculating proofs is going to live. For the moment it is un-implemented.
…f-rewrite-skeleton
…her/proof-rewrite-leaf-only
mattsse
left a comment
There was a problem hiding this comment.
cool, even with my rather limited trie knowledge these comments made it easy to follow along.
all of this makes sense to me
would like @shekhirin and @yongkangc to also take a look here
| /// # Panics | ||
| /// | ||
| /// Panics if the given `len` is greater than the length of the `Nibbles`. | ||
| pub(crate) fn trim_nibbles_prefix(n: &Nibbles, len: usize) -> Nibbles { |
There was a problem hiding this comment.
should we also upstream this to nybbles?
There was a problem hiding this comment.
not sure, it's fairly trivial, was just nice to have here to clarify what's happening
| /// This method expects that there already exists a child on the `child_stack`, and that that | ||
| /// child has a non-zero short key. The new branch is constructed based on the top child from | ||
| /// the `child_stack` and the given leaf. | ||
| fn push_new_branch(&mut self, leaf_key: Nibbles, leaf_val: VE::DeferredEncoder) { |
There was a problem hiding this comment.
oh I fully get the encoder + deferred encoding now
| /// | ||
| /// # Panics | ||
| /// | ||
| /// This method panics if `branch_stack` is empty. |
There was a problem hiding this comment.
would there ever be a situation that branch_stack is empty?
There was a problem hiding this comment.
Yes there are a few:
- For the first two leaves processed the branch_stack will be empty, only after the second leaf is done processing will there be a branch under construction.
- If (for example) the third leaf is then a child of that branch, then that branch will be popped (leaving branch_stack empty) and a new one created with that previous branch+extension as a child and the third leaf as the other child (leaving branch_stack with a single branch again).
In this implementation pop_branch is only called in the case where the incoming leaf's shared prefix with the branch_path is shorter than the branch_path. For that to happen branch_path must itself be non-empty, which means there must be a branch on the stack.
| /// This method expects that there already exists a child on the `child_stack`, and that that | ||
| /// child has a non-zero short key. The new branch is constructed based on the top child from | ||
| /// the `child_stack` and the given leaf. | ||
| fn push_new_branch(&mut self, leaf_key: Nibbles, leaf_val: VE::DeferredEncoder) { |
There was a problem hiding this comment.
I see, so this basically inserts an intermediate branch when the current branch already has a child on the same nibble as the incoming leaf node
| } else { | ||
| // When there is a current branch then trim off its path as well as the nibble that it | ||
| // has set for this leaf. | ||
| leaf_key.slice_unchecked(self.branch_path.len() + 1, leaf_key.len()) |
There was a problem hiding this comment.
i see because the branch path length is < key length, so len() + 1 is in-bounds.
There was a problem hiding this comment.
Right, it's impossible that a branch path would have a length of 32, which will always be the key length, so this will always be in-bounds
| // The new branch's first child is the child already on the top of the stack, for which | ||
| // we've already adjusted its short key. | ||
| self.child_stack | ||
| .push(ProofTrieBranchChild::Leaf { short_key: leaf_short_key, value: leaf_val }); |
There was a problem hiding this comment.
this is why deffered encoding is important right? so we can encode only when we want to do pop_branch
There was a problem hiding this comment.
Exactly, basically we have the time between pushing a leaf onto the stack and calling pop_branch for its branch for the leaf's value to be fully resolved. For the accounts trie this gives us time to potentially dispatch a task to storage workers for missed leaves, so we don't have to do those synchronously like we do currently.
yongkangc
left a comment
There was a problem hiding this comment.
i dont get 100%, but nothing blocking stands out to me and was able to understand the branching logic
good docs
…f-rewrite-state-root
Co-authored-by: YK <chiayongkang@hotmail.com>
shekhirin
left a comment
There was a problem hiding this comment.
Could follow along, really appreciate the comments
Co-authored-by: Alexey Shekhirin <5773434+shekhirin@users.noreply.github.com>
Towards #19512
This PR implements the next step of the proof calculation rewrite. The ProofCalculator can now take a series of leaf values, sorted by key, and calculate all trie nodes in the MPT which is built from those leaves.
Not all nodes are kept in memory, but rather a stack-based approach is used to only keep those nodes in-memory which are required to calculate subsequent nodes for future keys. This approach is essentially identical to that used by HashBuilder but is (imo) much easier to follow.
At the moment no retention of proof nodes is done; passed in proof targets are ignored and the root node is always returned. Proof retention will be implemented in the next PR.
Testing
Proptests are used to generate random account tries and compare the root node generated by the legacy
reth-trie::Proofimplementation with that generated by this one.