feat(stateless): make witness generation conform to the draft specs#22289
Conversation
mattsse
left a comment
There was a problem hiding this comment.
ty, some suggestions
also please let me know if you have any suggestions for how to make this easier, will try my best to keep the stateless crate and this here in sync
mattsse
left a comment
There was a problem hiding this comment.
makes sense
maybe @mediocregopher wants to do a quick sanity review
|
Regarding the backwards compat question: I don't really have enough context on how TrieWitness is being used by other clients to understand how disruptive this would be for others. Since it's just a matter of doing upserts or removals first I think including some kind of flag from RPC wouldn't be significantly disruptive. Regarding changes as a whole I don't see any problems at a logical level. However since #22922 the TrieWitness has been completely rewritten and you'll need to redo your changes to it, unfortunately. I had already separated upserts/removals in that PR, but in the opposite order in order to retain maximum backwards compatibility. As already mentioned we can reverse this easily enough if that's the direction the winds are blowing. I don't think there's any other features mentioned here that will be difficult to accomplish with the new implementation, it just needs to be done. |
…nd retrieval through RPC
|
@mediocregopher, thanks! I'll be working to rebase this PR on top of latest main and add a flag to support both legacy and canonical witness. I'll ping again whenever it's ready for some eyes. |
3a797fe to
6eb8948
Compare
There was a problem hiding this comment.
@mediocregopher, this is ready for new eyes:
- Rebased on top of
master - Adjusted for the new V2 stuff
- Added
ExecutionWitnessModeso both stylesLegacy(current) andCanonical(new) behavior can co-exist.
I resolved all previous comments, and created new ones so you only see fresh ones after this re-work.
cc @figtracer
| &self, | ||
| _input: TrieInput, | ||
| _target: HashedPostState, | ||
| _mode: reth_trie::ExecutionWitnessMode, |
There was a problem hiding this comment.
ExecutionWitnessMode is a new enum to allow generating/using both witness types: Legacy (current), and Canonical (new).
At the debug_executionWitness level there is a new parameter that allows to select this. It is an Option so not providing it defaults to Legacy for backwards compat.
| ) | ||
| .collect(); | ||
| /// Records the state after execution using the given witness generation mode. | ||
| pub fn record_executed_state<DB>(&mut self, statedb: &State<DB>, mode: ExecutionWitnessMode) { |
| .map(|hm| hm.into_values().collect()) | ||
| .with_execution_witness_mode(mode); | ||
| let witness = | ||
| if mode.is_canonical() { witness } else { witness.always_include_root_node() }; |
There was a problem hiding this comment.
Note always_include_root_node() now depends on mode.
| if mode.is_canonical() { witness } else { witness.always_include_root_node() }; | ||
| witness.compute(target).map_err(ProviderError::from).map(|hm| { | ||
| let mut values: Vec<_> = hm.into_values().collect(); | ||
| if mode.is_canonical() { |
| let canonical_empty_witness = TrieWitness::new( | ||
| DatabaseTrieCursorFactory::<_, A>::new(provider.tx_ref()), | ||
| DatabaseHashedCursorFactory::new(provider.tx_ref()), | ||
| ) | ||
| .with_execution_witness_mode(ExecutionWitnessMode::Canonical) | ||
| .compute(HashedPostState { | ||
| accounts: HashMap::from_iter([(hashed_address, Some(Account::default()))]), | ||
| storages: HashMap::default(), | ||
| }) | ||
| .unwrap(); | ||
| assert!(canonical_empty_witness.is_empty()); |
There was a problem hiding this comment.
As a refresher: compared to Legacy above, the empty root hash must not be in the witness since only by the hash clients can know which is the RLP-representation.
| let target_state = HashedPostState { | ||
| accounts: HashMap::from_iter([( | ||
| hashed_address, | ||
| Some(Account { balance: U256::from(2), ..Default::default() }), | ||
| )]), | ||
| storages: HashMap::default(), | ||
| }; | ||
|
|
||
| let legacy_witness = TrieWitness::new( | ||
| DatabaseTrieCursorFactory::<_, A>::new(provider.tx_ref()), | ||
| DatabaseHashedCursorFactory::new(provider.tx_ref()), | ||
| ) | ||
| .with_execution_witness_mode(ExecutionWitnessMode::Legacy) | ||
| .compute(target_state.clone()) | ||
| .unwrap(); | ||
| assert!(legacy_witness.contains_key(&state_root)); | ||
| for node in multiproof.account_subtree.values() { | ||
| assert_eq!(legacy_witness.get(&keccak256(node)), Some(node)); | ||
| } | ||
| assert!(legacy_witness.contains_key(&storage_root)); | ||
|
|
||
| let canonical_witness = TrieWitness::new( | ||
| DatabaseTrieCursorFactory::<_, A>::new(provider.tx_ref()), | ||
| DatabaseHashedCursorFactory::new(provider.tx_ref()), | ||
| ) | ||
| .with_execution_witness_mode(ExecutionWitnessMode::Canonical) | ||
| .compute(target_state) | ||
| .unwrap(); | ||
| assert!(canonical_witness.contains_key(&state_root)); | ||
| for node in multiproof.account_subtree.values() { | ||
| assert_eq!(canonical_witness.get(&keccak256(node)), Some(node)); | ||
| } | ||
| assert!(!canonical_witness.contains_key(&storage_root)); |
There was a problem hiding this comment.
This test is covering another behavior difference between Legacy and Canonical: if the storage trie of an account doesn't change:
Legacyalways include the storage trie root node RLP (not sure why, but that's how it works)Canonical, doesn't since isn't required for anything.
| } | ||
|
|
||
| #[test] | ||
| fn canonical_mode_handles_mixed_storage_inserts_and_removals() { |
There was a problem hiding this comment.
The last remaining diff between Legacy and Canonical: how apply state diffs when calculating post-state root.
As quick context:
Legacy: applies removals first and then insert/updates -- my interpretation is that this is done to provide the most "compatibility" with users since it will maximize the number of potential nodes that might be required.Canonical: applies insert/updates first and then removals. This is done to minimize branch compressions, thus minimize the witness size as much as possible. This implies that whoever is using this witness should also be a "canonical" one in the sense (i.e. apply insert/updates firsta nd removals after).
| let address = Address::random(); | ||
| let hashed_address = keccak256(address); | ||
|
|
||
| // Pre-state storage root is a branch with two children at nibbles 1 and 2. |
There was a problem hiding this comment.
TL;DR: we have a branch node with two children, and we have some state-diff that will add a new children and remove an existing one.
This setup allows to see if the sibiling is or isn't in the witness depending on the mode.
mediocregopher
left a comment
There was a problem hiding this comment.
One small nit but lgtm
Co-authored-by: Brian Picciano <me@mediocregopher.com>
This PR contains changes to Reth to conform to the witness generations specs draft. These specs are still in flux (PR 1 and PR 3). We are working with the EF STEEL team to officialize them.
I'm leaving this PR in draft for a bit to get some feedback about the changes, mostly reg the last paragraph of this PR description.
You don't need to understand the specs fully to review this PR, since all changes must make sense on their own, which I'll help to explain in this PR description and comments.
Note that other required fixes aren't in this repo but in the newly created paradigmxyz/stateless (PR there to be created soon).
Below, I’ll summarise the changes, but I’ll also create PR comments extending the explanation a bit more for each case:
0x80).0x80).statelessrepo thus in other PR) post-root calculation to apply insertions/updates first and deletions after to minimise MPT nodes included in the witness — this is related to minimising cases of sibling branch compression inclusions.nodes,bytecodesandancestorsinExecutionWitnessas defined in the spec (basically, lexicographic order for the first two, andblockNum ascfor the latter).There is only one change that makes this new execution witness version a breaking change: the new post-state root update ordering (5th bullet point above). If someone uses an execution witness generated by Reth done with this PR, with an older stateless trie implementation, it might find that the post-state root can’t be calculated because the older version of the trie might ask for more MPT nodes than strictly needed.
We could fix this by adding more code to still support both witness generation styles for a while until people update their code (prob also involves modifying RPC
debug_executionWitnessto have an extra optional param to ask for “canonical form” or the current/mainnet style, or create a new one). If we also want to keep the behavior for other fields (i.e. still keep the bytecodes bloating and other cases, it can become a bit messy since basically all code fixes in this PR should be guarded in some kind ofif canonical_witnessor similar. I guess doable but can get tricky.Note that the reverse is okay — the new stateless trie (that now lives in paradigmxyz/stateless) can support older-style execution witnesses, since these extra MPT nodes are totally okay to be there (they won’t be used). I want to gauge a bit what the Reth team prefers to do on this front.