chore(trie): Use Vec<Option<...>> in HashedPostStateCursors#19233
chore(trie): Use Vec<Option<...>> in HashedPostStateCursors#192330x00101010 wants to merge 5 commits intoparadigmxyz:mainfrom
Conversation
f56defd to
e69db80
Compare
mediocregopher
left a comment
There was a problem hiding this comment.
Hey @0x00101010, great work on this so far, at a high level it looks correct, just a few hopefully smaller issues to work out.
| /// Slots that have been zero valued. | ||
| pub zero_valued_slots: B256Set, | ||
| /// Sorted collection of updated storage slots, None indicates zero valued. | ||
| pub storage_slots: Vec<(B256, Option<U256>)>, |
There was a problem hiding this comment.
I believe instead of Option<U256> you can use U256::ZERO to signal a value removal, unless you discovered some reason this wouldn't be the case.
There was a problem hiding this comment.
Generally just make it easier and consistent in how to handle none values, if I discard Option here, it's gonna be tricky to implement HashedPostStateCursor to handle both Account and Storage
There was a problem hiding this comment.
I think we can do it without duplicating significant code, see comment at : https://github.com/paradigmxyz/reth/pull/19233/files#r2481365676
|
@0x00101010 just wanted to give you a heads up that I found a pretty bad bug in the InMemoryTrieCursor that you should know about if you're going to continue with this PR: #19277 Note the proptests there, if you did something similar for your PR it would be very helpful I think |
Thanks for the headsup, will do |
6fe7543 to
b9a9e65
Compare
|
@mediocregopher mind taking a look again? |
| last_key: Option<B256>, | ||
| /// Tracks whether `seek` has been called. Used to prevent re-seeking the DB cursor | ||
| /// when it has been exhausted by iteration. | ||
| seeked: bool, |
There was a problem hiding this comment.
In the InMemoryTrieCursor this was included only for cfg(debug_assertions), I would like to do that here as well.
There was a problem hiding this comment.
This seeked field is needed here for this case (not only debug purposes):
- cursor created
- all in memory state are none or zero (deleted)
- db has corresponding entries
- multiple seeks are called
Without seeked, we don't have a way to distinguish if db is exhausted (should not seek again) or there's no seek yet (should seek).
See all_storage_slots_deleted_not_wiped_exact_keys() in tests/post_state.rs for the specific test case. Without this, lots of integration tests will fail, for example cargo test --test e2e_testsuite test_local_rpc_tests_compat --profile test --manifest-path /Users/francis/src/reth/main/crates/rpc/rpc-e2e-tests/Cargo.toml 2>&1
Depending on if there should be multiple continuous seek calls, this potential bug exists on InMemoryTrieCursor as well.
Try test it with:
#[test]
fn test_all_storage_slots_deleted_not_wiped_exact_keys() {
// This test reproduces an edge case where:
// - cursor is not None (not wiped)
// - All in-memory entries are deletions (None values)
// - Database has corresponding entries
// - Expected: NO leaves should be returned (all deleted)
// Generate 42 trie node entries with keys distributed across the keyspace
let db_nodes: Vec<(Nibbles, BranchNodeCompact)> = (0..42)
.map(|i| {
let key_bytes = vec![(i * 6) as u8, i as u8]; // Spread keys across keyspace
let nibbles = Nibbles::from_nibbles_unchecked(key_bytes);
(nibbles, BranchNodeCompact::new(i as u16, i as u16, 0, vec![], None))
})
.collect();
// Create in-memory entries with same keys but all None values (deletions)
let in_memory_nodes: Vec<(Nibbles, Option<BranchNodeCompact>)> =
db_nodes.iter().map(|(key, _)| (*key, None)).collect();
let db_nodes_map: BTreeMap<Nibbles, BranchNodeCompact> = db_nodes.into_iter().collect();
let db_nodes_arc = Arc::new(db_nodes_map);
let visited_keys = Arc::new(Mutex::new(Vec::new()));
let mock_cursor = MockTrieCursor::new(db_nodes_arc, visited_keys);
let mut cursor = InMemoryTrieCursor::new(Some(mock_cursor), &in_memory_nodes);
// Seek to beginning should return None (all nodes are deleted)
let result = cursor.seek(Nibbles::default()).unwrap();
assert_eq!(
result, None,
"Expected no entries when all nodes are deleted, but got {:?}",
result
);
// Test seek operations at various positions - all should return None
let seek_keys = vec![
Nibbles::from_nibbles([0x00]),
Nibbles::from_nibbles([0x5d]),
Nibbles::from_nibbles([0x5e]),
Nibbles::from_nibbles([0x5f]),
Nibbles::from_nibbles([0xc2]),
Nibbles::from_nibbles([0xc5]),
Nibbles::from_nibbles([0xc9]),
Nibbles::from_nibbles([0xf0]),
];
for seek_key in seek_keys {
let result = cursor.seek(seek_key).unwrap();
assert_eq!(
result, None,
"Expected None when seeking to {:?} but got {:?}",
seek_key, result
);
}
// next() should also always return None
let result = cursor.next().unwrap();
assert_eq!(result, None, "Expected None from next() but got {:?}", result);
}
There was a problem hiding this comment.
Thanks for the heads up here, I've submitted a PR for the InMemoryTrieCursor here.
I'm not 100% that it's a critical bug there, as it requires calling seek on the cursor even after the previous seek returned None, which doesn't really make sense to do. But it's worth handling at any rate. It's interesting that the same case for HashedPostStateCursor produces hive errors... I would guess there's some oddity in the TrieNodeIter that's causing it.
| // Returns true if cursor is None (wiped storage or empty DB). | ||
| self.cursor.as_mut().map_or(Ok(true), |c| c.is_storage_empty()) | ||
| } | ||
| } |
There was a problem hiding this comment.
I'd like to see a proptest like was implemented for InMemoryTrieCursor 🙏
There was a problem hiding this comment.
There are relevant proptests there already:
reth/crates/trie/db/tests/post_state.rs
Lines 178 to 209 in 780161a
reth/crates/trie/db/tests/post_state.rs
Lines 446 to 490 in 780161a
| /// Slots that have been zero valued. | ||
| pub zero_valued_slots: B256Set, | ||
| /// Sorted collection of updated storage slots, None indicates zero valued. | ||
| pub storage_slots: Vec<(B256, Option<U256>)>, |
There was a problem hiding this comment.
I think we can do it without duplicating significant code, see comment at : https://github.com/paradigmxyz/reth/pull/19233/files#r2481365676
974ad1a to
6e85c93
Compare
|
Hey @0x00101010 , I've continued this work at a new PR here in order to help get it through faster without the code deviating too much from the InMemoryTrieCursor implementation. Thanks for all the effort you've put in so far; your commits are still in the history on my branch so you should get the contributor credit still. |
Fixes #18848
Summary
This PR refactors and simplifies the
HashedPostStateCursorimplementation by removing redundant state tracking and unifying the cursor types. The changes reduce code complexity while maintaining the same functional behavior.Key Changes
1. Unified Cursor Implementation
HashedPostStateAccountCursorandHashedPostStateStorageCursorinto a single genericHashedPostStateCursor<C, V>typewipedfield: Thewipedstate had 1:1 correspondence with the cursor beingNone/Some, so the field was redundantSome(NoopHashedCursor::default())to passingNonedirectly when no database cursor is needed2. Simplified
HashedPostStateSortedStructureBefore: