chore(trie): Use Vec<Option<...>> in HashedPostStateCursors by 0x00101010 · Pull Request #19233 · paradigmxyz/reth

0x00101010 · 2025-10-22T15:37:10Z

Summary

This PR refactors and simplifies the HashedPostStateCursor implementation by removing redundant state tracking and unifying the cursor types. The changes reduce code complexity while maintaining the same functional behavior.

Key Changes

1. Unified Cursor Implementation

Merged cursor types: Combined HashedPostStateAccountCursor and HashedPostStateStorageCursor into a single generic HashedPostStateCursor<C, V> type
Removed redundant wiped field: The wiped state had 1:1 correspondence with the cursor being None/Some, so the field was redundant
Simplified cursor construction: Changed from passing Some(NoopHashedCursor::default()) to passing None directly when no database cursor is needed

2. Simplified `HashedPostStateSorted` Structure

Before:

pub struct HashedAccountsSorted {
    pub accounts: Vec<(B256, Account)>,
    pub destroyed_accounts: B256Set,
}

pub struct HashedStorageSorted {
    pub non_zero_valued_slots: Vec<(B256, U256)>,
    pub zero_valued_slots: B256Set,
    pub wiped: bool,
}

After:
pub struct HashedPostStateSorted {
    pub accounts: Vec<(B256, Option<Account>)>,  // None = destroyed
    pub storages: B256Map<HashedStorageSorted>,
}

pub struct HashedStorageSorted {
    pub storage_slots: Vec<(B256, Option<U256>)>,  // None = zero-valued
    pub wiped: bool,
}

This change:
- Uses Option<T> consistently to represent deletions/zero-values
- Eliminates separate collections for tracking destroyed/zero-valued entries
- Maintains sorted order in a single vector per type
- Simplifies iteration logic by removing the need to merge and re-sort

3. Improved ForwardInMemoryCursor

- Changed from iterator-based to index-based traversal
- Added has_any() method for checking if any entry satisfies a predicate
- More efficient current() implementation that doesn't clone the iterator

4. Simplified is_storage_empty() Logic

Before:
fn is_storage_empty(&mut self) -> Result<bool, DatabaseError> {
    let post_state_is_empty = /* complex logic checking if all are zero */;
    Ok(post_state_is_empty && self.cursor.as_mut().map_or(self.wiped == Some(true), |c| c.is_storage_empty()?))
}

After:
fn is_storage_empty(&mut self) -> Result<bool, DatabaseError> {
    // Storage is not empty if it has non-zero slots
    if self.post_state_cursor.has_any(|(_, value)| value.is_some()) {
        return Ok(false);
    }

    // If no non-zero slots in post state, check the database
    // Returns true if cursor is None (wiped storage or empty DB)
    self.cursor.as_mut().map_or(Ok(true), |c| c.is_storage_empty())
}

5. Documentation Improvements

- Added comprehensive documentation to HashedPostStateSorted explaining the Option<T> semantics
- Clarified that Some(value) means update and None means deletion
- Improved comments in critical sections

6. Test Coverage

- Added regression test all_storage_slots_deleted_not_wiped_exact_keys to cover edge case where all storage slots are deleted (zero-valued) but storage is not wiped
- Test ensures repeated seek() operations correctly return None for all deleted slots
- Added test case for non-wiped storage with all zero-value entries

Testing

All 177 trie-related tests pass:
- reth-trie: 36 tests
- reth-trie-db: 70 tests
- reth-trie-sparse: 36 tests
- reth-trie-sparse-parallel: 35 tests

Benefits

1. Less code: Reduced by ~30 net lines while adding comprehensive tests
2. Simpler logic: Single generic cursor type instead of multiple specialized types
3. Better performance: Index-based cursor traversal instead of iterator cloning
4. Clearer semantics: Option<T> explicitly communicates update vs deletion
5. Easier to maintain: Less duplicate code and clearer abstractions

mediocregopher

Hey @0x00101010, great work on this so far, at a high level it looks correct, just a few hopefully smaller issues to work out.

mediocregopher · 2025-10-23T09:08:19Z

crates/trie/common/src/hashed_state.rs

-    /// Slots that have been zero valued.
-    pub zero_valued_slots: B256Set,
+    /// Sorted collection of updated storage slots, None indicates zero valued.
+    pub storage_slots: Vec<(B256, Option<U256>)>,


I believe instead of Option<U256> you can use U256::ZERO to signal a value removal, unless you discovered some reason this wouldn't be the case.

Generally just make it easier and consistent in how to handle none values, if I discard Option here, it's gonna be tricky to implement HashedPostStateCursor to handle both Account and Storage

I think we can do it without duplicating significant code, see comment at : https://github.com/paradigmxyz/reth/pull/19233/files#r2481365676

crates/trie/common/src/hashed_state.rs

crates/trie/trie/src/hashed_cursor/post_state.rs

mediocregopher · 2025-10-24T14:10:14Z

@0x00101010 just wanted to give you a heads up that I found a pretty bad bug in the InMemoryTrieCursor that you should know about if you're going to continue with this PR: #19277

Note the proptests there, if you did something similar for your PR it would be very helpful I think

0x00101010 · 2025-10-24T14:11:32Z

@0x00101010 just wanted to give you a heads up that I found a pretty bad bug in the InMemoryTrieCursor that you should know about if you're going to continue with this PR: #19277

Note the proptests there, if you did something similar for your PR it would be very helpful I think

Thanks for the headsup, will do

0x00101010 · 2025-10-30T17:40:00Z

@mediocregopher mind taking a look again?

crates/trie/common/src/hashed_state.rs

mediocregopher · 2025-10-31T10:52:45Z

crates/trie/trie/src/hashed_cursor/post_state.rs

+    last_key: Option<B256>,
+    /// Tracks whether `seek` has been called. Used to prevent re-seeking the DB cursor
+    /// when it has been exhausted by iteration.
+    seeked: bool,


In the InMemoryTrieCursor this was included only for cfg(debug_assertions), I would like to do that here as well.

This seeked field is needed here for this case (not only debug purposes):

cursor created

all in memory state are none or zero (deleted)

db has corresponding entries

multiple seeks are called

Without seeked, we don't have a way to distinguish if db is exhausted (should not seek again) or there's no seek yet (should seek).

See all_storage_slots_deleted_not_wiped_exact_keys() in tests/post_state.rs for the specific test case. Without this, lots of integration tests will fail, for example cargo test --test e2e_testsuite test_local_rpc_tests_compat --profile test --manifest-path /Users/francis/src/reth/main/crates/rpc/rpc-e2e-tests/Cargo.toml 2>&1

Depending on if there should be multiple continuous seek calls, this potential bug exists on InMemoryTrieCursor as well.

Try test it with:

#[test] fn test_all_storage_slots_deleted_not_wiped_exact_keys() { // This test reproduces an edge case where: // - cursor is not None (not wiped) // - All in-memory entries are deletions (None values) // - Database has corresponding entries // - Expected: NO leaves should be returned (all deleted) // Generate 42 trie node entries with keys distributed across the keyspace let db_nodes: Vec<(Nibbles, BranchNodeCompact)> = (0..42) .map(|i| { let key_bytes = vec![(i * 6) as u8, i as u8]; // Spread keys across keyspace let nibbles = Nibbles::from_nibbles_unchecked(key_bytes); (nibbles, BranchNodeCompact::new(i as u16, i as u16, 0, vec![], None)) }) .collect(); // Create in-memory entries with same keys but all None values (deletions) let in_memory_nodes: Vec<(Nibbles, Option<BranchNodeCompact>)> = db_nodes.iter().map(|(key, _)| (*key, None)).collect(); let db_nodes_map: BTreeMap<Nibbles, BranchNodeCompact> = db_nodes.into_iter().collect(); let db_nodes_arc = Arc::new(db_nodes_map); let visited_keys = Arc::new(Mutex::new(Vec::new())); let mock_cursor = MockTrieCursor::new(db_nodes_arc, visited_keys); let mut cursor = InMemoryTrieCursor::new(Some(mock_cursor), &in_memory_nodes); // Seek to beginning should return None (all nodes are deleted) let result = cursor.seek(Nibbles::default()).unwrap(); assert_eq!( result, None, "Expected no entries when all nodes are deleted, but got {:?}", result ); // Test seek operations at various positions - all should return None let seek_keys = vec![ Nibbles::from_nibbles([0x00]), Nibbles::from_nibbles([0x5d]), Nibbles::from_nibbles([0x5e]), Nibbles::from_nibbles([0x5f]), Nibbles::from_nibbles([0xc2]), Nibbles::from_nibbles([0xc5]), Nibbles::from_nibbles([0xc9]), Nibbles::from_nibbles([0xf0]), ]; for seek_key in seek_keys { let result = cursor.seek(seek_key).unwrap(); assert_eq!( result, None, "Expected None when seeking to {:?} but got {:?}", seek_key, result ); } // next() should also always return None let result = cursor.next().unwrap(); assert_eq!(result, None, "Expected None from next() but got {:?}", result); }

Thanks for the heads up here, I've submitted a PR for the InMemoryTrieCursor here.

I'm not 100% that it's a critical bug there, as it requires calling seek on the cursor even after the previous seek returned None, which doesn't really make sense to do. But it's worth handling at any rate. It's interesting that the same case for HashedPostStateCursor produces hive errors... I would guess there's some oddity in the TrieNodeIter that's causing it.

crates/trie/trie/src/hashed_cursor/post_state.rs

mediocregopher · 2025-10-31T12:55:59Z

crates/trie/trie/src/hashed_cursor/post_state.rs

+        // Returns true if cursor is None (wiped storage or empty DB).
+        self.cursor.as_mut().map_or(Ok(true), |c| c.is_storage_empty())
    }
 }


I'd like to see a proptest like was implemented for InMemoryTrieCursor 🙏

There are relevant proptests there already:

reth/crates/trie/db/tests/post_state.rs

Lines 178 to 209 in 780161a

fn fuzz_hashed_account_cursor() {

proptest!(ProptestConfig::with_cases(10), |(db_accounts in arb::<BTreeMap<B256, Account>>(), post_state_accounts in arb::<BTreeMap<B256, Option<Account>>>())| {

let db = create_test_rw_db();

db.update(|tx| {

for (key, account) in &db_accounts {

tx.put::<tables::HashedAccounts>(*key, *account).unwrap();

}

})

.unwrap();

let mut hashed_post_state = HashedPostState::default();

for (hashed_address, account) in &post_state_accounts {

hashed_post_state.accounts.insert(*hashed_address, *account);

}

let mut expected = db_accounts;

// overwrite or remove accounts from the expected result

for (key, account) in &post_state_accounts {

if let Some(account) = account {

expected.insert(*key, *account);

} else {

expected.remove(key);

}

}

let sorted = hashed_post_state.into_sorted();

let tx = db.tx().unwrap();

let factory = HashedPostStateCursorFactory::new(DatabaseHashedCursorFactory::new(&tx), &sorted);

assert_account_cursor_order(&factory, expected.into_iter());

}

);

}

reth/crates/trie/db/tests/post_state.rs

Lines 446 to 490 in 780161a

fn fuzz_hashed_storage_cursor() {

proptest!(ProptestConfig::with_cases(10),

|(

db_storages: BTreeMap<B256, BTreeMap<B256, U256>>,

post_state_storages: BTreeMap<B256, (bool, BTreeMap<B256, U256>)>

)|

{

let db = create_test_rw_db();

db.update(|tx| {

for (address, storage) in &db_storages {

for (slot, value) in storage {

let entry = StorageEntry { key: *slot, value: *value };

tx.put::<tables::HashedStorages>(*address, entry).unwrap();

}

}

})

.unwrap();

let mut hashed_post_state = HashedPostState::default();

for (address, (wiped, storage)) in &post_state_storages {

let mut hashed_storage = HashedStorage::new(*wiped);

for (slot, value) in storage {

hashed_storage.storage.insert(*slot, *value);

}

hashed_post_state.storages.insert(*address, hashed_storage);

}

let mut expected = db_storages;

// overwrite or remove accounts from the expected result

for (key, (wiped, storage)) in post_state_storages {

let entry = expected.entry(key).or_default();

if wiped {

entry.clear();

}

entry.extend(storage);

}

let sorted = hashed_post_state.into_sorted();

let tx = db.tx().unwrap();

let factory = HashedPostStateCursorFactory::new(DatabaseHashedCursorFactory::new(&tx), &sorted);

assert_storage_cursor_order(&factory, expected.into_iter());

});

}

crates/trie/trie/src/hashed_cursor/post_state.rs

mediocregopher · 2025-10-31T13:11:59Z

crates/trie/common/src/hashed_state.rs

-    /// Slots that have been zero valued.
-    pub zero_valued_slots: B256Set,
+    /// Sorted collection of updated storage slots, None indicates zero valued.
+    pub storage_slots: Vec<(B256, Option<U256>)>,


I think we can do it without duplicating significant code, see comment at : https://github.com/paradigmxyz/reth/pull/19233/files#r2481365676

mediocregopher · 2025-11-04T15:16:06Z

Hey @0x00101010 , I've continued this work at a new PR here in order to help get it through faster without the code deviating too much from the InMemoryTrieCursor implementation. Thanks for all the effort you've put in so far; your commits are still in the history on my branch so you should get the contributor credit still.

github-project-automation bot added this to Reth Tracker Oct 22, 2025

github-project-automation bot moved this to Backlog in Reth Tracker Oct 22, 2025

0x00101010 force-pushed the rework-cursors branch 3 times, most recently from f56defd to e69db80 Compare October 22, 2025 15:47

mediocregopher reviewed Oct 23, 2025

View reviewed changes

0x00101010 force-pushed the rework-cursors branch 5 times, most recently from 6fe7543 to b9a9e65 Compare October 28, 2025 19:02

0x00101010 marked this pull request as ready for review October 28, 2025 19:30

0x00101010 requested review from Rjected, joshieDo, rakita and shekhirin as code owners October 28, 2025 19:30

0x00101010 requested a review from mediocregopher October 28, 2025 19:30

mediocregopher reviewed Oct 31, 2025

View reviewed changes

Francis Li added 4 commits November 1, 2025 11:16

Rework HashedPostStateCursors

ac1018c

updates based on comment

6a03173

Finish rework

fbaf21c

linting

6e85c93

0x00101010 force-pushed the rework-cursors branch from 974ad1a to 6e85c93 Compare November 1, 2025 18:16

Fix issue

c363afb

0x00101010 requested a review from mediocregopher November 1, 2025 20:35

This was referenced Nov 3, 2025

fix(trie): InMemoryTrieCursor case where all DB nodes are deleted #19464

Merged

chore(trie): Use Vec<Option<...>> in HashedPostStateCursors #19487

Merged

mediocregopher closed this Nov 4, 2025

github-project-automation bot moved this from Backlog to Done in Reth Tracker Nov 4, 2025

	fn fuzz_hashed_account_cursor() {
	proptest!(ProptestConfig::with_cases(10), \|(db_accounts in arb::<BTreeMap<B256, Account>>(), post_state_accounts in arb::<BTreeMap<B256, Option<Account>>>())\| {
	let db = create_test_rw_db();
	db.update(\|tx\| {
	for (key, account) in &db_accounts {
	tx.put::<tables::HashedAccounts>(key, account).unwrap();
	}
	})
	.unwrap();

	let mut hashed_post_state = HashedPostState::default();
	for (hashed_address, account) in &post_state_accounts {
	hashed_post_state.accounts.insert(hashed_address, account);
	}

	let mut expected = db_accounts;
	// overwrite or remove accounts from the expected result
	for (key, account) in &post_state_accounts {
	if let Some(account) = account {
	expected.insert(key, account);
	} else {
	expected.remove(key);
	}
	}

	let sorted = hashed_post_state.into_sorted();
	let tx = db.tx().unwrap();
	let factory = HashedPostStateCursorFactory::new(DatabaseHashedCursorFactory::new(&tx), &sorted);
	assert_account_cursor_order(&factory, expected.into_iter());
	}
	);
	}

	fn fuzz_hashed_storage_cursor() {
	proptest!(ProptestConfig::with_cases(10),
	\|(
	db_storages: BTreeMap<B256, BTreeMap<B256, U256>>,
	post_state_storages: BTreeMap<B256, (bool, BTreeMap<B256, U256>)>
	)\|
	{
	let db = create_test_rw_db();
	db.update(\|tx\| {
	for (address, storage) in &db_storages {
	for (slot, value) in storage {
	let entry = StorageEntry { key: slot, value: value };
	tx.put::<tables::HashedStorages>(*address, entry).unwrap();
	}
	}
	})
	.unwrap();

	let mut hashed_post_state = HashedPostState::default();

	for (address, (wiped, storage)) in &post_state_storages {
	let mut hashed_storage = HashedStorage::new(*wiped);
	for (slot, value) in storage {
	hashed_storage.storage.insert(slot, value);
	}
	hashed_post_state.storages.insert(*address, hashed_storage);
	}


	let mut expected = db_storages;
	// overwrite or remove accounts from the expected result
	for (key, (wiped, storage)) in post_state_storages {
	let entry = expected.entry(key).or_default();
	if wiped {
	entry.clear();
	}
	entry.extend(storage);
	}

	let sorted = hashed_post_state.into_sorted();
	let tx = db.tx().unwrap();
	let factory = HashedPostStateCursorFactory::new(DatabaseHashedCursorFactory::new(&tx), &sorted);
	assert_storage_cursor_order(&factory, expected.into_iter());
	});
	}

Conversation

0x00101010 commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

1. Unified Cursor Implementation

2. Simplified HashedPostStateSorted Structure

Uh oh!

mediocregopher left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mediocregopher commented Oct 24, 2025

Uh oh!

0x00101010 commented Oct 24, 2025

Uh oh!

0x00101010 commented Oct 30, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mediocregopher commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

0x00101010 commented Oct 22, 2025 •

edited

Loading

2. Simplified `HashedPostStateSorted` Structure