Skip to content

chore(trie): Use Vec<Option<...>> in HashedPostStateCursors#19233

Closed
0x00101010 wants to merge 5 commits intoparadigmxyz:mainfrom
0x00101010:rework-cursors
Closed

chore(trie): Use Vec<Option<...>> in HashedPostStateCursors#19233
0x00101010 wants to merge 5 commits intoparadigmxyz:mainfrom
0x00101010:rework-cursors

Conversation

@0x00101010
Copy link
Contributor

@0x00101010 0x00101010 commented Oct 22, 2025

Fixes #18848

Summary

This PR refactors and simplifies the HashedPostStateCursor implementation by removing redundant state tracking and unifying the cursor types. The changes reduce code complexity while maintaining the same functional behavior.

Key Changes

1. Unified Cursor Implementation

  • Merged cursor types: Combined HashedPostStateAccountCursor and HashedPostStateStorageCursor into a single generic HashedPostStateCursor<C, V> type
  • Removed redundant wiped field: The wiped state had 1:1 correspondence with the cursor being None/Some, so the field was redundant
  • Simplified cursor construction: Changed from passing Some(NoopHashedCursor::default()) to passing None directly when no database cursor is needed

2. Simplified HashedPostStateSorted Structure

Before:

pub struct HashedAccountsSorted {
    pub accounts: Vec<(B256, Account)>,
    pub destroyed_accounts: B256Set,
}

pub struct HashedStorageSorted {
    pub non_zero_valued_slots: Vec<(B256, U256)>,
    pub zero_valued_slots: B256Set,
    pub wiped: bool,
}

After:
pub struct HashedPostStateSorted {
    pub accounts: Vec<(B256, Option<Account>)>,  // None = destroyed
    pub storages: B256Map<HashedStorageSorted>,
}

pub struct HashedStorageSorted {
    pub storage_slots: Vec<(B256, Option<U256>)>,  // None = zero-valued
    pub wiped: bool,
}

This change:
- Uses Option<T> consistently to represent deletions/zero-values
- Eliminates separate collections for tracking destroyed/zero-valued entries
- Maintains sorted order in a single vector per type
- Simplifies iteration logic by removing the need to merge and re-sort

3. Improved ForwardInMemoryCursor

- Changed from iterator-based to index-based traversal
- Added has_any() method for checking if any entry satisfies a predicate
- More efficient current() implementation that doesn't clone the iterator

4. Simplified is_storage_empty() Logic

Before:
fn is_storage_empty(&mut self) -> Result<bool, DatabaseError> {
    let post_state_is_empty = /* complex logic checking if all are zero */;
    Ok(post_state_is_empty && self.cursor.as_mut().map_or(self.wiped == Some(true), |c| c.is_storage_empty()?))
}

After:
fn is_storage_empty(&mut self) -> Result<bool, DatabaseError> {
    // Storage is not empty if it has non-zero slots
    if self.post_state_cursor.has_any(|(_, value)| value.is_some()) {
        return Ok(false);
    }

    // If no non-zero slots in post state, check the database
    // Returns true if cursor is None (wiped storage or empty DB)
    self.cursor.as_mut().map_or(Ok(true), |c| c.is_storage_empty())
}

5. Documentation Improvements

- Added comprehensive documentation to HashedPostStateSorted explaining the Option<T> semantics
- Clarified that Some(value) means update and None means deletion
- Improved comments in critical sections

6. Test Coverage

- Added regression test all_storage_slots_deleted_not_wiped_exact_keys to cover edge case where all storage slots are deleted (zero-valued) but storage is not wiped
- Test ensures repeated seek() operations correctly return None for all deleted slots
- Added test case for non-wiped storage with all zero-value entries

Testing

All 177 trie-related tests pass:
- reth-trie: 36 tests
- reth-trie-db: 70 tests
- reth-trie-sparse: 36 tests
- reth-trie-sparse-parallel: 35 tests

Benefits

1. Less code: Reduced by ~30 net lines while adding comprehensive tests
2. Simpler logic: Single generic cursor type instead of multiple specialized types
3. Better performance: Index-based cursor traversal instead of iterator cloning
4. Clearer semantics: Option<T> explicitly communicates update vs deletion
5. Easier to maintain: Less duplicate code and clearer abstractions

@github-project-automation github-project-automation bot moved this to Backlog in Reth Tracker Oct 22, 2025
@0x00101010 0x00101010 force-pushed the rework-cursors branch 3 times, most recently from f56defd to e69db80 Compare October 22, 2025 15:47
Copy link
Member

@mediocregopher mediocregopher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @0x00101010, great work on this so far, at a high level it looks correct, just a few hopefully smaller issues to work out.

/// Slots that have been zero valued.
pub zero_valued_slots: B256Set,
/// Sorted collection of updated storage slots, None indicates zero valued.
pub storage_slots: Vec<(B256, Option<U256>)>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe instead of Option<U256> you can use U256::ZERO to signal a value removal, unless you discovered some reason this wouldn't be the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally just make it easier and consistent in how to handle none values, if I discard Option here, it's gonna be tricky to implement HashedPostStateCursor to handle both Account and Storage

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do it without duplicating significant code, see comment at : https://github.com/paradigmxyz/reth/pull/19233/files#r2481365676

@mediocregopher
Copy link
Member

@0x00101010 just wanted to give you a heads up that I found a pretty bad bug in the InMemoryTrieCursor that you should know about if you're going to continue with this PR: #19277

Note the proptests there, if you did something similar for your PR it would be very helpful I think

@0x00101010
Copy link
Contributor Author

@0x00101010 just wanted to give you a heads up that I found a pretty bad bug in the InMemoryTrieCursor that you should know about if you're going to continue with this PR: #19277

Note the proptests there, if you did something similar for your PR it would be very helpful I think

Thanks for the headsup, will do

@0x00101010 0x00101010 force-pushed the rework-cursors branch 5 times, most recently from 6fe7543 to b9a9e65 Compare October 28, 2025 19:02
@0x00101010 0x00101010 marked this pull request as ready for review October 28, 2025 19:30
@0x00101010
Copy link
Contributor Author

@mediocregopher mind taking a look again?

last_key: Option<B256>,
/// Tracks whether `seek` has been called. Used to prevent re-seeking the DB cursor
/// when it has been exhausted by iteration.
seeked: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the InMemoryTrieCursor this was included only for cfg(debug_assertions), I would like to do that here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seeked field is needed here for this case (not only debug purposes):

  1. cursor created
  2. all in memory state are none or zero (deleted)
  3. db has corresponding entries
  4. multiple seeks are called

Without seeked, we don't have a way to distinguish if db is exhausted (should not seek again) or there's no seek yet (should seek).

See all_storage_slots_deleted_not_wiped_exact_keys() in tests/post_state.rs for the specific test case. Without this, lots of integration tests will fail, for example cargo test --test e2e_testsuite test_local_rpc_tests_compat --profile test --manifest-path /Users/francis/src/reth/main/crates/rpc/rpc-e2e-tests/Cargo.toml 2>&1

Depending on if there should be multiple continuous seek calls, this potential bug exists on InMemoryTrieCursor as well.

Try test it with:

#[test]
fn test_all_storage_slots_deleted_not_wiped_exact_keys() {
    // This test reproduces an edge case where:
    // - cursor is not None (not wiped)
    // - All in-memory entries are deletions (None values)
    // - Database has corresponding entries
    // - Expected: NO leaves should be returned (all deleted)

    // Generate 42 trie node entries with keys distributed across the keyspace
    let db_nodes: Vec<(Nibbles, BranchNodeCompact)> = (0..42)
        .map(|i| {
            let key_bytes = vec![(i * 6) as u8, i as u8]; // Spread keys across keyspace
            let nibbles = Nibbles::from_nibbles_unchecked(key_bytes);
            (nibbles, BranchNodeCompact::new(i as u16, i as u16, 0, vec![], None))
        })
        .collect();

    // Create in-memory entries with same keys but all None values (deletions)
    let in_memory_nodes: Vec<(Nibbles, Option<BranchNodeCompact>)> =
        db_nodes.iter().map(|(key, _)| (*key, None)).collect();

    let db_nodes_map: BTreeMap<Nibbles, BranchNodeCompact> = db_nodes.into_iter().collect();
    let db_nodes_arc = Arc::new(db_nodes_map);
    let visited_keys = Arc::new(Mutex::new(Vec::new()));
    let mock_cursor = MockTrieCursor::new(db_nodes_arc, visited_keys);

    let mut cursor = InMemoryTrieCursor::new(Some(mock_cursor), &in_memory_nodes);

    // Seek to beginning should return None (all nodes are deleted)
    let result = cursor.seek(Nibbles::default()).unwrap();
    assert_eq!(
        result, None,
        "Expected no entries when all nodes are deleted, but got {:?}",
        result
    );

    // Test seek operations at various positions - all should return None
    let seek_keys = vec![
        Nibbles::from_nibbles([0x00]),
        Nibbles::from_nibbles([0x5d]),
        Nibbles::from_nibbles([0x5e]),
        Nibbles::from_nibbles([0x5f]),
        Nibbles::from_nibbles([0xc2]),
        Nibbles::from_nibbles([0xc5]),
        Nibbles::from_nibbles([0xc9]),
        Nibbles::from_nibbles([0xf0]),
    ];

    for seek_key in seek_keys {
        let result = cursor.seek(seek_key).unwrap();
        assert_eq!(
            result, None,
            "Expected None when seeking to {:?} but got {:?}",
            seek_key, result
        );
    }

    // next() should also always return None
    let result = cursor.next().unwrap();
    assert_eq!(result, None, "Expected None from next() but got {:?}", result);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the heads up here, I've submitted a PR for the InMemoryTrieCursor here.

I'm not 100% that it's a critical bug there, as it requires calling seek on the cursor even after the previous seek returned None, which doesn't really make sense to do. But it's worth handling at any rate. It's interesting that the same case for HashedPostStateCursor produces hive errors... I would guess there's some oddity in the TrieNodeIter that's causing it.

// Returns true if cursor is None (wiped storage or empty DB).
self.cursor.as_mut().map_or(Ok(true), |c| c.is_storage_empty())
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see a proptest like was implemented for InMemoryTrieCursor 🙏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are relevant proptests there already:

fn fuzz_hashed_account_cursor() {
proptest!(ProptestConfig::with_cases(10), |(db_accounts in arb::<BTreeMap<B256, Account>>(), post_state_accounts in arb::<BTreeMap<B256, Option<Account>>>())| {
let db = create_test_rw_db();
db.update(|tx| {
for (key, account) in &db_accounts {
tx.put::<tables::HashedAccounts>(*key, *account).unwrap();
}
})
.unwrap();
let mut hashed_post_state = HashedPostState::default();
for (hashed_address, account) in &post_state_accounts {
hashed_post_state.accounts.insert(*hashed_address, *account);
}
let mut expected = db_accounts;
// overwrite or remove accounts from the expected result
for (key, account) in &post_state_accounts {
if let Some(account) = account {
expected.insert(*key, *account);
} else {
expected.remove(key);
}
}
let sorted = hashed_post_state.into_sorted();
let tx = db.tx().unwrap();
let factory = HashedPostStateCursorFactory::new(DatabaseHashedCursorFactory::new(&tx), &sorted);
assert_account_cursor_order(&factory, expected.into_iter());
}
);
}

fn fuzz_hashed_storage_cursor() {
proptest!(ProptestConfig::with_cases(10),
|(
db_storages: BTreeMap<B256, BTreeMap<B256, U256>>,
post_state_storages: BTreeMap<B256, (bool, BTreeMap<B256, U256>)>
)|
{
let db = create_test_rw_db();
db.update(|tx| {
for (address, storage) in &db_storages {
for (slot, value) in storage {
let entry = StorageEntry { key: *slot, value: *value };
tx.put::<tables::HashedStorages>(*address, entry).unwrap();
}
}
})
.unwrap();
let mut hashed_post_state = HashedPostState::default();
for (address, (wiped, storage)) in &post_state_storages {
let mut hashed_storage = HashedStorage::new(*wiped);
for (slot, value) in storage {
hashed_storage.storage.insert(*slot, *value);
}
hashed_post_state.storages.insert(*address, hashed_storage);
}
let mut expected = db_storages;
// overwrite or remove accounts from the expected result
for (key, (wiped, storage)) in post_state_storages {
let entry = expected.entry(key).or_default();
if wiped {
entry.clear();
}
entry.extend(storage);
}
let sorted = hashed_post_state.into_sorted();
let tx = db.tx().unwrap();
let factory = HashedPostStateCursorFactory::new(DatabaseHashedCursorFactory::new(&tx), &sorted);
assert_storage_cursor_order(&factory, expected.into_iter());
});
}

/// Slots that have been zero valued.
pub zero_valued_slots: B256Set,
/// Sorted collection of updated storage slots, None indicates zero valued.
pub storage_slots: Vec<(B256, Option<U256>)>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do it without duplicating significant code, see comment at : https://github.com/paradigmxyz/reth/pull/19233/files#r2481365676

@mediocregopher
Copy link
Member

Hey @0x00101010 , I've continued this work at a new PR here in order to help get it through faster without the code deviating too much from the InMemoryTrieCursor implementation. Thanks for all the effort you've put in so far; your commits are still in the history on my branch so you should get the contributor credit still.

@github-project-automation github-project-automation bot moved this from Backlog to Done in Reth Tracker Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Use Vec<(B256, Option<...>)> in HashedPostStateCursors

2 participants