Skip to content

feat(persistence): batch write hashed_state#19990

Closed
duyquang6 wants to merge 3 commits intoparadigmxyz:mainfrom
duyquang6:push-rsnqslrrpszs
Closed

feat(persistence): batch write hashed_state#19990
duyquang6 wants to merge 3 commits intoparadigmxyz:mainfrom
duyquang6:push-rsnqslrrpszs

Conversation

@duyquang6
Copy link
Contributor

@duyquang6 duyquang6 commented Nov 26, 2025

As discussed #19739 (comment)
Batch write of hashed_state is safe, so I created this PR to cherry-pick old reverted commit

Changes

  • batch write hashed_state

*Note: The after result here is after we improve extend_sorted_vec, which is upcoming PR

before

erc20 transfers spam: ~100ms

image

native transfers spam: ~50ms

image

after

erc20 transfers spam: ~80ms

image

native transfers spam: ~30ms

image

@duyquang6 duyquang6 requested a review from joshieDo as a code owner November 26, 2025 13:52
@github-project-automation github-project-automation bot moved this to Backlog in Reth Tracker Nov 26, 2025
@duyquang6 duyquang6 changed the title perf: improve extend_sorted_vec & write batch for hashed_state perf: improve extend_sorted_vec & batch write hashed_state Nov 26, 2025
@duyquang6 duyquang6 force-pushed the push-rsnqslrrpszs branch 3 times, most recently from f6cfcb2 to 029724d Compare November 27, 2025 02:20
@mattsse mattsse added C-perf A change motivated by improving speed, memory usage or disk footprint A-db Related to the database labels Nov 27, 2025
Copy link
Collaborator

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pedantic doc nit

@github-project-automation github-project-automation bot moved this from Backlog to In Progress in Reth Tracker Nov 27, 2025
@duyquang6 duyquang6 force-pushed the push-rsnqslrrpszs branch 4 times, most recently from 40f7bef to ec2786b Compare November 27, 2025 11:39
target.sort_unstable_by(|a, b| a.0.cmp(&b.0));
}
})
.collect();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous implementation was specifically designed to avoid having to do a big collect like this; the resulting memory allocation from this collect dwarfs any ostensible speedup you get from not having to sort. I just did a bench comparing your implementation to the previous and this new one is about 2x slower for synthetic datasets:

Image

You can see the bench here if you're curious

Copy link
Contributor Author

@duyquang6 duyquang6 Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very useful bench @mediocregopher
when I first use old version with aggregated hashed state to bench, the result is not good. This is why I think there something wrong with extend_ref or extend_sorted_vec

when I use your bench compare with custom own merge version (not used merge_join_by), it only shine at other target size smaller than other size, but overall case, old version still win. That give me some hint to use this function better, is keep target size and other size similar or larger so might benefit old version

shine case (new better)
image

image

but overall (size similar or target size > other) old still better
image

image

Copy link
Contributor Author

@duyquang6 duyquang6 Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dump the raw data of HashedPostStateSorted when bench with native-transfer

here is bench result of extend_ref, new version of both is better than in this testcase

image

can double check the bench here - already attach hashed state raw data
Could be raw data might have properties that benchmark doesn’t fully cover 🤔 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duyquang6 these are interesting results, your benches for extend_sorted_vec_comparison/t10_o1000 conflict with what I originally saw in mine, but now I'm able to replicate, so there's some inconsistency there that I still need to figure out.

If yours is faster for t10_o1000 I expect it's because it's doing the full allocation up-front, whereas mine is likely doing two larger allocations at the end with the extend calls.

What do you think about trying out something like:

    // Where "50" is a made up number that needs to be tuned
    if other.len() > target.len() * 50 {
        return extend_sorted_vec_custom(target, other);
    }

    extend_sorted_vec(target, other)

This way maybe we better cover all cases.

@duyquang6
Copy link
Contributor Author

I dump the raw data of HashedPostStateSorted when bench with native-transfer

here is bench result of extend_ref, new version of both is better than in this testcase

image can double check the bench [here](https://github.com/duyquang6/reth/blob/bench-sorted-extend/crates/trie/common/benches/extend_ref.rs#L286) - already attach hashed state raw data Could be raw data might have properties that benchmark doesn’t fully cover 🤔 ?

draft temporarily. I will invest more time on finding root cause why there is different, will update later

@mediocregopher
Copy link
Member

Closes #20609

@mediocregopher mediocregopher linked an issue Dec 23, 2025 that may be closed by this pull request
@duyquang6
Copy link
Contributor Author

duyquang6 commented Dec 25, 2025

Hi, I'm currently busy with other work and won't have time for this PR for a few weeks. If anyone wants to take this over to unblock #20609, feel free - otherwise I'll get back to it when I have bandwidth
cc sir @mediocregopher

nvm, i got some bandwidth today, will work on this

@duyquang6
Copy link
Contributor Author

duyquang6 commented Dec 26, 2025

I benchmarked with Vec<B256> instead of Vec<u64> since that matches the real use case. Here is results from M1 Pro, differ significantly from u64 benchmarks:

image

The custom merge version is ~30% faster than the current extend_sorted_vec for B256 data.

Result of Vec<u64>:
image


Summary:

Tested three approaches for merging sorted vectors:

  1. default: In-place overwrites for duplicates, collects new items, sorts at end
  2. merge: Classic single-pass merge into new vector, O(n+m)
  3. itertool_merge: Uses itertools::merge_join_by

Key findings:

Scenario Winner Speedup
Large B256 keys (100k entries) custom_merge ~30-40% faster
Small u64 keys with high overlap default ~20% faster
Small datasets (<100 entries) All similar negligible

Why merge wins for B256 (use case on HashedPostState):

  • B256 comparision is more expensive that u64, that make difference
  • At 100k accounts: merge ~0.8ms vs default ~1.2ms

Benchmark code: https://github.com/duyquang6/reth/blob/bench-sorted-extend/crates/trie/common/benches/extend_sorted_vec.rs

Implementation: https://github.com/duyquang6/reth/blob/bench-sorted-extend/crates/trie/common/src/utils.rs

should I split 2 PR - since with batch write, we can resolve #20609 first ?:

  • Batch write hashed state (this PR)
  • Improve extend_sorted_vec (if needed)

sir @mediocregopher @mattsse

@duyquang6 duyquang6 marked this pull request as ready for review December 26, 2025 06:53
@duyquang6 duyquang6 changed the title perf: improve extend_sorted_vec & batch write hashed_state feat: batch write hashed_state Dec 26, 2025
@duyquang6 duyquang6 changed the title feat: batch write hashed_state feat(persistence): batch write hashed_state Dec 26, 2025
@cliff0412
Copy link

may i know what tool u used to do erc20/native transfers spam?

@duyquang6
Copy link
Contributor Author

may i know what tool u used to do erc20/native transfers spam?

Hi, we wrote a custom Rust script for benchmarking transaction throughput

@mediocregopher
Copy link
Member

We've gone with a different approach in #21422 and confirmed small perf improvement based on that. Further improvements can be based on that work, going to close this for now

@github-project-automation github-project-automation bot moved this from In Progress to Done in Reth Tracker Jan 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-db Related to the database C-perf A change motivated by improving speed, memory usage or disk footprint

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Flatten HashedPostState before persisting

4 participants