feat(persistence): batch write hashed_state#19990
feat(persistence): batch write hashed_state#19990duyquang6 wants to merge 3 commits intoparadigmxyz:mainfrom
Conversation
f6cfcb2 to
029724d
Compare
40f7bef to
ec2786b
Compare
crates/trie/common/src/utils.rs
Outdated
| target.sort_unstable_by(|a, b| a.0.cmp(&b.0)); | ||
| } | ||
| }) | ||
| .collect(); |
There was a problem hiding this comment.
The previous implementation was specifically designed to avoid having to do a big collect like this; the resulting memory allocation from this collect dwarfs any ostensible speedup you get from not having to sort. I just did a bench comparing your implementation to the previous and this new one is about 2x slower for synthetic datasets:
You can see the bench here if you're curious
There was a problem hiding this comment.
very useful bench @mediocregopher
when I first use old version with aggregated hashed state to bench, the result is not good. This is why I think there something wrong with extend_ref or extend_sorted_vec
when I use your bench compare with custom own merge version (not used merge_join_by), it only shine at other target size smaller than other size, but overall case, old version still win. That give me some hint to use this function better, is keep target size and other size similar or larger so might benefit old version
but overall (size similar or target size > other) old still better

There was a problem hiding this comment.
I dump the raw data of HashedPostStateSorted when bench with native-transfer
here is bench result of extend_ref, new version of both is better than in this testcase
can double check the bench here - already attach hashed state raw data
Could be raw data might have properties that benchmark doesn’t fully cover 🤔 ?
There was a problem hiding this comment.
@duyquang6 these are interesting results, your benches for extend_sorted_vec_comparison/t10_o1000 conflict with what I originally saw in mine, but now I'm able to replicate, so there's some inconsistency there that I still need to figure out.
If yours is faster for t10_o1000 I expect it's because it's doing the full allocation up-front, whereas mine is likely doing two larger allocations at the end with the extend calls.
What do you think about trying out something like:
// Where "50" is a made up number that needs to be tuned
if other.len() > target.len() * 50 {
return extend_sorted_vec_custom(target, other);
}
extend_sorted_vec(target, other)
This way maybe we better cover all cases.
|
Closes #20609 |
|
nvm, i got some bandwidth today, will work on this |
|
I benchmarked with Vec<B256> instead of Vec<u64> since that matches the real use case. Here is results from M1 Pro, differ significantly from u64 benchmarks:
The custom merge version is ~30% faster than the current extend_sorted_vec for B256 data. Summary: Tested three approaches for merging sorted vectors:
Key findings:
Why merge wins for B256 (use case on HashedPostState):
Benchmark code: https://github.com/duyquang6/reth/blob/bench-sorted-extend/crates/trie/common/benches/extend_sorted_vec.rs Implementation: https://github.com/duyquang6/reth/blob/bench-sorted-extend/crates/trie/common/src/utils.rs should I split 2 PR - since with batch write, we can resolve #20609 first ?:
|
ec2786b to
4208a5c
Compare
4208a5c to
95c86c1
Compare
|
may i know what tool u used to do erc20/native transfers spam? |
Hi, we wrote a custom Rust script for benchmarking transaction throughput |
|
We've gone with a different approach in #21422 and confirmed small perf improvement based on that. Further improvements can be based on that work, going to close this for now |




As discussed #19739 (comment)
Batch write of
hashed_stateis safe, so I created this PR to cherry-pick old reverted commitChanges
*Note: The after result here is after we improve extend_sorted_vec, which is upcoming PR
before
erc20 transfers spam: ~100ms
native transfers spam: ~50ms
after
erc20 transfers spam: ~80ms
native transfers spam: ~30ms