perf(persistence): improve write batch for HashedPostState & TrieUpdatesSorted#19739
Conversation
f089de4 to
6b419eb
Compare
6b419eb to
a260533
Compare
mediocregopher
left a comment
There was a problem hiding this comment.
One nitpick on a comment, and a suggestion for another easy win you could do here but it's optional
| // * state | ||
| // * hashed state | ||
| // * hashed state (already done) | ||
| // * trie updates (cannot naively extend, need helper) |
There was a problem hiding this comment.
This comment isn't accurate, we can collect a TrieUpdatesSorted and extend it with the trie updates from each block, and then only do a single write_trie_updates_sorted at the end, just like you're doing here with the hashed state. Might be worth trying out in this PR?
There was a problem hiding this comment.
hi @mediocregopher, about comment, that is obsolete I think
Let me try benchmark and update to you again
There was a problem hiding this comment.
There was a problem hiding this comment.
Very nice, thanks for re-benching it 🙏
|
@duyquang6 this had to be reverted as it caused state root mismatches. The issue is that the writing of changesets (
|
Hi @mediocregopher, I think the root cause is indeed that Regarding the potential conflict between write_hashed_state and write_state, I believe it’s not relevant in this context.
whereas
Given that they operate on separate data, batching |
That makes sense, I guess it was only the trie changesets which caused the state root mismatches we saw then. |
Hi sir, I'm planning to cherry-pick the
About this, I also had sometimes to play with this aggregate, but the bench result is not good, go one by one block still better in this step I'll reference this discussion in the new PR |
Accumulates TriesUpdateSorted and batch write at at the end. Prior benchmark done in paradigmxyz#19739 indicates this should improve batch processing performance. Both paradigmxyz#19739 and its succeesor paradigmxyz#19991, contains benchmarks that measures the performance improvements. I am not really familiar on how to do that, so if that's something that's required, would appreciate some help. Closes paradigmxyz#20611
Accumulates TriesUpdateSorted and batch write at at the end. Prior benchmark done in paradigmxyz#19739 indicates this should improve batch processing performance. Both paradigmxyz#19739 and its succeesor paradigmxyz#19991, contains benchmarks that measures the performance improvements. I am not really familiar on how to do that, so if that's something that's required, would appreciate some help. Closes paradigmxyz#20611
Accumulates TriesUpdateSorted and batch write at at the end. Prior benchmark done in paradigmxyz#19739 indicates this should improve batch processing performance. Both paradigmxyz#19739 and its succeesor paradigmxyz#19991, contains benchmarks that measures the performance improvements. I am not really familiar on how to do that, so if that's something that's required, would appreciate some help. Closes paradigmxyz#20611
Accumulates TriesUpdateSorted and batch write at at the end. Prior benchmark done in paradigmxyz#19739 indicates this should improve batch processing performance. Both paradigmxyz#19739 and its succeesor paradigmxyz#19991, contains benchmarks that measures the performance improvements. I am not really familiar on how to do that, so if that's something that's required, would appreciate some help. Closes paradigmxyz#20611
Accumulates TrieUpdatesSorted from all blocks and batch writes at the end. Prior benchmark done in #19739 indicates this should improve batch processing performance. Key changes: - Accumulate trie updates into a single TrieUpdatesSorted during save_blocks loop - Pass accumulated overlay to write_trie_changesets so each block sees correct state - Call write_trie_updates_sorted once after the loop with accumulated updates Closes #20611


HashedPostState&TrieUpdatesSortedinsave_blocksto improve batch processing performance.Performance Results:
Note: This assumes the number of accounts remains relatively stable across blocks, which aligns with typical production workloads.
Before:

*Note: p0.9 is sum of total time for all persist blocks, so it will be multiple by number of block need to persist
After:
Updated:
With --engine.persistence-threshold 6, so it need persist 7 blocks per interval:
For TrieUpdatesSorted:

after:
Note: If we use default config: flush 3 blocks interval, the diff is not much different
But this still a win if in case our persistence very slow, and it need persist tons of block to catch-up with the tip