perf: dynamically batch tx sender recovery#1834
Merged
Conversation
82b89a9 to
ffe7c3b
Compare
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #1834 +/- ##
=======================================
Coverage 73.50% 73.51%
=======================================
Files 410 410
Lines 50515 50527 +12
=======================================
+ Hits 37131 37143 +12
Misses 13384 13384
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 6 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
mattsse
approved these changes
Mar 18, 2023
Comment on lines
+113
to
+115
| .for_each(|result: Result<_, StageError>| { | ||
| let _ = tx.send(result); | ||
| }); |
Collaborator
There was a problem hiding this comment.
sending them one by one is totally fine
Comment on lines
+93
to
+94
| for chunk in | ||
| &tx_walker.chunks(self.commit_threshold as usize / rayon::current_num_threads()) |
Collaborator
There was a problem hiding this comment.
👍
in hindsight this is kinda obvious
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The performance regression in the sender recovery stage was caused by us effectively queuing 5000 really "fast" (relatively) jobs, leading to a lot of time lost on Rayon's worker threads trying to steal more jobs.
The solution is to reintroduce batching. For now, we create batches based on the number of worker threads in the Rayon threadpool. This works since we are limited by memory, and can't crank the commit threshold too much, and separate config for batch sizes in this case doesn't make much sense.
This is a perf grab of the current sender recovery stage:
As we can see here (on the top right), almost 50%(!) of the time is spent trying to get more work.
Compare this with this PR:
We almost spend no time trying to get more work.
The speedup for me is that sender recovery now feels snappy again - before, it felt like it took 20-30s per 5k blocks, now it takes about 3-4.