Skip to content

perf: dynamically batch tx sender recovery#1834

Merged
onbjerg merged 1 commit intomainfrom
onbjerg/sender-recovery-perf
Mar 18, 2023
Merged

perf: dynamically batch tx sender recovery#1834
onbjerg merged 1 commit intomainfrom
onbjerg/sender-recovery-perf

Conversation

@onbjerg
Copy link
Collaborator

@onbjerg onbjerg commented Mar 18, 2023

The performance regression in the sender recovery stage was caused by us effectively queuing 5000 really "fast" (relatively) jobs, leading to a lot of time lost on Rayon's worker threads trying to steal more jobs.

The solution is to reintroduce batching. For now, we create batches based on the number of worker threads in the Rayon threadpool. This works since we are limited by memory, and can't crank the commit threshold too much, and separate config for batch sizes in this case doesn't make much sense.

This is a perf grab of the current sender recovery stage:

Screenshot from 2023-03-18 13-19-39

As we can see here (on the top right), almost 50%(!) of the time is spent trying to get more work.

Compare this with this PR:

Screenshot from 2023-03-18 14-01-03

We almost spend no time trying to get more work.

The speedup for me is that sender recovery now feels snappy again - before, it felt like it took 20-30s per 5k blocks, now it takes about 3-4.

@onbjerg onbjerg added A-staged-sync Related to staged sync (pipelines and stages) C-perf A change motivated by improving speed, memory usage or disk footprint labels Mar 18, 2023
@onbjerg onbjerg requested a review from rkrasiuk as a code owner March 18, 2023 13:05
@onbjerg onbjerg force-pushed the onbjerg/sender-recovery-perf branch from 82b89a9 to ffe7c3b Compare March 18, 2023 13:08
@codecov-commenter
Copy link

codecov-commenter commented Mar 18, 2023

Codecov Report

Merging #1834 (ffe7c3b) into main (995c5ad) will increase coverage by 0.00%.
The diff coverage is 100.00%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@           Coverage Diff           @@
##             main    #1834   +/-   ##
=======================================
  Coverage   73.50%   73.51%           
=======================================
  Files         410      410           
  Lines       50515    50527   +12     
=======================================
+ Hits        37131    37143   +12     
  Misses      13384    13384           
Flag Coverage Δ
integration-tests 19.70% <0.00%> (-0.01%) ⬇️
unit-tests 67.86% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
crates/stages/src/stages/sender_recovery.rs 94.14% <100.00%> (+2.95%) ⬆️

... and 6 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Collaborator

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gg

Comment on lines +113 to +115
.for_each(|result: Result<_, StageError>| {
let _ = tx.send(result);
});
Copy link
Collaborator

@mattsse mattsse Mar 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sending them one by one is totally fine

Comment on lines +93 to +94
for chunk in
&tx_walker.chunks(self.commit_threshold as usize / rayon::current_num_threads())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

in hindsight this is kinda obvious

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah...

@onbjerg onbjerg merged commit a05b3ff into main Mar 18, 2023
@onbjerg onbjerg deleted the onbjerg/sender-recovery-perf branch March 18, 2023 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-staged-sync Related to staged sync (pipelines and stages) C-perf A change motivated by improving speed, memory usage or disk footprint

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants