greedy_scheduler: cache Batches#7193
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #7193 +/- ##
=======================================
Coverage 82.8% 82.8%
=======================================
Files 801 801
Lines 363284 363292 +8
=======================================
+ Hits 300802 300830 +28
+ Misses 62482 62462 -20 🚀 New features to boost your workflow:
|
|
I'd avoid doing anything like this, because these are just freed by same thread as allocated them, i.e. scheduler. jemalloc should (I think) just push them back into thread-local cache. |
| self.working_account_set.clear(); | ||
| // Use zero here to avoid allocating since we are done with `Batches`. | ||
| num_sent += self.common.send_batches(&mut batches, 0)?; | ||
| num_sent += self.common.send_batches(&mut self.batches, 0)?; |
There was a problem hiding this comment.
note to self: except in case of early exit (there shouldn't be any) this will guarantee the batches are empty by the end of each schedule call.
| common: SchedulingCommon<Tx>, | ||
| working_account_set: ReadWriteAccountSet, | ||
| unschedulables: Vec<TransactionPriorityId>, | ||
| batches: Batches<Tx>, |
There was a problem hiding this comment.
if allocation/deallocation of batches is an issue, we could put them in the common so that prio_graph variant also gets the benefit.
There was a problem hiding this comment.
yeah happy to move it there
There was a problem hiding this comment.
I've done this now. There are a couple of early returns in schedule(), but SchedulingError is unrecoverable (the banking thread dies), so it's ok
There was a problem hiding this comment.
I can't read, it doesn't die
There was a problem hiding this comment.
code wise this looks bad, but in practice we error only if the workers get disconnected, which never happens unless the validator gets in some kind of hosed state anyway
| } | ||
| } | ||
|
|
||
| pub fn clear(&mut self) { |
There was a problem hiding this comment.
I'm hesitant to have a clear on this.
We should never have batches that last longer than a schedule call. If we do, then it's a bug.
Because at the end of each schedule call we send_batches which sends out all non-empty batches, right?
There was a problem hiding this comment.
I don't know, I did a mechanical change: saw it in profiles, before this change it was dropping/reallocating, this is functionally equivalent to doing that modulo the churn.
I see your point, in which case we should assert that it's cleared by the time we return.
| for ids in &mut self.ids { | ||
| ids.clear(); | ||
| } | ||
| for transactions in &mut self.transactions { | ||
| transactions.clear(); | ||
| } | ||
| for max_ages in &mut self.max_ages { | ||
| max_ages.clear(); | ||
| } | ||
| for total_cus in &mut self.total_cus { |
eab774e to
76100be
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR caches Batches instances within the SchedulingCommon struct to avoid repeated allocations and deallocations in the hot path of transaction scheduling. The change moves batch creation from being instantiated per scheduling pass to being reused across scheduling operations.
- Moves
Batchesinstance from local variable toSchedulingCommonfield - Updates constructor to accept
target_num_transactions_per_batchparameter - Removes
batchesparameter from batch sending methods since it's now internal state
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| scheduler_common.rs | Adds batches field to SchedulingCommon, updates constructor and batch methods |
| prio_graph_scheduler.rs | Removes local Batches creation, uses cached instance from SchedulingCommon |
| greedy_scheduler.rs | Removes local Batches creation, uses cached instance from SchedulingCommon |
| total_cus: vec![0; num_threads], | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
The is_empty method lacks documentation explaining its purpose and when it should be called. Consider adding a docstring to explain that this method is used for debug assertions to verify batches are properly cleared after scheduling.
| /// Returns true if all batches are empty and no compute units are allocated. | |
| /// | |
| /// This method is intended for use in debug assertions to verify that | |
| /// batches are properly cleared after scheduling. It should be called | |
| /// after batch processing to ensure no residual state remains. |
apfitzge
left a comment
There was a problem hiding this comment.
lgtm. left small nit and a potential for more clean-up
|
|
||
| debug_assert!( | ||
| self.common.batches.is_empty(), | ||
| "batches must be empty after scheduling" |
There was a problem hiding this comment.
nit: "batches must start empty for scheduling".
weird to say after scheduling when this check is before?
There was a problem hiding this comment.
heh, I initially put the check after scheduling, but then saw the early returns and moved it. Fixed!
| @@ -27,17 +27,31 @@ pub struct Batches<Tx> { | |||
|
|
|||
| impl<Tx> Batches<Tx> { | |||
| pub fn new(num_threads: usize, target_num_transactions_per_batch: usize) -> Self { | |||
There was a problem hiding this comment.
since we now store Batches in common, we don't need to do the 0 target size to avoid allocation on the final send. So the target_num_transactions_per_batch is constant.
We could store it in the Batches itself now, making all the calls to send_* a bit cleaner.
This avoids a bunch of allocations/deallocations in the hot path
76100be to
e9d3efa
Compare

This avoids a bunch of allocations/deallocations in the hot path.
This must be the 4th time I do this change. Finally committing and PRing so I don't have to do it again next month.