Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 18 additions & 5 deletions src/gateway/sharding/shard_manager.rs
Original file line number Diff line number Diff line change
Expand Up @@ -142,16 +142,29 @@ impl ShardManager {
) -> Result<(), GatewayError> {
self.initialize(shard_index, shard_init, shard_total);
loop {
if let Ok(Some(msg)) =
timeout(self.wait_time_between_shard_start, self.manager_rx.next()).await
{
let batch = self.queue.pop_batch();
let msg = if batch.is_empty() {
// This function is the only code that can add new shards
// to start to the queue directly (enforced by `&mut`), so
// if the batch is empty, it will always be empty until a
// `ShardManagerMessage::Boot` is received here.
self.manager_rx.next().await
} else {
self.checked_start(batch).await;

// Include a timeout so we can start the next batch of
// shards even if no more messages are received.
timeout(self.wait_time_between_shard_start, self.manager_rx.next())
.await
.unwrap_or_default()
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for having two separate branches? Yeah, adding a timeout only makes sense if there are shards in the queue waiting to get started (since we don't want to block forever), but I think just swapping the order serves well enough here.

Empty batches are handled fine by checked_start, so if multiple timeouts happen while waiting for a shard to get queued, all that will happen is a few extra empty Vec will get created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timers and running this loop at all add at least some overhead that's not too hard to avoid.

There is also the potential that someone reduces the wait time to zero (for example in a scenario with a proxy) in which case this loop might run constantly with no delay. This wouldn't be a massive problem since tokio manages coop budgets when the timeout call is hit, but it still seems like unnecessary overhead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just don't like calling self.manager_rx.next() in two different places. Adding a timer to the future really introduces very little overhead (what with zero-cost futures and all that, plus tokio relies on the system timer and doesn't just spinlock).

If you'd like to clean up the extra Vec usage, then you could either inline checked_start into the loop, or change ShardQueue::pop_batch to return Option<Vec<ShardId>> (None for empty batches), or both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My problem isn't the Vec or the timer future itself, it's CPU time spent on even running the loop when it's known that it won't do anything, especially if the user overrides the wait time. The loop could end up pretty much just spinning if the wait time is set to zero (minus tokio yielding due to coop budget).

Keeping checked_start as its own function seems cleaner to me, but I can change pop_batch to return Option. That seems like it improves the intent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to verify if my concerns about this are actually measurable and the answer is not really. I'll just revert this snippet to what it was before and avoid any further changes.


if let Some(msg) = msg {
match msg {
ShardManagerMessage::Boot(shard_id) => self.queue_for_start(shard_id),
ShardManagerMessage::Quit(res) => return res,
}
}
let batch = self.queue.pop_batch();
self.checked_start(batch).await;
}
}

Expand Down
Loading