Merge `ShardQueuer` functionality into `ShardManager` #3108

mkrasnitski · 2025-02-08T04:07:54Z

Untangles the spaghetti of shard management code by removing the middleman ShardQueuer struct, which used a disorienting mess of mpsc channels to facilitate message passing between the ShardManager and each of its ShardRunners.

The new ShardManager::run function takes care of creating/spawning new runners as needed. This is fine to call without spawning a separate task (which was being done for ShardQueuer) because Client::start_connection previously already waited for the ShardQueuer loop to (essentially) finish before returning. In other words, the Client would start the ShardManager and then block, but the waiting was just being facilitated through channels, which wasn't obvious.

The only messages passed across channels now are ShardRunnerMessage (sent from the manager to a given runner), and a new ShardManagerMessage (sent from a runner back to the manager). For example, the manager can send a message to a runner telling it to restart, then the runner will shut itself down and ask the manager to reboot the shard by sending it a message back.

The shard queue now keeps track of if it should dispatch in concurrent batches or not. Since we only use concurrent startup when first initializing all shards, and then switch it off once and don't re-enable it, this should be fine.

I've done some cursory testing of this using the repo examples, but I'd like some real apps to test this out before it's merged so that I know nothing is broken. Especially I'd like to make sure that concurrent shard starting works correctly.

Untangles the spaghetti of shard management code by removing the middleman `ShardQueuer` struct, which used a disorienting mess of mpsc channels to facilitate message passing between the `ShardManager` and each of its `ShardRunner`s. The new `ShardManager::run` function takes care of dispatching new runners as needed. This is fine to call without spawning a separate task because `Client::start_connection` previously already waited for the `ShardQueuer` loop to finish before returning, meaning that the `ShardManager` was already long-running before, it just wasn't obvious. Rename `Context::shard_messenger` back to `shard`

GnomedDev

Looks very good, a lot better, just some concerns I spotted.

src/gateway/sharding/shard_manager.rs

src/gateway/sharding/shard_runner.rs

* Add docs * Rename `ShardManagerMessage::Err` to `Quit` * Move `ShardQueue` into its own file * Change runner_info `Mutex` to a sync implementation and don't block when locking it * Outline the shutdown logic in the `Drop` impl for `ShardManager`

src/gateway/sharding/shard_queue.rs

GnomedDev

Okay, I am comfortable with this being merged now and getting some testing on next especially with follow up changes on the way.

People like me with big bots reliant on next may want to be ready to roll back after this PR as it is likely to cause issues that code review and small scale testing cannot find.

Untangles the spaghetti of shard management code by removing the middleman `ShardQueuer` struct, which used a disorienting mess of mpsc channels to facilitate message passing between the `ShardManager` and each of its `ShardRunner`s. The new `ShardManager::run` function takes care of creating/spawning new runners as needed. This is fine to call without spawning a separate task (which was being done for `ShardQueuer`) because `Client::start_connection` previously already waited for the `ShardQueuer` loop to (essentially) finish before returning. In other words, the Client would start the `ShardManager` and then block, but the waiting was just being facilitated through channels, which wasn't obvious. The only messages passed across channels now are `ShardRunnerMessage` (sent from the manager to a given runner), and a new `ShardManagerMessage` (sent from a runner back to the manager). For example, the manager can send a message to a runner telling it to restart, then the runner will shut itself down and ask the manager to reboot the shard by sending it a message back. The shard queue now keeps track of if it should dispatch in concurrent batches or not. Since we only use concurrent startup when first initializing all shards, and then switch it off once and don't re-enable it, this should be fine.

Follow up to #3108. Inlines the fields and methods of `ShardMessenger` into `Context` and changes collectors to take `&Context`. Also, removes the `ShardRunnerMessage::{Close, Message}` variants, adds an `UpdateVoiceState` variant (Songbird will want to use this) and combines the 3 presence-related variants into one. Plus some miscellaneous cleanup of the `ShardRunner` code.

Untangles the spaghetti of shard management code by removing the middleman `ShardQueuer` struct, which used a disorienting mess of mpsc channels to facilitate message passing between the `ShardManager` and each of its `ShardRunner`s. The new `ShardManager::run` function takes care of creating/spawning new runners as needed. This is fine to call without spawning a separate task (which was being done for `ShardQueuer`) because `Client::start_connection` previously already waited for the `ShardQueuer` loop to (essentially) finish before returning. In other words, the Client would start the `ShardManager` and then block, but the waiting was just being facilitated through channels, which wasn't obvious. The only messages passed across channels now are `ShardRunnerMessage` (sent from the manager to a given runner), and a new `ShardManagerMessage` (sent from a runner back to the manager). For example, the manager can send a message to a runner telling it to restart, then the runner will shut itself down and ask the manager to reboot the shard by sending it a message back. The shard queue now keeps track of if it should dispatch in concurrent batches or not. Since we only use concurrent startup when first initializing all shards, and then switch it off once and don't re-enable it, this should be fine.

Follow up to serenity-rs#3108. Inlines the fields and methods of `ShardMessenger` into `Context` and changes collectors to take `&Context`. Also, removes the `ShardRunnerMessage::{Close, Message}` variants, adds an `UpdateVoiceState` variant (Songbird will want to use this) and combines the 3 presence-related variants into one. Plus some miscellaneous cleanup of the `ShardRunner` code.