Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Arc<Mutex<ShardManager>> to Arc<ShardManager> #2523

Merged
merged 6 commits into from
Aug 30, 2023

Conversation

tazz4843
Copy link
Contributor

@tazz4843 tazz4843 commented Aug 27, 2023

Has the side effect of (theoretically)
fixing shutdown by removing the major deadlock

Required converting all &mut self methods on ShardManager to &self

Closes #2507

Has the side effect of (theoretically)
fixing shutdown by removing the major deadlock

Required converting all `&mut self` methods on ShardManager to &self
@tazz4843 tazz4843 marked this pull request as draft August 27, 2023 04:51
@github-actions github-actions bot added client Related to the `client` module. gateway Related to the `gateway` module. examples Related to Serenity's examples. labels Aug 27, 2023
@tazz4843
Copy link
Contributor Author

Accidentally opened as ready for merge, converted to draft as I don't want a round 2 of last time we tried fixing shutdown. I'm going to fire up a local bot as a test and leave it running for a few days.

Copy link
Collaborator

@mkrasnitski mkrasnitski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me - let's hope this works. Your doctests are failing, though.

@tazz4843
Copy link
Contributor Author

Turns out this breaks poise as well, I'll make a PR to that as a companion.

Co-authored-by: Michael Krasnitski <[email protected]>
@arqunis arqunis added fix A solution to an existing bug. breaking change The public API is changed, resulting in miscompilations or unexpected new behaviour for users labels Aug 27, 2023
@GnomedDev
Copy link
Member

Is this a good idea? Have you tested running shutdown concurrently to see if that breaks anything? It worries me if this was written with the assumption that it's &mut self that a new class of bugs will be introduced (that cannot be caught by rustc, as we are mutating the state of the program via IO to discord).

@mkrasnitski
Copy link
Collaborator

mkrasnitski commented Aug 27, 2023

The only fields of ShardManager that end up being mutated are shard_shutdown as well as the integers shard_index, shard_init, and shard_total. The former is now behind a Mutex that only gets locked in one place, so no opportunity for deadlocks on the same thread.

The other three fields all get written to only in .set_shards. There could feasibly be a race condition where a context switch happens between two of the calls to .store, and then an invariant of the manager might get broken? I don't know. Maybe we should synchronize the three atomic writes to make sure they can't get interrupted.

Allows `Client::start` to finally return once all shards have shut down
@tazz4843
Copy link
Contributor Author

Have you tested running shutdown concurrently to see if that breaks anything?

As far as I can tell, shutdown is a completely idempotent function, even with these new changes. The worst that could happen is an extra irrelevant "timed out" warning after a successful shutdown, which would have happened previously anyways. shutdown simply fires a message and then waits for a response, and if it doesn't get it, times out with an error.

The other three fields all get written to only in .set_shards. There could feasibly be a race condition where a context switch happens between two of the calls to .store, and then an invariant of the manager might get broken? I don't know. Maybe we should synchronize the three atomic writes to make sure they can't get interrupted.

.set_shards is only ever called in one place, Client::start_connection, which takes &mut self on the Client, preventing concurrent calls. Regardless, we could synchronize the calls, however I'm not sure how we could go about that.

@tazz4843
Copy link
Contributor Author

As with the tokio-tungstenite upgrade, failing test seems to be unrelated: dtolnay/proc-macro2#398

@tazz4843 tazz4843 marked this pull request as ready for review August 30, 2023 18:15
@tazz4843
Copy link
Contributor Author

Test bot has been online for three days straight now without any heartbeat failures (through network loss etc etc). I think this has finally solved the issue.

@arqunis arqunis merged commit 4f0f649 into serenity-rs:next Aug 30, 2023
kangalio added a commit to serenity-rs/poise that referenced this pull request Aug 30, 2023
* Companion to serenity-rs/serenity#2523

* Update serenity to include serenity-rs/serenity#2523

* `cargo fmt`

* `cargo update`

Fixes `proc-macro2` compilation failure

* Fix failing tests

* Bump MSRV to 1.72.0

* Update and fix broken things

* Convert u64 to u16

---------

Co-authored-by: kangalioo <[email protected]>
mkrasnitski added a commit to mkrasnitski/serenity that referenced this pull request Sep 21, 2023
Has the side effect of fixing shutdown by removing the major deadlock.

Required converting all `&mut self` methods on `ShardManager` to `&self`.

Allows `Client::start` to finally return once all shards have shut down.

---------

Co-authored-by: Michael Krasnitski <[email protected]>
@tazz4843 tazz4843 deleted the fix-shutdown branch October 7, 2023 01:40
mkrasnitski added a commit to mkrasnitski/serenity that referenced this pull request Oct 17, 2023
Has the side effect of fixing shutdown by removing the major deadlock.

Required converting all `&mut self` methods on `ShardManager` to `&self`.

Allows `Client::start` to finally return once all shards have shut down.

---------

Co-authored-by: Michael Krasnitski <[email protected]>
mkrasnitski added a commit to mkrasnitski/serenity that referenced this pull request Oct 24, 2023
Has the side effect of fixing shutdown by removing the major deadlock.

Required converting all `&mut self` methods on `ShardManager` to `&self`.

Allows `Client::start` to finally return once all shards have shut down.

---------

Co-authored-by: Michael Krasnitski <[email protected]>
arqunis pushed a commit to arqunis/serenity that referenced this pull request Oct 24, 2023
Has the side effect of fixing shutdown by removing the major deadlock.

Required converting all `&mut self` methods on `ShardManager` to `&self`.

Allows `Client::start` to finally return once all shards have shut down.

---------

Co-authored-by: Michael Krasnitski <[email protected]>
arqunis pushed a commit to arqunis/serenity that referenced this pull request Oct 24, 2023
Has the side effect of fixing shutdown by removing the major deadlock.

Required converting all `&mut self` methods on `ShardManager` to `&self`.

Allows `Client::start` to finally return once all shards have shut down.

---------

Co-authored-by: Michael Krasnitski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change The public API is changed, resulting in miscompilations or unexpected new behaviour for users client Related to the `client` module. examples Related to Serenity's examples. fix A solution to an existing bug. gateway Related to the `gateway` module.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants