Skip to content

[FIXED] NRG: Fix server shutdown race condition#7045

Merged
neilalexander merged 1 commit intomainfrom
maurice/nrg-shutdown-race
Jul 8, 2025
Merged

[FIXED] NRG: Fix server shutdown race condition#7045
neilalexander merged 1 commit intomainfrom
maurice/nrg-shutdown-race

Conversation

@MauriceVanVeen
Copy link
Copy Markdown
Member

The for s.isRunning() {..} makes sure the Raft run loop keeps running while the server is running and stops when it's done. Although this sounds sane, this actually results in a race condition. If the n.State() changes, for example if you're leader and do a leader transfer as part of shutting down, then this race condition could hit where s.isRunning() breaks us out of the loop, and then the monitorCluster/Stream/Consumer can't install a snapshot on shutdown.

The run loop can simply be a for {..}, because the last part of shutting down the server is s.shutdownRaftNodes() which stops any nodes that weren't stopped already. Which would properly break the loop.

Signed-off-by: Maurice van Veen github@mauricevanveen.com

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
@MauriceVanVeen MauriceVanVeen requested a review from a team as a code owner July 8, 2025 10:58
Copy link
Copy Markdown
Member

@neilalexander neilalexander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@neilalexander neilalexander merged commit a39322a into main Jul 8, 2025
89 of 92 checks passed
@neilalexander neilalexander deleted the maurice/nrg-shutdown-race branch July 8, 2025 12:09
@wallyqs wallyqs changed the title NRG: Fix server shutdown race condition [FIXED] NRG: Fix server shutdown race condition Jul 24, 2025
neilalexander added a commit that referenced this pull request Jul 25, 2025
Includes the following:

- #7031
- #7033
- #7034
- #7035
- #7036
- #7040
- #7043
- #7045
- #7047
- #7046
- #7050
- #7051
- #7052
- #7053
- #7061
- #7063
- #7064
- #7065
- #7066
- #7070
- #7072
- #7080
- #7026
- #6728
- #7074
- #7089
- #7095
- #7087
- #7094
- #7096
- #7099

Signed-off-by: Neil Twigg <neil@nats.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants