Skip to content

[FIXED] Cluster node skew race#6984

Merged
neilalexander merged 1 commit intomainfrom
maurice/node-skew-race
Jun 17, 2025
Merged

[FIXED] Cluster node skew race#6984
neilalexander merged 1 commit intomainfrom
maurice/node-skew-race

Conversation

@MauriceVanVeen
Copy link
Copy Markdown
Member

@MauriceVanVeen MauriceVanVeen commented Jun 17, 2025

There was a race condition where the stream/consumer health check would delete the group's node if the stream/consumer's node was not yet initialized.

The following would be observed in the logs:

Detected stream cluster node skew
Resource not found: open /nats/store/jetstream/$SYS/_js_/S-R3F-tuN0AWqr/tav.idx: no such file or directory
Error writing term and vote file for "S-R3F-tuN0AWqr": open /nats/store/jetstream/$SYS/_js_/S-R3F-tuN0AWqr/tav.idx: no such file or directory

Signed-off-by: Maurice van Veen github@mauricevanveen.com

@MauriceVanVeen MauriceVanVeen requested a review from a team as a code owner June 17, 2025 08:40
mset.mu.Lock()
mset.node = nil
mset.mu.Unlock()
require_True(t, sa.Group != nil)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For checking .Group and .Group.node to not be racy, you may need to take the JS lock.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
@MauriceVanVeen MauriceVanVeen force-pushed the maurice/node-skew-race branch from b0737ab to e7021db Compare June 17, 2025 08:48
Copy link
Copy Markdown
Member

@neilalexander neilalexander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@neilalexander neilalexander merged commit e86b5a2 into main Jun 17, 2025
90 of 92 checks passed
@neilalexander neilalexander deleted the maurice/node-skew-race branch June 17, 2025 09:22
neilalexander added a commit that referenced this pull request Jun 17, 2025
Includes the following:

- #6922
- #6931
- #6933
- #6934
- #6939
- #6938
- #6940
- #6941
- #6942
- #6943
- #6945
- #6944
- #6947
- #6948
- #6949
- #6956
- #6960
- #6961
- #6951
- #6965
- #6968
- #6981
- #6983
- #6984

Signed-off-by: Neil Twigg <neil@nats.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants