Skip to content

NRG: Install leader snapshot on scaleup#7509

Merged
neilalexander merged 1 commit intomainfrom
maurice/r1-scaleup
Nov 5, 2025
Merged

NRG: Install leader snapshot on scaleup#7509
neilalexander merged 1 commit intomainfrom
maurice/r1-scaleup

Conversation

@MauriceVanVeen
Copy link
Copy Markdown
Member

When scaling up a stream from R1 to R3 a snapshot is made of the R1 stream and SendSnapshot is called to share the initial state with the new peers. However, this snapshot would solely be in the log and not installed. If the upper layer JetStream catchup were to fail halfway, the two incomplete peers could try to become the leader. This could then result in the stream becoming desynced.

We can ensure these peers never become leader before they're fully synced by installing the snapshot, as that ensures the upper layer can process it during recovery. If the previous R1 leader is not online to perform the catchup, the follower can now successfully call n.DrainAndReplaySnapshot() without needing to reset clustered state. Allowing it to reuse the installed snapshot and not become leader until after the snapshot has been successfully processed.

Signed-off-by: Maurice van Veen github@mauricevanveen.com

@MauriceVanVeen MauriceVanVeen requested a review from a team as a code owner November 4, 2025 18:04
Copy link
Copy Markdown
Member

@neilalexander neilalexander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@neilalexander
Copy link
Copy Markdown
Member

FYI:

=== RUN   TestJetStreamClusterSnapshotAndRestoreWithHealthz
    jetstream_cluster_3_test.go:4811: S-1 - JetStream stream '$G > TEST' is not current: group node unhealthy
--- FAIL: TestJetStreamClusterSnapshotAndRestoreWithHealthz (2.56s)

Copy link
Copy Markdown
Member

@neilalexander neilalexander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
@neilalexander neilalexander merged commit b6d9254 into main Nov 5, 2025
130 of 136 checks passed
@neilalexander neilalexander deleted the maurice/r1-scaleup branch November 5, 2025 10:14
neilalexander added a commit that referenced this pull request Nov 5, 2025
Includes the following:

- #7499
- #7503
- #7508
- #7510
- #7509
- #7512
- #7516
- #7515

Signed-off-by: Neil Twigg <neil@nats.io>
neilalexander added a commit that referenced this pull request Nov 5, 2025
Includes the following:

- #7416
- #7425
- #7486
- #7495
- #7482
- #7496
- #7499
- #7503
- #7508 (excluding weak
pointer/cache-related changes that apply only to 2.12.x)
- #7510
- #7509
- #7512
- #7516
- #7515

Signed-off-by: Neil Twigg <neil@nats.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants