NRG: Install leader snapshot on scaleup#7509
Merged
neilalexander merged 1 commit intomainfrom Nov 5, 2025
Merged
Conversation
Member
|
FYI: |
864f629 to
3b55741
Compare
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
3b55741 to
1ac5ba5
Compare
This was referenced Nov 5, 2025
neilalexander
added a commit
that referenced
this pull request
Nov 5, 2025
neilalexander
added a commit
that referenced
this pull request
Nov 5, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When scaling up a stream from R1 to R3 a snapshot is made of the R1 stream and
SendSnapshotis called to share the initial state with the new peers. However, this snapshot would solely be in the log and not installed. If the upper layer JetStream catchup were to fail halfway, the two incomplete peers could try to become the leader. This could then result in the stream becoming desynced.We can ensure these peers never become leader before they're fully synced by installing the snapshot, as that ensures the upper layer can process it during recovery. If the previous R1 leader is not online to perform the catchup, the follower can now successfully call
n.DrainAndReplaySnapshot()without needing to reset clustered state. Allowing it to reuse the installed snapshot and not become leader until after the snapshot has been successfully processed.Signed-off-by: Maurice van Veen github@mauricevanveen.com