Skip to content

NRG (2.11): Don't revert term to pterm on AE mismatch#5684

Merged
derekcollison merged 1 commit intomainfrom
neil/nrgaemismatch
Jul 23, 2024
Merged

NRG (2.11): Don't revert term to pterm on AE mismatch#5684
derekcollison merged 1 commit intomainfrom
neil/nrgaemismatch

Conversation

@neilalexander
Copy link
Copy Markdown
Member

@neilalexander neilalexander commented Jul 22, 2024

Beforehand when we were trying to run a catchup, we were reverting the term back to pterm. We can't ever move the term backwards safely and the catchup itself does not rely on this behaviour in order to work (as the catchup entries are matched only on pterm/pindex), so don't revert it.

We saw this behaviour in Antithesis where a catchup could take us back a term.

Signed-off-by: Neil Twigg neil@nats.io

@neilalexander neilalexander requested a review from a team as a code owner July 22, 2024 17:14
Beforehand when we were trying to run a catchup, we were reverting the
`term` back to `pterm`. We can't ever move the term backwards safely and
the catchup itself does not rely on this behaviour in order to work (as
the catchup entries are matched only on `pindex`), so don't revert it.

Signed-off-by: Neil Twigg <neil@nats.io>
@neilalexander
Copy link
Copy Markdown
Member Author

Would like @ReubenMathew's approval before we proceed.

@ReubenMathew
Copy link
Copy Markdown
Contributor

Example log of this happening:

[        46.773] [      service_nats-0] [inf] [1] 2024/07/18 18:43:37.702207 [DBG] RAFT [S1Nunr6R - S-R3F-41d81bAI - term:4 p:1/19 sm:18/18] AppendEntry updating leader to "cnrtt3eg"
[        46.773] [      service_nats-0] [inf] [1] 2024/07/18 18:43:37.702213 [DBG] RAFT [S1Nunr6R - S-R3F-41d81bAI - term:4 p:1/19 sm:18/18] AppendEntry did not match 1 22 with 1 19
[        47.457] [      service_nats-2] [inf] [1] 2024/07/18 18:43:38.386003 [DBG] RAFT [cnrtt3eg - S-R3F-41d81bAI - term:4 p:4/23 sm:20/20] Being asked to catch up follower: "S1Nunr6R"
[        47.457] [      service_nats-2] [inf] [1] 2024/07/18 18:43:38.386011 [DBG] RAFT [cnrtt3eg - S-R3F-41d81bAI - term:4 p:4/23 sm:20/20] Need to send snapshot to follower
[        47.457] [      service_nats-2] [inf] [1] 2024/07/18 18:43:38.386165 [DBG] RAFT [cnrtt3eg - S-R3F-41d81bAI - term:4 p:4/23 sm:20/20] Snapshot sent, reset first catchup entry to 20
[        47.458] [      service_nats-2] [inf] [1] 2024/07/18 18:43:38.386752 [DBG] RAFT [cnrtt3eg - S-R3F-41d81bAI - term:4 p:4/23 sm:20/20] Our first entry [1:20] does not match request from follower [1:19]
[        47.458] [      service_nats-2] [inf] [1] 2024/07/18 18:43:38.387053 [DBG] RAFT [cnrtt3eg - S-R3F-41d81bAI - term:4 p:4/23 sm:20/20] Running catchup for "S1Nunr6R"
[        47.459] [      service_nats-2] [inf] [1] 2024/07/18 18:43:38.387644 [DBG] RAFT [cnrtt3eg - S-R3F-41d81bAI - term:4 p:4/23 sm:20/20] Finished catching up
[        49.926] [      service_nats-0] [inf] [1] 2024/07/18 18:43:40.854758 [DBG] RAFT [S1Nunr6R - S-R3F-41d81bAI - term:1 p:1/19 sm:18/18] Catchup may be stalled, will request again

Copy link
Copy Markdown
Member

@derekcollison derekcollison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@derekcollison derekcollison merged commit c316803 into main Jul 23, 2024
@derekcollison derekcollison deleted the neil/nrgaemismatch branch July 23, 2024 03:14
neilalexander added a commit that referenced this pull request Nov 25, 2024
Includes the following:

- #5661
- #5666
- #5671
- #5344
- #5684
- #5689
- #5691
- #5714
- #5717
- #5707
- #5792
- #5912
- #5957
- #5700
- #5975
- #5991
- #5987
- #6027
- #6038
- #6053
- #5848
- #6055
- #6056
- #6060
- #6061
- #6072
- #5832
- #6073
- #6107

Signed-off-by: Neil Twigg <neil@nats.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants