NRG (2.11): Start catchup from n.commit & fix AppendEntry is stored at seq=ae.pindex+1#5987
Conversation
848e1d2 to
5bd3d7e
Compare
|
LMK when ready for review. |
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
…e.pindex+1 Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
4fff9ec to
b62c058
Compare
neilalexander
left a comment
There was a problem hiding this comment.
I think this looks good, let's mark for review.
mprimi
left a comment
There was a problem hiding this comment.
Minor style comments on tests
server/jetstream_cluster_4_test.go
Outdated
| followerServer2.WaitForShutdown() | ||
|
|
||
| // Although this request will time out, it will be added to the stream leader's WAL. | ||
| _, err = js.Publish("foo", []byte("first")) |
There was a problem hiding this comment.
Could set a shorter timeout to make the test faster? (not sure what the default is)
There was a problem hiding this comment.
Done, default seemed to be 5 seconds, lowered to 1s.
| streamLeaderServer.WaitForShutdown() | ||
|
|
||
| // Only restart the (previous) followers. | ||
| followerServer1 = c.restartServer(followerServer1) |
There was a problem hiding this comment.
Why is one server variable reassigned and not the other?
There was a problem hiding this comment.
It's used below to have a connection to that server:
nc, js = jsClientConnect(t, followerServer1)The connection could be to either server, as long as it's not the (previous) leader. So only this one variable is used to setup the connection.
| rs := c.randomNonStreamLeader(globalAccountName, "TEST") | ||
| ts := time.Now().UnixNano() | ||
|
|
||
| var scratch [1024]byte |
There was a problem hiding this comment.
This bit may use an explanation... maybe
Manually add 3 append entries to each node's WAL, except for one node who is one behind
I actually am not sure. Your inner loop goes to 3, but then you have a break at 1.
There was a problem hiding this comment.
That description is correct. Two servers will have 3 uncommitted entries, and one server will have 2 uncommitted entries so it needs to catchup for that third one.
Have moved that condition for that one server up, so it's a bit clearer it gets 2 iterations of that loop.
| } | ||
|
|
||
| // Check that the first two published messages came from our WAL, and | ||
| // the last came from a catchup by another leader. |
There was a problem hiding this comment.
It seems to me you are doing the same check for all 3 entries you look at, this comment is maybe outdated?
There was a problem hiding this comment.
There are 3 checks, 2x require_Equal and 1x require_NotEqual. Have changed it to use require_True with == and != instead, that seems a bit more clear.
(And it doesn't matter what the values being compared are, just that they either match or not)
| if len(expected) > 0 && int(state.LastSeq-state.FirstSeq+1) != len(expected) { | ||
| return fmt.Errorf("WAL is different: too many entries") | ||
| } | ||
| for index := state.FirstSeq; index <= state.LastSeq; index++ { |
There was a problem hiding this comment.
This looped check deserves a comment
| n, err := s.initRaftNode(globalAccountName, cfg, pprofLabels{}) | ||
| require_NoError(t, err) | ||
|
|
||
| encode := func(ae *appendEntry) *appendEntry { |
There was a problem hiding this comment.
Comment this bit of arcane magic
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
…6027) Reverts the changes made in #5987, but the tests are kept. Instead opting for a simpler approach: - removing the `isNew` condition when `pterm` or `pindex` don't match, to ensure consistency even during catchup - move the `ae.pindex == n.pindex` condition up so `pterm` can be corrected (otherwise it would not be executed) Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
Signed-off-by: Neil Twigg <neil@nats.io>
… at `seq=ae.pindex+1` (#5987) This PR makes three complementary fixes to the way how catchup and truncating is handled. Specifically: - when doing `n.loadEntry(index)` we need to pass where the AppendEntry is in terms of stream sequence, this is equal to `ae.pindex+1` since the `ae.pindex` is the value before it's stored in the stream. - start catchup from `n.commit`, we could have messages past our commit that have been invalidated and need to be truncated since there was a switch between leaders - because we catchup from `n.commit`, we check if our local AppendEntry matches terms with the incoming AppendEntry, we only need to truncate if the terms don't match Signed-off-by: Maurice van Veen <github@mauricevanveen.com> --------- Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
… at `seq=ae.pindex+1` (#5987) This PR makes three complementary fixes to the way how catchup and truncating is handled. Specifically: - when doing `n.loadEntry(index)` we need to pass where the AppendEntry is in terms of stream sequence, this is equal to `ae.pindex+1` since the `ae.pindex` is the value before it's stored in the stream. - start catchup from `n.commit`, we could have messages past our commit that have been invalidated and need to be truncated since there was a switch between leaders - because we catchup from `n.commit`, we check if our local AppendEntry matches terms with the incoming AppendEntry, we only need to truncate if the terms don't match Signed-off-by: Maurice van Veen <github@mauricevanveen.com> --------- Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
… at `seq=ae.pindex+1` (#5987) This PR makes three complementary fixes to the way how catchup and truncating is handled. Specifically: - when doing `n.loadEntry(index)` we need to pass where the AppendEntry is in terms of stream sequence, this is equal to `ae.pindex+1` since the `ae.pindex` is the value before it's stored in the stream. - start catchup from `n.commit`, we could have messages past our commit that have been invalidated and need to be truncated since there was a switch between leaders - because we catchup from `n.commit`, we check if our local AppendEntry matches terms with the incoming AppendEntry, we only need to truncate if the terms don't match Signed-off-by: Maurice van Veen <github@mauricevanveen.com> --------- Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
… at `seq=ae.pindex+1` (#5987) This PR makes three complementary fixes to the way how catchup and truncating is handled. Specifically: - when doing `n.loadEntry(index)` we need to pass where the AppendEntry is in terms of stream sequence, this is equal to `ae.pindex+1` since the `ae.pindex` is the value before it's stored in the stream. - start catchup from `n.commit`, we could have messages past our commit that have been invalidated and need to be truncated since there was a switch between leaders - because we catchup from `n.commit`, we check if our local AppendEntry matches terms with the incoming AppendEntry, we only need to truncate if the terms don't match Signed-off-by: Maurice van Veen <github@mauricevanveen.com> --------- Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
This PR makes three complementary fixes to the way how catchup and truncating is handled.
Specifically:
n.loadEntry(index)we need to pass where the AppendEntry is in terms of stream sequence, this is equal toae.pindex+1since theae.pindexis the value before it's stored in the stream.n.commit, we could have messages past our commit that have been invalidated and need to be truncated since there was a switch between leadersn.commit, we check if our local AppendEntry matches terms with the incoming AppendEntry, we only need to truncate if the terms don't matchSigned-off-by: Maurice van Veen github@mauricevanveen.com