Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft/raft: fix the liveness bug of apply-time activation confchange #12435

Closed
wants to merge 1 commit into from

Conversation

NingLin-P
Copy link

@NingLin-P NingLin-P commented Oct 29, 2020

Signed-off-by: linning [email protected]

This PR fixes the liveness bug mentioned at #12359 by passing the commit index during elections.

@NingLin-P NingLin-P marked this pull request as draft October 29, 2020 09:34
@NingLin-P NingLin-P marked this pull request as ready for review November 3, 2020 08:34
@NingLin-P
Copy link
Author

PTAL @bdarnell @xiang90 @gengliqi

@NingLin-P NingLin-P force-pushed the fix-liveness branch 2 times, most recently from 87a50d9 to 63a2ec4 Compare November 4, 2020 03:51
if m.Reject && m.Commit > r.raftLog.committed && r.raftLog.matchTerm(m.Commit, m.CommitTerm) {
r.logger.Infof("%x [commit: %d, lastindex: %d, lastterm: %d] fast-forwarded commit to vote response [index: %d, term: %d]",
r.id, r.raftLog.committed, r.raftLog.lastIndex(), r.raftLog.lastTerm(), m.Commit, m.CommitTerm)
r.raftLog.commitTo(m.Commit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks an assumption that progress tracker will not change during election. Now we use map to count votes and quorum, so it doesn't seem to affect the library itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Committing across membership changes during an election is scary. Perhaps whenever the commit index advances this way we revert to being a follower.

BusyJay added a commit to BusyJay/raft-rs that referenced this pull request Dec 22, 2020
This is a port of etcd-io/etcd#12435. I made some adaption, notable
changes are:
- Candidate steps down as follower when new commit logs contains conf
change.
- Allow log commit only when a node has no leader.

Those changes are not necessary in theory, but I add them for best
safety.

Signed-off-by: Jay Lee <[email protected]>
BusyJay added a commit to tikv/raft-rs that referenced this pull request Dec 24, 2020
This is a port of etcd-io/etcd#12435. I made some adaption, notable
change is candidate steps down as follower when new commit logs
contains conf change.

Those changes are not necessary in theory, but I add them for best
safety.

Signed-off-by: Jay Lee <[email protected]>
@@ -67,7 +67,8 @@ message Message {
optional Snapshot snapshot = 9 [(gogoproto.nullable) = false];
optional bool reject = 10 [(gogoproto.nullable) = false];
optional uint64 rejectHint = 11 [(gogoproto.nullable) = false];
optional bytes context = 12;
optional bytes context = 12;
optional uint64 commitTerm = 13 [(gogoproto.nullable) = false];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's surprising that commitTerm is set only on vote messages, and not on all messages for which commit is set. Should we be setting it everywhere for consistency? Or maybe it's better to omit it when we can since we have to handle that case for backwards compatibility.

If we do limit commitTerm to vote messages alone, we should add a comment about it. (I know we're not good about comments in this file so far, but we should start moving in the right direction)

@@ -29,15 +29,15 @@ stabilize
Lead:0 State:StateCandidate
HardState Term:1 Vote:1 Commit:2
Messages:
1->2 MsgVote Term:1 Log:1/2
1->3 MsgVote Term:1 Log:1/2
1->2 MsgVote Term:1 Log:1/2 Commit:2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably extend the String() method to include the commit term and not just the commit.

@@ -1298,6 +1318,11 @@ func stepCandidate(r *raft, m pb.Message) error {
case myVoteRespType:
gr, rj, res := r.poll(m.From, m.Type, !m.Reject)
r.logger.Infof("%x has received %d %s votes and %d vote rejections", r.id, gr, m.Type, rj)
if m.Reject && m.Commit > r.raftLog.committed && r.raftLog.matchTerm(m.Commit, m.CommitTerm) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The combined conditions here feel subtle: If someone transmits a commit index/term pair with Reject=false it will be ignored. Why? I know that right now we don't send such an update, but if we started to in the future the versioning gets more complicated. It might be simpler (and remove some duplication) to hoist this check up to a higher level in Step: Whenever we receive a message with a commit and commit term, we advance to it if we can.

if m.Reject && m.Commit > r.raftLog.committed && r.raftLog.matchTerm(m.Commit, m.CommitTerm) {
r.logger.Infof("%x [commit: %d, lastindex: %d, lastterm: %d] fast-forwarded commit to vote response [index: %d, term: %d]",
r.id, r.raftLog.committed, r.raftLog.lastIndex(), r.raftLog.lastTerm(), m.Commit, m.CommitTerm)
r.raftLog.commitTo(m.Commit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Committing across membership changes during an election is scary. Perhaps whenever the commit index advances this way we revert to being a follower.

@stale
Copy link

stale bot commented Aug 11, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Aug 11, 2021
@stale stale bot closed this Sep 2, 2021
shbieng pushed a commit to shbieng/raft-rs that referenced this pull request Sep 19, 2022
This is a port of etcd-io/etcd#12435. I made some adaption, notable
change is candidate steps down as follower when new commit logs
contains conf change.

Those changes are not necessary in theory, but I add them for best
safety.

Signed-off-by: Jay Lee <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants