Joint consensus may get stuck if either C<old> or C<new> could not reach a quorum after entering the begin-conf-change stage #192

Fullstop000 · 2019-02-26T04:12:18Z

After entering the begin-conf-change stage In joint consensus, if we couldn't get a majority of responses from C<new> (maybe due to network isolation), the Log Replication will always fail and C<old> cluster seems hanging forever because both C<old> and C<new> quorums must be satisfied.

Below are some un-matured personal thoughts about this scenario:

First if we could not get either quorum from C<old> or C<new>, clearly we can do nothing in raft to avoid this.
Then If we could get one of quorums from C<old> or C<new>, I think cluster has a chance to move forward.

Basically maybe we should let C<old> still work by some rollback mechanisms because in the real world situations like two data center involved transferring we always concern about cluster stability.

In a more generally situation, we can define the relationship between C<old> and C<new>:

C<new> is totally exclusive of C<old> e.g. [A, B, C] to [D, E]
This indicates that if we still can reach a quorum from C<old> but fails from C<new> , the rollback may performs like dismiss the previous begin-conf-change stage in a timeout ( similar to Leader Transfer) because the C<old> can be still trusted to work well and we need to fix the C<new> problems beyond the raft lib.
C<new> overlaps C<old>
Things become kind of complicated now if we still want a rollback to C<old>. Let's say M<old> is the majority of C<old> and similar to the C<new> :
- M<old> is still in C<new> and is also M<new> e.g. [A, B, C] to [A, B, D] --- S1
- M<old> is still in C<new> and is not M<new> e.g. [A, B, C] to [A, B, D, E, F] --- S2
- M<old> is not in C<new> e.g. [A, B, C] to [A, D, E, F] --- S3
I'm still trying to figure out the problem in these situations :(

But since the raft paper and thesis doesn't mention something about above, maybe we can still stick to the current implement and add some comments.

The text was updated successfully, but these errors were encountered:

Hoverbear · 2019-03-01T03:23:04Z

Hi @Fullstop000,

It's so cool you're testing out joint consensus!

Yes, the situation you describe is a bad situation to be in. I think a rollback mechanism does make sense. I'm curious if there is a formal definition of such a thing exists. (cc @ongardie?)

in TiKV we're focusing on a "Replace Node" use for this right now, where you'd have say set ABC and want set ABD. In this way you won't encounter the situation you describe.

However, in order to use joint consensus fully we do need to consider this.

I think allowing the leader to roll back is a valid idea... What do you think @BusyJay / @hicqu / @overvenus ? Maybe a CancelMembershipChange entry?

BusyJay · 2019-03-01T03:47:45Z

Rolling back is a nice to have feature. But it can't be implemented unless conf change is taken affect immediately on receiving.

Fullstop000 · 2019-03-01T07:59:08Z

Do we have any plan to refactor the current implementation to support that ConfChange takes affect immediately ? @BusyJay

Hoverbear · 2019-03-01T15:27:10Z

@Fullstop000 Our current implementation's docs do suggest you call the apply after receiving the message, not applying.

ongardie · 2019-03-08T03:44:14Z

I'm not aware of any formal spec for joint consensus, @Hoverbear. You can search this file for the word "configuration" to see what I implemented in LogCabin: https://github.com/logcabin/logcabin/blob/master/Server/RaftConsensus.cc

There is a rollback mechanism in LogCabin. It's probably not explicitly described in the original paper because rolling back log entries is something that Raft already does. In LogCabin and the Raft paper, the configuration a server uses is the last one in its log (which may not be committed). So, when extraneous entries are removed from its log, its configuration rolls back.

Hoverbear · 2019-03-08T16:49:31Z

@ongardie Thanks for this really useful insight. :)

Hoverbear · 2019-04-05T18:17:48Z

Seems we may be able to solve this with the work from @hicqu ! :)

disksing assigned Hoverbear Mar 1, 2019

Fullstop000 closed this as completed Dec 5, 2019

BusyJay mentioned this issue Jul 1, 2020

How about make configuration changes effective on received instead of applied etcd-io/etcd#11284

Closed

nolouch mentioned this issue Jul 26, 2020

*: use joint consensus tikv/rfcs#54

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Joint consensus may get stuck if either C<old> or C<new> could not reach a quorum after entering the begin-conf-change stage #192

Joint consensus may get stuck if either C<old> or C<new> could not reach a quorum after entering the begin-conf-change stage #192

Fullstop000 commented Feb 26, 2019 •

edited

Loading

Hoverbear commented Mar 1, 2019

BusyJay commented Mar 1, 2019

Fullstop000 commented Mar 1, 2019

Hoverbear commented Mar 1, 2019

ongardie commented Mar 8, 2019

Hoverbear commented Mar 8, 2019

Hoverbear commented Apr 5, 2019

Joint consensus may get stuck if either C<old> or C<new> could not reach a quorum after entering the begin-conf-change stage #192

Joint consensus may get stuck if either C<old> or C<new> could not reach a quorum after entering the begin-conf-change stage #192

Comments

Fullstop000 commented Feb 26, 2019 • edited Loading

Hoverbear commented Mar 1, 2019

BusyJay commented Mar 1, 2019

Fullstop000 commented Mar 1, 2019

Hoverbear commented Mar 1, 2019

ongardie commented Mar 8, 2019

Hoverbear commented Mar 8, 2019

Hoverbear commented Apr 5, 2019

Fullstop000 commented Feb 26, 2019 •

edited

Loading