Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joint consensus may get stuck if either C<old> or C<new> could not reach a quorum after entering the begin-conf-change stage #192

Closed
Fullstop000 opened this issue Feb 26, 2019 · 7 comments
Assignees

Comments

@Fullstop000
Copy link
Member

Fullstop000 commented Feb 26, 2019

After entering the begin-conf-change stage In joint consensus, if we couldn't get a majority of responses from C<new> (maybe due to network isolation), the Log Replication will always fail and C<old> cluster seems hanging forever because both C<old> and C<new> quorums must be satisfied.

Below are some un-matured personal thoughts about this scenario:

  • First if we could not get either quorum from C<old> or C<new>, clearly we can do nothing in raft to avoid this.
  • Then If we could get one of quorums from C<old> or C<new>, I think cluster has a chance to move forward.

Basically maybe we should let C<old> still work by some rollback mechanisms because in the real world situations like two data center involved transferring we always concern about cluster stability.

In a more generally situation, we can define the relationship between C<old> and C<new>:

  1. C<new> is totally exclusive of C<old> e.g. [A, B, C] to [D, E]
    This indicates that if we still can reach a quorum from C<old> but fails from C<new> , the rollback may performs like dismiss the previous begin-conf-change stage in a timeout ( similar to Leader Transfer) because the C<old> can be still trusted to work well and we need to fix the C<new> problems beyond the raft lib.
  2. C<new> overlaps C<old>
    Things become kind of complicated now if we still want a rollback to C<old>. Let's say M<old> is the majority of C<old> and similar to the C<new> :
    - M<old> is still in C<new> and is also M<new> e.g. [A, B, C] to [A, B, D] --- S1
    - M<old> is still in C<new> and is not M<new> e.g. [A, B, C] to [A, B, D, E, F] --- S2
    - M<old> is not in C<new> e.g. [A, B, C] to [A, D, E, F] --- S3
    I'm still trying to figure out the problem in these situations :(

But since the raft paper and thesis doesn't mention something about above, maybe we can still stick to the current implement and add some comments.

@Hoverbear
Copy link
Contributor

Hi @Fullstop000,

It's so cool you're testing out joint consensus!

Yes, the situation you describe is a bad situation to be in. I think a rollback mechanism does make sense. I'm curious if there is a formal definition of such a thing exists. (cc @ongardie?)

in TiKV we're focusing on a "Replace Node" use for this right now, where you'd have say set ABC and want set ABD. In this way you won't encounter the situation you describe.

However, in order to use joint consensus fully we do need to consider this.

I think allowing the leader to roll back is a valid idea... What do you think @BusyJay / @hicqu / @overvenus ? Maybe a CancelMembershipChange entry?

@BusyJay
Copy link
Member

BusyJay commented Mar 1, 2019

Rolling back is a nice to have feature. But it can't be implemented unless conf change is taken affect immediately on receiving.

@Fullstop000
Copy link
Member Author

Do we have any plan to refactor the current implementation to support that ConfChange takes affect immediately ? @BusyJay

@Hoverbear
Copy link
Contributor

@Fullstop000 Our current implementation's docs do suggest you call the apply after receiving the message, not applying.

@ongardie
Copy link

ongardie commented Mar 8, 2019

I'm not aware of any formal spec for joint consensus, @Hoverbear. You can search this file for the word "configuration" to see what I implemented in LogCabin: https://github.com/logcabin/logcabin/blob/master/Server/RaftConsensus.cc

There is a rollback mechanism in LogCabin. It's probably not explicitly described in the original paper because rolling back log entries is something that Raft already does. In LogCabin and the Raft paper, the configuration a server uses is the last one in its log (which may not be committed). So, when extraneous entries are removed from its log, its configuration rolls back.

@Hoverbear
Copy link
Contributor

@ongardie Thanks for this really useful insight. :)

@Hoverbear
Copy link
Contributor

Seems we may be able to solve this with the work from @hicqu ! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants