Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: allow voter to become learner through snapshot #10864

Merged
merged 1 commit into from
Jul 11, 2019

Conversation

tbg
Copy link
Contributor

@tbg tbg commented Jun 28, 2019

At the time of writing, we don't allow configuration changes to change
voters to learners directly, but note that a snapshot may compress
multiple changes to the configuration into one: the voter could have
been removed, then readded as a learner and the snapshot reflects both
changes. In that case, a voter receives a snapshot telling it that it is
now a learner. In fact, the node has to accept that snapshot, or it is
permanently cut off from the Raft log.

I think this just wasn't realized in the original work, but this is just
my guess since there generally is very little rationale on the various
decisions made. I also generally haven't been able to figure out whether
the decision to prevent voters from becoming learners without first
having been removed was motivated by some particular concern, or if it
just wasn't deemed necessary. I suspect it is the latter because
demoting a voter seems perfectly safe.

See #8751 (comment).

cc @xiang90

@tbg tbg requested a review from bdarnell June 28, 2019 10:08
@codecov-io
Copy link

Codecov Report

Merging #10864 into master will increase coverage by 0.23%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10864      +/-   ##
==========================================
+ Coverage   62.82%   63.06%   +0.23%     
==========================================
  Files         398      398              
  Lines       37446    37441       -5     
==========================================
+ Hits        23526    23611      +85     
+ Misses      12339    12252      -87     
+ Partials     1581     1578       -3
Impacted Files Coverage Δ
raft/raft.go 91.93% <ø> (+0.63%) ⬆️
clientv3/balancer/grpc1.7-health.go 22.38% <0%> (-38.09%) ⬇️
proxy/grpcproxy/register.go 69.44% <0%> (-11.12%) ⬇️
clientv3/naming/grpc.go 68.42% <0%> (-7.02%) ⬇️
clientv3/namespace/watch.go 87.87% <0%> (-6.07%) ⬇️
pkg/transport/listener.go 51.18% <0%> (-3.8%) ⬇️
etcdserver/util.go 86.25% <0%> (-3.75%) ⬇️
pkg/testutil/recorder.go 77.77% <0%> (-3.71%) ⬇️
etcdserver/api/v3election/election.go 61.11% <0%> (-2.78%) ⬇️
pkg/adt/interval_tree.go 81.38% <0%> (-1.81%) ⬇️
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 948e276...08aa02b. Read the comment docs.

@tbg
Copy link
Contributor Author

tbg commented Jun 28, 2019

I also generally haven't been able to figure out whether
the decision to prevent voters from becoming learners without first
having been removed was motivated by some particular concern

One concern that I could see is that turning the leader from a voter into a learner has to be handled appropriately. This is similar to removing the leader completely, except it's worse if we get it wrong.

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me, but I'd like to hear from @xiang90 or @siddontang why they recommended testing for this in the original PR.

@xiang90
Copy link
Contributor

xiang90 commented Jul 1, 2019

@tbg

We just do not have this use case (demote) at that time... I think it is a safe operation otherwise.

/cc @siddontang any thoughts?

@tbg
Copy link
Contributor Author

tbg commented Jul 3, 2019

We just do not have this use case (demote) at that time... I think it is a safe operation otherwise.

In case this wasn't clear, I believe that this change fixes a real bug when using learners -- some peers may end up never being able to accept snapshots again.

I'm going to go ahead and merge this (pretty confident it's correct at this point) but I'd be happy to hear from @siddontang just on the off chance that I'm missing something.

At the time of writing, we don't allow configuration changes to change
voters to learners directly, but note that a snapshot may compress
multiple changes to the configuration into one: the voter could have
been removed, then readded as a learner and the snapshot reflects both
changes. In that case, a voter receives a snapshot telling it that it is
now a learner. In fact, the node has to accept that snapshot, or it is
permanently cut off from the Raft log.

I think this just wasn't realized in the original work, but this is just
my guess since there generally is very little rationale on the various
decisions made. I also generally haven't been able to figure out whether
the decision to prevent voters from becoming learners without first
having been removed was motivated by some particular concern, or if it
just wasn't deemed necessary. I suspect it is the latter because
demoting a voter seems perfectly safe.

See etcd-io#8751 (comment).
@tbg tbg merged commit b2274ef into etcd-io:master Jul 11, 2019
@tbg tbg deleted the learner-snap branch July 11, 2019 13:48
@BusyJay
Copy link
Contributor

BusyJay commented Jul 12, 2019

We have an assumption that when a node is removed from the group, it will never be added to the group again. If a peer is readded on the same node, it will be assigned a different peer ID.

I have not thought through it yet whether a removed peer can be readded without a new ID but it certainly needs to pay attention to campaign because a removed peer can vote now.

@tbg
Copy link
Contributor Author

tbg commented Jul 12, 2019

No, votes from removed peers are not taken into account. You can see this here, we iterate over the config to fish the votes. If there are any votes from nodes not in the active config, they are ignored. Generally all of the new quorum code is taking the configuration into account properly.
In reworking that, I did fix a bug in ReadIndex which was susceptible to the problem you're describing (it was counting votes from nodes that could then be removed), but I wasn't able to work out any counter-example from that; ReadIndex generally needs only very few guarantees because it only wants to make sure it'd discover any overlapping but newer leader. Either way, ReadIndex is solid now, too.

We have an assumption that when a node is removed from the group, it will never be added to the group again. If a peer is readded on the same node, it will be assigned a different peer ID.

That's an assumption that's reasonable to make in an app (CockroachDB works the same way, we use tombstones to avoid accidentally recreating a replica that has had its state deleted already) but it's not one intrinsic to raft. For example, it can make a lot of sense to promote/demote voters repeatedly and it could lead to these snapshots.

danhhz added a commit to danhhz/cockroach that referenced this pull request Jul 16, 2019
To pick up etcd-io/etcd#10864.

Release note: None
tbg pushed a commit to danhhz/cockroach that referenced this pull request Jul 17, 2019
To pick up etcd-io/etcd#10864.

Release note: None
danhhz added a commit to danhhz/cockroach that referenced this pull request Jul 18, 2019
To pick up etcd-io/etcd#10864.

This commit also resolves the following, which have happened since the
last version of etcd/raft:
- The raft.ProgressState enum and its variants now live in raft/tracker.
- The Paused field is no longer present on raft.Progress.
- RawNode.Status no longer returns a pointer.
- Some of the fields on raft.Status are now on an embedded
  raft.BasicStatus struct.

Release note: None
craig bot pushed a commit to cockroachdb/cockroach that referenced this pull request Jul 18, 2019
38904: parser: Add user defined column families to CTAS. r=adityamaru27 a=adityamaru27

This change introduces grammar to support user defined
column families in a CREATE TABLE ... AS query.

38913: deps: bump go.etcd.io/etcd r=tbg a=danhhz

To pick up etcd-io/etcd#10864.

Release note: None

38972: exec: fix calculation of selectivity stat when num batches is zero r=yuzefovich a=yuzefovich

Release note: None

Co-authored-by: Aditya Maru <[email protected]>
Co-authored-by: Tobias Schottdorf <[email protected]>
Co-authored-by: Daniel Harrison <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants