Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: auto-promoting nodes #10913

Closed

Conversation

maxenglander
Copy link

@maxenglander maxenglander commented Jul 20, 2019

Propose v3.4 v3.5 support for having a cluster automatically promote
a learner node to a voting member when added to a cluster with
member add --learner --auto.

The changes proposed for v3.4 v3.5 in this PR are implemented in #10887.

@codecov-io
Copy link

codecov-io commented Jul 20, 2019

Codecov Report

Merging #10913 into master will increase coverage by 0.34%.
The diff coverage is 16.36%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10913      +/-   ##
==========================================
+ Coverage   65.91%   66.26%   +0.34%     
==========================================
  Files         403      403              
  Lines       37021    37067      +46     
==========================================
+ Hits        24403    24563     +160     
+ Misses      11112    11007     -105     
+ Partials     1506     1497       -9     
Impacted Files Coverage Δ
etcdctl/ctlv3/command/printer.go 0.00% <0.00%> (ø)
etcdctl/ctlv3/command/printer_json.go 0.00% <0.00%> (ø)
etcdserver/server.go 77.03% <0.00%> (+0.30%) ⬆️
pkg/tlsutil/cipher_suites.go 100.00% <ø> (ø)
mvcc/watchable_store.go 84.75% <75.00%> (-0.20%) ⬇️
clientv3/watch.go 90.40% <100.00%> (+0.40%) ⬆️
raft/raft.go 90.37% <100.00%> (-0.62%) ⬇️
proxy/grpcproxy/watcher.go 85.71% <0.00%> (-8.17%) ⬇️
clientv3/client.go 71.42% <0.00%> (-2.63%) ⬇️
integration/bridge.go 90.07% <0.00%> (-2.30%) ⬇️
... and 23 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7cc2f8a...2b87fc8. Read the comment docs.

@jingyih
Copy link
Contributor

jingyih commented Jul 20, 2019

cc @gyuho @jpbetz @xiang90 @WIZARD-CXY

@jingyih
Copy link
Contributor

jingyih commented Jul 20, 2019

Thanks for making the PR. I think this is a good starting point. Let's use this PR to gather comments and suggestions on API changes as well as implementation details regarding learner promotion in upcoming and future releases.

@@ -9,7 +9,7 @@ Learner
Gyuho Lee (github.com/gyuho, *Amazon.com*),
Joe Betz (github.com/jpbetz, *Google Inc.*)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add your name here:)

@@ -140,6 +142,25 @@ etcd server must not transfer leadership to learner, since it may still lag behi

Internally in Raft, second ``MemberAdd`` call to learner node promotes it to a voting member. Leader maintains the progress of each follower and learner. If learner has not completed its snapshot message, reject promote request. Only accept promote request if and only if: The learner node is in a healthy state. The learner is in sync with leader or the delta is within the threshold (e.g. the number of entries to replicate to learner is less than 1/10 of snapshot count, which means it is less likely that even after promotion leader would not need send snapshot to the learner). All these logic are hard-coded in ``etcdserver`` package and not configurable.

*Expose "Auto-promoting" node type to "MemberAdd" API.*

etcd client adds an “AutoPromote” flag to ``MemberAdd`` API for auto-promoting learner nodes. This flag is only effective when used in conjunction with the “IsLearner” flag, and is ignored otherwise. An etcd server handler applies membership change entry with ``pb.ConfChangeAddAutoPromotingNode`` type. Once the command has been applied, a server joins the cluster with ``etcd --initial-cluster-state=existing`` flag.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "AutoPromote" flag only needed in short term and will eventually be deprecated?

Copy link
Author

@maxenglander maxenglander Jul 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jingyih, I think you could definitely make a case for that.

On the other hand, I think you could also make a case for removing the --learner (and IsLearner) flag, because in v3.5 the default and only behavior of of newly added members is to be a learner (until manually or automatically promoted). It might be more expressive to keep --auto-promote and allow operators to specify --auto-promote=false to add a standby learner.

@maxenglander
Copy link
Author

Hi @jingyih @gyuho @jpbetz @xiang90 @WIZARD-CXY

Pardon the ping, wanted see if anyone has additional feedback on this proposal and associated PR?

@jingyih
Copy link
Contributor

jingyih commented Jul 25, 2019

I added an item for this in tomorrow's etcd community meeting to ask for reviews and comments.

https://docs.google.com/document/d/16XEGyPBisZvmmoIHSZzv__LoyOeluC5a4x353CX0SIM

@xiang90
Copy link
Contributor

xiang90 commented Aug 12, 2019

There are three cases for auto-promote:

  1. promote a learner to voter regardless of the cluster size.
  2. promote a learner to voter if the cluster size is smaller than the expected size (for example, a voter is removed from the cluster)
  3. promote a learner to voter if some other voter is unhealthy for some user defined duration (remove the unhealthy node and replace it with a learner)

@xiang90 xiang90 added this to the etcd-v3.5 milestone Aug 12, 2019
@maxenglander
Copy link
Author

@xiang90 thanks for the comment. Does that mean you do not think there is a 4th case (the case I have proposed in this PR) to have new voters be added first as learners, which are then auto-promoted upon catching up with leader? Or is 4th case == 1st case?

@xiang90
Copy link
Contributor

xiang90 commented Aug 12, 2019

@maxenglander

Sorry, I mean there are additional 3 cases we should be aware of. I am fine with addressing the one you proposed here.

From UX perspective, I hope them can be consistent. So I mentioned other cases for you to consider.

@maxenglander
Copy link
Author

Thanks for the clarification @xiang90, what you're saying about consistency makes sense to me. I can imagine, for example, having additional possible values for --auto-promote such as --auto-promote=when-caught-up-with-leader, --auto-promote=when-a-voter-is-unhealty-for=30s, etc.

@stale
Copy link

stale bot commented Apr 6, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 6, 2020
@stale stale bot removed the stale label Apr 26, 2020
@maxenglander
Copy link
Author

maxenglander commented Apr 26, 2020

👋
Am revisiting this. Has been a while!

Based on the comments above and the current state of learner doc, I am thinking of something like this for the CLI and PB APIs.

CLI

Current learner behavior

Do not automatically promote to reader or voter. Operator must manually trigger promotion.

member add --learner

The above command is equivalent to:

member add --learner --auto=false --min-progress=90 --delay=0

Promote to voter

Automatically promotes learner to voter when is has achieved at least 90% progress towards being in-sync with leader.

member add --learner --auto --min-progress=90

Automatically promote learner to voter when the cluster has less than three voters for 5 seconds.

member add --learner --auto --min-voters=3 --delay=5000

Automatically promote learner to voter when the cluster has less than current number of members for 10 seconds.

member add --learner --auto --min-members=existing --delay=10000

Promote to reader

Promote to read-only node when progress is 50% of leader's. Allow reading possibly stale data.

member add --learner --auto --promote-to=reader \
           --min-progress=50 --read-consistency=weak

Promote to read-only node when any member (voter or otherwise) becomes unhealthy. Disallow reading possibly stale data.

member add --learner --auto --promote-to=reader \
           --max-unhealthy-members=0 --read-consistency=strong

Combine promotion rules

Combines promotion rules so that the learner is promoted either...

  • To a voter when all of the conditions below hold:
    • Progress is at least 95%
    • Any voter becomes unhealthy
    • Above conditions hold for 15 seconds or more
  • To a reader when its progress is 100% of leader's for at least 3 seconds
member add --learner \
           --promote-rule=auto:true,min-progress:95,max-unhealthy-voters=0,delay=15000 \
           --promote-rule=auto:true,to:reader,min-progress:100,delay=3000

In theory, the above rules could be interpreted so that the learner is first promoted to a reader when it has caught up with the leader. Later, if any voter becomes unhealthy, the learner (now a reader) could be promoted to a voter. We wouldn't allow the learner to be promoted to a voter, and later a reader (presumably that would be a demotion).

PB

For the protobuf API, was thinking of something like this.

message MemberMonitor {
  enum Op {
    GREATER_EQUAL = 0;
    LESS = 1;
  }

  enum Type {
    PROGRESS = 0;
    UNHEALTHY_VOTERS = 1;
    MEMBER_COUNT = 2;
    VOTER_COUNT = 3;
  }

  Type type = 1;
  Op op = 2;
  uint64 threshold = 3;
  uint64 delay = 4;
}

message ReaderOptions {
  enum ReadConsistency {
    STRONG = 0;
    WEAK = 1;
  }
}

message MemberPromoteRule {
  enum TargetRole {
    READER = 0;
    VOTER = 1;
  }

  TargetRole promoteTo = 1;
  repeated MemberMonitor monitors = 2;
  optional ReaderOptions readerOptions = 3;
}

message MemberAddRequest {
  // peerURLs is the list of URLs the added member will use to communicate with the cluster.
  repeated string peerURLs = 1;
  // isLearner indicates if the added member is raft learner.
  bool isLearner = 2;

  // Zero or more rules which govern the automatic promotion of learners.
  // If no rules are specified, the learner is not automatically promoted, but
  // the operator may manually trigger a promotion..
  repeated PromoteRule promoteRules = 3;
}

If all that is too much, can go with something a little simpler, e.g. with something that does not support combinations of rules, and which flattens all possible monitors into the MemberAddRequest.

Looking forward to feedback. In the meantime, am going to start making adjustments to this PR to support --auto, --min-progress, and --delay, as well the subset of the PB changes above needed to support those options.

@maxenglander
Copy link
Author

Pardon the ping @jingyih and @xiang90. I've tried to make adjustments to address some of the comments you've provided me here.

There are three cases for auto-promote:

  • promote a learner to voter regardless of the cluster size.
  • promote a learner to voter if the cluster size is smaller than the expected size (for example, a voter is removed from the cluster)
  • promote a learner to voter if some other voter is unhealthy for some user defined duration (remove the unhealthy node and replace it with a learner)

My last comment above describes how I am currently imagining the MemberAdd API would evolve to support these different cases. I've updated this PR to describe one step of the evolution, and updated this PR to demonstrate a possible implementation.

Looking forward to your feedback.

@maxenglander maxenglander requested a review from jingyih May 14, 2020 04:36
@maxenglander
Copy link
Author

maxenglander commented Jun 13, 2020

Hey @jingyih @xiang90 hope you won't mind if I bump this again and ask for feedback on the latest design proposal and PR. Thanks!

@jingyih jingyih self-assigned this Jun 17, 2020
@jingyih
Copy link
Contributor

jingyih commented Jun 20, 2020

Thanks for the detailed propose! I do not quite follow your PB section, maybe it makes more sense to me when I see it in the PR. Currently there is no "reader" member in etcd. Is this something you plan on adding?

@maxenglander
Copy link
Author

maxenglander commented Jun 20, 2020

Hi @jingyih thanks for following up.

In my comment above I included the concept of a "reader" member because it is mentioned in the learner docs as planned for 3.5. With this comment I wanted to show how, even though the changes in this PR do not include any reference to a "reader", they could easily be extended to support that type of member.

I wasn't planning on adding support for a "reader" member in this PR (or in this one either), but I would happy to help with that effort in another PR.

@jingyih
Copy link
Contributor

jingyih commented Jun 22, 2020

Thanks! Could you mention this topic in this week's etcd community meeting and ask for people's comment and review? The behavior you defined in CLI section looks good to me.

@maxenglander
Copy link
Author

Hi @jingyih sorry I couldn't make it today due to a conflict. Will try to make the next one. In the meantime, let me know if there's anything else I can do to move this along.

@jingyih
Copy link
Contributor

jingyih commented Jun 26, 2020

My suggestion of the next step is to start implement member add --learner --auto. I am not sure if people are going to agree with other flags you are proposing. But adding an --auto flag and the mechanism to auto promote leaner should be fine - I think they are required for auto promoting feature.

@maxenglander
Copy link
Author

@jingyih Sounds good, it's implemented here.

It currently has a --min-progress flag, but it defaults to 90%, so users can just specify member add --learner --auto and it will automatically promote when learner is at least 90% towards leader's progress. User's can change this to have a higher or lower threshold e.g. --min-progress 95.

I'm happy to remove the flag if people don't like it, but I think it's useful for showing how other rules (e.g. --min-unhealthy-voters) can be accommodated by the PB API and rule evaluation logic.

@maxenglander
Copy link
Author

maxenglander commented Jul 30, 2020

@gyuho thanks for the chat during community meeting, here is the PR I referenced with design proposal for auto-promoting nodes. I've tried to follow @jingyih and @xiang90 recommendations above as much as possible.

The proposal here lays out long-term API design, which will eventually support automatically promoting nodes based on multiple conditions (# voters in cluster, # healthy nodes). The implementation PR in #10887 implements a subset of this proposal: auto promote based on log progress of learner.

@maxenglander
Copy link
Author

Hi @jingyih @gyuho just checking in to see if you have any thoughts here about moving forward with this. Thanks!

@stale
Copy link

stale bot commented Dec 8, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Dec 8, 2020
@maxenglander
Copy link
Author

/stale remove

@stale stale bot removed the stale label Dec 10, 2020
@maxenglander
Copy link
Author

Hi @jingyih @gyuho just checking in again. Anything I can do to help move this forward?

Copy link
Member

@spzala spzala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxenglander thanks for the PR, and fyi that the project Doc is moved to https://github.com/etcd-io/website/ so this PR should be closed and be created in the doc repo now. Sorry it took long in the review process, but we will try to review and merge quickly in the doc repo. Thanks!

@maxenglander
Copy link
Author

Thanks you @spzala I'll get this moved over.

@spzala
Copy link
Member

spzala commented Feb 17, 2021

Thanks you @spzala I'll get this moved over.

Yrw, and thank you @maxenglander

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants