-
Notifications
You must be signed in to change notification settings - Fork 535
etcd: cluster-etcd-operator #56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
26db67a to
7dac572
Compare
1f91764 to
ee0bf6c
Compare
aaeb78f to
279f188
Compare
|
Today in OpenShift 4 we have a pretty strong separation between "control plane" and "workers", where "control plane" means apiserver+etcd. This enhancement is proposing to separate apiserver and etcd (right?) which has a lot of impact to how we think of cluster installation and management. If e.g. an admin wants to scale up etcd to 5 nodes, that would become supported...but would things be left in a state where e.g. the original 3 "control plane" nodes would be in the MCO's (The way both the MCO and machine API both do drains today in a way that doesn't try to intelligently look at PDBs is a problem) Further, clusters backed by machineAPI would probably want to at least have a different (e.g. more performant) machineset for etcd workers or so. |
Maybe things have changed but that wasn't the intent at the time. |
dbcdbff to
ab2e906
Compare
Signed-off-by: Sam Batschelet <[email protected]>
We do not intend to separate etcd from master control-plane nodes. In the case where we would choose to support a 5 etcd cluster. All etcd instances would remain on the master nodes, thus requiring 5 masters. The pairing of etcd to apiserver would remain the same on the master nodes. In the initial design, we looked to leverage MCO and maintain the static pod spec in machine-config. But the reality is we need to have the ability to adjust our spec in the same manner that the kube-apiserver operator iterates with its static pod resources. This allows us to perform actions such as cert rotation and make on the fly adjustments to our staticpod, for example during an upgrade. To conclude we are going to be making some changes to the initial design to account for these issues. |
|
This document is showing the current state. It's merging late, but we cannot argue with reality. @hexfusion fix location and formatting, we'll merge this as representing the current state and then manipulate from there as we go. |
Signed-off-by: Sam Batschelet <[email protected]>
|
@deads2k updated |
| the notReadyAddress state without further investigation. When `etcd-member` Pod starts the init | ||
| containers will wait for various observations to continue. Because we are waiting in a not Ready | ||
| state we can be simply waiting for direction from the operator. To validate this we ensure that | ||
| the Pods containers are not currently in `CrashLoopBackoff`. If we pass the `isPodCrashLoop`[1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing reference for [1]. It should be:
https://github.com/openshift/cluster-etcd-operator/blob/release-4.4/pkg/operator/configobservation/etcd/observe_etcd.go#L339
| exited 0, because inits containers including certs are now complete we can begin the process of | ||
| scaling etcd via clustermembercontroller and the status is updated to `MemberReady`. | ||
|
|
||
| [1] https://github.com/openshift/cluster-etcd-operator/blob/release-4.4/pkg/operator/configobservation/etcd/observe_etcd.go#L192 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be reference [2], as it points to isPendingReady.
| `Cluster` API. The `staticpod` controller will also respond by stopping the static Pod, removing | ||
| the existing data-dir and then starting the static Pod. The discovery init container then will | ||
| wait for status now observed as `Unknown` to change to Add before it continues | ||
| . `configobservation` mcontroller observes the Pod and confirms etcd-member container is no |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/mcontroller/controller/
Hum. But today masters are provisioned via the installer, we don't have machineAPI support for scaling them etc. Is this proposal envisioning a future that supports that, and short term anyone who wants to do it can do it manually by booting new machines using the master Ignition, and what CEO is doing here is just ensuring that etcd is taken care of rather than having that be manual too? |
That's the goal. It hasn't yet been tested. |
|
gotta start somewhere. This reflects actual state. /lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, hexfusion The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This design may add additional information but the core is stable. I will hold open the option to make changes until the review process is complete.