Skip to content

Conversation

@hexfusion
Copy link
Contributor

@hexfusion hexfusion commented Oct 1, 2019

This design may add additional information but the core is stable. I will hold open the option to make changes until the review process is complete.

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 1, 2019
@hexfusion hexfusion force-pushed the add_ceo branch 2 times, most recently from 26db67a to 7dac572 Compare October 3, 2019 01:25
@hexfusion hexfusion force-pushed the add_ceo branch 3 times, most recently from 1f91764 to ee0bf6c Compare October 3, 2019 13:18
@hexfusion hexfusion changed the title WIP etcd: cluster-etcd-operator etcd: cluster-etcd-operator Oct 3, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 3, 2019
@hexfusion hexfusion force-pushed the add_ceo branch 3 times, most recently from aaeb78f to 279f188 Compare October 3, 2019 16:11
@hexfusion hexfusion changed the title etcd: cluster-etcd-operator WIP etcd: cluster-etcd-operator Oct 3, 2019
@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 3, 2019
@hexfusion hexfusion changed the title WIP etcd: cluster-etcd-operator etcd: cluster-etcd-operator Feb 4, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 4, 2020
@cgwalters
Copy link
Member

Today in OpenShift 4 we have a pretty strong separation between "control plane" and "workers", where "control plane" means apiserver+etcd.

This enhancement is proposing to separate apiserver and etcd (right?) which has a lot of impact to how we think of cluster installation and management. If e.g. an admin wants to scale up etcd to 5 nodes, that would become supported...but would things be left in a state where e.g. the original 3 "control plane" nodes would be in the MCO's master pool, and 2 other etcd instances would be in worker? So 2/5 would be scheduable by default - and in fact, the MCO would then feel free to try and drain/reboot potentially 1 master and 1 "etcd worker" at the same time. This wouldn't necessarily work if the CEO's PDB prevented it...but it's a bit ugly because the MCO will sit there trying to drain and not succeed.

(The way both the MCO and machine API both do drains today in a way that doesn't try to intelligently look at PDBs is a problem)

Further, clusters backed by machineAPI would probably want to at least have a different (e.g. more performant) machineset for etcd workers or so.

@beekhof
Copy link
Contributor

beekhof commented Feb 4, 2020

Today in OpenShift 4 we have a pretty strong separation between "control plane" and "workers", where "control plane" means apiserver+etcd.

This enhancement is proposing to separate apiserver and etcd (right?)

Maybe things have changed but that wasn't the intent at the time.
Any etcd scaling events were still going to be limited to occurring on master nodes (the creation of which would be made significantly easier by this work).

@hexfusion hexfusion force-pushed the add_ceo branch 2 times, most recently from dbcdbff to ab2e906 Compare February 5, 2020 22:29
@hexfusion
Copy link
Contributor Author

hexfusion commented Feb 5, 2020

This enhancement is proposing to separate apiserver and etcd (right?) which has a lot of impact to how we think of cluster installation and management. I

We do not intend to separate etcd from master control-plane nodes. In the case where we would choose to support a 5 etcd cluster. All etcd instances would remain on the master nodes, thus requiring 5 masters. The pairing of etcd to apiserver would remain the same on the master nodes.

In the initial design, we looked to leverage MCO and maintain the static pod spec in machine-config. But the reality is we need to have the ability to adjust our spec in the same manner that the kube-apiserver operator iterates with its static pod resources. This allows us to perform actions such as cert rotation and make on the fly adjustments to our staticpod, for example during an upgrade.

To conclude we are going to be making some changes to the initial design to account for these issues.

@deads2k
Copy link
Contributor

deads2k commented Feb 6, 2020

This document is showing the current state. It's merging late, but we cannot argue with reality.

@hexfusion fix location and formatting, we'll merge this as representing the current state and then manipulate from there as we go.

@openshift-ci-robot openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 6, 2020
@hexfusion
Copy link
Contributor Author

@deads2k updated

the notReadyAddress state without further investigation. When `etcd-member` Pod starts the init
containers will wait for various observations to continue. Because we are waiting in a not Ready
state we can be simply waiting for direction from the operator. To validate this we ensure that
the Pods containers are not currently in `CrashLoopBackoff`. If we pass the `isPodCrashLoop`[1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exited 0, because inits containers including certs are now complete we can begin the process of
scaling etcd via clustermembercontroller and the status is updated to `MemberReady`.

[1] https://github.com/openshift/cluster-etcd-operator/blob/release-4.4/pkg/operator/configobservation/etcd/observe_etcd.go#L192
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be reference [2], as it points to isPendingReady.

`Cluster` API. The `staticpod` controller will also respond by stopping the static Pod, removing
the existing data-dir and then starting the static Pod. The discovery init container then will
wait for status now observed as `Unknown` to change to Add before it continues
. `configobservation` mcontroller observes the Pod and confirms etcd-member container is no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/mcontroller/controller/

@cgwalters
Copy link
Member

All etcd instances would remain on the master nodes, thus requiring 5 masters.

Hum. But today masters are provisioned via the installer, we don't have machineAPI support for scaling them etc. Is this proposal envisioning a future that supports that, and short term anyone who wants to do it can do it manually by booting new machines using the master Ignition, and what CEO is doing here is just ensuring that etcd is taken care of rather than having that be manual too?

@deads2k
Copy link
Contributor

deads2k commented Feb 6, 2020

Hum. But today masters are provisioned via the installer, we don't have machineAPI support for scaling them etc. Is this proposal envisioning a future that supports that, and short term anyone who wants to do it can do it manually by booting new machines using the master Ignition, and what CEO is doing here is just ensuring that etcd is taken care of rather than having that be manual too?

That's the goal. It hasn't yet been tested.

@deads2k
Copy link
Contributor

deads2k commented Feb 6, 2020

gotta start somewhere. This reflects actual state.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 6, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, hexfusion

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 6, 2020
@openshift-merge-robot openshift-merge-robot merged commit 40bf2bf into openshift:master Feb 6, 2020
@hexfusion hexfusion deleted the add_ceo branch March 22, 2020 00:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants