Skip to content

Conversation

@hexfusion
Copy link
Contributor

@hexfusion hexfusion commented Oct 25, 2019

This PR brings in the controllers necessary to facilitate scaling from bootstrap and membership.

notable changes

  • clustermembercontroller: The clusterMember controller is tasked with managing etcd membership reconciliations. We loop Pods and conclude from observations of etcds isMemberReady() or isMemberRemove(). Based on those observations we can conclude if the Pod is currently part of the cluster, pending joining or pending leaving the cluster. This controller is the only controller that actually talks directly to etcd Cluster API with MemberAdd and MemberRemove.

  • staticpodcontroller: staticPod controller is deployed as a DaemonSet and takes action on the physical static Pod itself. We Get the local static Pod for the controller and conclude from observations isMemberRemove(). Remove results in stopping the static Pod and deleting the data-dir and TLS peer, server and metric certs from disk. After this takes place we then start the Pod. Stopping the Pod involves removing the static pod spec from the manifests directory. Starting the Pod involves a process by which we extract the etcd-member Pod spec from MCO and persist it back into the manifests directory.

  • staticsynccontroller: staticSync controller handles the issue of providing static Pods with assets allowing static Pods to use the default service account as normal Pods do. These assets include 4 files namespace, ca.crt, service-ca.crt and token.

  • hostetcdendpointcontroller: this controller is tasked with managing the previously static list of endpoints on the host network by which the kube-apiserver generates the storage backend.

  • etcdcertsigner: this controller is tasked with generating TLS certs required for etcd membership.

  • configobservationcontroller: The configObservation controller has the primary job of converting observations from etcd endpoints into appropriate member/pending keys for etcds.

design currently WIP

For more details please refer to openshift/enhancements#56

Long live the CEO!

NOTE: we have commented Dockerfile so we are not deployed through CVO until all the pieces are merged for installer and MCO.

Signed-off-by: Sam Batschelet <[email protected]>
Signed-off-by: Sam Batschelet <[email protected]>
Signed-off-by: Sam Batschelet <[email protected]>
Signed-off-by: Sam Batschelet <[email protected]>
Signed-off-by: Sam Batschelet <[email protected]>
@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Oct 25, 2019
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 25, 2019
@hexfusion hexfusion mentioned this pull request Oct 25, 2019
@hexfusion hexfusion force-pushed the 4.3-cluster-etcd-operator-v1 branch from ded9d8d to f1776f6 Compare October 25, 2019 15:11
hexfusion and others added 16 commits October 25, 2019 11:17
Add bootstrapteardown/teardown.go which watches on clusterversions.
The installer watches on clusterversions to wait for the installation
to complete. We use the same mechanism so removing the bootstrap from
ceo and removing the resources in installer are both triggered on
clusterversions reporting install complete and are executed at about
the same time preventing excessive etcd logging.
1) util.go was failing on empty string
2) cluster member controller adds only if MemberReady
3) Observer etcd isPendingReady returns true if certs container exists
and the pod is not crashlooping
isPendingReady returns true only when etcd-member container is waiting
and 2 init containers have successfully exit. This will take care of
1) Bootstrap: Containers will be waiting for PodInitializing state
2) Restart: Restart controller will remove the pod, making it
in PodInitializing state
The controller watches on cluster etcd CRD, endpoints and pods in
etcd namespace. It reconciles by running a diff on actual number of
members observed by etcd api with members in the endpoint list.

Querying the etcd membership is necessary here because all the
controllers in this project depends on the endpoints being the source of
truth. If the endpoint list is based on other things like
cluster.members data in etcd, it creates a cyclic dependency and is
not necessarily accurate.

hostendpointcontroller: fix typo
Ignore removing etcd-bootstrap. This is have a side effect, any time
bootstrap member is added, it would have to be removed outside of this
controller. It is handled during bootstrap and teardown.
@hexfusion hexfusion force-pushed the 4.3-cluster-etcd-operator-v1 branch from f1776f6 to 4ecd1f6 Compare October 25, 2019 15:20
@alaypatel07
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 25, 2019
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alaypatel07, hexfusion

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hexfusion hexfusion changed the title [WIP] *: welcome cluster-etcd-operator *: welcome cluster-etcd-operator Oct 25, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 25, 2019
@openshift-merge-robot openshift-merge-robot merged commit b206e95 into openshift:master Oct 25, 2019
@hexfusion hexfusion deleted the 4.3-cluster-etcd-operator-v1 branch October 25, 2019 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants