-
Notifications
You must be signed in to change notification settings - Fork 1.5k
*: add support for cluster-etcd-operator #2730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason why this can't be done in the if section above ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to be able to manage retry. There is a circumstance where we can be etcd-bootstrap.done but then we still need to perform the checks below. I felt this was more explicit vs elif. I can add a comment about retry or create a single block with elif if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also once CEO is stable this will be removed as we will only deploy in a managed state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to revisit and craft in a manner where commit can be reverted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed PTAL
data/data/manifests/bootkube/etcd-host-service-endpoints.yaml.template
Outdated
Show resolved
Hide resolved
|
/hold We want to coordinate this going in. |
Looks like a flake /test e2e-libvirt |
|
/test e2e-gcp |
|
/test e2e-azure |
trying libvirt again as suggested by CI /test e2e-libvirt |
|
Build SUCCESS, see build http://10.8.144.11:8080/job/dev-tools/1338/ |
|
/lgtm |
Signed-off-by: Sam Batschelet <[email protected]>
|
/test e2e-aws-upi |
|
/test e2e-gcp-upi |
|
/test e2e-vpshere |
|
Build FAILURE, see build http://10.8.144.11:8080/job/dev-tools/1342/ |
|
LGTM just want to see if any cluster variants are affected. |
|
/retest |
|
LGTM. baremetal IPI looks good with this change, thanks for checking. This probably simplifies a bunch of things for us, I filed #2740 to look at that. |
|
@abhinavdahiya last passing |
|
/test e2e-aws-upi |
|
Although this appears to be etcd related we have made no changes to the way etcd bootstrap should be working, /test e2e-azure |
|
GCP bootstrapped fine I see some performance issues with etcd unrelated to this PR.
/test e2e-gcp-upi |
|
/test e2e-gcp-upi |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abhinavdahiya, alaypatel07, crawford, hexfusion The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
flakey has passed before using the same code /test e2e-azure |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
3 similar comments
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
@hexfusion: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
@hexfusion @abhinavdahiya @jhixson74 I built I tried to create a cluster on AWS, and the bootstrap failed. I unpacked the logs-bundle tar.gz file, and found the following in all the log files in the |
|
@rodrigc master installer(4.4) is still using 4.3 release payload your best bet is to use explicit pinning of release payload and installer. For example[1] otherwise you can override installer release with |
|
@hexfusion ah ok. I periodically build openshift-install from master just to do a quick sanity check to make sure that my stuff still works on the latest pre-release Openshift, so I don't get any surprises when the full release comes out. |
|
@hexfusion I now built I tried to create a cluster in AWS. It looks like the VM's are provisioned, but In bootstrap/journals/bootkube.log I see: Can you recommend a value of |
In 4.4 cluster-etcd-operator will take over the process of bootstrapping the etcd cluster. To provide a clear path to disable/revert these changes we have setup the following conditional logic.
MCO: The MCO render command invoked in bootkube has a new optional flag to pass the value of the
cluster-etcd-operatorimage[1]. The availability of this flags value[2] is used to conditionally adjust theetcd-memberstatic pod spec allowing it to use the new bootstrapping method via the operator or fall back to the 4.3 SRV method.Installer: The installer in 4.4 has a few notable changes introduced by this PR. First of all the
rendercommand populates a static pod manifest which creates a single member etcd cluster. After we have the single node cluster we can progress and cluster-operator can be deployed. This speeds up the time it takes for the operators to begin to reconcile as we are no longer waiting for all 3 etcds to bootstrap before we progress the operators.cluster-etcd-operator: CEO is currently set as Unmanaged[3]. This allows us to include the CEO in CVO operator payload while setting the controllers to perform noop. This short term phase allows us to merge this PR proving that we can at the same time have CEO included in CVO but still use the old SRV bootstrap.
Revert Plan: If a case existed where we had a design error and the operator needed to be pulled from 4.4.
:
[1] https://github.com/openshift/installer/pull/2608/files#diff-ce82c1d8a44f7dfc41dfc024085ccfeeR298
[2] https://github.com/openshift/machine-config-operator/blob/bd846958bc95d049547164046a962054fca093df/templates/master/00-master/_base/files/etc-kubernetes-manifests-etcd-member.yaml#L22
[3] https://github.com/openshift/cluster-etcd-operator/blob/master/manifests/0000_12_etcd-operator_01_operator.cr.yaml#L8
Depends on: