OCPBUGS-42083: Don't rollout revision until three etcd endpoints are listed#1743
Conversation
| return fmt.Errorf("%v empty in config", strings.Join(requiredPath, ".")) | ||
| } | ||
|
|
||
| if requiredPath[0] == "etcd-servers" { |
There was a problem hiding this comment.
I'd be curious to see in the tests if this also gated rev 1, though I doubt it will be the case. Worst case scenario we can always add that check as a precondition to the revision controller.
There was a problem hiding this comment.
Right, requiredPath[0] is apiServerArguments, updated the code and added unit tests
There was a problem hiding this comment.
My comment was about checking in the test run if the config for revision 1 got the required list of etcd-servers or if it bypassed this check. We need to make sure that rev 1 gets the minimum number of endpoints, otherwise we might run into the same problem as before with the first kube-apiserver pod.
There was a problem hiding this comment.
Ah, got it. Well, we can't issue a revision if config is not considered correct, so I think we have this covered.
Not sure we already have revision controller unit tests
| if !ok { | ||
| return fmt.Errorf("%v is not a slice", strings.Join(requiredPath, ".")) | ||
| } | ||
| if len(configValSlice) < 3 { |
There was a problem hiding this comment.
This needs to be gated for SNO as I assume it would only get 2 endpoints. Also, shouldn't we aim for 4 endpoints since we include localhost?
There was a problem hiding this comment.
yes, adding an SNO gate too.
shouldn't we aim for 4 endpoints since we include localhost?
Three endpoints - localhost, local IP and etcd IP on any other master - should be sufficient to prevent hangup for 10 mins.
We may never have all 4 endpoints on HA with assisted installer - one of masters is used as a bootstrap, so we shouldn't rely on having all 3 masters available during bootstrap.
There was a problem hiding this comment.
I haven’t read the code, but I’m wondering if we only want to stop new revisions during the installation time. If not, what happens when a customer takes a master node out of the pool for service? Does that mean the kas-o will stop installing new revisions? If so, that’s bad, right?
There was a problem hiding this comment.
Also, recently there was a change to the RevisionController (library-go) which permits passing a precondition function. New revisions will be allowed by the controller once the precondition fulfils. Maybe you could use it.
There was a problem hiding this comment.
If not, what happens when a customer takes a master node out of the pool for service?
Multiple scenarios here:
-
The node may be physically down for maintenance, but it would still be listed as an endpoint.
-
Also, while we require three endpoints - its in fact two distinct etcd servers (local IP and localhost are considered single etcd server but two distinct endpoints). So this code won't prevent one master from being torn down during CPMS scale up / down
-
However there is an etcd restore procedure where the cluster is reduced to a single node to restore to backup and later being scale up to 3. We specifically don't want rolling out new revisions until all three masters are available to ensure all three etcd servers are in sync.
TODO: run e2e to do cpms scale up / etcd disruption to ensure that it works as expected
There was a problem hiding this comment.
New revisions will be allowed by the controller once the precondition fulfils
WithRevisionControllerPrecondition might be an alternative to this solution, yes
There was a problem hiding this comment.
WithRevisionControllerPrecondition might be an alternative to this solution, yes
Not a blocking comment just wanted to let you know the mechanism exist so that you may consider using it.
So, when an etcd endpoint is removed from the list?
There was a problem hiding this comment.
Also, should we add a similar mechanism for the aggregated API servers ?
There was a problem hiding this comment.
just wanted to let you know the mechanism exist so that you may consider using it
Right, I think current one is more suited for this situation though - we already have a config validation function so its only natural to extend it. This way we prevent the config from being generated not merely rolled out.
However if this method won't work in some scenarios WithRevisionControllerPrecondition would be a good candidate to try.
should we add a similar mechanism for the aggregated API servers
Its not necessary imo, the issue is happening early on bootstrap when openshift-apiserver is not even created.
If kube-apiserver can reliably write to etcd we'll notice slow etcd rollout via "static pod didn't rollout in 3 mins" test - and a similar solution can be applied to openshift-apiservers
| return fmt.Errorf("%v empty in config", strings.Join(requiredPath, ".")) | ||
| } | ||
|
|
||
| if requiredPath[0] == "etcd-servers" { |
There was a problem hiding this comment.
remember to only do this for HA clusters, not single node.
a6baafd to
b82c96d
Compare
|
/test e2e-azure-ovn |
|
@vrutkovs: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/f3e3cdf0-7c9c-11ef-9614-4fb038d9905b-0 |
b82c96d to
7067c16
Compare
|
/test e2e-azure-ovn |
|
@vrutkovs: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/8898f0e0-7ca4-11ef-9d68-6af2d98e5388-0 |
|
@vrutkovs: This pull request references Jira Issue OCPBUGS-42083, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest |
Hold revision rollouts until we collect three etcd endpoints (localhost, local IP and any other). This ensures that kube-apiserver has another etcd to connect to when local instance is being reconfigured. This functionality is skipped in control plane uses single replica topology
7067c16 to
992114a
Compare
|
/payload-aggregate periodic-ci-openshift-release-master-nightly-4.17-e2e-azure-ovn-kube-apiserver-rollout 10 |
|
@vrutkovs: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/8c0e55c0-800b-11ef-8f16-62d28d7d5e0a-0 |
|
/test e2e-azure-ovn |
|
/retest-required |
|
/test e2e-aws-ovn |
|
/test e2e-aws-ovn |
|
/test e2e-gcp-operator |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dgrisonnet, vrutkovs The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@vrutkovs: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
The problem this is targeting just failed another payload. I think the operator job is failing in general. I'm overriding and will notify the team in slack. /override ci/prow/e2e-gcp-operator |
|
@deads2k: Overrode contexts on behalf of deads2k: ci/prow/e2e-gcp-operator DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@vrutkovs: Jira Issue OCPBUGS-42083: Some pull requests linked via external trackers have merged: The following pull requests linked via external trackers have not merged: These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with Jira Issue OCPBUGS-42083 has not been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[ART PR BUILD NOTIFIER] Distgit: ose-cluster-kube-apiserver-operator |
Prevent apiserver from pointing to a single etcd endpoint (IP and localhost), which will lose connection when the single etcd is being updated