-
Notifications
You must be signed in to change notification settings - Fork 213
Bug 1957991: install/0000_90_cluster-version-operator_02_servicemonitor: Update ClusterOperatorDegraded message to 30m #556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…usterOperatorDegraded message to 30m Catching up with fb5257d (install/0000_90_cluster-version-operator_02_servicemonitor: Soften ClusterOperatorDegraded, 2021-05-06, openshift#554).
|
@wking: This pull request references Bugzilla bug 1957991, which is valid. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/retest |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jottofar, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@wking: All pull requests linked via external trackers have merged: Bugzilla bug 1957991 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
…usterOperatorDegraded During install, the CVO has pushed manifests into the cluster as fast as possible without blocking on "has the in-cluster resource leveled?" since way back in b0b4902 (clusteroperator: Don't block on failing during initialization, 2019-03-11, openshift#136). That can lead to ClusterOperatorDown and ClusterOperatorDegraded firing during install, as we see in [1], where: * ClusterOperatorDegraded started pending at 5:00:15Z [2]. * Install completed at 5:09:58Z [3]. * ClusterOperatorDegraded started firing at 5:10:04Z [2]. * ClusterOperatorDegraded stopped firing at 5:10:23Z [2]. * The e2e suite complained about [1]: alert ClusterOperatorDegraded fired for 15 seconds with labels: {... name="authentication"...} (open bug: https://bugzilla.redhat.com/show_bug.cgi?id=1939580) ClusterOperatorDown is similar, but I'll leave addressing it to a separate commit. For ClusterOperatorDegraded, the degraded condition should not be particularly urgent [4], so we should be find bumping it to 'warning' and using 'for: 30m' or something more relaxed than the current 10m. This commit brings back * fb5257d (install/0000_90_cluster-version-operator_02_servicemonitor: Soften ClusterOperatorDegraded, 2021-05-06, openshift#554) and * 92ed7f1 (install/0000_90_cluster-version-operator_02_servicemonitor: Update ClusterOperatorDegraded message to 30m, 2021-05-08, openshift#556). There are some conflicts, because I am not bringing back 90539f9 (pkg/cvo/metrics: Ignore Degraded for cluster_operator_up, 2021-04-26, openshift#550). But that one had its own conflicts in metrics.go [5], and the conflicts with this commit were orthogonal context issues, so moving this back to 4.7 first won't make it much harder to bring back openshift#550 and such later on, if we decide to do that. [1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-upi-4.8/1389436726862155776 [2]: https://promecieus.dptools.openshift.org/?search=https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-upi-4.8/1389436726862155776 group by (alertstate) (ALERTS{alertname="ClusterOperatorDegraded"}) [3]: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-upi-4.8/1389436726862155776/artifacts/e2e-aws-upi/clusterversion.json [4]: openshift/api#916 [5]: openshift#550 (comment)
Catching up with fb5257d (#554).