-
Notifications
You must be signed in to change notification settings - Fork 215
Bug 2010365: OpenShift Alerting Rules Style-Guide Compliance #800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 2010365: OpenShift Alerting Rules Style-Guide Compliance #800
Conversation
|
@Davoska: This pull request references Bugzilla bug 2010365, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
1 similar comment
|
@Davoska: This pull request references Bugzilla bug 2010365, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh |
|
@Davoska: This pull request references Bugzilla bug 2010365, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/retest-required |
install/0000_90_cluster-version-operator_02_servicemonitor.yaml
Outdated
Show resolved
Hide resolved
install/0000_90_cluster-version-operator_02_servicemonitor.yaml
Outdated
Show resolved
Hide resolved
install/0000_90_cluster-version-operator_02_servicemonitor.yaml
Outdated
Show resolved
Hide resolved
| cluster_operator_up{job="cluster-version-operator"} == 0 | ||
| for: 10m | ||
| labels: | ||
| namespace: openshift-cluster-version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ClusterOperatorDown already has namespace in Telemetry (this time via cluster_operator_up), so I'd rather drop this line from the commit as redundant.
We could consider a max by (namespace, name, ...) (...) aggregation, if we wanted to exclude pod to avoid alert churn whenever the CVO pod cycled. But we're not all that consistent on this front today, so if we do decide to do this sort of thing, we should probably put in enough thought to make the change consistently (e.g. I'm not sure why we are including endpoint but not service in ClusterNotUpgradeable today).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, fixed in d525eb9.
I have also added the aggregation operator max by to CannotRetrieveUpdates, ClusterOperatorDown, ClusterOperatorDegraded, and ClusterOperatorFlapping to avoid the alert churn whenever the CVO pod cycles in the new commit 2b8ef2a.
I am not sure of the purpose of the endpoint or potentially service in ClusterNotUpgradeable, so I have not made this modification.
install/0000_90_cluster-version-operator_02_servicemonitor.yaml
Outdated
Show resolved
Hide resolved
install/0000_90_cluster-version-operator_02_servicemonitor.yaml
Outdated
Show resolved
Hide resolved
|
/hold addressing comments |
…pace labels to alerts Add missing namespace labels to alerting rules to comply with the style guidance [1]. Alerts should include a namespace label indicating the alert's source. Either add the static label or modify the PromQL expression to include the namespace label. [1] https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md#style-guide
900a462 to
2b8ef2a
Compare
|
Thank you, Trevor, for the explanations. It seems that I forgot about the purpose of aggregation operators and receiving labels implicitly from metrics 🤦. I wanted to be explicit about the alerts having the namespace. However, I now understand that it's redundant. Fixed the redundant labels Added aggregation operators to some alerts to exclude necessary labels (mainly for the purpose of avoiding multiple alerts when the CVO pod cycles) in the new commit 2b8ef2a. /unhold |
|
/hold |
2b8ef2a to
ffb2602
Compare
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Davoska, LalatenduMohanty, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
|
This run passed, except for timing out during teardown. /override ci/prow/e2e-agnostic-upgrade |
|
@wking: Overrode contexts on behalf of wking: ci/prow/e2e-agnostic-upgrade DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@wking, I guess we need to run the |
|
/hold Revision 66fc016 was retested 9 times: holding |
Unholding the PR and retesting, so we can potentially override the ci/prow/e2e-agnostic-upgrade again, because the head of the master branch has changed. |
|
/unhold |
|
/retest |
|
Still hitting orthogonal disruption failures because the origin suite doesn't scale disruption for our A->B->A rollback presubmit. /override ci/prow/e2e-agnostic-upgrade |
|
@wking: Overrode contexts on behalf of wking: ci/prow/e2e-agnostic-upgrade DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@Davoska: All pull requests linked via external trackers have merged: Bugzilla bug 2010365 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@Davoska: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Add missing namespace labels
openshift-cluster-versionto alerting rules to comply withthe style guidance. Alerts should include a namespace label indicating the alert's source.