-
Notifications
You must be signed in to change notification settings - Fork 4.8k
test/extended/prometheus: Validate alerting rules #26504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I've run this on an OVN cluster and it passed. I haven't been able to get a vSphere cluster to spin up successfully, but I added exceptions for the problematic rules I saw in CI and that are listed in the bug that was created. |
|
/lgtm |
|
/retest |
The [OpenShift Alerting Consistency][1] enhancement defines a style guide for the alerts shipped as part of OpenShift. This adds a test validating some of the guidelines considered required. This was originally added in #26476, but was reverted in #26499 due to failures with OVN and vSphere clusters. This adds the tests back, but adds exceptions for the non-compliant alerts as well as marking the failing tests as flakes for now. We'll gather data and make the tests required once we're reasonably sure things are passing with all the existing alerts. [1]: https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md
|
@simonpasquier: I found a new alert that needed an exception, so lost the LGTM. Can you have another look? That's the only change. |
|
/test e2e-gcp-upgrade |
simonpasquier
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
/test e2e-gcp-upgrade |
|
@deads2k, this adds the tests for alerting rules again after they were reverted. The problematic tests are marked as flakes now, in addition to having added temporary exceptions for the OVN and vSphere alerts. Can you have a look? The |
|
/test e2e-gcp-upgrade |
1 similar comment
|
/test e2e-gcp-upgrade |
|
/lgtm |
|
/retest |
|
We need an owner to approve this. Could maybe @deads2k, @smarterclayton, or @bparees have a look if you have a minute? |
| }) | ||
|
|
||
| if err != nil { | ||
| e2e.Failf(err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you indicated it would flake on failure, but this one is failing on failure....deliberate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, sorry, only the tests for runbooks and the description + summary labels are flakes for now, because lots of existing alerts don't conform. AFAICT, no existing alerts fail the test for the severity label, so we can just make it enforcing immediately. I am happy to make it flake also if we want to be extra sure though.
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bison, bparees, jan--f, simonpasquier The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
3 similar comments
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
5 similar comments
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
The OpenShift Alerting Consistency enhancement defines a style
guide for the alerts shipped as part of OpenShift. This adds a test
validating some of the guidelines considered required.
This was originally added in #26476, but was reverted in #26499 due to
failures with OVN and vSphere clusters. This adds the tests back, but
adds exceptions for the non-compliant alerts as well as marking the
failing tests as flakes for now. We'll gather data and make the tests
required once we're reasonably sure things are passing with all the
existing alerts.