Skip to content

Conversation

@bison
Copy link

@bison bison commented Oct 5, 2021

The OpenShift Alerting Consistency enhancement defines a style
guide for the alerts shipped as part of OpenShift. This adds a test
validating some of the guidelines considered required.

This was originally added in #26476, but was reverted in #26499 due to
failures with OVN and vSphere clusters. This adds the tests back, but
adds exceptions for the non-compliant alerts as well as marking the
failing tests as flakes for now. We'll gather data and make the tests
required once we're reasonably sure things are passing with all the
existing alerts.

@bison
Copy link
Author

bison commented Oct 5, 2021

I've run this on an OVN cluster and it passed. I haven't been able to get a vSphere cluster to spin up successfully, but I added exceptions for the problematic rules I saw in CI and that are listed in the bug that was created.

@simonpasquier
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 5, 2021
@bison
Copy link
Author

bison commented Oct 5, 2021

/retest

The [OpenShift Alerting Consistency][1] enhancement defines a style
guide for the alerts shipped as part of OpenShift.  This adds a test
validating some of the guidelines considered required.

This was originally added in #26476, but was reverted in #26499 due to
failures with OVN and vSphere clusters.  This adds the tests back, but
adds exceptions for the non-compliant alerts as well as marking the
failing tests as flakes for now.  We'll gather data and make the tests
required once we're reasonably sure things are passing with all the
existing alerts.

[1]: https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 5, 2021
@bison
Copy link
Author

bison commented Oct 5, 2021

@simonpasquier: I found a new alert that needed an exception, so lost the LGTM. Can you have another look? That's the only change.

@bison
Copy link
Author

bison commented Oct 5, 2021

/test e2e-gcp-upgrade

Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 6, 2021
@bison
Copy link
Author

bison commented Oct 7, 2021

/test e2e-gcp-upgrade

@bison
Copy link
Author

bison commented Oct 7, 2021

@deads2k, this adds the tests for alerting rules again after they were reverted. The problematic tests are marked as flakes now, in addition to having added temporary exceptions for the OVN and vSphere alerts. Can you have a look?

The e2e-gcp-upgrade tests are failing, but it looks unrelated.

@bison
Copy link
Author

bison commented Oct 7, 2021

/test e2e-gcp-upgrade

1 similar comment
@simonpasquier
Copy link
Contributor

/test e2e-gcp-upgrade

@jan--f
Copy link
Contributor

jan--f commented Oct 8, 2021

/lgtm

@bison
Copy link
Author

bison commented Oct 11, 2021

/retest

@bison
Copy link
Author

bison commented Oct 11, 2021

We need an owner to approve this. Could maybe @deads2k, @smarterclayton, or @bparees have a look if you have a minute?

})

if err != nil {
e2e.Failf(err.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you indicated it would flake on failure, but this one is failing on failure....deliberate?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, sorry, only the tests for runbooks and the description + summary labels are flakes for now, because lots of existing alerts don't conform. AFAICT, no existing alerts fail the test for the severity label, so we can just make it enforcing immediately. I am happy to make it flake also if we want to be extra sure though.

@bparees
Copy link
Contributor

bparees commented Oct 11, 2021

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 11, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bison, bparees, jan--f, simonpasquier

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 11, 2021
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

3 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

5 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 16ba463 into openshift:master Oct 12, 2021
@bison bison deleted the alerting-rules branch October 12, 2021 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants