Skip to content

Conversation

@DavidHurta
Copy link
Contributor

@DavidHurta DavidHurta commented Jul 21, 2022

Add missing namespace labels openshift-cluster-version to alerting rules to comply with
the style guidance. Alerts should include a namespace label indicating the alert's source.

@openshift-ci openshift-ci bot added bugzilla/severity-low Referenced Bugzilla bug's severity is low for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jul 21, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 21, 2022

@Davoska: This pull request references Bugzilla bug 2010365, which is invalid:

  • expected the bug to target the "4.12.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 2010365: OpenShift Alerting Rules Style-Guide Compliance

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

1 similar comment
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 21, 2022

@Davoska: This pull request references Bugzilla bug 2010365, which is invalid:

  • expected the bug to target the "4.12.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 2010365: OpenShift Alerting Rules Style-Guide Compliance

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested review from jottofar and wking July 21, 2022 14:32
@DavidHurta
Copy link
Contributor Author

/bugzilla refresh

@openshift-ci openshift-ci bot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jul 21, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 21, 2022

@Davoska: This pull request references Bugzilla bug 2010365, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.12.0) matches configured target release for branch (4.12.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @jianlinliu

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested a review from jianlinliu July 21, 2022 14:37
@DavidHurta
Copy link
Contributor Author

/retest-required

cluster_operator_up{job="cluster-version-operator"} == 0
for: 10m
labels:
namespace: openshift-cluster-version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClusterOperatorDown already has namespace in Telemetry (this time via cluster_operator_up), so I'd rather drop this line from the commit as redundant.

We could consider a max by (namespace, name, ...) (...) aggregation, if we wanted to exclude pod to avoid alert churn whenever the CVO pod cycled. But we're not all that consistent on this front today, so if we do decide to do this sort of thing, we should probably put in enough thought to make the change consistently (e.g. I'm not sure why we are including endpoint but not service in ClusterNotUpgradeable today).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, fixed in d525eb9.

I have also added the aggregation operator max by to CannotRetrieveUpdates, ClusterOperatorDown, ClusterOperatorDegraded, and ClusterOperatorFlapping to avoid the alert churn whenever the CVO pod cycles in the new commit 2b8ef2a.

I am not sure of the purpose of the endpoint or potentially service in ClusterNotUpgradeable, so I have not made this modification.

@DavidHurta
Copy link
Contributor Author

/hold addressing comments

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 28, 2022
…pace labels to alerts

Add missing namespace labels to alerting rules to comply with the style
guidance [1]. Alerts should include a namespace label indicating the
alert's source. Either add the static label or modify the PromQL
expression to include the namespace label.

[1] https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md#style-guide
@DavidHurta DavidHurta force-pushed the bug-2010365-openshift-alerting-rules-compliance branch from 900a462 to 2b8ef2a Compare August 1, 2022 11:32
@DavidHurta
Copy link
Contributor Author

Thank you, Trevor, for the explanations. It seems that I forgot about the purpose of aggregation operators and receiving labels implicitly from metrics 🤦. I wanted to be explicit about the alerts having the namespace. However, I now understand that it's redundant.

Fixed the redundant labels namespace in the new commit d525eb9.

Added aggregation operators to some alerts to exclude necessary labels (mainly for the purpose of avoiding multiple alerts when the CVO pod cycles) in the new commit 2b8ef2a.

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 1, 2022
@DavidHurta DavidHurta requested a review from wking August 1, 2022 12:07
@DavidHurta
Copy link
Contributor Author

DavidHurta commented Aug 1, 2022

/hold
I'll double check the use of aggregation operators in 2b8ef2a.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 1, 2022
@DavidHurta DavidHurta force-pushed the bug-2010365-openshift-alerting-rules-compliance branch from 2b8ef2a to ffb2602 Compare August 1, 2022 17:59
@DavidHurta
Copy link
Contributor Author

2b8ef2a -> ffb2602 wording of the commit message.

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 1, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 5, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Davoska, LalatenduMohanty, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [LalatenduMohanty,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 2 against base HEAD 0f8c533 and 8 for PR HEAD 66fc016 in total

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 1 against base HEAD 0f8c533 and 7 for PR HEAD 66fc016 in total

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 0f8c533 and 6 for PR HEAD 66fc016 in total

@DavidHurta
Copy link
Contributor Author

/retest

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 2 against base HEAD 02838c7 and 5 for PR HEAD 66fc016 in total

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 1 against base HEAD 02838c7 and 4 for PR HEAD 66fc016 in total

@wking
Copy link
Member

wking commented Aug 8, 2022

This run passed, except for timing out during teardown.

/override ci/prow/e2e-agnostic-upgrade

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 8, 2022

@wking: Overrode contexts on behalf of wking: ci/prow/e2e-agnostic-upgrade

Details

In response to this:

This run passed, except for timing out during teardown.

/override ci/prow/e2e-agnostic-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 02838c7 and 3 for PR HEAD 66fc016 in total

@DavidHurta
Copy link
Contributor Author

@wking, I guess we need to run the /override again? 🤔 I am not sure why it's still not passing because there weren't any merges to the master branch since that.

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 2 against base HEAD c699d55 and 2 for PR HEAD 66fc016 in total

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 1 against base HEAD c699d55 and 1 for PR HEAD 66fc016 in total

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD c699d55 and 0 for PR HEAD 66fc016 in total

@openshift-ci-robot
Copy link
Contributor

/hold

Revision 66fc016 was retested 9 times: holding

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 12, 2022
@DavidHurta
Copy link
Contributor Author

DavidHurta commented Aug 16, 2022

#800 (comment):

This run passed, except for timing out during teardown.

/override ci/prow/e2e-agnostic-upgrade

Unholding the PR and retesting, so we can potentially override the ci/prow/e2e-agnostic-upgrade again, because the head of the master branch has changed.

@DavidHurta
Copy link
Contributor Author

DavidHurta commented Aug 16, 2022

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 16, 2022
@DavidHurta
Copy link
Contributor Author

/retest

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 2 against base HEAD 348add6 and 8 for PR HEAD 66fc016 in total

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 2 against base HEAD bfa3017 and 7 for PR HEAD 66fc016 in total

@wking
Copy link
Member

wking commented Aug 16, 2022

Still hitting orthogonal disruption failures because the origin suite doesn't scale disruption for our A->B->A rollback presubmit.

/override ci/prow/e2e-agnostic-upgrade

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 16, 2022

@wking: Overrode contexts on behalf of wking: ci/prow/e2e-agnostic-upgrade

Details

In response to this:

Still hitting orthogonal disruption failures because the origin suite doesn't scale disruption for our A->B->A rollback presubmit.

/override ci/prow/e2e-agnostic-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 1 against base HEAD bfa3017 and 6 for PR HEAD 66fc016 in total

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD bfa3017 and 5 for PR HEAD 66fc016 in total

@openshift-merge-robot openshift-merge-robot merged commit 2d7057c into openshift:master Aug 16, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 16, 2022

@Davoska: All pull requests linked via external trackers have merged:

Bugzilla bug 2010365 has been moved to the MODIFIED state.

Details

In response to this:

Bug 2010365: OpenShift Alerting Rules Style-Guide Compliance

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 16, 2022

@Davoska: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-low Referenced Bugzilla bug's severity is low for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants