Bug 1997596: install/0000_90_cluster-version-operator_02_servicemonitor: Trim labels for UpdateAvailable #643

wking · 2021-08-25T15:26:10Z

These are the only two labels we set on the metric, but the Prometheus scraper adds some more, like job, namespace, pod, etc., to describe who was scraping what. Reducing to the labels we care about avoids annoying re-triggers, e.g. the CVO pod changes. With this change, folks will be able to use a single silence per channel/upstream tuple.

I could even see stripping all the labels, but folks who care enough to bump their channel or upstream are presumably interested in hearing about available updates, at least for a while, so having them re-silence if they go back to not caring doesn't sound that tedious.

openshift-ci · 2021-08-25T15:28:22Z

@wking: This pull request references Bugzilla bug 1997596, which is invalid:

expected the bug to target the "4.9.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 1997596: install/0000_90_cluster-version-operator_02_servicemonitor: Trim labels for UpdateAvailable

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

…ls for UpdateAvailable These are the only two labels we set on the metric, but the Prometheus scraper adds some more, like job, namespace, pod, etc., to describe who was scraping what. Reducing to the labels we care about avoids annoying re-triggers, e.g. the CVO pod changes [1]. I'm using [2]: sum by (channel,upstream) (cluster_version_available_updates) to aggregate over all cluster_version_available_updates series by collapsing the other labels and keeping only the listing two. For example, an input series like: cluster_version_available_updates{channel="stable-4.8", endpoint="metrics", instance="192.168.1.164:9099", job="cluster-version-operator", namespace="openshift-cluster-version", pod="cluster-version-operator-7dd68fd686-p7ckm", prometheus="openshift-monitoring/k8s", receive="true", service="cluster-version-operator", upstream="https://api.openshift.com/api/upgrades_info/v1/graph"} 3 will be collapsed to: {channel="stable-4.8",upstream="https://api.openshift.com/api/upgrades_info/v1/graph"} 3 if there happened to be a second cluster_version_available_updates in the cluster (which is unlikely, because the CVO only serves metrics after acquiring the leader lock), that would get added in too. With the label collapse, folks will be able to use a single silence per channel/upstream tuple. I could even see stripping all the labels, but folks who care enough to bump their channel or upstream are presumably interested in hearing about available updates, at least for a while, so having them re-silence if they go back to not caring doesn't sound that tedious. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1997596 [2]: https://prometheus.io/docs/prometheus/latest/querying/operators/#aggregation-operators

LalatenduMohanty

/lgtm

openshift-ci · 2021-08-25T16:32:37Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: LalatenduMohanty, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [LalatenduMohanty,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-bot · 2021-08-25T17:31:31Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-08-25T17:55:31Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-08-25T19:08:30Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

wking · 2021-08-25T19:50:56Z

/bugzilla refresh

openshift-ci · 2021-08-25T19:52:04Z

@wking: This pull request references Bugzilla bug 1997596, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.9.0) matches configured target release for branch (4.9.0)
bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @jianlinliu

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

wking · 2021-08-25T19:52:19Z

Neither failure was relevant to this alert touch.

/override ci/prow/e2e-agnostic
/override ci/prow/e2e-agnostic-upgrade

openshift-ci · 2021-08-25T19:53:43Z

@wking: Overrode contexts on behalf of wking: ci/prow/e2e-agnostic, ci/prow/e2e-agnostic-upgrade

Details

In response to this:

Neither failure was relevant to this alert touch.

/override ci/prow/e2e-agnostic
/override ci/prow/e2e-agnostic-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci · 2021-08-25T19:54:13Z

@wking: All pull requests linked via external trackers have merged:

openshift/cluster-version-operator#643

Bugzilla bug 1997596 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1997596: install/0000_90_cluster-version-operator_02_servicemonitor: Trim labels for UpdateAvailable

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci bot added bugzilla/severity-low Referenced Bugzilla bug's severity is low for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Aug 25, 2021

wking force-pushed the UpdateAvailable-labels branch from 2f184b0 to 0721e7b Compare August 25, 2021 15:36

openshift-ci bot requested review from LalatenduMohanty and jottofar August 25, 2021 15:40

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 25, 2021

LalatenduMohanty approved these changes Aug 25, 2021

View reviewed changes

openshift-ci bot assigned LalatenduMohanty Aug 25, 2021

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 25, 2021

openshift-ci bot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Aug 25, 2021

openshift-ci bot removed the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Aug 25, 2021

openshift-ci bot requested a review from jianlinliu August 25, 2021 19:52

openshift-merge-robot merged commit 17d9690 into openshift:master Aug 25, 2021

wking deleted the UpdateAvailable-labels branch August 25, 2021 20:01

wking mentioned this pull request Oct 19, 2021

Send alert when MCO can't safely apply updated Kubelet CA on nodes in paused pool openshift/machine-config-operator#2802

Merged

wking mentioned this pull request Mar 3, 2022

Insights Operator Prometheus Alerts for Insights Recommendations openshift/enhancements#1036

Merged

wking mentioned this pull request Jul 25, 2022

Bug 2010365: OpenShift Alerting Rules Style-Guide Compliance #800

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug 1997596: install/0000_90_cluster-version-operator_02_servicemonitor: Trim labels for UpdateAvailable #643

Bug 1997596: install/0000_90_cluster-version-operator_02_servicemonitor: Trim labels for UpdateAvailable #643

Uh oh!

wking commented Aug 25, 2021

Uh oh!

openshift-ci bot commented Aug 25, 2021

Uh oh!

LalatenduMohanty left a comment

Uh oh!

openshift-ci bot commented Aug 25, 2021

Uh oh!

openshift-bot commented Aug 25, 2021

Uh oh!

openshift-bot commented Aug 25, 2021

Uh oh!

openshift-bot commented Aug 25, 2021

Uh oh!

wking commented Aug 25, 2021

Uh oh!

openshift-ci bot commented Aug 25, 2021

Uh oh!

wking commented Aug 25, 2021

Uh oh!

openshift-ci bot commented Aug 25, 2021

Uh oh!

openshift-ci bot commented Aug 25, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Bug 1997596: install/0000_90_cluster-version-operator_02_servicemonitor: Trim labels for UpdateAvailable #643

Bug 1997596: install/0000_90_cluster-version-operator_02_servicemonitor: Trim labels for UpdateAvailable #643

Uh oh!

Conversation

wking commented Aug 25, 2021

Uh oh!

openshift-ci bot commented Aug 25, 2021

Uh oh!

LalatenduMohanty left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Aug 25, 2021

Uh oh!

openshift-bot commented Aug 25, 2021

Uh oh!

openshift-bot commented Aug 25, 2021

Uh oh!

openshift-bot commented Aug 25, 2021

Uh oh!

wking commented Aug 25, 2021

Uh oh!

openshift-ci bot commented Aug 25, 2021

Uh oh!

wking commented Aug 25, 2021

Uh oh!

openshift-ci bot commented Aug 25, 2021

Uh oh!

openshift-ci bot commented Aug 25, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants