From 0721e7bfab28e0170efcecb94b3d2762c4cd230d Mon Sep 17 00:00:00 2001 From: "W. Trevor King" Date: Wed, 25 Aug 2021 08:19:23 -0700 Subject: [PATCH] install/0000_90_cluster-version-operator_02_servicemonitor: Trim labels for UpdateAvailable These are the only two labels we set on the metric, but the Prometheus scraper adds some more, like job, namespace, pod, etc., to describe who was scraping what. Reducing to the labels we care about avoids annoying re-triggers, e.g. the CVO pod changes [1]. I'm using [2]: sum by (channel,upstream) (cluster_version_available_updates) to aggregate over all cluster_version_available_updates series by collapsing the other labels and keeping only the listing two. For example, an input series like: cluster_version_available_updates{channel="stable-4.8", endpoint="metrics", instance="192.168.1.164:9099", job="cluster-version-operator", namespace="openshift-cluster-version", pod="cluster-version-operator-7dd68fd686-p7ckm", prometheus="openshift-monitoring/k8s", receive="true", service="cluster-version-operator", upstream="https://api.openshift.com/api/upgrades_info/v1/graph"} 3 will be collapsed to: {channel="stable-4.8",upstream="https://api.openshift.com/api/upgrades_info/v1/graph"} 3 if there happened to be a second cluster_version_available_updates in the cluster (which is unlikely, because the CVO only serves metrics after acquiring the leader lock), that would get added in too. With the label collapse, folks will be able to use a single silence per channel/upstream tuple. I could even see stripping all the labels, but folks who care enough to bump their channel or upstream are presumably interested in hearing about available updates, at least for a while, so having them re-silence if they go back to not caring doesn't sound that tedious. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1997596 [2]: https://prometheus.io/docs/prometheus/latest/querying/operators/#aggregation-operators --- install/0000_90_cluster-version-operator_02_servicemonitor.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/install/0000_90_cluster-version-operator_02_servicemonitor.yaml b/install/0000_90_cluster-version-operator_02_servicemonitor.yaml index 6890a702f7..622b8a66ff 100644 --- a/install/0000_90_cluster-version-operator_02_servicemonitor.yaml +++ b/install/0000_90_cluster-version-operator_02_servicemonitor.yaml @@ -60,7 +60,7 @@ spec: summary: Your upstream update recommendation service recommends you update your cluster. description: For more information refer to 'oc adm upgrade'{{ "{{ with $console_url := \"console_url\" | query }}{{ if ne (len (label \"url\" (first $console_url ) ) ) 0}} or {{ label \"url\" (first $console_url ) }}/settings/cluster/{{ end }}{{ end }}" }}. expr: | - cluster_version_available_updates > 0 + sum by (channel,upstream) (cluster_version_available_updates) > 0 labels: severity: info - name: cluster-operators