Convert PromQL risks to Always risks in releases older than 8 weeks #2968

sdodson · 2023-01-06T17:33:03Z

This works around a CVO bug/design decision where we only evaluate one newly enumerated risk every 10 minutes in an effort to avoid overwhelming the monitoring stack with requests. However this creates a bad UX where if there are many risks to evaluate in the set of available update paths it could be N-1 * 10 minutes before the set of recommended updates are computed.

This preserves the notification of issue while largely being a no-op because clusters have, currently, had better update paths for at least 12 weeks.

We intend to fix the CVO bug, but that won't fix the issue in the deployed fleet.

See: https://issues.redhat.com/browse/OCPBUGS-5469

blocked-edges/4.10.14-release-data-with-hyphen-prefix.yaml

blocked-edges/4.10.0-fc.2-modified-aws-load-balancer-service.yaml

This works around a CVO bug/design decision where we only evaluate one newly enumerated risk every 10 minutes in an effort to avoid overwhelming the monitoring stack with requests. However this creates a bad UX where if there are many risks to evaluate in the set of available update paths it could be N-1 * 10 minutes before the set of recommended updates are computed. This preserves the notification of issue while largely being a no-op because clusters have, currently, had better update paths for at least 12 weeks. We intend to fix the CVO bug, but that won't fix the issue in the deployed fleet. See: https://issues.redhat.com/browse/OCPBUGS-5469

wking · 2023-01-06T18:07:56Z

blocked-edges/4.10.11-parallel-ceph_fsync.yaml

-      topk(1,
-        label_replace(group(ceph_health_status), "ceph", "yes", "", "")
-        or
-        label_replace(0 * group(cluster_version), "ceph", "no", "", "")


possibly shift this PromQL into the message? Or the linked bug (although this one links https://bugzilla.redhat.com/show_bug.cgi?id=2076312#c9, which seems to be private)? But the current message is phrased as if we know (or suspect) the cluster is exposed, while with Always this will also trip for clusters we know are not exposed. Or 🤷, maybe that's more polish than we care about for such an old 4.10.z target as these.

I'd prefer to just leave it as is. Only 17% of the 4.10 fleet has upgrades to < 4.10.17 and those all have paths to better edges listed more prominently.

openshift-ci · 2023-01-06T19:20:21Z

@sdodson: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

wking

/lgtm

openshift-ci · 2023-01-06T19:56:54Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sdodson, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci bot requested review from LalatenduMohanty and wking January 6, 2023 17:33

wking reviewed Jan 6, 2023

View reviewed changes

blocked-edges/4.10.14-release-data-with-hyphen-prefix.yaml Outdated Show resolved Hide resolved

wking reviewed Jan 6, 2023

View reviewed changes

blocked-edges/4.10.0-fc.2-modified-aws-load-balancer-service.yaml Outdated Show resolved Hide resolved

sdodson force-pushed the convert-old-promql-to-always branch from abdd39b to 370f8a7 Compare January 6, 2023 18:03

wking reviewed Jan 6, 2023

View reviewed changes

Remove some old conditional updates which were added for demo purposes

95c1702

wking approved these changes Jan 6, 2023

View reviewed changes

openshift-ci bot assigned wking Jan 6, 2023

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 6, 2023

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 6, 2023

openshift-merge-robot merged commit 65bca5d into openshift:master Jan 6, 2023

sdodson mentioned this pull request May 9, 2023

Retire some older conditional edges which required PromQL eval #3590

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Convert PromQL risks to Always risks in releases older than 8 weeks #2968

Convert PromQL risks to Always risks in releases older than 8 weeks #2968

Uh oh!

sdodson commented Jan 6, 2023

Uh oh!

Uh oh!

Uh oh!

wking Jan 6, 2023

Uh oh!

sdodson Jan 6, 2023

Uh oh!

openshift-ci bot commented Jan 6, 2023

Uh oh!

wking left a comment

Uh oh!

openshift-ci bot commented Jan 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Convert PromQL risks to Always risks in releases older than 8 weeks #2968

Convert PromQL risks to Always risks in releases older than 8 weeks #2968

Uh oh!

Conversation

sdodson commented Jan 6, 2023

Uh oh!

Uh oh!

Uh oh!

wking Jan 6, 2023

Choose a reason for hiding this comment

Uh oh!

sdodson Jan 6, 2023

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Jan 6, 2023

Uh oh!

wking left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Jan 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants