Skip to content

Conversation

@sdodson
Copy link
Member

@sdodson sdodson commented Jan 6, 2023

This works around a CVO bug/design decision where we only evaluate one newly enumerated risk every 10 minutes in an effort to avoid overwhelming the monitoring stack with requests. However this creates a bad UX where if there are many risks to evaluate in the set of available update paths it could be N-1 * 10 minutes before the set of recommended updates are computed.

This preserves the notification of issue while largely being a no-op because clusters have, currently, had better update paths for at least 12 weeks.

We intend to fix the CVO bug, but that won't fix the issue in the deployed fleet.

See: https://issues.redhat.com/browse/OCPBUGS-5469

This works around a CVO bug/design decision where we only evaluate one
newly enumerated risk every 10 minutes in an effort to avoid overwhelming
the monitoring stack with requests. However this creates a bad UX where
if there are many risks to evaluate in the set of available update paths
it could be N-1 * 10 minutes before the set of recommended updates are
computed.

This preserves the notification of issue while largely being a no-op
because clusters have, currently, had better update paths for at least
12 weeks.

We intend to fix the CVO bug, but that won't fix the issue in the
deployed fleet.

See: https://issues.redhat.com/browse/OCPBUGS-5469
@sdodson sdodson force-pushed the convert-old-promql-to-always branch from abdd39b to 370f8a7 Compare January 6, 2023 18:03
topk(1,
label_replace(group(ceph_health_status), "ceph", "yes", "", "")
or
label_replace(0 * group(cluster_version), "ceph", "no", "", "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly shift this PromQL into the message? Or the linked bug (although this one links https://bugzilla.redhat.com/show_bug.cgi?id=2076312#c9, which seems to be private)? But the current message is phrased as if we know (or suspect) the cluster is exposed, while with Always this will also trip for clusters we know are not exposed. Or 🤷, maybe that's more polish than we care about for such an old 4.10.z target as these.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to just leave it as is. Only 17% of the 4.10 fleet has upgrades to < 4.10.17 and those all have paths to better edges listed more prominently.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 6, 2023

@sdodson: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Member

@wking wking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 6, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 6, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sdodson, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 6, 2023
@openshift-merge-robot openshift-merge-robot merged commit 65bca5d into openshift:master Jan 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants