OTA-1272: pkg/cli/admin/upgrade/recommend: Show most-recent update options #1889

wking · 2024-09-30T17:27:12Z

Building on the CI that merged in #1885.

We had been showing all the recommended updates in decreasing order with --include-not-recommended to see all the updates-with-assessed-risks in decreasing order. But sometimes users want to update to the longest-hop, even if there are known risks. Or they want to read about the assessed risks, in case there's something they can do to their cluster to mitigate a currently-assessed risk before kicking off the update. This commit adjusts the output to order roughly by release freshness For example, for a 4.y cluster in a 4.(y+1) channel:

4.(y+1).tip
4.(y+1).(tip-1)
4.y.tip
4.y.(tip-1)
...

Because users are more likely to care about 4.(y+1).tip, even if it has assessed risks, than they are to care about 4.y.reallyOld, even if it doesn't have assessed risks.

I'm showing two releases per z-stream at the moment, with --show-outdated-releases to see all the results ("releases" to leave the door open to having things like a single-arch and a multi-arch target with the same version string). And I'm dropping --include-not-recommended, now that we are only filtering by age.

I'm also dropping "where are we now?" details, like Progressing=False, because those are less relevant for folks who are trying to pick which release they want to target next. I still call out Failing=True and Progressing=True, because oc adm upgrade ... has client-side checks for those. Folks who are concerned with how their cluster is currently doing can run oc adm upgrade ... or the tech-preview oc adm upgrade status ...).

I'm also dropping ReleaseAccepted, which is in a weird space after the admin requests an update but before the cluster-version operator accepts the retarget request. Because oc adm upgrade is the subcommand used to request an update (--to and --to-image), I think that's the appropriate place to surface that condition (and it already does), so we don't need to surface that condition in the recommend subcommand (which is only relevant in helping the admin decide if and how to request an update).

Renaming 4.14.1-both-available-and-conditional to 4.14.1-all-recommended, because it has no Recommended!=True conditionalUpdates. And all-recommended seems like a more clear way to express that, while "both-available-and-conditional" could sound like "some are conditional and Recommended!=True". Also bringing in some new examples from a 'launch 4.12.16 azure' Cluster Bot install, followed by: $ oc adm upgrade channel stable-4.13 I collected the examples with: $ oc get clusterversion version -o yaml >pkg/cli/admin/upgrade/recommend/examples/4.12.16-...-cv.yaml one early after the install, when the cluster-version operator was slowly populating its cache of PromQL evaluations (later fixed in 4.12.43 [1]). I've named the early-after-install longest-not-recommended, because it has long hops that are not recommended because risks have not been evaluated yet. I've named the while-after-install longest-recommended, because risks have been evaluated. [1]: https://issues.redhat.com/browse/OCPBUGS-22408?focusedId=23491059&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-23491059

openshift-ci-robot · 2024-09-30T17:27:16Z

@wking: This pull request references OTA-1272 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

Details

In response to this:

Building on the CI that merged in #1885.

We had been showing all the recommended updates in decreasing order with --include-not-recommended to see all the updates-with-assessed-risks in decreasing order. But sometimes users want to update to the longest-hop, even if there are known risks. Or they want to read about the assessed risks, in case there's something they can do to their cluster to mitigate a currently-assessed risk before kicking off the update. This commit adjusts the output to order roughly by release freshness For example, for a 4.y cluster in a 4.(y+1) channel:

4.(y+1).tip

4.(y+1).(tip-1)

4.y.tip

4.y.(tip-1)

...

Because users are more likely to care about 4.(y+1).tip, even if it has assessed risks, than they are to care about 4.y.reallyOld, even if it doesn't have assessed risks.

I'm showing two releases per z-stream at the moment, with --show-outdated-releases to see all the results ("releases" to leave the door open to having things like a single-arch and a multi-arch target with the same version string). And I'm dropping --include-not-recommended, now that we are only filtering by age.

I'm also dropping "where are we now?" details, like Progressing=False, because those are less relevant for folks who are trying to pick which release they want to target next. I still call out Failing=True and Progressing=True, because oc adm upgrade ... has client-side checks for those. Folks who are concerned with how their cluster is currently doing can run oc adm upgrade ... or the tech-preview oc adm upgrade status ...).

I'm also dropping ReleaseAccepted, which is in a weird space after the admin requests an update but before the cluster-version operator accepts the retarget request. Because oc adm upgrade is the subcommand used to request an update (--to and --to-image), I think that's the appropriate place to surface that condition (and it already does), so we don't need to surface that condition in the recommend subcommand (which is only relevant in helping the admin decide if and how to request an update).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

DavidHurta · 2024-10-01T21:57:05Z

/cc

openshift-ci · 2024-10-01T21:57:07Z

@DavidHurta: GitHub didn't allow me to request PR reviews from the following users: DavidHurta.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cc

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

petr-muller · 2024-10-02T12:06:18Z

/cc

DavidHurta · 2024-10-04T11:03:01Z

/cc

petr-muller

Several minor code suggestions inline, most are fine to ignore.

Thinking out loud about the proposed output:

Feels like the section that contains available paths should be more prominently separated. Right now there's only the Updates: "header" but it is small and drowns in clutter.
I think we should help the users in understanding that the full list now has version "subsections". There's a little hint about that in the "... and X older 4.13 updates" line that suggests that the above lines are 4.13 updates but I don't think it's enough.
I think interleaving the non-recommended updates as 4-line - minimum - paragraphs with recommended updates rendered as two-item lines formatted as a table is not a great UX, especially when we start interleaving many of them like in the 4.12.16-longest-recommended.show-outdated-releases-output example. I wonder if we could come with some unified presentation for both types (maybe a table with just the version/recommended status/known risk reason in the short form, and a paragraph-per-item for both types of them in long form?

I don't think these need to block this specific PR because we're building the new command in stealth mode and can improve UX as we go, but I wonder if we have a UX target eventually.

pkg/cli/admin/upgrade/recommend/examples/4.12.16-longest-recommended.output

petr-muller · 2024-10-04T11:53:12Z

...min/upgrade/recommend/examples/4.12.16-longest-not-recommended.show-outdated-releases-output

+  Version: 4.13.0
+  Image: quay.io/openshift-release-dev/ocp-release@sha256:74b23ed4bbb593195a721373ed6693687a9b444c97065ce8ac653ba464375711
+  Reason: MultipleReasons
+  Message: Exposure to AcceleratedNetworkingRace is unknown due to an evaluation failure: client-side throttling: only 8m30.898043529s has elapsed since the last match call completed for this cluster condition backend; this cached cluster condition request has been queued for later execution
+  Adding a new worker node may fail for clusters running on Azure with Accelerated Networking. https://issues.redhat.com/browse/OPNET-479
+
+  Exposure to MultiNetworkAttachmentsWhereaboutsVersion is unknown due to an evaluation failure: client-side throttling: only 8m30.898047129s has elapsed since the last match call completed for this cluster condition backend; this cached cluster condition request has been queued for later execution
+  Upgrade can get stuck on clusters that have multiple network attachments. https://access.redhat.com/solutions/7024726
+
+  Exposure to PerformanceProfilesCPUQuota is unknown due to an evaluation failure: client-side throttling: only 8m30.898050029s has elapsed since the last match call completed for this cluster condition backend; this cached cluster condition request has been queued for later execution
+  Pods may not execute properly because of CPU quota issues on clusters using PerformanceProfiles. https://issues.redhat.com/browse/OCPNODE-1705
+
+  New PersistentVolumes are created bound to an invalid symlink instead of the correct one and thus fail once updated to 4.13.4 or later as the invalid symlink dispears. https://issues.redhat.com/browse/COS-2349
+
+  Pods scheduled to nodes running a RHCOS/kernel version with a known defect may fail to start with a CreateContainerError. https://issues.redhat.com/browse/COS-2437
+
+  VERSION     IMAGE
+  4.12.64     quay.io/openshift-release-dev/ocp-release@sha256:669170342e3ae3456b2eb00dd0c45cd817d62af310e0aa2bf214f1da65fae9d0
+  4.12.63     quay.io/openshift-release-dev/ocp-release@sha256:5db6f8dd1db6b9d07dacfa74574a38a6e518145a3c0ab5d895e9c89e029a39e4
+  4.12.61     quay.io/openshift-release-dev/ocp-release@sha256:faa201ea7790bb101de7b9b6f01543136e8c589463b8437a10678da0cdd0197a
+  4.12.60     quay.io/openshift-release-dev/ocp-release@sha256:22174072f8aaef46f80bded25d5a2bdfa6fd8bb01fc5242a07e550f132cafc5a
+  4.12.59     quay.io/openshift-release-dev/ocp-release@sha256:8908e5a46f106e7520faa5f5ee679033c4c2f0026f1b8a339e6b3ea4becdd969
+  4.12.58     quay.io/openshift-release-dev/ocp-release@sha256:b622d947ade4a13754e9d9f6803107f1b321eecd925c897f3c12de1aafccbdc5
+  4.12.57     quay.io/openshift-release-dev/ocp-release@sha256:ebd20cd66e0bbb5bcd7d16559ff4856918b7172e29d6a7bd34d8cc49a556b8f7


Interleaving a multi-paragraph item for a single 4.13.0 release with a table is confusing to me - I know this is a full output but this could happen even in normal output - one item would be heavy like this and then we would have a single-item VERSION/IMAGE table with a version without a risk? I think we should try finding a more consistent presentation.

I agree with your volume concerns (also these). Follow-up work planned in OTA-1273 will add --to and give us the option to avoid rendering messages for some releases. For example, we could go down to one-line-per-target tables, and have a new column for matching risk slugs. But for now, without --to, I think the volume is acceptable while we step towards a more complete solution. Because part of the push here is to put at least the longest-hop risk message in front of the admin, without them having to opt-in to see that text.

and now #1897 is up, and part of what it does is drop the multi-line entries from --show-outdated-releases, which hopefully addresses this concern :).

pkg/cli/admin/upgrade/recommend/examples/4.12.16-longest-not-recommended.output

pkg/cli/admin/upgrade/recommend/examples/4.14.1-all-recommended.output

pkg/cli/admin/upgrade/recommend/recommend.go

We had been showing all the recommended updates in decreasing order with --include-not-recommended to see all the updates-with-assessed-risks in decreasing order. But sometimes users want to update to the longest-hop, even if there are known risks. Or they want to read about the assessed risks, in case there's something they can do to their cluster to mitigate a currently-assessed risk before kicking off the update. This commit adjusts the output to order roughly by release freshness For example, for a 4.y cluster in a 4.(y+1) channel: 4.(y+1).tip 4.(y+1).(tip-1) 4.y.tip 4.y.(tip-1) ... Because users are more likely to care about 4.(y+1).tip, even if it has assessed risks, than they are to care about 4.y.reallyOld, even if it doesn't have assessed risks. I'm showing two releases per z-stream at the moment, with --show-outdated-releases to see all the results ("releases" to leave the door open to having things like a single-arch and a multi-arch target with the same version string). And I'm dropping --include-not-recommended, now that we are only filtering by age. I'm also dropping "where are we now?" details, like Progressing=False, because those are less relevant for folks who are trying to pick which release they want to target next. I still call out Failing=True and Progressing=True, because 'oc adm upgrade ...' has client-side checks for those ("the cluster is experiencing an error reconciling" and "the cluster is already upgrading", with --allow-upgrade-with-warnings). Folks who are concerned with how their cluster is currently doing can run 'oc adm upgrade ...' or the tech-preview 'oc adm upgrade status ...'). I'm also dropping ReleaseAccepted, which is in a weird space after the admin requests an update but before the cluster-version operator accepts the retarget request. Because 'oc adm upgrade' is the subcommand used to request an update (--to and --to-image), I think that's the appropriate place to surface that condition (and it already does), so we don't need to surface that condition in the 'recommend' subcommand (which is only relevant in helping the admin decide if and how to request an update).

petr-muller

/meow

openshift-ci · 2024-10-04T17:14:20Z

@petr-muller:

Details

In response to this:

/meow

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

petr-muller · 2024-10-04T17:14:34Z

/lgtm

openshift-ci · 2024-10-04T17:15:01Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: petr-muller, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~pkg/cli/admin/upgrade/OWNERS~~ [petr-muller,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

wking · 2024-10-04T19:33:00Z

e2e-agnostic-ovn-cmd:

: [bz-etcd][invariant] alert/etcdHighCommitDurations should not be at or above info	0s
{  etcdHighCommitDurations was at or above info for at least 12s on platformidentification.JobType{Release:"4.18", FromRelease:"", Platform:"azure", Architecture:"amd64", Network:"ovn", Topology:"ha"} (maxAllowed=0s): pending for 5m56s, firing for 12s:

Oct 04 18:59:12.584 - 12s   W namespace/openshift-etcd node/10.0.0.4:9979 pod/etcd-ci-op-lz2hjldb-67d47-2h66q-master-0 alert/etcdHighCommitDurations alertstate/firing severity/warning ALERTS{alertname="etcdHighCommitDurations", alertstate="firing", endpoint="etcd-metrics", instance="10.0.0.4:9979", job="etcd", namespace="openshift-etcd", pod="etcd-ci-op-lz2hjldb-67d47-2h66q-master-0", prometheus="openshift-monitoring/k8s", service="etcd", severity="warning"}}

But my tech-preview command isn't overheating etcd, that's between etcd and Azure. Try again:

/test e2e-agnostic-ovn-cmd

openshift-ci · 2024-10-04T21:19:26Z

@wking: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-bot · 2024-10-05T00:00:48Z

[ART PR BUILD NOTIFIER]

Distgit: openshift-enterprise-cli
This PR has been included in build openshift-enterprise-cli-container-v4.18.0-202410042310.p0.g026e481.assembly.stream.el9.
All builds following this will include this PR.

openshift-bot · 2024-10-05T00:03:47Z

[ART PR BUILD NOTIFIER]

Distgit: ose-tools
This PR has been included in build ose-tools-container-v4.18.0-202410042310.p0.g026e481.assembly.stream.el9.
All builds following this will include this PR.

openshift-bot · 2024-10-05T00:09:56Z

[ART PR BUILD NOTIFIER]

Distgit: openshift-enterprise-deployer
This PR has been included in build openshift-enterprise-deployer-container-v4.18.0-202410042310.p0.g026e481.assembly.stream.el9.
All builds following this will include this PR.

openshift-bot · 2024-10-05T01:01:04Z

[ART PR BUILD NOTIFIER]

Distgit: ose-cli-artifacts
This PR has been included in build ose-cli-artifacts-container-v4.18.0-202410042310.p0.g026e481.assembly.stream.el9.
All builds following this will include this PR.

This allows us to drop the detailed, multi-line entries from the long --show-outdated-releases output. Now that form just gives us an overview of releases with 'reason' slugs for any updates with assessed issues. Cluster administrators who see something in that overview that they are interested in can make a subsequent call with '--to VERSION' to get the long-form details on that particular target. I'm keeping the long-form output for release-exposed risks in the default rendering though, because part of the motivation behind d8cac25 (pkg/cli/admin/upgrade/recommend: Show most-recent update options, 2024-09-26, openshift#1889) is to surface those longest-hop risk messages for cluster administrators, in case the issue is something like EvaluationFailed, which says more about the current cluster state than it does about the health of the update. Because the '--to 4.12.51' runs fail for some of our ClusterVersion example fixtures, I've adjusted examples_test.go to catch and compare output when opts.Run fails.

…leases This allows us to drop the detailed, multi-line entries from the long --show-outdated-releases output. Now that form just gives us an overview of releases with 'reason' slugs for any updates with assessed issues. Cluster administrators who see something in that overview that they are interested in can make a subsequent call with '--version VERSION' to get the long-form details on that particular target. I'm keeping the long-form output for release-exposed risks in the default rendering though, because part of the motivation behind d8cac25 (pkg/cli/admin/upgrade/recommend: Show most-recent update options, 2024-09-26, openshift#1889) is to surface those longest-hop risk messages for cluster administrators, in case the issue is something like EvaluationFailed, which says more about the current cluster state than it does about the health of the update. Because the '--version 4.12.51' runs fail for some of our ClusterVersion example fixtures, I've adjusted examples_test.go to catch and compare output when opts.Run fails.

JianLi-RH · 2024-10-22T08:24:47Z

hi @wking could you let me know how to enable the new option --show-outdated-releases? I tried set OC_ENABLE_CMD_UPGRADE_STATUS=true and set OC_ENABLE_CMD_UPGRADE_RECOMMEND=true, but still get error:

 [jianl@jianl-thinkpadt14gen4 418]$ oc adm upgrade --show-outdated-releases
error: unknown flag: --show-outdated-releases
See 'oc adm upgrade --help' for usage.
[jianl@jianl-thinkpadt14gen4 418]$

And how subcommand recommend works, what's the expected output of oc adm upgrade recommend?

Thanks!

…leases This allows us to drop the detailed, multi-line entries from the long --show-outdated-releases output, and the long IMAGE pullspecs from the tabular output. Now those forms just give us an overview of releases with 'reason' slugs for any updates with assessed issues. Cluster administrators who see something in that overview that they are interested in can make a subsequent call with '--version VERSION' to get the long-form details on that particular target. I'm keeping the long-form output for release-exposed risks in the default rendering though, because part of the motivation behind d8cac25 (pkg/cli/admin/upgrade/recommend: Show most-recent update options, 2024-09-26, openshift#1889) is to surface those longest-hop risk messages for cluster administrators, in case the issue is something like EvaluationFailed, which says more about the current cluster state than it does about the health of the update. Because the '--version 4.12.51' runs fail for some of our ClusterVersion example fixtures, I've adjusted examples_test.go to catch and compare output when opts.Run fails.

…-outdated-releases Catching up with: * 0cd2e2c (pkg/cli/admin/upgrade/recommend/examples: Update examples, 2024-09-26, openshift#1889)'s rename from 4.14.1-both-available-and-conditional to 4.14.1-all-recommended. * d8cac25 (pkg/cli/admin/upgrade/recommend: Show most-recent update options, 2024-09-26, openshift#1889)'s pivot from --include-not-recommended to --show-outdated-releases. * c1ef328 (pkg/cli/admin/upgrade/recommend: Add --version option for specific releases, 2024-10-14, openshift#1897) addition of --version.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 30, 2024

openshift-ci bot requested review from ardaguclu and ingvagabund September 30, 2024 17:27

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 30, 2024

wking force-pushed the oc-update-recommend-by-freshness branch from 3a0601a to 46d8a1c Compare September 30, 2024 17:55

openshift-ci bot requested a review from petr-muller October 2, 2024 12:06

openshift-ci bot requested a review from DavidHurta October 4, 2024 11:03

petr-muller reviewed Oct 4, 2024

View reviewed changes

wking force-pushed the oc-update-recommend-by-freshness branch from 46d8a1c to 07d7b03 Compare October 4, 2024 16:33

wking force-pushed the oc-update-recommend-by-freshness branch from 07d7b03 to d8cac25 Compare October 4, 2024 17:06

petr-muller approved these changes Oct 4, 2024

View reviewed changes

openshift-ci bot assigned petr-muller Oct 4, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 4, 2024

openshift-merge-bot bot merged commit 026e481 into openshift:master Oct 4, 2024

wking deleted the oc-update-recommend-by-freshness branch October 4, 2024 22:14

wking mentioned this pull request Oct 14, 2024

OTA-1273: pkg/cli/admin/upgrade/recommend: Add --version option for specific releases #1897

Merged

OTA-1272: pkg/cli/admin/upgrade/recommend: Show most-recent update options #1889

OTA-1272: pkg/cli/admin/upgrade/recommend: Show most-recent update options #1889

Uh oh!

Conversation

wking commented Sep 30, 2024

Uh oh!

openshift-ci-robot commented Sep 30, 2024 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DavidHurta commented Oct 1, 2024

Uh oh!

openshift-ci bot commented Oct 1, 2024

Uh oh!

petr-muller commented Oct 2, 2024

Uh oh!

DavidHurta commented Oct 4, 2024

Uh oh!

petr-muller left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

petr-muller Oct 4, 2024

Choose a reason for hiding this comment

Uh oh!

wking Oct 4, 2024

Choose a reason for hiding this comment

Uh oh!

wking Oct 14, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

petr-muller left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Oct 4, 2024

Uh oh!

petr-muller commented Oct 4, 2024

Uh oh!

openshift-ci bot commented Oct 4, 2024

Uh oh!

wking commented Oct 4, 2024

Uh oh!

openshift-ci bot commented Oct 4, 2024

Uh oh!

openshift-bot commented Oct 5, 2024

Uh oh!

openshift-bot commented Oct 5, 2024

Uh oh!

openshift-bot commented Oct 5, 2024

Uh oh!

openshift-bot commented Oct 5, 2024

Uh oh!

JianLi-RH commented Oct 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

openshift-ci-robot commented Sep 30, 2024 •

edited by openshift-ci bot

Loading