-
Notifications
You must be signed in to change notification settings - Fork 427
OTA-1272: pkg/cli/admin/upgrade/recommend: Show most-recent update options #1889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTA-1272: pkg/cli/admin/upgrade/recommend: Show most-recent update options #1889
Conversation
Renaming 4.14.1-both-available-and-conditional to 4.14.1-all-recommended, because it has no Recommended!=True conditionalUpdates. And all-recommended seems like a more clear way to express that, while "both-available-and-conditional" could sound like "some are conditional and Recommended!=True". Also bringing in some new examples from a 'launch 4.12.16 azure' Cluster Bot install, followed by: $ oc adm upgrade channel stable-4.13 I collected the examples with: $ oc get clusterversion version -o yaml >pkg/cli/admin/upgrade/recommend/examples/4.12.16-...-cv.yaml one early after the install, when the cluster-version operator was slowly populating its cache of PromQL evaluations (later fixed in 4.12.43 [1]). I've named the early-after-install longest-not-recommended, because it has long hops that are not recommended because risks have not been evaluated yet. I've named the while-after-install longest-recommended, because risks have been evaluated. [1]: https://issues.redhat.com/browse/OCPBUGS-22408?focusedId=23491059&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-23491059
|
@wking: This pull request references OTA-1272 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
3a0601a to
46d8a1c
Compare
|
/cc |
|
@DavidHurta: GitHub didn't allow me to request PR reviews from the following users: DavidHurta. Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/cc |
|
/cc |
petr-muller
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several minor code suggestions inline, most are fine to ignore.
Thinking out loud about the proposed output:
- Feels like the section that contains available paths should be more prominently separated. Right now there's only the
Updates:"header" but it is small and drowns in clutter. - I think we should help the users in understanding that the full list now has version "subsections". There's a little hint about that in the "... and X older 4.13 updates" line that suggests that the above lines are 4.13 updates but I don't think it's enough.
- I think interleaving the non-recommended updates as 4-line - minimum - paragraphs with recommended updates rendered as two-item lines formatted as a table is not a great UX, especially when we start interleaving many of them like in the
4.12.16-longest-recommended.show-outdated-releases-outputexample. I wonder if we could come with some unified presentation for both types (maybe a table with just the version/recommended status/known risk reason in the short form, and a paragraph-per-item for both types of them in long form?
I don't think these need to block this specific PR because we're building the new command in stealth mode and can improve UX as we go, but I wonder if we have a UX target eventually.
pkg/cli/admin/upgrade/recommend/examples/4.12.16-longest-recommended.output
Show resolved
Hide resolved
| Version: 4.13.0 | ||
| Image: quay.io/openshift-release-dev/ocp-release@sha256:74b23ed4bbb593195a721373ed6693687a9b444c97065ce8ac653ba464375711 | ||
| Reason: MultipleReasons | ||
| Message: Exposure to AcceleratedNetworkingRace is unknown due to an evaluation failure: client-side throttling: only 8m30.898043529s has elapsed since the last match call completed for this cluster condition backend; this cached cluster condition request has been queued for later execution | ||
| Adding a new worker node may fail for clusters running on Azure with Accelerated Networking. https://issues.redhat.com/browse/OPNET-479 | ||
|
|
||
| Exposure to MultiNetworkAttachmentsWhereaboutsVersion is unknown due to an evaluation failure: client-side throttling: only 8m30.898047129s has elapsed since the last match call completed for this cluster condition backend; this cached cluster condition request has been queued for later execution | ||
| Upgrade can get stuck on clusters that have multiple network attachments. https://access.redhat.com/solutions/7024726 | ||
|
|
||
| Exposure to PerformanceProfilesCPUQuota is unknown due to an evaluation failure: client-side throttling: only 8m30.898050029s has elapsed since the last match call completed for this cluster condition backend; this cached cluster condition request has been queued for later execution | ||
| Pods may not execute properly because of CPU quota issues on clusters using PerformanceProfiles. https://issues.redhat.com/browse/OCPNODE-1705 | ||
|
|
||
| New PersistentVolumes are created bound to an invalid symlink instead of the correct one and thus fail once updated to 4.13.4 or later as the invalid symlink dispears. https://issues.redhat.com/browse/COS-2349 | ||
|
|
||
| Pods scheduled to nodes running a RHCOS/kernel version with a known defect may fail to start with a CreateContainerError. https://issues.redhat.com/browse/COS-2437 | ||
|
|
||
| VERSION IMAGE | ||
| 4.12.64 quay.io/openshift-release-dev/ocp-release@sha256:669170342e3ae3456b2eb00dd0c45cd817d62af310e0aa2bf214f1da65fae9d0 | ||
| 4.12.63 quay.io/openshift-release-dev/ocp-release@sha256:5db6f8dd1db6b9d07dacfa74574a38a6e518145a3c0ab5d895e9c89e029a39e4 | ||
| 4.12.61 quay.io/openshift-release-dev/ocp-release@sha256:faa201ea7790bb101de7b9b6f01543136e8c589463b8437a10678da0cdd0197a | ||
| 4.12.60 quay.io/openshift-release-dev/ocp-release@sha256:22174072f8aaef46f80bded25d5a2bdfa6fd8bb01fc5242a07e550f132cafc5a | ||
| 4.12.59 quay.io/openshift-release-dev/ocp-release@sha256:8908e5a46f106e7520faa5f5ee679033c4c2f0026f1b8a339e6b3ea4becdd969 | ||
| 4.12.58 quay.io/openshift-release-dev/ocp-release@sha256:b622d947ade4a13754e9d9f6803107f1b321eecd925c897f3c12de1aafccbdc5 | ||
| 4.12.57 quay.io/openshift-release-dev/ocp-release@sha256:ebd20cd66e0bbb5bcd7d16559ff4856918b7172e29d6a7bd34d8cc49a556b8f7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interleaving a multi-paragraph item for a single 4.13.0 release with a table is confusing to me - I know this is a full output but this could happen even in normal output - one item would be heavy like this and then we would have a single-item VERSION/IMAGE table with a version without a risk? I think we should try finding a more consistent presentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your volume concerns (also these). Follow-up work planned in OTA-1273 will add --to and give us the option to avoid rendering messages for some releases. For example, we could go down to one-line-per-target tables, and have a new column for matching risk slugs. But for now, without --to, I think the volume is acceptable while we step towards a more complete solution. Because part of the push here is to put at least the longest-hop risk message in front of the admin, without them having to opt-in to see that text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and now #1897 is up, and part of what it does is drop the multi-line entries from --show-outdated-releases, which hopefully addresses this concern :).
pkg/cli/admin/upgrade/recommend/examples/4.12.16-longest-not-recommended.output
Outdated
Show resolved
Hide resolved
pkg/cli/admin/upgrade/recommend/examples/4.14.1-all-recommended.output
Outdated
Show resolved
Hide resolved
46d8a1c to
07d7b03
Compare
We had been showing all the recommended updates in decreasing order
with --include-not-recommended to see all the
updates-with-assessed-risks in decreasing order. But sometimes users
want to update to the longest-hop, even if there are known risks. Or
they want to read about the assessed risks, in case there's something
they can do to their cluster to mitigate a currently-assessed risk
before kicking off the update. This commit adjusts the output to
order roughly by release freshness For example, for a 4.y cluster in a
4.(y+1) channel:
4.(y+1).tip
4.(y+1).(tip-1)
4.y.tip
4.y.(tip-1)
...
Because users are more likely to care about 4.(y+1).tip, even if it
has assessed risks, than they are to care about 4.y.reallyOld, even if
it doesn't have assessed risks.
I'm showing two releases per z-stream at the moment, with
--show-outdated-releases to see all the results ("releases" to leave
the door open to having things like a single-arch and a multi-arch
target with the same version string). And I'm dropping
--include-not-recommended, now that we are only filtering by age.
I'm also dropping "where are we now?" details, like Progressing=False,
because those are less relevant for folks who are trying to pick which
release they want to target next. I still call out Failing=True and
Progressing=True, because 'oc adm upgrade ...' has client-side checks
for those ("the cluster is experiencing an error reconciling" and "the
cluster is already upgrading", with --allow-upgrade-with-warnings).
Folks who are concerned with how their cluster is currently doing can
run 'oc adm upgrade ...' or the tech-preview 'oc adm upgrade status ...').
I'm also dropping ReleaseAccepted, which is in a weird space after the
admin requests an update but before the cluster-version operator
accepts the retarget request. Because 'oc adm upgrade' is the
subcommand used to request an update (--to and --to-image), I think
that's the appropriate place to surface that condition (and it already
does), so we don't need to surface that condition in the 'recommend'
subcommand (which is only relevant in helping the admin decide if and
how to request an update).
07d7b03 to
d8cac25
Compare
petr-muller
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/meow
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: petr-muller, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
But my tech-preview command isn't overheating etcd, that's between etcd and Azure. Try again: /test e2e-agnostic-ovn-cmd |
|
@wking: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
[ART PR BUILD NOTIFIER] Distgit: openshift-enterprise-cli |
|
[ART PR BUILD NOTIFIER] Distgit: ose-tools |
|
[ART PR BUILD NOTIFIER] Distgit: openshift-enterprise-deployer |
|
[ART PR BUILD NOTIFIER] Distgit: ose-cli-artifacts |
This allows us to drop the detailed, multi-line entries from the long --show-outdated-releases output. Now that form just gives us an overview of releases with 'reason' slugs for any updates with assessed issues. Cluster administrators who see something in that overview that they are interested in can make a subsequent call with '--to VERSION' to get the long-form details on that particular target. I'm keeping the long-form output for release-exposed risks in the default rendering though, because part of the motivation behind d8cac25 (pkg/cli/admin/upgrade/recommend: Show most-recent update options, 2024-09-26, openshift#1889) is to surface those longest-hop risk messages for cluster administrators, in case the issue is something like EvaluationFailed, which says more about the current cluster state than it does about the health of the update. Because the '--to 4.12.51' runs fail for some of our ClusterVersion example fixtures, I've adjusted examples_test.go to catch and compare output when opts.Run fails.
…leases This allows us to drop the detailed, multi-line entries from the long --show-outdated-releases output. Now that form just gives us an overview of releases with 'reason' slugs for any updates with assessed issues. Cluster administrators who see something in that overview that they are interested in can make a subsequent call with '--version VERSION' to get the long-form details on that particular target. I'm keeping the long-form output for release-exposed risks in the default rendering though, because part of the motivation behind d8cac25 (pkg/cli/admin/upgrade/recommend: Show most-recent update options, 2024-09-26, openshift#1889) is to surface those longest-hop risk messages for cluster administrators, in case the issue is something like EvaluationFailed, which says more about the current cluster state than it does about the health of the update. Because the '--version 4.12.51' runs fail for some of our ClusterVersion example fixtures, I've adjusted examples_test.go to catch and compare output when opts.Run fails.
|
hi @wking could you let me know how to enable the new option And how subcommand Thanks! |
…leases This allows us to drop the detailed, multi-line entries from the long --show-outdated-releases output, and the long IMAGE pullspecs from the tabular output. Now those forms just give us an overview of releases with 'reason' slugs for any updates with assessed issues. Cluster administrators who see something in that overview that they are interested in can make a subsequent call with '--version VERSION' to get the long-form details on that particular target. I'm keeping the long-form output for release-exposed risks in the default rendering though, because part of the motivation behind d8cac25 (pkg/cli/admin/upgrade/recommend: Show most-recent update options, 2024-09-26, openshift#1889) is to surface those longest-hop risk messages for cluster administrators, in case the issue is something like EvaluationFailed, which says more about the current cluster state than it does about the health of the update. Because the '--version 4.12.51' runs fail for some of our ClusterVersion example fixtures, I've adjusted examples_test.go to catch and compare output when opts.Run fails.
…-outdated-releases Catching up with: * 0cd2e2c (pkg/cli/admin/upgrade/recommend/examples: Update examples, 2024-09-26, openshift#1889)'s rename from 4.14.1-both-available-and-conditional to 4.14.1-all-recommended. * d8cac25 (pkg/cli/admin/upgrade/recommend: Show most-recent update options, 2024-09-26, openshift#1889)'s pivot from --include-not-recommended to --show-outdated-releases. * c1ef328 (pkg/cli/admin/upgrade/recommend: Add --version option for specific releases, 2024-10-14, openshift#1897) addition of --version.
…-outdated-releases Catching up with: * 0cd2e2c (pkg/cli/admin/upgrade/recommend/examples: Update examples, 2024-09-26, openshift#1889)'s rename from 4.14.1-both-available-and-conditional to 4.14.1-all-recommended. * d8cac25 (pkg/cli/admin/upgrade/recommend: Show most-recent update options, 2024-09-26, openshift#1889)'s pivot from --include-not-recommended to --show-outdated-releases. * c1ef328 (pkg/cli/admin/upgrade/recommend: Add --version option for specific releases, 2024-10-14, openshift#1897) addition of --version.

Building on the CI that merged in #1885.
We had been showing all the recommended updates in decreasing order with
--include-not-recommendedto see all the updates-with-assessed-risks in decreasing order. But sometimes users want to update to the longest-hop, even if there are known risks. Or they want to read about the assessed risks, in case there's something they can do to their cluster to mitigate a currently-assessed risk before kicking off the update. This commit adjusts the output to order roughly by release freshness For example, for a 4.y cluster in a 4.(y+1) channel:Because users are more likely to care about 4.(y+1).tip, even if it has assessed risks, than they are to care about 4.y.reallyOld, even if it doesn't have assessed risks.
I'm showing two releases per z-stream at the moment, with
--show-outdated-releasesto see all the results ("releases" to leave the door open to having things like a single-arch and a multi-arch target with the same version string). And I'm dropping--include-not-recommended, now that we are only filtering by age.I'm also dropping "where are we now?" details, like
Progressing=False, because those are less relevant for folks who are trying to pick which release they want to target next. I still call outFailing=TrueandProgressing=True, becauseoc adm upgrade ...has client-side checks for those. Folks who are concerned with how their cluster is currently doing can runoc adm upgrade ...or the tech-previewoc adm upgrade status ...).I'm also dropping
ReleaseAccepted, which is in a weird space after the admin requests an update but before the cluster-version operator accepts the retarget request. Becauseoc adm upgradeis the subcommand used to request an update (--toand--to-image), I think that's the appropriate place to surface that condition (and it already does), so we don't need to surface that condition in therecommendsubcommand (which is only relevant in helping the admin decide if and how to request an update).