-
Notifications
You must be signed in to change notification settings - Fork 427
OTA-1560: pkg/cli/admin/upgrade: Promote 'recommend' to general availability #2076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTA-1560: pkg/cli/admin/upgrade: Promote 'recommend' to general availability #2076
Conversation
When I added this option in db01005 (pkg/cli/admin/upgrade/recommend: Add end-to-end output test coverage, 2024-09-24, openshift#1885), I'd been planning on dropping it at GA. But given the current focus on testing, I'm going to keep it through GA. This commit hides the test-focused option, to avoid distracting end users with internal knobs they are unlikely to need access to.
Most OpenShift feature gates are tracked in openshift/api. This one
is just the local environment variable, but we still want to meet the
usual tech-preview-to-GA promotion criteria [1]:
* Tests must contain either [OCPFeatureGate:<FeatureGateName>] or the
standard upstream [FeatureGate:<FeatureGateName>].
This one isn't relevant to this feature, because the test-cases are
ungated to run the tech-preview functionality against production
clusters [2]. As docs say [3]:
Your cluster does not need to be a Technology Preview-enabled
cluster in order for you to use the 'oc adm upgrade recommend'
command.
* There must be at least five tests for each FeatureGate.
Sippy returns exactly five [4]:
[Serial][sig-cli] oc adm upgrade recommend runs successfully, even without upstream OpenShift Update Service customization [Suite:openshift/conformance/serial]
[Serial][sig-cli] oc adm upgrade recommend runs successfully with an empty channel [Suite:openshift/conformance/serial]
[Serial][sig-cli] oc adm upgrade recommend When the update service has conditional recommendations runs successfully when listing all updates [Suite:openshift/conformance/serial]
[Serial][sig-cli] oc adm upgrade recommend When the update service has conditional recommendations runs successfully with conditional recommendations to the --version target [Suite:openshift/conformance/serial]
[Serial][sig-cli] oc adm upgrade recommend When the update service has no recommendations runs successfully [Suite:openshift/conformance/serial]
* Every test must be run on every TechPreview platform we have jobs
for. (Ask for an exception if your feature doesn't support a
variant.)
* Every test must run at least 14 times on every platform/variant.
* Every test must pass at least 95% of the time on every
platform/variant.
Checking on one of the least-run test-cases ("... with conditional
recommendations to the --version target"), with 124 runs [5]:
$ curl -s 'https://sippy.dptools.openshift.org/api/tests?release=4.20&filter=%7B%22items%22%3A%5B%7B%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22equals%22%2C%22value%22%3A%22%5BSerial%5D%5Bsig-cli%5D%20oc%20adm%20upgrade%20recommend%20When%20the%20update%20service%20has%20conditional%20recommendations%20runs%20successfully%20with%20conditional%20recommendations%20to%20the%20--version%20target%20%5BSuite%3Aopenshift%2Fconformance%2Fserial%5D%22%7D%2C%7B%22columnField%22%3A%22variants%22%2C%22not%22%3Atrue%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%22never-stable%22%7D%2C%7B%22columnField%22%3A%22variants%22%2C%22not%22%3Atrue%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%22aggregated%22%7D%5D%2C%22linkOperator%22%3A%22and%22%7D&period=default&sortField=delta_from_passing_average&sort=asc&collapse=false' | jq -r '[.[] | select(.variants) | ([.variants[] | select(startswith("Platform:") or startswith("Architecture:") or startswith("Topology:") or startswith("NetworkStack:"))] | tostring) as $platform | {current_runs, current_successes, platform: $platform}] | group_by(.platform)[] | (map(.current_runs) | add | tostring) + " " + (map(.current_successes) | add | tostring) + " " + .[0].platform' | sort -n
1 1 ["Platform:none","Architecture:ppc64le","NetworkStack:ipv4","Topology:ha"]
1 1 ["Platform:none","Architecture:s390x","NetworkStack:ipv4","Topology:ha"]
5 5 ["Platform:openstack","Architecture:amd64","NetworkStack:ipv4","Topology:ha"]
6 6 ["Platform:aws","Architecture:multi","NetworkStack:ipv4","Topology:ha"]
6 6 ["Platform:metal","Architecture:amd64","NetworkStack:ipv6","Topology:ha"]
7 7 ["Platform:metal","Architecture:amd64","NetworkStack:dual","Topology:ha"]
10 10 ["Platform:aws","Architecture:arm64","NetworkStack:ipv4","Topology:ha"]
11 11 ["Platform:gcp","Architecture:amd64","NetworkStack:ipv4","Topology:ha"]
12 12 ["Platform:azure","Architecture:amd64","NetworkStack:ipv4","Topology:ha"]
13 13 ["Platform:metal","Architecture:amd64","NetworkStack:ipv4","Topology:ha"]
14 14 ["Platform:aws","Architecture:amd64","NetworkStack:ipv4","Topology:single"]
17 17 ["Platform:aws","Architecture:amd64","NetworkStack:ipv4","Topology:ha"]
21 21 ["Platform:vsphere","Architecture:amd64","NetworkStack:ipv4","Topology:ha"]
As a platform-agnostic feature, it seems like we need these variants [6]:
{"Cloud":"aws","Architecture":"amd64","Topology":"ha"} (and we have 17)
{"Cloud":"azure","Architecture":"amd64","Topology":"ha"} (and we have 12, a bit short of 14)
{"Cloud":"gcp","Architecture":"amd64","Topology":"ha"} (and we have 11, a bit short of 14)
{"Cloud":"vsphere","Architecture":"amd64","Topology":"ha"} (and we have 21)
{"Cloud":"metal","Architecture":"amd64","Topology":"ha","NetworkStack":"ipv4"} (and we have 13, one short of 14)
{"Cloud":"metal","Architecture":"amd64","Topology":"ha","NetworkStack":"ipv6"} (and we have 6, well short of 14)
{"Cloud":"metal","Architecture":"amd64","Topology":"ha","NetworkStack":"dual"} (and we have 7, well short of 14)
{"Cloud":"aws","Architecture":"amd64","Topology":"single"} (and we have 14)
* Test results are taken from the last 7 days if the test was run at
least 14 times during that period. Otherwise, data from the last 14
days is used.
The tests are still less than a week old, so this isn't relevant.
* Test flakes (even if the test eventually passes on a retry) are
considered failures and negatively impact the pass rate.
Every test-case has a 100% pass rate [4], so this isn't relevant.
So Azure, GCP, and metal are all short. But given this is
platform-agnostic code, and the success rate so far is 100% across all
variants, I'm confident enough to drop the guard now. Waiting another
week would not be wrong either.
[1]: https://github.com/openshift/api/blob/88b2b21555f3c12755740b197cbd5b9b4ca11e19/README.md#defining-featuregate-e2e-tests
[2]: openshift/origin#29831
[3]: https://docs.redhat.com/en/documentation/openshift_container_platform/4.19/html/updating_clusters/performing-a-cluster-update#update-upgrading-oc-adm-upgrade-recommend_updating-cluster-cli
[4]: https://sippy.dptools.openshift.org/sippy-ng/tests/4.20?filters=%257B%2522items%2522%253A%255B%257B%2522columnField%2522%253A%2522current_runs%2522%252C%2522operatorValue%2522%253A%2522%253E%253D%2522%252C%2522value%2522%253A%25227%2522%257D%252C%257B%2522columnField%2522%253A%2522variants%2522%252C%2522not%2522%253Atrue%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522never-stable%2522%257D%252C%257B%2522columnField%2522%253A%2522variants%2522%252C%2522not%2522%253Atrue%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522aggregated%2522%257D%252C%257B%2522columnField%2522%253A%2522current_flake_percentage%2522%252C%2522not%2522%253Atrue%252C%2522operatorValue%2522%253A%2522%253D%2522%252C%2522value%2522%253A%2522100%2522%257D%252C%257B%2522id%2522%253A99%252C%2522columnField%2522%253A%2522name%2522%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522oc%2520adm%2520upgrade%2520recommend%2522%257D%255D%252C%2522linkOperator%2522%253A%2522and%2522%257D&sort=asc&sortField=net_improvement
[5]: https://sippy.dptools.openshift.org/sippy-ng/tests/4.20/analysis?filters=%7B%22items%22%3A%5B%7B%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22equals%22%2C%22value%22%3A%22%5BSerial%5D%5Bsig-cli%5D%20oc%20adm%20upgrade%20recommend%20When%20the%20update%20service%20has%20conditional%20recommendations%20runs%20successfully%20with%20conditional%20recommendations%20to%20the%20--version%20target%20%5BSuite%3Aopenshift%2Fconformance%2Fserial%5D%22%7D%2C%7B%22columnField%22%3A%22variants%22%2C%22not%22%3Atrue%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%22never-stable%22%7D%2C%7B%22columnField%22%3A%22variants%22%2C%22not%22%3Atrue%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%22aggregated%22%7D%5D%2C%22linkOperator%22%3A%22and%22%7D&pageSize=50&test=%5BSerial%5D%5Bsig-cli%5D%20oc%20adm%20upgrade%20recommend%20When%20the%20update%20service%20has%20conditional%20recommendations%20runs%20successfully%20with%20conditional%20recommendations%20to%20the%20--version%20target%20%5BSuite%3Aopenshift%2Fconformance%2Fserial%5D
[6]: https://github.com/openshift/api/blob/88b2b21555f3c12755740b197cbd5b9b4ca11e19/tools/codegen/cmd/featuregate-test-analyzer.go#L332-L375
|
@wking: This pull request references OTA-1560 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Test passed.
To a version which has risks To a version which do not have risk To an unknown version |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JianLi-RH, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@wking: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
c5c6334
into
openshift:main
|
[ART PR BUILD NOTIFIER] Distgit: openshift-enterprise-cli |
|
[ART PR BUILD NOTIFIER] Distgit: ose-tools |
|
[ART PR BUILD NOTIFIER] Distgit: openshift-enterprise-deployer |
|
[ART PR BUILD NOTIFIER] Distgit: ose-cli-artifacts |
| // TODO: We can remove this flag once the idea about `oc adm upgrade recommend` stabilizes and the command | ||
| // is promoted out of the OC_ENABLE_CMD_UPGRADE_RECOMMEND feature gate | ||
| flags.StringVar(&o.mockData.cvPath, "mock-clusterversion", "", "Path to a YAML ClusterVersion object to use for testing (will be removed later).") | ||
| flags.MarkHidden("mock-clusterversion") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat! TIL
Most OpenShift feature gates are tracked in openshift/api. This one is just the local environment variable, but we still want to meet the usual tech-preview-to-GA promotion criteria:
This one isn't relevant to this feature, because the test-cases are ungated to run the tech-preview functionality against production clusters (openshift/origin#29831). As docs say:
Sippy returns exactly five:
Checking on one of the least-run test-cases (
... with conditional recommendations to the --version target), with 124 runs:$ curl -s 'https://sippy.dptools.openshift.org/api/tests?release=4.20&filter=%7B%22items%22%3A%5B%7B%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22equals%22%2C%22value%22%3A%22%5BSerial%5D%5Bsig-cli%5D%20oc%20adm%20upgrade%20recommend%20When%20the%20update%20service%20has%20conditional%20recommendations%20runs%20successfully%20with%20conditional%20recommendations%20to%20the%20--version%20target%20%5BSuite%3Aopenshift%2Fconformance%2Fserial%5D%22%7D%2C%7B%22columnField%22%3A%22variants%22%2C%22not%22%3Atrue%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%22never-stable%22%7D%2C%7B%22columnField%22%3A%22variants%22%2C%22not%22%3Atrue%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%22aggregated%22%7D%5D%2C%22linkOperator%22%3A%22and%22%7D&period=default&sortField=delta_from_passing_average&sort=asc&collapse=false' | jq -r '[.[] | select(.variants) | ([.variants[] | select(startswith("Platform:") or startswith("Architecture:") or startswith("Topology:") or startswith("NetworkStack:"))] | tostring) as $platform | {current_runs, current_successes, platform: $platform}] | group_by(.platform)[] | (map(.current_runs) | add | tostring) + " " + (map(.current_successes) | add | tostring) + " " + .[0].platform' | sort -n 1 1 ["Platform:none","Architecture:ppc64le","NetworkStack:ipv4","Topology:ha"] 1 1 ["Platform:none","Architecture:s390x","NetworkStack:ipv4","Topology:ha"] 5 5 ["Platform:openstack","Architecture:amd64","NetworkStack:ipv4","Topology:ha"] 6 6 ["Platform:aws","Architecture:multi","NetworkStack:ipv4","Topology:ha"] 6 6 ["Platform:metal","Architecture:amd64","NetworkStack:ipv6","Topology:ha"] 7 7 ["Platform:metal","Architecture:amd64","NetworkStack:dual","Topology:ha"] 10 10 ["Platform:aws","Architecture:arm64","NetworkStack:ipv4","Topology:ha"] 11 11 ["Platform:gcp","Architecture:amd64","NetworkStack:ipv4","Topology:ha"] 12 12 ["Platform:azure","Architecture:amd64","NetworkStack:ipv4","Topology:ha"] 13 13 ["Platform:metal","Architecture:amd64","NetworkStack:ipv4","Topology:ha"] 14 14 ["Platform:aws","Architecture:amd64","NetworkStack:ipv4","Topology:single"] 17 17 ["Platform:aws","Architecture:amd64","NetworkStack:ipv4","Topology:ha"] 21 21 ["Platform:vsphere","Architecture:amd64","NetworkStack:ipv4","Topology:ha"]As a platform-agnostic feature, it seems like we need these variants:
The tests are still less than a week old, so this isn't relevant.
Every test-case has a 100% pass rate, so this isn't relevant.
So Azure, GCP, and metal are all short. But given this is platform-agnostic code, and the success rate so far is 100% across all variants, I'm confident enough to drop the guard now. Waiting another week would not be wrong either.