Skip to content

Conversation

@jianlinliu
Copy link
Contributor

@jianlinliu jianlinliu commented Nov 29, 2024

Since #59280

# extract_potential_new_caps "${4.15-manifest-dir}" "${4.18-manifest-dir}"
Build Console Insights MachineAPI marketplace OperatorLifecycleManagerV1 Storage

The detected new cap list is not accurate, actually since 4.15, only OperatorLifecycleManagerV1 is a newly introduced fresh operator capacity, which is not existing on the source version, only existing on the target version.

Fortunately the new caps calculated for the 1st time between 4.15 and 4.18 is CloudControllerManager Ingress OperatorLifecycleManagerV1, so when strip the list using the output of extract_potential_new_caps, only strip OperatorLifecycleManagerV1, that happened to help us get a correct implicit and enabled cap list.

Think about that, if the upgrade happened between other versions, e.g: 4.13 -> 4.18, the new caps list calculated for the 1st time is MachineAPI Build DeploymentConfig ImageRegistry OperatorLifecycleManager CloudCredential CloudControllerManager Ingress OperatorLifecycleManagerV1, when strip the list using the output of extract_potential_new_caps, Build would be removed together with OperatorLifecycleManagerV1, that means, Build would not in the implicit and enabled cap list. But actually the source version always install "Build" operator. That would lead the testing exit as failure, but that is wrong.

The root cause is extract_potential_new_caps did not get the exact caps that newly introduced operator later than source version.

To improve it, introduced get_transfered_new_caps function, in the function, it is going to do 2 checks:

  1. check if there is the same mainfest files for the specific new cap in the source payload, if yes, then it will not be identified as a known capacity for source version
  2. if 1 does not work, continue to try to grep related resource for the specific new cap in source manifest, if yes, then it will not be identified as a known capacity for source version
  3. otherwise, the capacity will be identified as a total new capacity that source version does not know.

The above logic is a bit complicated, and not reliable enough, especially some new special operator get introduced in the future.

So as final, we decide to introduce a dedicated array to record these newly introduced capacity.
E.g:
new_caps_version["OperatorLifecycleManagerV1"]="4.18"
new_caps_version["xxx"]="4.19"
new_caps_version["yyy"]="4.20"
Then compare the version number with the source version to decide which capacity should be removed from transfered_new_caps list.

@openshift-ci openshift-ci bot requested review from JianLi-RH and evakhoni November 29, 2024 09:21
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 29, 2024
@jianlinliu jianlinliu force-pushed the upgrade-capa-check-enhance branch from 8030ea8 to 95ec009 Compare November 29, 2024 15:07
@jianlinliu jianlinliu changed the title fix enhance the way of getting new capacities list in upgrade cases Nov 29, 2024
@jianlinliu
Copy link
Contributor Author

/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.18-multi-nightly-4.18-upgrade-from-stable-4.17-aws-ipi-sno-etcd-encryption-basecap-none-arm-f28 periodic-ci-openshift-openshift-tests-private-release-4.18-multi-nightly-4.18-upgrade-from-stable-4.15-gcp-ipi-basecap-none-additionalcaps-arm-f28

@openshift-ci-robot
Copy link
Contributor

@jianlinliu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@jianlinliu
Copy link
Contributor Author

/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.18-multi-nightly-4.18-upgrade-from-stable-4.15-gcp-ipi-basecap-none-additionalcaps-arm-f28

@openshift-ci-robot
Copy link
Contributor

@jianlinliu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@jianlinliu
Copy link
Contributor Author

cc @jiajliu to review.

@jianlinliu
Copy link
Contributor Author

/pj-rehearse ack

@openshift-ci-robot
Copy link
Contributor

@jianlinliu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci-robot openshift-ci-robot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Dec 2, 2024
@evakhoni
Copy link
Contributor

evakhoni commented Dec 2, 2024

/uncc

@openshift-ci openshift-ci bot removed the request for review from evakhoni December 2, 2024 10:01
for target_file in ${target_cap_files}; do
filename=$(basename "${target_file}")
source_file="$source_dir/$filename"
if [[ -f "$source_file" ]] && [[ -z "$(grep 'release.openshift.io/feature-set:' $source_file | grep -E 'CustomNoUpgrade|TechPreviewNoUpgrade')" ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall DevPreviewNoUpgrade be excluded too?

# grep -rh 'release.openshift.io/feature-set:' manifest/|sort -u
    release.openshift.io/feature-set: CustomNoUpgrade
        release.openshift.io/feature-set: CustomNoUpgrade,TechPreviewNoUpgrade
    release.openshift.io/feature-set: CustomNoUpgrade,TechPreviewNoUpgrade
    release.openshift.io/feature-set: Default
    release.openshift.io/feature-set: DevPreviewNoUpgrade
            release.openshift.io/feature-set: TechPreviewNoUpgrade
        release.openshift.io/feature-set: TechPreviewNoUpgrade
    release.openshift.io/feature-set: TechPreviewNoUpgrade

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

for cap in ${candidate_new_caps}; do
target_cap_files=$(grep -rl "capability.openshift.io/name: ${cap}$" ${target_dir} || true)
if [[ -z "$target_cap_files" ]]; then
echo "Did not find out $cap capacity manifest files in target payload"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the new_cap is not found in the target payload, it should not be expected, right? Should the test exit?

Copy link
Contributor Author

@jianlinliu jianlinliu Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take DeploymentConfig as example, there is no any capacity manifest files can be found for it, so introduce this if block to continue.

@jianlinliu jianlinliu force-pushed the upgrade-capa-check-enhance branch 2 times, most recently from 4d85627 to c986a9a Compare December 3, 2024 05:33
@openshift-ci-robot openshift-ci-robot removed the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Dec 3, 2024
@jianlinliu jianlinliu force-pushed the upgrade-capa-check-enhance branch 2 times, most recently from 4b126ea to 1f0fe98 Compare December 3, 2024 05:50
@jianlinliu
Copy link
Contributor Author

/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.18-multi-nightly-4.18-upgrade-from-stable-4.17-aws-ipi-sno-etcd-encryption-basecap-none-arm-f28

@openshift-ci-robot
Copy link
Contributor

@jianlinliu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

source_ver=$(echo "$source_ver" | cut -d. -f1,2)
for cap in ${candidate_new_caps}; do
#shellcheck disable=SC2076
if [[ " ${!new_caps_version[*]} " =~ " ${cap} " ]] && lowerVersion "${source_ver}" "${new_caps_version[$cap]}"; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when will ${source_ver} > ${new_caps_version[$cap]}?

Copy link
Contributor Author

@jianlinliu jianlinliu Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. In the current real world, there is no such case.

But I can foresee a case in the future.
Assumed on 4.N version, a new operator is introduced and GAed. Let us call it "A". On the 4.N+1 version, the operator become an optional operator.

In the beginning, we probably do not notice the new operator, so new_caps_version["A"]="4.N" is not defined yet.

Assumed there is no the condition lowerVersion "${source_ver}" "${new_caps_version[$cap]}" in this check.

Case-1: In 4.N -> 4.N+1 upgrade case, candidate_new_caps is "A", after the function is executed, transfered_new_caps is set to "A". It will lead to a passed result.
Case-2: In 4.N-1 -> 4.N+1 upgrade case, candidate_new_caps is "A", after the function is executed, transfered_new_caps is set to "A". It will lead to a failed result. (A is not installed in the upgraded cluster, but the testing is expecting it is enabled).

Now we notice the issue, then define new_caps_version["A"]="4.N".

Case-1: In 4.N -> 4.N+1 upgrade case, candidate_new_caps is "A", after the function is executed, transfered_new_caps is set to null. It will lead to a failed result (A is installed in the upgraded cluster, but the testing is expecting it is disabled).
Case-2: In 4.N-1 -> 4.N+1 upgrade case, candidate_new_caps is "A", after the function is executed, transfered_new_caps is set to null. It will lead to a passed result.

So now we have to use the lowerVersion condition.
Case-1: In 4.N -> 4.N+1 upgrade case, candidate_new_caps is "A", after the function is executed, transfered_new_caps is set to "A". It will lead to a passed result.
Case-2: In 4.N-1 -> 4.N+1 upgrade case, candidate_new_caps is "A", after the function is executed, transfered_new_caps is set to null. It will lead to a passed result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assumed on 4.N version, a new operator is introduced and GAed. Let us call it "A". On the 4.N+1 version, the operator become an optional operator.

In this scenario, "A" does not meet first introduced operational cap, right? So "A" should not be defined in new_caps_version[], but only added into caps_string of get_caps_for_version_capset().
Then, in 4.N -> 4.N+1 upgrade case, candidate_new_caps is "A", after the function is executed, transfered_new_caps is still set to "A". It will lead to a passed result.

In 4.N-1 -> 4.N+1 upgrade case, candidate_new_caps is "A", after the function is executed, transfered_new_caps is still set to "A". It will lead to a passed result. "A" will be enabled.

Copy link
Contributor Author

@jianlinliu jianlinliu Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 4.N-1 -> 4.N+1 upgrade case, candidate_new_caps is "A", after the function is executed, transfered_new_caps is still set to "A". It will lead to a passed result. "A" will be enabled.

For 4.N-1, "A" is totally unknown for source cluster, the resources belonging to "A" would not be installed in the fresh install, after the upgrade, there would be no any resources belonging to "A" installed even when the cluster is on 4.N+1 version. The testing would be failed, because the script is expecting "A" was enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4.N-1 -> 4.N+1 will go to 4.N first, right? "A" is a new operator but not an optional cap in 4.N, so during 4.N-1-> 4.N, "A" will be installed after upgrade to 4.N by default. And then during 4.N->4.N+1, "A" is still installed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even go to 4.N first, "A" would not be installed automatically, though it is not an optional cap in 4.N.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, if "A" is a new operator(non optional) in 4.N but not 4.N-1, will "A" be installed or not if setting BaselineCapabilitySet=None during installation? I just checked the api definition in doc, BaselineCapabilitySet=None seems only control optional caps. So I guess "A" is not in the optional caps, then it should be installed, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you are right, in https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.12-amd64-stable-4.12-upgrade-from-stable-4.11-aws-c2s-ipi-disc-priv-fips-f60/1863851873665028096/artifacts/aws-c2s-ipi-disc-priv-fips-f60/, control-plane-machine-set is introduced in 4.12 for the 1st time, after upgrade to 4.12 from 4.11, the co is installed in the upgraded cluster.

Let me update the script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I keep lowerVersion function in case of possible demand in the future.

@jianlinliu jianlinliu force-pushed the upgrade-capa-check-enhance branch from 1f0fe98 to dd88ddf Compare December 3, 2024 12:24
@jianlinliu
Copy link
Contributor Author

/pj-rehearse ack

@openshift-ci-robot
Copy link
Contributor

@jianlinliu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci-robot openshift-ci-robot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Dec 3, 2024
@jianlinliu jianlinliu force-pushed the upgrade-capa-check-enhance branch from dd88ddf to 76b432c Compare December 4, 2024 04:04
@openshift-ci-robot openshift-ci-robot removed the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Dec 4, 2024
@jianlinliu
Copy link
Contributor Author

Just removed one condition in the last commit to loosen the check, low risk to introduce regression issue, would skip pj-rehearse testing.

/pj-rehearse skip

@openshift-ci-robot
Copy link
Contributor

@jianlinliu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci-robot openshift-ci-robot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Dec 4, 2024
@jianlinliu jianlinliu force-pushed the upgrade-capa-check-enhance branch from 76b432c to 2fc03d4 Compare December 4, 2024 04:20
@openshift-ci-robot
Copy link
Contributor

[REHEARSALNOTIFIER]
@jianlinliu: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-4.16-upgrade-from-stable-4.15-baremetalds-ipi-ovn-lvms-f28 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.16-multi-nightly-4.16-upgrade-from-stable-4.15-aws-ipi-usertags-custom-sg-fips-amd-f28 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-4.16-upgrade-from-stable-4.15-vsphere-ipi-zones-f28 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-4.15-upgrade-from-stable-4.14-vsphere-ipi-ovn-ipsec-f14 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.18-amd64-nightly-4.18-upgrade-from-stable-4.17-baremetal-compact-agent-ipv4-static-connected-f28 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.18-amd64-nightly-4.18-upgrade-from-stable-4.17-vsphere-ipi-proxy-workers-rhel8-f28 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.17-multi-nightly-4.17-upgrade-from-stable-4.17-azure-ipi-ingress-controller-arm-mixarch-f60 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.14-amd64-nightly-4.14-upgrade-from-stable-4.13-ibmcloud-ipi-private-fips-f28 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-4.16-upgrade-from-stable-4.16-ibmcloud-ipi-private-byo-kms-f28 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.12-amd64-nightly-4.12-upgrade-from-stable-4.12-vsphere-agent-disc-sno-f60 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.16-multi-nightly-4.16-upgrade-from-stable-4.15-azure-ipi-mixed-apiserver-internal-arm-f28 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-4.16-upgrade-from-stable-4.15-vsphere-agent-dualstack-ha-f28 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.13-multi-nightly-4.13-upgrade-from-stable-4.12-aws-ipi-usertags-fips-amd-f60 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.14-amd64-nightly-4.14-upgrade-from-stable-4.14-aws-ipi-disc-priv-f60 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.18-multi-nightly-4.18-upgrade-from-stable-4.15-azure-ipi-ovn-ipsec-arm-mixarch-f28 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.17-multi-nightly-4.17-upgrade-from-stable-4.16-aws-upi-basecap-none-arm-f28 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-4.17-upgrade-from-stable-4.16-gcp-ipi-confidential-computing-fips-f28 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.17-multi-nightly-4.17-upgrade-from-stable-4.16-gcp-upi-arm-f28 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.17-multi-nightly-4.17-upgrade-from-stable-4.16-azure-ipi-fullyprivate-internal-registry-arm-f28 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.17-multi-nightly-4.17-upgrade-from-stable-4.16-gcp-ipi-disk-encryption-arm-f28 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-4.17-upgrade-from-stable-4.16-vsphere-ipi-zones-multisubnets-external-lb-f28 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.18-amd64-nightly-4.18-upgrade-from-stable-4.17-baremetalds-ipi-ovn-dualstack-primaryv6-f28 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.13-amd64-nightly-4.13-upgrade-from-stable-4.13-aws-ipi-disc-priv-sts-ep-fips-f60 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-4.16-upgrade-from-aro-4.14-aro-f28 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.18-multi-nightly-4.18-upgrade-from-stable-4.18-baremetal-ipi-ovn-ipv4-fips-vmedia-amd-f60 N/A periodic Registry content changed

A total of 1160 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs.

A full list of affected jobs can be found here

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@openshift-ci-robot openshift-ci-robot removed the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Dec 4, 2024
@jiajliu
Copy link
Contributor

jiajliu commented Dec 4, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 4, 2024
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 4, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jiajliu, jianlinliu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 4, 2024

@jianlinliu: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/periodic-ci-openshift-openshift-tests-private-release-4.18-multi-nightly-4.18-upgrade-from-stable-4.15-gcp-ipi-basecap-none-additionalcaps-arm-f28 95ec009 link unknown /pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.18-multi-nightly-4.18-upgrade-from-stable-4.15-gcp-ipi-basecap-none-additionalcaps-arm-f28
ci/rehearse/periodic-ci-openshift-openshift-tests-private-release-4.18-multi-nightly-4.18-upgrade-from-stable-4.17-aws-ipi-sno-etcd-encryption-basecap-none-arm-f28 1f0fe98 link unknown /pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.18-multi-nightly-4.18-upgrade-from-stable-4.17-aws-ipi-sno-etcd-encryption-basecap-none-arm-f28

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@jianlinliu
Copy link
Contributor Author

/pj-rehearse skip

@openshift-ci-robot
Copy link
Contributor

@jianlinliu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci-robot openshift-ci-robot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Dec 4, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 59b5671 into openshift:master Dec 4, 2024
pezhang pushed a commit to pezhang/release that referenced this pull request Dec 5, 2024
wangke19 pushed a commit to wangke19/release that referenced this pull request Dec 5, 2024
krishvoor pushed a commit to krishvoor/release that referenced this pull request Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. rehearsals-ack Signifies that rehearsal jobs have been acknowledged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants