-
Notifications
You must be signed in to change notification settings - Fork 2.1k
enhance the way of getting new capacities list in upgrade cases #59355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhance the way of getting new capacities list in upgrade cases #59355
Conversation
8030ea8 to
95ec009
Compare
|
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.18-multi-nightly-4.18-upgrade-from-stable-4.17-aws-ipi-sno-etcd-encryption-basecap-none-arm-f28 periodic-ci-openshift-openshift-tests-private-release-4.18-multi-nightly-4.18-upgrade-from-stable-4.15-gcp-ipi-basecap-none-additionalcaps-arm-f28 |
|
@jianlinliu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.18-multi-nightly-4.18-upgrade-from-stable-4.15-gcp-ipi-basecap-none-additionalcaps-arm-f28 |
|
@jianlinliu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
cc @jiajliu to review. |
|
/pj-rehearse ack |
|
@jianlinliu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/uncc |
| for target_file in ${target_cap_files}; do | ||
| filename=$(basename "${target_file}") | ||
| source_file="$source_dir/$filename" | ||
| if [[ -f "$source_file" ]] && [[ -z "$(grep 'release.openshift.io/feature-set:' $source_file | grep -E 'CustomNoUpgrade|TechPreviewNoUpgrade')" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall DevPreviewNoUpgrade be excluded too?
# grep -rh 'release.openshift.io/feature-set:' manifest/|sort -u
release.openshift.io/feature-set: CustomNoUpgrade
release.openshift.io/feature-set: CustomNoUpgrade,TechPreviewNoUpgrade
release.openshift.io/feature-set: CustomNoUpgrade,TechPreviewNoUpgrade
release.openshift.io/feature-set: Default
release.openshift.io/feature-set: DevPreviewNoUpgrade
release.openshift.io/feature-set: TechPreviewNoUpgrade
release.openshift.io/feature-set: TechPreviewNoUpgrade
release.openshift.io/feature-set: TechPreviewNoUpgrade
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| for cap in ${candidate_new_caps}; do | ||
| target_cap_files=$(grep -rl "capability.openshift.io/name: ${cap}$" ${target_dir} || true) | ||
| if [[ -z "$target_cap_files" ]]; then | ||
| echo "Did not find out $cap capacity manifest files in target payload" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the new_cap is not found in the target payload, it should not be expected, right? Should the test exit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take DeploymentConfig as example, there is no any capacity manifest files can be found for it, so introduce this if block to continue.
4d85627 to
c986a9a
Compare
4b126ea to
1f0fe98
Compare
|
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.18-multi-nightly-4.18-upgrade-from-stable-4.17-aws-ipi-sno-etcd-encryption-basecap-none-arm-f28 |
|
@jianlinliu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
| source_ver=$(echo "$source_ver" | cut -d. -f1,2) | ||
| for cap in ${candidate_new_caps}; do | ||
| #shellcheck disable=SC2076 | ||
| if [[ " ${!new_caps_version[*]} " =~ " ${cap} " ]] && lowerVersion "${source_ver}" "${new_caps_version[$cap]}"; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when will ${source_ver} > ${new_caps_version[$cap]}?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. In the current real world, there is no such case.
But I can foresee a case in the future.
Assumed on 4.N version, a new operator is introduced and GAed. Let us call it "A". On the 4.N+1 version, the operator become an optional operator.
In the beginning, we probably do not notice the new operator, so new_caps_version["A"]="4.N" is not defined yet.
Assumed there is no the condition lowerVersion "${source_ver}" "${new_caps_version[$cap]}" in this check.
Case-1: In 4.N -> 4.N+1 upgrade case, candidate_new_caps is "A", after the function is executed, transfered_new_caps is set to "A". It will lead to a passed result.
Case-2: In 4.N-1 -> 4.N+1 upgrade case, candidate_new_caps is "A", after the function is executed, transfered_new_caps is set to "A". It will lead to a failed result. (A is not installed in the upgraded cluster, but the testing is expecting it is enabled).
Now we notice the issue, then define new_caps_version["A"]="4.N".
Case-1: In 4.N -> 4.N+1 upgrade case, candidate_new_caps is "A", after the function is executed, transfered_new_caps is set to null. It will lead to a failed result (A is installed in the upgraded cluster, but the testing is expecting it is disabled).
Case-2: In 4.N-1 -> 4.N+1 upgrade case, candidate_new_caps is "A", after the function is executed, transfered_new_caps is set to null. It will lead to a passed result.
So now we have to use the lowerVersion condition.
Case-1: In 4.N -> 4.N+1 upgrade case, candidate_new_caps is "A", after the function is executed, transfered_new_caps is set to "A". It will lead to a passed result.
Case-2: In 4.N-1 -> 4.N+1 upgrade case, candidate_new_caps is "A", after the function is executed, transfered_new_caps is set to null. It will lead to a passed result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assumed on 4.N version, a new operator is introduced and GAed. Let us call it "A". On the 4.N+1 version, the operator become an optional operator.
In this scenario, "A" does not meet first introduced operational cap, right? So "A" should not be defined in new_caps_version[], but only added into caps_string of get_caps_for_version_capset().
Then, in 4.N -> 4.N+1 upgrade case, candidate_new_caps is "A", after the function is executed, transfered_new_caps is still set to "A". It will lead to a passed result.
In 4.N-1 -> 4.N+1 upgrade case, candidate_new_caps is "A", after the function is executed, transfered_new_caps is still set to "A". It will lead to a passed result. "A" will be enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 4.N-1 -> 4.N+1 upgrade case, candidate_new_caps is "A", after the function is executed, transfered_new_caps is still set to "A". It will lead to a passed result. "A" will be enabled.
For 4.N-1, "A" is totally unknown for source cluster, the resources belonging to "A" would not be installed in the fresh install, after the upgrade, there would be no any resources belonging to "A" installed even when the cluster is on 4.N+1 version. The testing would be failed, because the script is expecting "A" was enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4.N-1 -> 4.N+1 will go to 4.N first, right? "A" is a new operator but not an optional cap in 4.N, so during 4.N-1-> 4.N, "A" will be installed after upgrade to 4.N by default. And then during 4.N->4.N+1, "A" is still installed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even go to 4.N first, "A" would not be installed automatically, though it is not an optional cap in 4.N.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, if "A" is a new operator(non optional) in 4.N but not 4.N-1, will "A" be installed or not if setting BaselineCapabilitySet=None during installation? I just checked the api definition in doc, BaselineCapabilitySet=None seems only control optional caps. So I guess "A" is not in the optional caps, then it should be installed, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you are right, in https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.12-amd64-stable-4.12-upgrade-from-stable-4.11-aws-c2s-ipi-disc-priv-fips-f60/1863851873665028096/artifacts/aws-c2s-ipi-disc-priv-fips-f60/, control-plane-machine-set is introduced in 4.12 for the 1st time, after upgrade to 4.12 from 4.11, the co is installed in the upgraded cluster.
Let me update the script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, I keep lowerVersion function in case of possible demand in the future.
1f0fe98 to
dd88ddf
Compare
|
/pj-rehearse ack |
|
@jianlinliu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
dd88ddf to
76b432c
Compare
|
Just removed one condition in the last commit to loosen the check, low risk to introduce regression issue, would skip pj-rehearse testing. /pj-rehearse skip |
|
@jianlinliu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
76b432c to
2fc03d4
Compare
|
[REHEARSALNOTIFIER]
A total of 1160 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs. A full list of affected jobs can be found here Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jiajliu, jianlinliu The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@jianlinliu: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/pj-rehearse skip |
|
@jianlinliu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
Since #59280
The detected new cap list is not accurate, actually since 4.15, only
OperatorLifecycleManagerV1is a newly introduced fresh operator capacity, which is not existing on the source version, only existing on the target version.Fortunately the new caps calculated for the 1st time between 4.15 and 4.18 is
CloudControllerManager Ingress OperatorLifecycleManagerV1, so when strip the list using the output ofextract_potential_new_caps, only stripOperatorLifecycleManagerV1, that happened to help us get a correct implicit and enabled cap list.Think about that, if the upgrade happened between other versions, e.g: 4.13 -> 4.18, the new caps list calculated for the 1st time is
MachineAPI Build DeploymentConfig ImageRegistry OperatorLifecycleManager CloudCredential CloudControllerManager Ingress OperatorLifecycleManagerV1, when strip the list using the output ofextract_potential_new_caps,Buildwould be removed together withOperatorLifecycleManagerV1, that means,Buildwould not in the implicit and enabled cap list. But actually the source version always install "Build" operator. That would lead the testing exit as failure, but that is wrong.The root cause is
extract_potential_new_capsdid not get the exact caps that newly introduced operator later than source version.To improve it, introduced
get_transfered_new_capsfunction, in the function, it is going to do 2 checks:The above logic is a bit complicated, and not reliable enough, especially some new special operator get introduced in the future.
So as final, we decide to introduce a dedicated array to record these newly introduced capacity.
E.g:
new_caps_version["OperatorLifecycleManagerV1"]="4.18"
new_caps_version["xxx"]="4.19"
new_caps_version["yyy"]="4.20"
Then compare the version number with the source version to decide which capacity should be removed from transfered_new_caps list.