-
Notifications
You must be signed in to change notification settings - Fork 213
[CNF-5645] bump api version to add new capability #801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CNF-5645] bump api version to add new capability #801
Conversation
|
Skipping CI for Draft Pull Request. |
|
/uncc @LalatenduMohanty @vrutkovs |
5f23f41 to
b4c3e04
Compare
|
/test all |
b4c3e04 to
55c8df2
Compare
|
/test all |
55c8df2 to
ec209ec
Compare
|
/test-all |
ec209ec to
6c10a68
Compare
|
/test all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you separate the commit for the changes around Migrate to ConfigMapsLeasesResourceLock . Also the version bump commit should have steps you performed e.g.
$ go get github.com/openshift/api@dc2694e95008443da2f0fb2a3d2f7a552824c79d
$ go mod tidy
$ go mod vendor
$ git add -A go.* vendor
6c10a68 to
df83c81
Compare
df83c81 to
e9d2bbc
Compare
|
Lets wait for merge of #827 and then rebase this PR |
…and B->A jobs We've been using abort-at since way back in 05da8a4 (ci-operator: CVO should perform a rollback test, 2020-01-20, openshift#6780), because it's important to cover both: * Can the proposed CVO successfully roll out the cluster on update? (where we want the new CVO in the target release). * Can the proposed CVO successfully validate an update request and launch the requested replacement CVO? (where we want the new CVO in the source release). However, being different has costs, because not all cluster components are tuned to gracefully handle back-to-back updates, and we have had to make temporary changes like b0b6533 (ci-operator/config/openshift/cluster-version-operator: Temporarily drop abort-at from master, 2021-06-17, openshift#19396), before putting our toes back in the water with ad76b3e (Revert "ci-operator/config/openshift/cluster-version-operator: Temporarily drop abort-at from master", 2021-08-03, openshift#20875). While discovering chained-update UX issues is useful, CVO presubmits are probably not the best place to do it (it would be better to have a periodic, so you have coverage even when the CVO is not seeing pull request activity). And a more fundamental issue is that recent origin suites use job classification to look up historical disruption levels to limit regression risk [1]. Those lookups are based on inspections of the cluster-under-test [2], and not on the job configuration. And the classification structure only has Release and FromRelease strings [3]. So today, when the suite checks for disruption at the end of an A->B->A abort-at=100 run, it sets {FromRelease: B, Release: A}, and complains when the disruption from the full A->B->A transition overshoots the allowed disruption for a single B->A leg. We could hypothetically address this by plumbing knowledge of the job configuration through the origin suite, so we could compare with expected disruption from other A->B->A runs. But David's not interested in making that sort of change in origin until we can demonstrate it mattering in a periodic that some team has committed to monitor and maintain, and we don't have that today. Another issue with A->B->A tests is that sometimes we make changes to the CVO that are one way, like extending the enumerated list of ClusterVersion capabilities [4]. On the B->A leg, the CVO restores the original ClusterVersion CRD with a restricted capability enum, and subsequent attempts to update the ClusterVersion resource fail like: I0818 17:41:40.580147 1 cvo.go:544] Error handling openshift-cluster-version/version: ClusterVersion.config.openshift.io "version" is invalid: [status.capabilities.enabledCapabilities[0]: Unsupported value: "Console": supported values: "openshift-samples", "baremetal", "marketplace", status.capabilities.enabledCapabilities[1]: Unsupported value: "Insights": supported values: "openshift-samples", "baremetal", "marketplace", status.capabilities.enabledCapabilities[2]: Unsupported value: "Storage": supported values: "openshift-samples", "baremetal", "marketplace"] By separating into two presubmits, we should have consistently passing A->B updates, with the rest of the organization helping to keep that job style healthy. And we'll also have B->A updates which look the same to the job classifier, and should be similarly healthy as long as we don't make breaking CVO changes. When we do make breaking CVO changes, we can inspect the results, and: /override e2e-agnostic-upgrade-out-of-change when we feel that the proposed CVO cleanly launched the target, and ignore everything that happened once the target CVO started trying to reconcile the cluster. [1]: https://github.com/openshift/origin/blob/c33bf438a00bbd66227186f01c7e6a5c36741492/test/extended/util/disruption/backend_sampler_tester.go#L91-L98 [2]: https://github.com/openshift/origin/blob/c33bf438a00bbd66227186f01c7e6a5c36741492/pkg/synthetictests/platformidentification/types.go#L43-L117 [3]: https://github.com/openshift/origin/blob/c33bf438a00bbd66227186f01c7e6a5c36741492/pkg/synthetictests/platformidentification/types.go#L16-L23 [4]: openshift/cluster-version-operator#801
go get github.com/openshift/api@427c7cc572807fd1093cc97098b0c78dcfe618c4 go get k8s.io/[email protected] go mod tidy go mod vendor git add -A go.* vendor
With new client-go version 0.24 ConfigMapLock was depricated. This commit migrates to new ConfigMapsLeasesResourceLock mechanism.
update tests with new capabilities brought by new version of openshift/api package
54b69fc to
d0b6729
Compare
|
waiting until openshift/release#31518 get merged to fix the upgrade test |
LalatenduMohanty
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: LalatenduMohanty, qJkee The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test e2e-agnostic-upgrade |
|
openshift/release#31518 should have replaced our A->B->A presubmit with A->B and B->A presubmits. /retest |
|
/test does-not-exist |
|
@wking: The specified target(s) for
Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
heh, hasn't actually landed yet 😂 |
…and B->A jobs (#31518) We've been using abort-at since way back in 05da8a4 (ci-operator: CVO should perform a rollback test, 2020-01-20, #6780), because it's important to cover both: * Can the proposed CVO successfully roll out the cluster on update? (where we want the new CVO in the target release). * Can the proposed CVO successfully validate an update request and launch the requested replacement CVO? (where we want the new CVO in the source release). However, being different has costs, because not all cluster components are tuned to gracefully handle back-to-back updates, and we have had to make temporary changes like b0b6533 (ci-operator/config/openshift/cluster-version-operator: Temporarily drop abort-at from master, 2021-06-17, #19396), before putting our toes back in the water with ad76b3e (Revert "ci-operator/config/openshift/cluster-version-operator: Temporarily drop abort-at from master", 2021-08-03, #20875). While discovering chained-update UX issues is useful, CVO presubmits are probably not the best place to do it (it would be better to have a periodic, so you have coverage even when the CVO is not seeing pull request activity). And a more fundamental issue is that recent origin suites use job classification to look up historical disruption levels to limit regression risk [1]. Those lookups are based on inspections of the cluster-under-test [2], and not on the job configuration. And the classification structure only has Release and FromRelease strings [3]. So today, when the suite checks for disruption at the end of an A->B->A abort-at=100 run, it sets {FromRelease: B, Release: A}, and complains when the disruption from the full A->B->A transition overshoots the allowed disruption for a single B->A leg. We could hypothetically address this by plumbing knowledge of the job configuration through the origin suite, so we could compare with expected disruption from other A->B->A runs. But David's not interested in making that sort of change in origin until we can demonstrate it mattering in a periodic that some team has committed to monitor and maintain, and we don't have that today. Another issue with A->B->A tests is that sometimes we make changes to the CVO that are one way, like extending the enumerated list of ClusterVersion capabilities [4]. On the B->A leg, the CVO restores the original ClusterVersion CRD with a restricted capability enum, and subsequent attempts to update the ClusterVersion resource fail like: I0818 17:41:40.580147 1 cvo.go:544] Error handling openshift-cluster-version/version: ClusterVersion.config.openshift.io "version" is invalid: [status.capabilities.enabledCapabilities[0]: Unsupported value: "Console": supported values: "openshift-samples", "baremetal", "marketplace", status.capabilities.enabledCapabilities[1]: Unsupported value: "Insights": supported values: "openshift-samples", "baremetal", "marketplace", status.capabilities.enabledCapabilities[2]: Unsupported value: "Storage": supported values: "openshift-samples", "baremetal", "marketplace"] By separating into two presubmits, we should have consistently passing A->B updates, with the rest of the organization helping to keep that job style healthy. And we'll also have B->A updates which look the same to the job classifier, and should be similarly healthy as long as we don't make breaking CVO changes. When we do make breaking CVO changes, we can inspect the results, and: /override e2e-agnostic-upgrade-out-of-change when we feel that the proposed CVO cleanly launched the target, and ignore everything that happened once the target CVO started trying to reconcile the cluster. [1]: https://github.com/openshift/origin/blob/c33bf438a00bbd66227186f01c7e6a5c36741492/test/extended/util/disruption/backend_sampler_tester.go#L91-L98 [2]: https://github.com/openshift/origin/blob/c33bf438a00bbd66227186f01c7e6a5c36741492/pkg/synthetictests/platformidentification/types.go#L43-L117 [3]: https://github.com/openshift/origin/blob/c33bf438a00bbd66227186f01c7e6a5c36741492/pkg/synthetictests/platformidentification/types.go#L16-L23 [4]: openshift/cluster-version-operator#801
Not sure what that's about. But digging in to see what's going on with versions: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/801/pull-ci-openshift-cluster-version-operator-master-e2e-agnostic-upgrade-into-change/1561830992749006848/artifacts/e2e-agnostic-upgrade-into-change/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.history[] | .startedTime + " " + .completionTime + " " + .state + " " + .version'
2022-08-22T22:06:47Z 2022-08-22T22:42:12Z Completed 4.12.0-0.ci.test-2022-08-22-214209-ci-op-3mwr1wss-initial
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/801/pull-ci-openshift-cluster-version-operator-master-e2e-agnostic-upgrade-into-change/1561830992749006848/artifacts/e2e-agnostic-upgrade-into-change/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-74b4d84766-pkscn_cluster-version-operator.log | head -n1
I0822 22:27:38.568033 1 start.go:23] ClusterVersionOperator v1.0.0-878-g869df9e1-dirty
$ git --no-pager log --oneline -1 origin/master
869df9e1 (HEAD -> master, origin/release-4.13, origin/release-4.12, origin/master, origin/HEAD) Merge pull request #827 from wking/more-cap-sortingSo hooray, it was trying to install the branch-tip A, not the release B with this change mixed in. Maybe the test-platform folks had some kind of node issue that caused them to evict running tests? Anyway, digging in to versions: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/801/pull-ci-openshift-cluster-version-operator-master-e2e-agnostic-upgrade-out-of-change/1561830992786755584/artifacts/e2e-agnostic-upgrade-out-of-change/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.history[] | .startedTime + " " + .completionTime + " " + .state + " " + .version'
2022-08-22T22:30:54Z Partial 4.12.0-0.ci.test-2022-08-22-215104-ci-op-3mwr1wss-latest
2022-08-22T21:59:50Z 2022-08-22T22:23:58Z Completed 4.12.0-0.ci.test-2022-08-22-214209-ci-op-3mwr1wss-initial
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/801/pull-ci-openshift-cluster-version-operator-master-e2e-agnostic-upgrade-out-of-change/1561830992786755584/artifacts/e2e-agnostic-upgrade-out-of-change/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-66d4c75496-fhdqg_cluster-version-operator.log | head -n1
I0822 22:30:56.387655 1 start.go:23] ClusterVersionOperator v1.0.0-882-gb44878b1-dirtyHmm, looks like that might be doing an A->B update too. I'll go back and look at openshift/release#31518 to see if I need to change something. In the mean time: /retest |
|
@wking: The specified target(s) for
Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/override ci/prow/e2e-agnostic-upgrade |
|
@wking: Overrode contexts on behalf of wking: ci/prow/e2e-agnostic-upgrade DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@qJkee: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |

bump openshift/api with new capability