[CNF-5645] bump api version to add new capability #801

qJkee · 2022-07-21T14:56:34Z

bump openshift/api with new capability

openshift-ci · 2022-07-21T14:58:11Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

qJkee · 2022-07-21T15:06:47Z

/uncc @LalatenduMohanty @vrutkovs

qJkee · 2022-07-26T16:25:22Z

/test all

qJkee · 2022-07-27T20:49:46Z

/test all

qJkee · 2022-08-11T16:36:48Z

/test-all

qJkee · 2022-08-15T18:21:18Z

/test all

qJkee · 2022-08-16T13:44:06Z

Hi @rluders. I wanted to mention you because this PR is almost the same as #817 but with newer version of openshift/api and one more capability

LalatenduMohanty

Can you separate the commit for the changes around Migrate to ConfigMapsLeasesResourceLock . Also the version bump commit should have steps you performed e.g.

$ go get github.com/openshift/api@dc2694e95008443da2f0fb2a3d2f7a552824c79d
$ go mod tidy
$ go mod vendor
$ git add -A go.* vendor

LalatenduMohanty · 2022-08-19T17:39:57Z

Lets wait for merge of #827 and then rebase this PR

…and B->A jobs We've been using abort-at since way back in 05da8a4 (ci-operator: CVO should perform a rollback test, 2020-01-20, openshift#6780), because it's important to cover both: * Can the proposed CVO successfully roll out the cluster on update? (where we want the new CVO in the target release). * Can the proposed CVO successfully validate an update request and launch the requested replacement CVO? (where we want the new CVO in the source release). However, being different has costs, because not all cluster components are tuned to gracefully handle back-to-back updates, and we have had to make temporary changes like b0b6533 (ci-operator/config/openshift/cluster-version-operator: Temporarily drop abort-at from master, 2021-06-17, openshift#19396), before putting our toes back in the water with ad76b3e (Revert "ci-operator/config/openshift/cluster-version-operator: Temporarily drop abort-at from master", 2021-08-03, openshift#20875). While discovering chained-update UX issues is useful, CVO presubmits are probably not the best place to do it (it would be better to have a periodic, so you have coverage even when the CVO is not seeing pull request activity). And a more fundamental issue is that recent origin suites use job classification to look up historical disruption levels to limit regression risk [1]. Those lookups are based on inspections of the cluster-under-test [2], and not on the job configuration. And the classification structure only has Release and FromRelease strings [3]. So today, when the suite checks for disruption at the end of an A->B->A abort-at=100 run, it sets {FromRelease: B, Release: A}, and complains when the disruption from the full A->B->A transition overshoots the allowed disruption for a single B->A leg. We could hypothetically address this by plumbing knowledge of the job configuration through the origin suite, so we could compare with expected disruption from other A->B->A runs. But David's not interested in making that sort of change in origin until we can demonstrate it mattering in a periodic that some team has committed to monitor and maintain, and we don't have that today. Another issue with A->B->A tests is that sometimes we make changes to the CVO that are one way, like extending the enumerated list of ClusterVersion capabilities [4]. On the B->A leg, the CVO restores the original ClusterVersion CRD with a restricted capability enum, and subsequent attempts to update the ClusterVersion resource fail like: I0818 17:41:40.580147 1 cvo.go:544] Error handling openshift-cluster-version/version: ClusterVersion.config.openshift.io "version" is invalid: [status.capabilities.enabledCapabilities[0]: Unsupported value: "Console": supported values: "openshift-samples", "baremetal", "marketplace", status.capabilities.enabledCapabilities[1]: Unsupported value: "Insights": supported values: "openshift-samples", "baremetal", "marketplace", status.capabilities.enabledCapabilities[2]: Unsupported value: "Storage": supported values: "openshift-samples", "baremetal", "marketplace"] By separating into two presubmits, we should have consistently passing A->B updates, with the rest of the organization helping to keep that job style healthy. And we'll also have B->A updates which look the same to the job classifier, and should be similarly healthy as long as we don't make breaking CVO changes. When we do make breaking CVO changes, we can inspect the results, and: /override e2e-agnostic-upgrade-out-of-change when we feel that the proposed CVO cleanly launched the target, and ignore everything that happened once the target CVO started trying to reconcile the cluster. [1]: https://github.com/openshift/origin/blob/c33bf438a00bbd66227186f01c7e6a5c36741492/test/extended/util/disruption/backend_sampler_tester.go#L91-L98 [2]: https://github.com/openshift/origin/blob/c33bf438a00bbd66227186f01c7e6a5c36741492/pkg/synthetictests/platformidentification/types.go#L43-L117 [3]: https://github.com/openshift/origin/blob/c33bf438a00bbd66227186f01c7e6a5c36741492/pkg/synthetictests/platformidentification/types.go#L16-L23 [4]: openshift/cluster-version-operator#801

go get github.com/openshift/api@427c7cc572807fd1093cc97098b0c78dcfe618c4 go get k8s.io/[email protected] go mod tidy go mod vendor git add -A go.* vendor

With new client-go version 0.24 ConfigMapLock was depricated. This commit migrates to new ConfigMapsLeasesResourceLock mechanism.

update tests with new capabilities brought by new version of openshift/api package

qJkee · 2022-08-22T00:14:03Z

waiting until openshift/release#31518 get merged to fix the upgrade test

LalatenduMohanty

/lgtm

openshift-ci · 2022-08-22T14:51:09Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: LalatenduMohanty, qJkee

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [LalatenduMohanty]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

LalatenduMohanty · 2022-08-22T14:58:56Z

/test e2e-agnostic-upgrade

openshift-ci-robot · 2022-08-22T19:50:36Z

/retest-required

Remaining retests: 2 against base HEAD 869df9e and 8 for PR HEAD d0b6729 in total

wking · 2022-08-22T21:11:29Z

openshift/release#31518 should have replaced our A->B->A presubmit with A->B and B->A presubmits.

/retest

wking · 2022-08-22T21:11:55Z

/test does-not-exist

openshift-ci · 2022-08-22T21:12:04Z

@wking: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test e2e-agnostic
/test e2e-agnostic-operator
/test e2e-agnostic-upgrade
/test gofmt
/test images
/test lint
/test unit

Use /test all to run all jobs.

Details

In response to this:

/test does-not-exist

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

wking · 2022-08-22T21:12:48Z

heh, hasn't actually landed yet 😂

…and B->A jobs (#31518) We've been using abort-at since way back in 05da8a4 (ci-operator: CVO should perform a rollback test, 2020-01-20, #6780), because it's important to cover both: * Can the proposed CVO successfully roll out the cluster on update? (where we want the new CVO in the target release). * Can the proposed CVO successfully validate an update request and launch the requested replacement CVO? (where we want the new CVO in the source release). However, being different has costs, because not all cluster components are tuned to gracefully handle back-to-back updates, and we have had to make temporary changes like b0b6533 (ci-operator/config/openshift/cluster-version-operator: Temporarily drop abort-at from master, 2021-06-17, #19396), before putting our toes back in the water with ad76b3e (Revert "ci-operator/config/openshift/cluster-version-operator: Temporarily drop abort-at from master", 2021-08-03, #20875). While discovering chained-update UX issues is useful, CVO presubmits are probably not the best place to do it (it would be better to have a periodic, so you have coverage even when the CVO is not seeing pull request activity). And a more fundamental issue is that recent origin suites use job classification to look up historical disruption levels to limit regression risk [1]. Those lookups are based on inspections of the cluster-under-test [2], and not on the job configuration. And the classification structure only has Release and FromRelease strings [3]. So today, when the suite checks for disruption at the end of an A->B->A abort-at=100 run, it sets {FromRelease: B, Release: A}, and complains when the disruption from the full A->B->A transition overshoots the allowed disruption for a single B->A leg. We could hypothetically address this by plumbing knowledge of the job configuration through the origin suite, so we could compare with expected disruption from other A->B->A runs. But David's not interested in making that sort of change in origin until we can demonstrate it mattering in a periodic that some team has committed to monitor and maintain, and we don't have that today. Another issue with A->B->A tests is that sometimes we make changes to the CVO that are one way, like extending the enumerated list of ClusterVersion capabilities [4]. On the B->A leg, the CVO restores the original ClusterVersion CRD with a restricted capability enum, and subsequent attempts to update the ClusterVersion resource fail like: I0818 17:41:40.580147 1 cvo.go:544] Error handling openshift-cluster-version/version: ClusterVersion.config.openshift.io "version" is invalid: [status.capabilities.enabledCapabilities[0]: Unsupported value: "Console": supported values: "openshift-samples", "baremetal", "marketplace", status.capabilities.enabledCapabilities[1]: Unsupported value: "Insights": supported values: "openshift-samples", "baremetal", "marketplace", status.capabilities.enabledCapabilities[2]: Unsupported value: "Storage": supported values: "openshift-samples", "baremetal", "marketplace"] By separating into two presubmits, we should have consistently passing A->B updates, with the rest of the organization helping to keep that job style healthy. And we'll also have B->A updates which look the same to the job classifier, and should be similarly healthy as long as we don't make breaking CVO changes. When we do make breaking CVO changes, we can inspect the results, and: /override e2e-agnostic-upgrade-out-of-change when we feel that the proposed CVO cleanly launched the target, and ignore everything that happened once the target CVO started trying to reconcile the cluster. [1]: https://github.com/openshift/origin/blob/c33bf438a00bbd66227186f01c7e6a5c36741492/test/extended/util/disruption/backend_sampler_tester.go#L91-L98 [2]: https://github.com/openshift/origin/blob/c33bf438a00bbd66227186f01c7e6a5c36741492/pkg/synthetictests/platformidentification/types.go#L43-L117 [3]: https://github.com/openshift/origin/blob/c33bf438a00bbd66227186f01c7e6a5c36741492/pkg/synthetictests/platformidentification/types.go#L16-L23 [4]: openshift/cluster-version-operator#801

openshift-ci-robot · 2022-08-22T23:32:44Z

/retest-required

Remaining retests: 1 against base HEAD 869df9e and 7 for PR HEAD d0b6729 in total

wking · 2022-08-22T23:35:25Z

into change:

: Run multi-stage test pre phase
..."e2e-agnostic-upgrade-into-change" pod "e2e-agnostic-upgrade-into-change-ipi-install-install-stableinitial" failed: context canceled...

Not sure what that's about. But digging in to see what's going on with versions:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/801/pull-ci-openshift-cluster-version-operator-master-e2e-agnostic-upgrade-into-change/1561830992749006848/artifacts/e2e-agnostic-upgrade-into-change/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.history[] | .startedTime + " " + .completionTime + " " + .state + " " + .version'
2022-08-22T22:06:47Z 2022-08-22T22:42:12Z Completed 4.12.0-0.ci.test-2022-08-22-214209-ci-op-3mwr1wss-initial
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/801/pull-ci-openshift-cluster-version-operator-master-e2e-agnostic-upgrade-into-change/1561830992749006848/artifacts/e2e-agnostic-upgrade-into-change/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-74b4d84766-pkscn_cluster-version-operator.log | head -n1
I0822 22:27:38.568033       1 start.go:23] ClusterVersionOperator v1.0.0-878-g869df9e1-dirty
$ git --no-pager log --oneline -1 origin/master
869df9e1 (HEAD -> master, origin/release-4.13, origin/release-4.12, origin/master, origin/HEAD) Merge pull request #827 from wking/more-cap-sorting

So hooray, it was trying to install the branch-tip A, not the release B with this change mixed in.

out of change:

: Run multi-stage test test phase expand_less | 33m55s
-- | --
{  ["e2e-agnostic-upgrade-out-of-change" pod "e2e-agnostic-upgrade-out-of-change-openshift-e2e-test" failed: context canceled
...

Maybe the test-platform folks had some kind of node issue that caused them to evict running tests? Anyway, digging in to versions:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/801/pull-ci-openshift-cluster-version-operator-master-e2e-agnostic-upgrade-out-of-change/1561830992786755584/artifacts/e2e-agnostic-upgrade-out-of-change/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.history[] | .startedTime + " " + .completionTime + " " + .state + " " + .version'
2022-08-22T22:30:54Z  Partial 4.12.0-0.ci.test-2022-08-22-215104-ci-op-3mwr1wss-latest
2022-08-22T21:59:50Z 2022-08-22T22:23:58Z Completed 4.12.0-0.ci.test-2022-08-22-214209-ci-op-3mwr1wss-initial
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/801/pull-ci-openshift-cluster-version-operator-master-e2e-agnostic-upgrade-out-of-change/1561830992786755584/artifacts/e2e-agnostic-upgrade-out-of-change/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-66d4c75496-fhdqg_cluster-version-operator.log | head -n1
I0822 22:30:56.387655       1 start.go:23] ClusterVersionOperator v1.0.0-882-gb44878b1-dirty

Hmm, looks like that might be doing an A->B update too. I'll go back and look at openshift/release#31518 to see if I need to change something. In the mean time:

/retest

wking · 2022-08-23T02:50:58Z

C'mon Prow, it doesn't exist anymore...

Just in case:

/test ci/prow/e2e-agnostic-upgrade

openshift-ci · 2022-08-23T02:51:04Z

@wking: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test e2e-agnostic
/test e2e-agnostic-operator
/test e2e-agnostic-upgrade-into-change
/test e2e-agnostic-upgrade-out-of-change
/test gofmt
/test images
/test lint
/test unit

Use /test all to run all jobs.

Details

In response to this:

C'mon Prow, it doesn't exist anymore...

Just in case:

/test ci/prow/e2e-agnostic-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

wking · 2022-08-23T02:51:23Z

/override ci/prow/e2e-agnostic-upgrade

openshift-ci · 2022-08-23T02:51:57Z

@wking: Overrode contexts on behalf of wking: ci/prow/e2e-agnostic-upgrade

Details

In response to this:

/override ci/prow/e2e-agnostic-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci · 2022-08-23T02:51:59Z

@qJkee: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-agnostic-upgrade	`d0b6729`	link	true	`/test e2e-agnostic-upgrade`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 21, 2022

openshift-ci bot requested review from LalatenduMohanty and vrutkovs July 21, 2022 15:06

openshift-ci bot removed request for LalatenduMohanty and vrutkovs July 21, 2022 15:07

qJkee force-pushed the CNF-5645-optional-operator branch 3 times, most recently from 5f23f41 to b4c3e04 Compare July 26, 2022 16:23

qJkee force-pushed the CNF-5645-optional-operator branch from b4c3e04 to 55c8df2 Compare July 27, 2022 20:49

qJkee changed the title ~~[WIP] bump api version to add new capability~~ [WIP][CNF-5645] bump api version to add new capability Aug 10, 2022

qJkee force-pushed the CNF-5645-optional-operator branch from 55c8df2 to ec209ec Compare August 11, 2022 16:35

qJkee changed the title ~~[WIP][CNF-5645] bump api version to add new capability~~ [CNF-5645] bump api version to add new capability Aug 11, 2022

qJkee marked this pull request as ready for review August 11, 2022 16:36

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 11, 2022

openshift-ci bot requested review from LalatenduMohanty and wking August 11, 2022 16:41

qJkee force-pushed the CNF-5645-optional-operator branch from ec209ec to 6c10a68 Compare August 15, 2022 18:19

LalatenduMohanty suggested changes Aug 16, 2022

View reviewed changes

LalatenduMohanty mentioned this pull request Aug 16, 2022

Bump openshift/api #817

Closed

qJkee force-pushed the CNF-5645-optional-operator branch from 6c10a68 to df83c81 Compare August 16, 2022 16:17

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 16, 2022

qJkee force-pushed the CNF-5645-optional-operator branch from df83c81 to e9d2bbc Compare August 16, 2022 16:23

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 16, 2022

wking mentioned this pull request Aug 19, 2022

ci-operator/config/openshift/cluster-version-operator: Separate A->B and B->A jobs openshift/release#31518

Merged

qJkee added 3 commits August 19, 2022 18:00

vendor: upgrade openshift/api

6a438ff

go get github.com/openshift/api@427c7cc572807fd1093cc97098b0c78dcfe618c4 go get k8s.io/[email protected] go mod tidy go mod vendor git add -A go.* vendor

migrate to ConfigMapsLeasesResourceLock

c153b75

With new client-go version 0.24 ConfigMapLock was depricated. This commit migrates to new ConfigMapsLeasesResourceLock mechanism.

update tests with new capabilities

d0b6729

update tests with new capabilities brought by new version of openshift/api package

qJkee force-pushed the CNF-5645-optional-operator branch from 54b69fc to d0b6729 Compare August 19, 2022 22:00

LalatenduMohanty approved these changes Aug 22, 2022

View reviewed changes

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 22, 2022

openshift-merge-robot merged commit f7f9b8d into openshift:master Aug 23, 2022

rluders mentioned this pull request Aug 24, 2022

Adds Insights, Console, and Storage capabilities openshift/installer#6258

Closed

wking mentioned this pull request Aug 25, 2022

CONSOLE-3220: Add console capability to all manifests openshift/console-operator#665

Merged

[CNF-5645] bump api version to add new capability #801

[CNF-5645] bump api version to add new capability #801

Uh oh!

Conversation

qJkee commented Jul 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Jul 21, 2022

Uh oh!

qJkee commented Jul 21, 2022

Uh oh!

qJkee commented Jul 26, 2022

Uh oh!

qJkee commented Jul 27, 2022

Uh oh!

qJkee commented Aug 11, 2022

Uh oh!

qJkee commented Aug 15, 2022

Uh oh!

qJkee commented Aug 16, 2022

Uh oh!

LalatenduMohanty left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LalatenduMohanty commented Aug 19, 2022

Uh oh!

qJkee commented Aug 22, 2022

Uh oh!

LalatenduMohanty left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Aug 22, 2022

Uh oh!

LalatenduMohanty commented Aug 22, 2022

Uh oh!

openshift-ci-robot commented Aug 22, 2022

Uh oh!

wking commented Aug 22, 2022

Uh oh!

wking commented Aug 22, 2022

Uh oh!

openshift-ci bot commented Aug 22, 2022

Uh oh!

wking commented Aug 22, 2022

Uh oh!

openshift-ci-robot commented Aug 22, 2022

Uh oh!

wking commented Aug 22, 2022

Uh oh!

wking commented Aug 23, 2022

Uh oh!

openshift-ci bot commented Aug 23, 2022

Uh oh!

wking commented Aug 23, 2022

Uh oh!

openshift-ci bot commented Aug 23, 2022

Uh oh!

openshift-ci bot commented Aug 23, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

qJkee commented Jul 21, 2022 •

edited

Loading

LalatenduMohanty left a comment •

edited

Loading