-
Notifications
You must be signed in to change notification settings - Fork 2.1k
ci-operator/config/openshift/cincinnati-operator: Move e2e-operator to multi-step #12486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci-operator/config/openshift/cincinnati-operator: Move e2e-operator to multi-step #12486
Conversation
|
@wking: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
Hmm, we need |
Sounds like we need |
|
Not sure the e2e test should be using deploy.sh. That's just a convenience script for deploying manually. It's also used when running the unit tests. I would think we want our e2e test to test the way this will be deployed in the field which is per step 11 of https://github.com/openshift/cincinnati-operator/blob/master/docs/disconnected-cincinnati-operator.md or at least along those lines and/or per the doc you referenced OLM-operator CI support. |
|
Also, I'm in the process of fixing deploy.sh because the files it references have changed locations and names. |
32e11cd to
7abfc1a
Compare
|
With 32e11cd909 -> 7abfc1a8c4, I've rebased onto master and added |
7abfc1a to
4eb6956
Compare
|
Still failing to build |
|
openshift/cincinnati-operator#69 landed. /retest |
|
/test pj-rehearse |
4eb6956 to
f547d44
Compare
$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_release/12486/rehearse-12486-pull-ci-openshift-cincinnati-operator-master-operator-e2e/1318745953997426688/build-log.txt | grep -1 'into stable\|panic'
2020/10/21 02:51:30 Build cincinnati-graph-data-container succeeded after 1m28s
2020/10/21 02:51:30 Tagging cincinnati-graph-data-container into stable
2020/10/21 02:51:49 Build cincinnati-operator succeeded after 1m47s
2020/10/21 02:51:49 Tagging cincinnati-operator into stable
2020/10/21 02:51:50 Create release image registry.build01.ci.openshift.org/ci-op-qwfkx85i/release:latest
--
I1021 03:29:37.177795 4880 utils.go:121] Waiting for full availability of cincinnati-operator deployment (0/1)
panic: test timed out after 10m0sbecause of: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_release/12486/rehearse-12486-pull-ci-openshift-cincinnati-operator-master-operator-e2e/1318745953997426688/artifacts/operator-e2e/gather-extra/pods.json | jq -r '.items[] | select(.metadata.name | startswith("cincinnati-operator-")).status.containerStatuses[].state.waiting.message'
Back-off pulling image "registry.svc.ci.openshift.org/ci-op-qwfkx85i/stable:cincinnati-operator"Like here. The failure is because the operator-repo-hard-coded |
f547d44 to
268a47a
Compare
|
Hrm: $ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_release/12486/rehearse-12486-pull-ci-openshift-cincinnati-operator-master-operator-e2e/1319430438657200128/build-log.txt | grep -1 'into stable\|panic'
2020/10/23 00:15:02 Build cincinnati-operator succeeded after 3m51s
2020/10/23 00:15:02 Tagging cincinnati-operator into stable
2020/10/23 00:26:03 Build cincinnati-graph-data-container succeeded after 14m52s
2020/10/23 00:26:03 Tagging cincinnati-graph-data-container into stable
2020/10/23 00:26:03 Create release image registry.build01.ci.openshift.org/ci-op-q1invh8m/release:latest
--
I1023 01:21:33.184314 4893 utils.go:121] Waiting for full availability of cincinnati-operator deployment (0/1)
panic: test timed out after 10m0s
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_release/12486/rehearse-12486-pull-ci-openshift-cincinnati-operator-master-operator-e2e/1319430438657200128/artifacts/operator-e2e/gather-extra/pods.json | jq -r '.items[] | select(.metadata.name | startswith("cincinnati-operator-")).status.containerStatuses[].state.waiting.message'
container create failed: time="2020-10-23T01:26:41Z" level=error msg="container_linux.go:366: starting container process caused: exec: \"cincinnati-operator\": executable file not found in $PATH"Progress 😆 |
|
I think that's good enough, and that we should land this as it stands, go clean some stuff up in the operator repo, and then come back and polish off the remaining hacks on this side. |
…o multi-step Using the openshift-e2e-gcp workflow and overriding the test step per [1] to run our operator tests instead of the usual e2e suite. I've dropped "cincinnati" from the job name, because this presubmit only runs in the cincinnati-operator repository. The fact that it is operator-e2e is sufficient to distinguish from other presubmits in that repository. I've dropped "aws" from the job name, because we are platform-agnostic (see ci-operator/platform-balance). The 'cli: initial' property injects 'oc' into the step container [2,3], because we need 'oc', a Go toolchain, and our source checkout to run the CI suite. The dependencies avoid [4]: $ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_release/12486/rehearse-12486-pull-ci-openshift-cincinnati-operator-master-operator-e2e/1318745953997426688/build-log.txt | grep -1 'into stable\|panic' 2020/10/21 02:51:30 Build cincinnati-graph-data-container succeeded after 1m28s 2020/10/21 02:51:30 Tagging cincinnati-graph-data-container into stable 2020/10/21 02:51:49 Build cincinnati-operator succeeded after 1m47s 2020/10/21 02:51:49 Tagging cincinnati-operator into stable 2020/10/21 02:51:50 Create release image registry.build01.ci.openshift.org/ci-op-qwfkx85i/release:latest -- I1021 03:29:37.177795 4880 utils.go:121] Waiting for full availability of cincinnati-operator deployment (0/1) panic: test timed out after 10m0s because of: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_release/12486/rehearse-12486-pull-ci-openshift-cincinnati-operator-master-operator-e2e/1318745953997426688/artifacts/operator-e2e/gather-extra/pods.json | jq -r '.items[] | select(.metadata.name | startswith("cincinnati-operator-")).status.containerStatuses[].state.waiting.message' Back-off pulling image "registry.svc.ci.openshift.org/ci-op-qwfkx85i/stable:cincinnati-operator" The failure is because the operator-repo-hard-coded registry.svc.ci.openshift.org default does not match the registry.build01.ci.openshift.org where the CI operator was injecting the images. By using explicit dependency images, we drop our reliance on the unreliable operator-repo-hard-coded values. I'm also setting OPERAND_IMAGE to the most recent published image: $ skopeo inspect docker://quay.io/app-sre/cincinnati@sha256:d1d2f881bce1a1375ec8470133ee0a912164b8a7ecce19aac24d24e623aef59b | jq -r .Created 2020-10-12T17:08:41.179845937Z In a future pivot we'll pull the operand image out of CI too, instead of hard-coding. But with this change we at least move the hard-coding into the CI repository. And I'm clearing OPENSHIFT_BUILD_NAMESPACE, because hack/deploy.sh uses it to clobber both OPERATOR_IMAGE and GRAPH_DATA_IMAGE [4], and we don't want those clobbered anymore. Once we have green CI, we can update the operator repo to simplify the logic. Generated by editing ci-operator/config and then running: $ make update [1]: https://steps.ci.openshift.org/help#config [2]: openshift/ci-tools#1296 [3]: https://docs.ci.openshift.org/docs/architecture/step-registry/#injecting-the-oc-cli [4]: https://github.com/openshift/cincinnati-operator/blob/8fce9de9dfe004249b9b19a83d1cbec3c4095965/hack/deploy.sh#L11
268a47a to
efcafb6
Compare
|
/assign @jottofar |
|
@wking: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jottofar, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
@wking: Updated the following 15 configmaps:
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
…incy The Cincinnati image is the operand, not the operator. Fixes a typo from efcafb6 (ci-operator/config/openshift/cincinnati-operator: Move e2e-operator to multi-step, 2020-10-06, openshift#12486).
…p-published-graph-data, etc. Moving to a recent Go builder, based on [1] and: $ oc -n ocp get -o json imagestream builder | jq -r '.status.tags[] | select(.items | length > 0) | .items[0].created + " " + .tag' | sort | grep golang ... 2023-11-02T19:53:15Z rhel-8-golang-1.18-openshift-4.11 2023-11-02T19:53:23Z rhel-8-golang-1.17-openshift-4.11 2023-11-02T20:49:19Z rhel-8-golang-1.19-openshift-4.13 2023-11-02T20:49:25Z rhel-9-golang-1.19-openshift-4.13 2023-11-02T21:54:25Z rhel-9-golang-1.20-openshift-4.14 2023-11-02T21:54:46Z rhel-8-golang-1.20-openshift-4.14 2023-11-02T21:55:24Z rhel-8-golang-1.19-openshift-4.14 2023-11-02T21:55:29Z rhel-9-golang-1.19-openshift-4.14 I'd tried dropping the build_root stanza, because we didn't seem to need the functionality it delivers [2]. But that removal caused failures like [3]: Failed to load CI Operator configuration" error="invalid ci-operator config: invalid configuration: when 'images' are specified 'build_root' is required and must have image_stream_tag, project_image or from_repository set" source-file=ci-operator/config/openshift/cincinnati-operator/openshift-cincinnati-operator-master.yaml And [2] docs a need for Git, which apparently the UBI images don't have. So I'm using a Go image here still, even though we don't need Go, and although that means some tedious bumping to keep up with RHEL and Go versions instead of floating. The operators stanza doc'ed in [4] remains largely unchanged, although I did rename 'cincinnati_operand_latest' to 'cincinnati-operand', because these tests use a single operand image, and there is no need to distinguish between multiple operand images with "latest". The image used for operator-sdk (which I bump to an OpenShift 4.14 base) and its use are doc'ed in [5]. The 4.14 cluster-claim pool I'm transitioning to is listed as healthy in [6]. For the end-to-end tests, we install the operator via the test suite, so we do not need the SDK bits. I've dropped OPERATOR_IMAGE, because we are well past the transition initiated by eae9d38 (ci-operator/config/openshift/cincinnati-operator: Set RELATED_IMAGE_*, 2021-04-05, openshift#17435) and openshift/cincinnati-operator@799d18525b (Changing the name to make OSBS auto repo/registry replacements to work, 2021-04-06, openshift/cincinnati-operator#104). I'm consistently using the current Cincinnati operand instead of the pinned one, because we ship the OpenShift Update Service Operator as a bundle with the operator and operand, and while it might be useful to grow update-between-OSUS-releases test coverage, we do not expect long durations of new operators coexisting with old-image operand pods. And we never expect new operators to touch Deployments with old operand images, except to bump them to new operand images. We'd been using digest-pinned operand images here since efcafb6 (ci-operator/config/openshift/cincinnati-operator: Move e2e-operator to multi-step, 2020-10-06, openshift#12486), where I said: In a future pivot we'll pull the operand image out of CI too, instead of hard-coding. But with this change we at least move the hard-coding into the CI repository. 4f46d7e (cincinnati-operator: test operator against released OSUS version and latest master, 2022-01-11, openshift#25152) brought in that floating operand image, but neglected, for reasons that I am not clear on, did not drop the digest-pinned operand. I'm dropping it now. With "which operand image" removed as a differentiator, the remaining differentiators for the end-to-end tests are: * Which host OpenShift? * To protect from "new operators require new platform capabilities not present in older OpenShift releases", we have an old-ocp job. It's currently 4.11 for the oldest supported release [7]. * To protect from "new operators still use platform capabilities that have been removed from development branches of OpenShift", we have a new-ocp job. It's currently 4.14, as the most modern openshift-ci pool in [6], but if there was a 4.15 openshift-ci pool I'd us that to ensure we work on dev-branch engineering candidates like 4.15.0-ec.1. * To protect against "HyperShift does something the operator does not expect", we have a hypershift job. I'd prefer to defer "which version?" to the workflow, because we do not expect HyperShift-specific difference to evolve much between 4.y releases, while the APIs used by the operator (Deployments, Services, Routes, etc.) might. But perhaps I'm wrong, and we will see more API evolution during HyperShift minor versions. And in any case, today 4.14 fails with [8]: Unable to apply 4.14.1: some cluster operators are not available so in the short term I'm going with 4.13, but with a generic name so we only have to bump one place as HyperShift support improves. * I'm not worrying about enumerating all the current 4.y options like we had done before. That is more work to maintain, and renaming required jobs confuses Prow and requires an /override of the removed job. It seems unlikely that we work on 4.old, break on some 4.middle, and work again on 4.dev. Again, we can always revisit this if we change our minds about the exposure. * Which graph-data? * To protect against "I updated my OSUS without changing the graph-data image, and it broke", we have published-graph-data jobs. These consume images that were built by previous postsubmits in the cincinnati-graph-data repository. * We could theoretically also add coverage for older forms of graph-data images we suspect customers might be using. I'm punting this kind of thing to possible future work, if we decide the exposure is significant enough to warrant ongoing CI coverage. * To allow testing new features like serving signatures, we have a local-graph-data job. This consumes a graph-data image built from steps in the operator repository, allowing convenient testing of changes that simultaneously tweak the operator and how the graph-data image is built. For example, [9] injects an image signature into graph-data, and updates graph-data to serve it. I'm setting a GRAPH_DATA environment variable to 'local' to allow the test suite to easily distinguish this case. [1]: https://docs.ci.openshift.org/docs/architecture/images/#ci-images [2]: https://docs.ci.openshift.org/docs/architecture/ci-operator/#build-root-image [3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/45245/pull-ci-openshift-release-master-generated-config/1720218786344210432 [4]: https://docs.ci.openshift.org/docs/how-tos/testing-operator-sdk-operators/#building-operator-bundles [5]: https://docs.ci.openshift.org/docs/how-tos/testing-operator-sdk-operators/#simple-operator-installation [6]: https://docs.ci.openshift.org/docs/how-tos/cluster-claim/#existing-cluster-pools [7]: https://access.redhat.com/support/policy/updates/openshift/#dates [8]: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/45245/rehearse-45245-pull-ci-openshift-cincinnati-operator-master-operator-e2e-hypershift-local-graph-data/1720287506777247744 [9]: openshift/cincinnati-operator#176
…p-published-graph-data, etc. (#45245) Moving to a recent Go builder, based on [1] and: $ oc -n ocp get -o json imagestream builder | jq -r '.status.tags[] | select(.items | length > 0) | .items[0].created + " " + .tag' | sort | grep golang ... 2023-11-02T19:53:15Z rhel-8-golang-1.18-openshift-4.11 2023-11-02T19:53:23Z rhel-8-golang-1.17-openshift-4.11 2023-11-02T20:49:19Z rhel-8-golang-1.19-openshift-4.13 2023-11-02T20:49:25Z rhel-9-golang-1.19-openshift-4.13 2023-11-02T21:54:25Z rhel-9-golang-1.20-openshift-4.14 2023-11-02T21:54:46Z rhel-8-golang-1.20-openshift-4.14 2023-11-02T21:55:24Z rhel-8-golang-1.19-openshift-4.14 2023-11-02T21:55:29Z rhel-9-golang-1.19-openshift-4.14 I'd tried dropping the build_root stanza, because we didn't seem to need the functionality it delivers [2]. But that removal caused failures like [3]: Failed to load CI Operator configuration" error="invalid ci-operator config: invalid configuration: when 'images' are specified 'build_root' is required and must have image_stream_tag, project_image or from_repository set" source-file=ci-operator/config/openshift/cincinnati-operator/openshift-cincinnati-operator-master.yaml And [2] docs a need for Git, which apparently the UBI images don't have. So I'm using a Go image here still, even though we don't need Go, and although that means some tedious bumping to keep up with RHEL and Go versions instead of floating. The operators stanza doc'ed in [4] remains largely unchanged, although I did rename 'cincinnati_operand_latest' to 'cincinnati-operand', because these tests use a single operand image, and there is no need to distinguish between multiple operand images with "latest". The image used for operator-sdk (which I bump to an OpenShift 4.14 base) and its use are doc'ed in [5]. The 4.14 cluster-claim pool I'm transitioning to is listed as healthy in [6]. For the end-to-end tests, we install the operator via the test suite, so we do not need the SDK bits. I've dropped OPERATOR_IMAGE, because we are well past the transition initiated by eae9d38 (ci-operator/config/openshift/cincinnati-operator: Set RELATED_IMAGE_*, 2021-04-05, #17435) and openshift/cincinnati-operator@799d18525b (Changing the name to make OSBS auto repo/registry replacements to work, 2021-04-06, openshift/cincinnati-operator#104). I'm consistently using the current Cincinnati operand instead of the pinned one, because we ship the OpenShift Update Service Operator as a bundle with the operator and operand, and while it might be useful to grow update-between-OSUS-releases test coverage, we do not expect long durations of new operators coexisting with old-image operand pods. And we never expect new operators to touch Deployments with old operand images, except to bump them to new operand images. We'd been using digest-pinned operand images here since efcafb6 (ci-operator/config/openshift/cincinnati-operator: Move e2e-operator to multi-step, 2020-10-06, #12486), where I said: In a future pivot we'll pull the operand image out of CI too, instead of hard-coding. But with this change we at least move the hard-coding into the CI repository. 4f46d7e (cincinnati-operator: test operator against released OSUS version and latest master, 2022-01-11, #25152) brought in that floating operand image, but neglected, for reasons that I am not clear on, did not drop the digest-pinned operand. I'm dropping it now. With "which operand image" removed as a differentiator, the remaining differentiators for the end-to-end tests are: * Which host OpenShift? * To protect from "new operators require new platform capabilities not present in older OpenShift releases", we have an old-ocp job. It's currently 4.11 for the oldest supported release [7]. * To protect from "new operators still use platform capabilities that have been removed from development branches of OpenShift", we have a new-ocp job. It's currently 4.14, as the most modern openshift-ci pool in [6], but if there was a 4.15 openshift-ci pool I'd us that to ensure we work on dev-branch engineering candidates like 4.15.0-ec.1. * To protect against "HyperShift does something the operator does not expect", we have a hypershift job. I'd prefer to defer "which version?" to the workflow, because we do not expect HyperShift-specific difference to evolve much between 4.y releases, while the APIs used by the operator (Deployments, Services, Routes, etc.) might. But perhaps I'm wrong, and we will see more API evolution during HyperShift minor versions. And in any case, today 4.14 fails with [8]: Unable to apply 4.14.1: some cluster operators are not available so in the short term I'm going with 4.13, but with a generic name so we only have to bump one place as HyperShift support improves. * I'm not worrying about enumerating all the current 4.y options like we had done before. That is more work to maintain, and renaming required jobs confuses Prow and requires an /override of the removed job. It seems unlikely that we work on 4.old, break on some 4.middle, and work again on 4.dev. Again, we can always revisit this if we change our minds about the exposure. * Which graph-data? * To protect against "I updated my OSUS without changing the graph-data image, and it broke", we have published-graph-data jobs. These consume images that were built by previous postsubmits in the cincinnati-graph-data repository. * We could theoretically also add coverage for older forms of graph-data images we suspect customers might be using. I'm punting this kind of thing to possible future work, if we decide the exposure is significant enough to warrant ongoing CI coverage. * To allow testing new features like serving signatures, we have a local-graph-data job. This consumes a graph-data image built from steps in the operator repository, allowing convenient testing of changes that simultaneously tweak the operator and how the graph-data image is built. For example, [9] injects an image signature into graph-data, and updates graph-data to serve it. I'm setting a GRAPH_DATA environment variable to 'local' to allow the test suite to easily distinguish this case. [1]: https://docs.ci.openshift.org/docs/architecture/images/#ci-images [2]: https://docs.ci.openshift.org/docs/architecture/ci-operator/#build-root-image [3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/45245/pull-ci-openshift-release-master-generated-config/1720218786344210432 [4]: https://docs.ci.openshift.org/docs/how-tos/testing-operator-sdk-operators/#building-operator-bundles [5]: https://docs.ci.openshift.org/docs/how-tos/testing-operator-sdk-operators/#simple-operator-installation [6]: https://docs.ci.openshift.org/docs/how-tos/cluster-claim/#existing-cluster-pools [7]: https://access.redhat.com/support/policy/updates/openshift/#dates [8]: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/45245/rehearse-45245-pull-ci-openshift-cincinnati-operator-master-operator-e2e-hypershift-local-graph-data/1720287506777247744 [9]: openshift/cincinnati-operator#176
Using the
openshift-e2e-gcpworkflow and overriding theteststep per these docs to run our operator tests instead of the usual e2e suite.I've dropped
cincinnatifrom the job name, because this presubmit only runs in the cincinnati-operator repository. The fact that it isoperator-e2eis sufficient to distinguish from other presubmits in that repository.I've dropped
awsfrom the job name, because we are platform-agnostic.Generated by editing
ci-operator/configand then running:$ make updateWIP because once we get a green rehearsal I'll extend this to cover 4.6+.