-
Notifications
You must be signed in to change notification settings - Fork 33
OTA-1014: controllers: Add metadata container and Route #176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTA-1014: controllers: Add metadata container and Route #176
Conversation
|
@wking: This pull request references OTA-1014 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
32735af to
f463912
Compare
| @echo "Running functional test suite" | ||
| go clean -testcache | ||
| go test -timeout 20m -v ./functests/... | ||
| go test -timeout 20m -v ./functests/... || (oc -n openshift-updateservice adm inspect --dest-dir="$(ARTIFACT_DIR)/inspect" namespace/openshift-updateservice customresourcedefinition/updateservices.updateservice.operator.openshift.io updateservice/example; false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment on why the clause is here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's one of two lines touched by f376675 which explains the addition. I'm fine inlining that commit message in a Makefile comment for folks who prefer to not use blame, if folks want. I'm also fine dropping the commit from the pull once I get CI greened up.
…p-published-graph-data, etc. Moving to a recent Go builder, based on [1] and: $ oc -n ocp get -o json imagestream builder | jq -r '.status.tags[] | select(.items | length > 0) | .items[0].created + " " + .tag' | sort | grep golang ... 2023-11-02T19:53:15Z rhel-8-golang-1.18-openshift-4.11 2023-11-02T19:53:23Z rhel-8-golang-1.17-openshift-4.11 2023-11-02T20:49:19Z rhel-8-golang-1.19-openshift-4.13 2023-11-02T20:49:25Z rhel-9-golang-1.19-openshift-4.13 2023-11-02T21:54:25Z rhel-9-golang-1.20-openshift-4.14 2023-11-02T21:54:46Z rhel-8-golang-1.20-openshift-4.14 2023-11-02T21:55:24Z rhel-8-golang-1.19-openshift-4.14 2023-11-02T21:55:29Z rhel-9-golang-1.19-openshift-4.14 I'd tried dropping the build_root stanza, because we didn't seem to need the functionality it delivers [2]. But that removal caused failures like [3]: Failed to load CI Operator configuration" error="invalid ci-operator config: invalid configuration: when 'images' are specified 'build_root' is required and must have image_stream_tag, project_image or from_repository set" source-file=ci-operator/config/openshift/cincinnati-operator/openshift-cincinnati-operator-master.yaml And [2] docs a need for Git, which apparently the UBI images don't have. So I'm using a Go image here still, even though we don't need Go, and although that means some tedious bumping to keep up with RHEL and Go versions instead of floating. The operators stanza doc'ed in [4] remains largely unchanged, although I did rename 'cincinnati_operand_latest' to 'cincinnati-operand', because these tests use a single operand image, and there is no need to distinguish between multiple operand images with "latest". The image used for operator-sdk (which I bump to an OpenShift 4.14 base) and its use are doc'ed in [5]. The 4.14 cluster-claim pool I'm transitioning to is listed as healthy in [6]. For the end-to-end tests, we install the operator via the test suite, so we do not need the SDK bits. I've dropped OPERATOR_IMAGE, because we are well past the transition initiated by eae9d38 (ci-operator/config/openshift/cincinnati-operator: Set RELATED_IMAGE_*, 2021-04-05, openshift#17435) and openshift/cincinnati-operator@799d18525b (Changing the name to make OSBS auto repo/registry replacements to work, 2021-04-06, openshift/cincinnati-operator#104). I'm consistently using the current Cincinnati operand instead of the pinned one, because we ship the OpenShift Update Service Operator as a bundle with the operator and operand, and while it might be useful to grow update-between-OSUS-releases test coverage, we do not expect long durations of new operators coexisting with old-image operand pods. And we never expect new operators to touch Deployments with old operand images, except to bump them to new operand images. We'd been using digest-pinned operand images here since efcafb6 (ci-operator/config/openshift/cincinnati-operator: Move e2e-operator to multi-step, 2020-10-06, openshift#12486), where I said: In a future pivot we'll pull the operand image out of CI too, instead of hard-coding. But with this change we at least move the hard-coding into the CI repository. 4f46d7e (cincinnati-operator: test operator against released OSUS version and latest master, 2022-01-11, openshift#25152) brought in that floating operand image, but neglected, for reasons that I am not clear on, did not drop the digest-pinned operand. I'm dropping it now. With "which operand image" removed as a differentiator, the remaining differentiators for the end-to-end tests are: * Which host OpenShift? * To protect from "new operators require new platform capabilities not present in older OpenShift releases", we have an old-ocp job. It's currently 4.11 for the oldest supported release [7]. * To protect from "new operators still use platform capabilities that have been removed from development branches of OpenShift", we have a new-ocp job. It's currently 4.14, as the most modern openshift-ci pool in [6], but if there was a 4.15 openshift-ci pool I'd us that to ensure we work on dev-branch engineering candidates like 4.15.0-ec.1. * To protect against "HyperShift does something the operator does not expect", we have a hypershift job. I'd prefer to defer "which version?" to the workflow, because we do not expect HyperShift-specific difference to evolve much between 4.y releases, while the APIs used by the operator (Deployments, Services, Routes, etc.) might. But perhaps I'm wrong, and we will see more API evolution during HyperShift minor versions. And in any case, today 4.14 fails with [8]: Unable to apply 4.14.1: some cluster operators are not available so in the short term I'm going with 4.13, but with a generic name so we only have to bump one place as HyperShift support improves. * I'm not worrying about enumerating all the current 4.y options like we had done before. That is more work to maintain, and renaming required jobs confuses Prow and requires an /override of the removed job. It seems unlikely that we work on 4.old, break on some 4.middle, and work again on 4.dev. Again, we can always revisit this if we change our minds about the exposure. * Which graph-data? * To protect against "I updated my OSUS without changing the graph-data image, and it broke", we have published-graph-data jobs. These consume images that were built by previous postsubmits in the cincinnati-graph-data repository. * We could theoretically also add coverage for older forms of graph-data images we suspect customers might be using. I'm punting this kind of thing to possible future work, if we decide the exposure is significant enough to warrant ongoing CI coverage. * To allow testing new features like serving signatures, we have a local-graph-data job. This consumes a graph-data image built from steps in the operator repository, allowing convenient testing of changes that simultaneously tweak the operator and how the graph-data image is built. For example, [9] injects an image signature into graph-data, and updates graph-data to serve it. I'm setting a GRAPH_DATA environment variable to 'local' to allow the test suite to easily distinguish this case. [1]: https://docs.ci.openshift.org/docs/architecture/images/#ci-images [2]: https://docs.ci.openshift.org/docs/architecture/ci-operator/#build-root-image [3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/45245/pull-ci-openshift-release-master-generated-config/1720218786344210432 [4]: https://docs.ci.openshift.org/docs/how-tos/testing-operator-sdk-operators/#building-operator-bundles [5]: https://docs.ci.openshift.org/docs/how-tos/testing-operator-sdk-operators/#simple-operator-installation [6]: https://docs.ci.openshift.org/docs/how-tos/cluster-claim/#existing-cluster-pools [7]: https://access.redhat.com/support/policy/updates/openshift/#dates [8]: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/45245/rehearse-45245-pull-ci-openshift-cincinnati-operator-master-operator-e2e-hypershift-local-graph-data/1720287506777247744 [9]: openshift/cincinnati-operator#176
|
/retest-required |
|
@wking: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
e99c80d to
f605e05
Compare
|
Local graph-data (where this pull injects a signature) shows this working 🎉: |
PratikMahajan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/hold for other reviewers
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: PratikMahajan, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| Name: name, | ||
| Namespace: instance.Namespace, | ||
| Labels: map[string]string{ | ||
| "app": name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would expect the app label to be the same as the route that exposes the service? but I haven't looked too deep in how do we organize OSUS resources 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newMetadataService is using nameMetadataService for app and its own name. This pattern-matches the existing newPolicyEngineService
newMetadataRoute is using nameMetadataRoute for its own name, and nameDeployment for app. This pattern-matches the existing newPolicyEngineRoute.
There are some existing patterns that don't make sense to me, like why the Services expose the status ports that I'd expect only the kubelet to need access to (and the kubelet gets at the containers without passing through the Service). But I've left that kind of refactoring to follow-up work and just matched existing patterns for this new feature.
| updated.Spec.TLS = route.Spec.TLS | ||
|
|
||
| // found existing resource; let's compare and update if needed | ||
| if !reflect.DeepEqual(updated.Spec, route.Spec) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect controllers to use semantic.DeepEqual from apimachinery? https://github.com/kubernetes/apimachinery/blob/master/pkg/api/equality/semantic.go
But probably not a big deal in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm copy/pasting existing patterns, so I'll punt this pivot to follow-up work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer semantic.DeepEqual from apimachinery as the reason semantic.DeepEqual exists in apimachinery. I would guess that it is more future proof
Catching up with openshift/cincinnati@efe98dcbbbc6 (add metadata-helper deployments, 2023-07-18, openshift/cincinnati#816), allowing users to retrieve signatures from the metadata Route. For signatures provided via the graph-data image, this will provide a more convenient access than pushing signature ConfigMaps to individual clusters. [1] is in flight with a proposed mechanism to configure clusters to consume this signature-metadata endpoint. I'm using the multi-arch 4.13.0 as the example release for signatures: $ curl -s 'https://api.openshift.com/api/upgrades_info/graph?channel=stable-4.13&arch=multi' | jq -r '.nodes[] | select(.version == "4.13.0").payload' quay.io/openshift-release-dev/ocp-release@sha256:beda83fb057e328d6f94f8415382350ca3ddf99bb9094e262184e0f127810ce0 The signature location in the graph-data image is defined in openshift/cincinnati-graph-data@9e9e97cf2a (README: Define a 1.2.0 filesystem schema for release signatures, 2023-04-19, openshift/cincinnati-graph-data#3509). The GRAPH_DATA local check consumes openshift/release@23d93465e8 (ci-operator/config/openshift/cincinnati-operator: operator-e2e-old-ocp-published-graph-data, etc., 2023-11-02, openshift/release#45245), which sets that variable for the operator-e2e-hypershift-local-graph-data presubmit which consumes the graph-data image built from dev/Dockerfile (where we inject the signature we're testing for). The other end-to-end tests will consume external graph-data images (built by cincinnati-graph-data postsubmits), have GRAPH_DATA unset, and expect '404 Not Found' for requests for that signature. [1]: openshift/enhancements#1485
Using: $ go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.13.0 $ controller-gen rbac:roleName=updateservice-operator crd paths=./... output:crd:dir=config/crd/bases $ git add -p # avoid dropping additionalPrinterColumns with: $ go version go version go1.21.1 linux/amd64 $ controller-gen --version # not surprising since I installed 0.13.0 with 'go install ...' above Version: v0.13.0 where additionalPrinterColumns is from bf6d5e5 (Add additionalPrinterColumns to the UpdateService crd, 2021-12-13, openshift#138).
To gather the namespace and UpdateService resources after test-failures, because these aren't getting collected by HyperShift's gathering tools today.
f605e05 to
68f74fc
Compare
|
New changes are detected. LGTM label has been removed. |
|
/hold cancel |
|
Pivot to |
Generated with: $ cp config/crd/bases/updateservice.operator.openshift.io_updateservices.yaml bundle/manifests/updateservice.operator.openshift.io_updateservices.yaml to catch up with 6b266d2 (config: Regenerate UpdateService CRD, 2023-10-03, openshift#176). 24c7382 (Include bundle manifests and Dockerfile, 2023-08-08, openshift#173) suggests I could have done this with 'make bundle VERSION=5.0.3-dev' or some such, but I don't have operator-sdk installed at the moment.
…sureDeployment
Avoiding "encountered unexpected container in pod" issues [1]:
# ./oc -n install-osus-here logs updateservice-operator-5bdfcff5c5-wqjdt|grep metadata|tail -n2
1.701074599173428e+09 INFO controller_updateservice Updating Service {"Request.Namespace": "install-osus-here", "Request.Name": "sample", "Namespace": "install-osus-here", "Name": "sample-metadata"}
1.7010745992028923e+09 INFO controller_updateservice encountered unexpected container in pod {"Request.Namespace": "install-osus-here", "Request.Name": "sample", "Container.Name": "metadata"}
which is also seen in [2] in this CI run [3]. Fixing up 425be2f
(controllers: Add metadata container and Route, 2023-09-12, openshift#176).
[1]: https://issues.redhat.com/browse/OTA-958?focusedId=23535225&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-23535225
[2]: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cincinnati-operator/176/pull-ci-openshift-cincinnati-operator-master-operator-e2e-hypershift-local-graph-data/1722333693256667136/artifacts/operator-e2e-hypershift-local-graph-data/e2e-test/artifacts/inspect/namespaces/openshift-updateservice/pods/updateservice-operator-55d95555b5-8cft9/updateservice-operator/updateservice-operator/logs/current.log
[3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cincinnati-operator/176/pull-ci-openshift-cincinnati-operator-master-operator-e2e-hypershift-local-graph-data/1722333693256667136
The Go-side 'optional' was not enough to get these optional, so talk
to kubebuilder directly. We need metadataURI to be optional to avoid
[1]:
$ oc -n openshift-update-service get -o json installplan install-wpgw2 | jq '.status.conditions[]'
{
"lastTransitionTime": "2024-05-20T07:05:46Z",
"lastUpdateTime": "2024-05-20T07:05:46Z",
"message": "error validating existing CRs against new CRD's schema for \"updateservices.updateservice.operator.openshift.io\": error validating updateservice.operator.openshift.io/v1, Kind=UpdateService \"openshift-update-service/sample\": updated validation is too restrictive: [].status.metadataURI: Required value",
"reason": "InstallComponentFailed",
"status": "False",
"type": "Installed"
}
We need those existing 5.0.2 UpdateService to be compatible with the
incoming 5.0.3 CustomResourceDefinition, so we can install the new CRD
and updated operator, so the incoming operator can populate the new
metadataURI property.
While I was fixing that property, I'm adding the same comment to
policyEngineURI for consistency. This will allow the operator do
things like setting conditions complaining about difficulty
provisioning the Route needed to figure out the policyEngineURI.
After updating the Go, I updated config/crd/bases/... like 6b266d2
(config: Regenerate UpdateService CRD, 2023-10-03, openshift#176):
$ controller-gen rbac:roleName=updateservice-operator crd paths=./... output:crd:dir=config/crd/bases
$ git add -p # preserve additionalPrinterColumns
using the same:
$ controller-gen --version
Version: v0.13.0
I'd built in 6b266d2 (there may have been subsequent releases; I
haven't checked). Then dropping the additionalPrinterColumns changes:
$ git restore config/crd/bases/updateservice.operator.openshift.io_updateservices.yaml
And copying into the bundle as I'd done in e5716f8
(bundle/manifests: Update UpdateService CRD to pick up metadataURI,
2023-11-22, openshift#179):
$ cp config/crd/bases/updateservice.operator.openshift.io_updateservices.yaml bundle/manifests/updateservice.operator.openshift.io_updateservices.yaml
The Go-side 'optional' was not enough to get these optional, so talk
to kubebuilder directly. We need metadataURI to be optional to avoid
[1]:
$ oc -n openshift-update-service get -o json installplan install-wpgw2 | jq '.status.conditions[]'
{
"lastTransitionTime": "2024-05-20T07:05:46Z",
"lastUpdateTime": "2024-05-20T07:05:46Z",
"message": "error validating existing CRs against new CRD's schema for \"updateservices.updateservice.operator.openshift.io\": error validating updateservice.operator.openshift.io/v1, Kind=UpdateService \"openshift-update-service/sample\": updated validation is too restrictive: [].status.metadataURI: Required value",
"reason": "InstallComponentFailed",
"status": "False",
"type": "Installed"
}
We need those existing 5.0.2 UpdateService to be compatible with the
incoming 5.0.3 CustomResourceDefinition, so we can install the new CRD
and updated operator, so the incoming operator can populate the new
metadataURI property.
While I was fixing that property, I'm adding the same comment to
policyEngineURI for consistency. This will allow the operator do
things like setting conditions complaining about difficulty
provisioning the Route needed to figure out the policyEngineURI.
After updating the Go, I updated config/crd/bases/... like 6b266d2
(config: Regenerate UpdateService CRD, 2023-10-03, openshift#176):
$ controller-gen rbac:roleName=updateservice-operator crd paths=./... output:crd:dir=config/crd/bases
$ git add -p # preserve additionalPrinterColumns
using the same:
$ controller-gen --version
Version: v0.13.0
I'd built in 6b266d2 (there may have been subsequent releases; I
haven't checked). Then dropping the additionalPrinterColumns changes:
$ git restore config/crd/bases/updateservice.operator.openshift.io_updateservices.yaml
And copying into the bundle as I'd done in e5716f8
(bundle/manifests: Update UpdateService CRD to pick up metadataURI,
2023-11-22, openshift#179):
$ cp config/crd/bases/updateservice.operator.openshift.io_updateservices.yaml bundle/manifests/updateservice.operator.openshift.io_updateservices.yaml
[1]: https://issues.redhat.com/browse/OCPBUGS-33939
Catching up with openshift/cincinnati@efe98dcbbbc6 (openshift/cincinnati#816), allowing users to retrieve signatures from the metadata Route. For signatures provided via the graph-data image, this will provide a more convenient access than pushing signature ConfigMaps to individual clusters. openshift/enhancements#1485 is in flight with a proposed mechanism to configure clusters to consume this signature-metadata endpoint.