-
Notifications
You must be signed in to change notification settings - Fork 220
NE-2022: Bump to OSSM 3.0.1 and Istio 1.24.4 #1227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NE-2022: Bump to OSSM 3.0.1 and Istio 1.24.4 #1227
Conversation
|
@Miciah: This pull request references NE-2022 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
e2e-hypershift failed because |
|
e2e-aws-gatewayapi failed because Once the httproute is accepted, the I will push a commit to increase the timeout to 10 minutes. It is less clear why I do see that the Istiod pod starts failing readiness probes at T18:31:55 and gets stopped at T18:34:10, with a new pod created at T18:34:11: Maybe Finally, during Istiod's shutdown, it logged some permissions errors: I'll check with the Service Mesh team about those errors. |
|
@Miciah: This pull request references NE-2022 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
e2e-aws-operator failed on This might reflect a real regression in OSSM 3.0.1.
However, it timed out waiting for the DNS record to resolve: It is possible that DNS caching or propagation delay caused this failure. /test e2e-aws-operator |
|
Actually, in the e2e-aws-operator job, it looks like the DNS name did eventually resolve (though it took a while), but then the connection timed out. |
|
@Miciah: This pull request references NE-2022 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest |
|
e2e-aws-operator failed in
( |
0478030 to
d325731
Compare
|
https://github.com/openshift/cluster-ingress-operator/compare/0478030d2564a309f2514e80eb97c0dd65804fdb..d3257317c7c0c8e14c8c0704b2a40bd81f95b527 sets |
|
We are now blocked on #1236 for the Go 1.24 bump, which the sail-operator bump requires. |
|
#1236 has merged. /retest |
d325731 to
ece76fd
Compare
|
https://github.com/openshift/cluster-ingress-operator/compare/d3257317c7c0c8e14c8c0704b2a40bd81f95b527..ece76fddf9ce97b985fddc36aafcc73197c28f17 removes e2e-aws-operator failed because e2e-gcp-operator failed because not all the nodes came up, and e2e-azure-operator failed only on I did some manual testing with an AWS cluster and could not reproduce the HTTP errors when running /test e2e-aws-operator |
Also, explicitly set ENABLE_GATEWAY_API_MANUAL_DEPLOYMENT to "false" on the Istio CR. For Istio 1.24.3, OSSM has a vendor override that sets this option[1]. However, for Istio 1.24.4, the option must be explicitly set. 1. https://github.com/openshift-service-mesh/sail-operator/blob/3bf27ee3c4fb4494ffe6028c7f72034c5a7a1e60/pkg/istiovalues/vendor_defaults.yaml#L11-L14 This commit resolves NE-2022. https://issues.redhat.com/browse/NE-2022 * cmd/ingress-operator/start.go (defaultGatewayAPIOperatorVersion): * manifests/02-deployment-ibm-cloud-managed.yaml (GATEWAY_API_OPERATOR_VERSION): * manifests/02-deployment.yaml (GATEWAY_API_OPERATOR_VERSION): Bump from OSSM v3.0.0 to v3.0.1. * pkg/operator/controller/gatewayclass/istio.go (desiredIstio): Bump from Istio v1.24.3 to v1.24.4. Set ENABLE_GATEWAY_API_MANUAL_DEPLOYMENT to "false".
To avoid conflicts with user-managed control-planes, set a custom name
for the CA bundle configmaps for the Istio control-plane that the
operator manages. Also, configure Istio to inject the configmaps only
into namespaces where gateways exist in order to avoid polluting the
whole cluster.
Set one new environment variable in the Istio CR:
PILOT_ENABLE_GATEWAY_API_CA_CERT_ONLY
Set the Istio CR's trustBundleName global value to match the custom
configmap name. This change requires bumping the sail-operator API:
go get github.com/istio-ecosystem/sail-operator@30be83268d6b6bfaf6fb0562a6c3e505a17422ea
This commit is related to OSSM-9076.
* go.mod: Bump github.com/istio-ecosystem/sail-operator.
* go.sum:
* vendor/*/: Regenerate.
* pkg/operator/controller/gatewayclass/istio.go (desiredIstio): Set the
new environment variable and trustBundleName field
* pkg/operator/controller/names.go (OpenShiftGatewayCARootCertName): New
const.
Modified-by: Miciah Masters <miciah.masters@gmail.com>
Configure Istiod not to copy annotations or labels from gateways onto associated resources, such as the proxy deployment and load-balancer service for a gateway. This copying behavior is Istio-specific, not part of the Gateway API spec, and could be used to inject unsupported configuration. For example, an end-user could set a service annotation on the gateway in order to configure a load-balancer. Setting annotations on the gateway to configure the load-balancer would not be portable to other Gateway API implementations and would complicate product support. This commit is related to OSSM-8989. https://issues.redhat.com/browse/OSSM-8989 * pkg/operator/controller/gatewayclass/istio.go (desiredIstio): Set the "PILOT_ENABLE_GATEWAY_API_COPY_LABELS_ANNOTATIONS" to "false".
Delete the obsolete PILOT_ENABLE_GATEWAY_CONTROLLER_MODE environment variable from the Istiod configuration. This environment variable is no longer recognized in OSSM 3, and the variable has been superseded by EnhancedResourceScoping. * pkg/operator/controller/gatewayclass/istio.go (desiredIstio): Delete PILOT_ENABLE_GATEWAY_CONTROLLER_MODE.
Increase the timeout in assertDNSRecord for polling for the DNSRecord CR from 1 minute to 10 minutes. The cloud provider can easily take over a minute to provision the load balancer, and the operator cannot create the DNSRecord CR before the load balancer has been provisioned and assigned a host name or address. Consequently, the polling loop could easily reach the 1-minute timeout just on account of the time that it takes to provision the load balancer. * test/e2e/util_gatewayapi_test.go (assertDNSRecord): Increase timeout for the DNSRecord CR polling loop from 1m to 10m.
Increase the timeout for polling the gateway, and dump the gateway if the test fails. * test/e2e/gateway_api_test.go (testGatewayAPIManualDeployment): Increase the timeout for polling the gateway from 1m to 5m. Dump the gateway if the test fails.
|
e2e-azure-operator and e2e-gcp-operator failed only on |
ece76fd to
f1e445d
Compare
|
https://github.com/openshift/cluster-ingress-operator/compare/ece76fddf9ce97b985fddc36aafcc73197c28f17..f1e445da0c864fc0c15ed3f90b7f3f2f6483a014 sets |
|
/skip |
|
e2e-aws-operator failed because the "e2e-aws-operator-ipi-deprovision-deprovision" step failed. The tests all passed. |
|
e2e-aws-operator-techpreview failed because the |
|
e2e-aws-ovn failed on the "e2e-aws-ovn-ipi-deprovision-deprovision" step, but otherwise tests were passing. |
|
e2e-aws-ovn-serial also failed on the "e2e-aws-ovn-serial-ipi-deprovision-deprovision" step. |
|
e2e-aws-ovn-single-node, e2e-aws-ovn-techpreview, and e2e-aws-ovn-upgrade all failed on the "e2e-aws-ovn-serial-ipi-deprovision-deprovision" step. e2e-aws-ovn-techpreview also failed on Using search.ci, I found a few similar "deleting CustomResourceDefinition: timed out waiting for the condition" errors; I filed OCPBUGS-59257 to track the issue. |
|
/test e2e-aws-operator |
|
e2e-aws-operator failed because no worker nodes came up. /test e2e-aws-operator |
|
e2e-aws-operator failed because the job timed out: The error message is misleading. From the timestamps, it is clear that the tests did not run for even close to 4 hours. First, building the images took ~40 minutes: Then, getting a lease for the infrastructure took ~85 minutes: And then, installing the cluster took ~49 minutes: So the entire CI job appears to be constrained to 4 hours, the setup took almost 3 hours, and the tests themselves had just over 1 hour to run before the 4 hours elapsed and the job was terminated. Getting the lease should not have taken so long. I hope that the issue the caused the delay has been resolved. |
|
e2e-aws-operator looks good. Let's try the other AWS jobs now. /test e2e-aws-ovn |
|
/test e2e-aws-gatewayapi-conformance |
|
No issue found in pre-merge test |
186e633 to
f1e445d
Compare
|
https://github.com/openshift/cluster-ingress-operator/compare/186e633124c8ad819ef20c429df3a93be7d4987e..f1e445da0c864fc0c15ed3f90b7f3f2f6483a014 drops the OSSM 3.0.3 bump so that we can backport a single-version bump, which should enable automatic updates. |
|
/assign |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alebedev87 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Redo pre-merge test with the PR and the CSV and Istio look good, but I can see two installplans and both are |
Yes, OLM creates "next in upgrade graph" installplan after the current installplan was applied. This allows subscriptions with
No. It means that if the OSSM operator bump is next in the upgrade graph and no specific upgrade logic is needed, the current install plan approval logic (of cluster ingress operator) would work. Like in this PR, we have |
|
@Miciah: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
4f1ed7f
into
openshift:master
|
[ART PR BUILD NOTIFIER] Distgit: ose-cluster-ingress-operator |
Bump to OSSM 3.0.1 and Istio 1.24.4
Bump from OSSM v3.0.0 to v3.0.1 and from Istio v1.24.3 to v1.24.4.
Also, explicitly set
ENABLE_GATEWAY_API_MANUAL_DEPLOYMENTto "false" on the Istio CR. For Istio 1.24.3, OSSM has a vendor override that sets this option. However, for Istio 1.24.4, the option must be explicitly set.Enable Gateway only CA Bundles and custom CA CM name
To avoid conflicts with user-managed control-planes, set a custom name for the CA bundle configmaps for the Istio control-plane that the operator manages. Also, configure Istio to inject the configmaps only into namespaces where gateways exist in order to avoid polluting the whole cluster.
Set one new environment variable in the Istio CR:
Set the Istio CR's
trustBundleNameglobal value to match the custom configmap name. This change requires bumping the sail-operator API:This change is related to OSSM-9076.
This change incorporates #1209.
Don't copy labels or annotations
Configure Istiod not to copy annotations or labels from gateways onto associated resources, such as the proxy deployment and load-balancer service for a gateway.
This copying behavior is Istio-specific, not part of the Gateway API spec, and could be used to inject unsupported configuration. For example, an end-user could set a service annotation on the gateway in order to configure a load-balancer. Setting annotations on the gateway to configure the load-balancer would not be portable to other Gateway API implementations and would complicate product support.
One new environment variable is set:
This change is related to OSSM-8989.
Delete old controller-mode setting
Delete the obsolete
PILOT_ENABLE_GATEWAY_CONTROLLER_MODEenvironment variable from the Istiod configuration. This environment variable is no longer recognized in OSSM 3, and the variable has been superseded by EnhancedResourceScoping.assertDNSRecord: Increase timeout to 10mIncrease the timeout in
assertDNSRecordfor polling for the DNSRecord CR from 1 minute to 10 minutes.The cloud provider can easily take over a minute to provision the load balancer, and the operator cannot create the DNSRecord CR before the load balancer has been provisioned and assigned a host name or address. Consequently, the polling loop could easily reach the 1-minute timeout just on account of the time that it takes to provision the load balancer.
testGatewayAPIManualDeployment: Increase timeoutIncrease the timeout for polling the gateway, and dump the gateway if the test fails.