OCPBUGS-9037: Change Canary to use passthrough route#978
Conversation
bc30df7 to
5af9bc9
Compare
3aec956 to
591e372
Compare
|
Looking through the logs from the latest |
|
On further investigation, there is a canary related failure in
Despite getting what should be a valid response, the test fails. |
|
@rfredette: This pull request references Jira Issue OCPBUGS-9037, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@rfredette: This pull request references Jira Issue OCPBUGS-9037, which is invalid:
Comment DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/jira refresh |
|
@rfredette: This pull request references Jira Issue OCPBUGS-9037, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
the ovn failures require the fix in openshift/origin#28301 or something similar, but |
|
The origin test fix has merged |
|
/retest |
1 similar comment
|
/retest |
591e372 to
c350a7e
Compare
|
Given the change from re-encrypt to passthrough why do we need to do the things you highlight in commit c350a7e? What breaks, given the switch to passhthrough, if you don't do the additional reconciliation logic? |
|
/assign @frobware |
20e8af7 to
991a3a7
Compare
|
test failures look unrelated. |
|
/retest |
frobware
left a comment
There was a problem hiding this comment.
I tested by watching for port rotations and with canaryCheckFrequency=8s and canaryCheckCycleCount=1 based on the conditional check you have to wait 16s for the port to rotate. Is that intentional or, based on my parameters, should it rotate every 8s?
|
TestAllowedSourceRangesStatus - could be fixed by #1087. |
991a3a7 to
d29478a
Compare
|
/lgtm |
|
My LGTM didn't seem to stick, trying again... /lgtm |
| case intstr.Int: | ||
| if newRoute.Spec.Port.TargetPort.IntVal == portNum { | ||
| isValidPort = true | ||
| break |
There was a problem hiding this comment.
This break doesn't do what you probably think it does—it breaks out of the switch, not out of the for loop.
| // If newRoute's port is one of the canary service ports, the change may just be the ingress operator | ||
| // rotating canary ports. If the *only* change between oldRoute and newRoute is the port change, skip | ||
| // reconciling. | ||
| oldRoute.Spec.Port = newRoute.Spec.Port |
There was a problem hiding this comment.
Ah, the assignment is so that the cmp.Equal effectively ignores a difference in Spec.Port, right? Using something like cmp.Equal(oldRoute.Spec, newRoute.Spec, cmpopts.IgnoreFields(routev1.RouteSpec{}, "Port")) would be a little more obvious.
|
#1087 has merged. |
TestClientTLS, TestMTLSWithCRLs, and TestRouterCompressionOperation queried the canary route to test various router config options. However, with the canary being switched to use a passthrough route, many router config options no longer apply. This commit changes each test to deploy their own backend and route. TestClientTLS and TestMTLSWithCRLs were both changed to use the echo pod backend, which echoes the contents of the request back to the client. TestRouterCompressionOperation requires that the response have a specific content-type header, which the echo pod doesn't provide. It now deploys a httpd backend, which does include the content-type header, and has been updated to enable compression on the appropriate content-type. This is part of the fix for OCPBUGS-9037
Edge terminated TLS is subject to the ingress controller's client TLS (mTLS) requirements. When mTLS is required, the client is also required to provide a TLS certificate and key, but there is currently no way to provide a client certificate or key to the ingress operator, causing canary health checks to fail when mTLS is required. To allow the canary healthchecks to work when mTLS is required, this commit changes the canary to serve TLS and changes the canary route to use passthrough encryption, and adds a test to verify that enabling mTLS doesn't cause canary checks to fail. The canary service has been updated to have the service.beta.openshift.io/serving-cert-secret-name annotation, which prompts the auto-generation of a secret with a certificate and key for the canary to use for TLS. It also now serves on port 8443 instead of 8080. Previously, reconciliation of the canary service was limited to making sure it existed, but now the operator checks for changes to the service's annotations and ports, and updates them to the desired values if necessary. In order to make the serving certificate available to the canary, the canary daemonset has been updated to mount the relevant secret, and set the environment variables TLS_CERT and TLS_KEY to the full path to the serving certificate and key, respectively. In order to make sure this secret is available for the canary process, the reconciliation logic has also been updated to keep the environment variables, volumes, and volume mounts at their desired values. The operator now watches the canary daemonset and service, and reconciles canary resources when either changes. TestCanaryWithMTLS enables mTLS on the default ingress controller, then verifies that the CanaryChecksSucceeding condition remains true. This is part of the fix for OCPBUGS-9037 Co-authored-by: Andrew McDermott <frobware@users.noreply.github.com>
d29478a to
478b9bc
Compare
|
Thanks! |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Miciah The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
e2e-aws-operator failed during startup: /test e2e-aws-operator |
|
@rfredette: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@rfredette: Jira Issue OCPBUGS-9037: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-9037 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[ART PR BUILD NOTIFIER] This PR has been included in build ose-cluster-ingress-operator-container-v4.17.0-202406132111.p0.g4d654d3.assembly.stream.el9 for distgit ose-cluster-ingress-operator. |
|
Fix included in accepted release 4.22.0-0.nightly-2026-01-28-225830 |
When the default ingress controller is configured to use mTLS, connecting to edge or reencrypt routes requires the client to have a valid certificate/key pair. There is no way for a user to provide a client certificate or key to the ingress operator, so canary checks using edge encryption fail. This PR changes the canary route to be passthrough and has the canary host handle the TLS handshake, allowing it to function even if mTLS is otherwise required.