OCPBUGS-9037, OCPBUGS-64565: Use cluster wildcard certificate for ingress canary#1155
OCPBUGS-9037, OCPBUGS-64565: Use cluster wildcard certificate for ingress canary#1155rfredette wants to merge 1 commit intoopenshift:masterfrom
Conversation
|
@rfredette: This pull request references Jira Issue OCPBUGS-9037, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
4f7f29d to
a7792fa
Compare
|
test failures appear unrelated. |
|
/assign @Miciah |
|
/retest-required |
There was a problem hiding this comment.
@rfredette I've had these comments pending for some time. Please let me know if I've misunderstood the assignment here, but in some places it looks like we are just reusing the default cert. Was that the plan?
| UID: daemonset.UID, | ||
| Controller: &trueVar, | ||
| } | ||
| if _, err := r.ensureDefaultCertificateForIngress(ca, "openshift-ingress-canary", canaryRef, ingress); err != nil { |
There was a problem hiding this comment.
Doesn't there need to be a different function to ensure a canary cert rather than ensure a default cert? Does this ensure the correct cert?
There was a problem hiding this comment.
The intent is that the canary application should use the default IngressController's default certificate. This way, as long as the default IngressController has a correctly configured default certificate, so too will the canary application. Because #978 changed the canary application to use a TLS passthrough route, the only way to have the canary application use the default IngressController's default certificate is to copy that certificate to the canary application's namespace and configure the application to use that copy of the certificate.
If I understand correctly, ensureDefaultCertificateForIngress actually generates a new server certificate, using the existing CA certificate, so this logic doesn't quite implement the intent. We could use r.ensureDefaultCertificateForIngress(ca, ingress.Namespace, ref, ingress) (note the namespace) to get the existing certificate (or create it if it's missing), but it seems simpler just to do a Get from ingress.Namespace and then Create in "openshift-ingress-canary".
|
/restest-required |
|
Failure in infra: /test e2e-hypershift |
|
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
|
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
|
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
|
@openshift-bot: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@rfredette: This pull request references Jira Issue OCPBUGS-9037. The bug has been updated to no longer refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/reopen |
|
@rfredette: This pull request references Jira Issue OCPBUGS-9037, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. This pull request references Jira Issue OCPBUGS-64565, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@Miciah: This pull request references Jira Issue OCPBUGS-9037, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: This pull request references Jira Issue OCPBUGS-64565, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest e2e-aws-operator |
|
@candita: The The following commands are available to trigger optional jobs: Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
The bug OCPBUGS-9037 mentioned console health check as well, seems this fix is just for ingress canary, I'm wondering how to fix the console route ? |
768f6b8 to
595b17c
Compare
| return true | ||
| } | ||
|
|
||
| func (r *reconciler) canarySecretName(Namespace string) (types.NamespacedName, error) { |
There was a problem hiding this comment.
nit: it's a little weird that namespace is capitalized here.
| } | ||
|
|
||
| volumes := daemonset.Spec.Template.Spec.Volumes | ||
| secretMode := int32(0420) |
There was a problem hiding this comment.
Do you know why the defaultMode has to be set for the test but not in the function desiredCanaryDaemonSet?
There was a problem hiding this comment.
I'm setting the default mode on the expected result here. Default mode is set in the daemonset manifest here for the desired daemonset
| Namespace: canaryCertName.Namespace, | ||
| OwnerReferences: []metav1.OwnerReference{canaryRef}, | ||
| } | ||
| if err := r.client.Create(ctx, canaryCert); err != nil { |
There was a problem hiding this comment.
I might be way off, but I'm not sure why you can't just store the name of the default cert instead of creating a copy of it? I defer to @Miciah on this.
There was a problem hiding this comment.
We need to copy the secret's data so that the canary application uses the exact certificate that is used as the default IngressController's default certificate. That is, we specifically need canaryCert.Data to match defaultCert.Data (and it makes sense to set canaryCert.Type to defaultCert.Type as well). The name doesn't matter, other than that the copy needs to be in the same namespace as the canary daemonset and needs to match whatever name the canary daemonset specifies in its volume.
595b17c to
3ecac04
Compare
|
@rfredette: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Miciah
left a comment
There was a problem hiding this comment.
The commit message is missing a body. Please copy and paste the PR description into the commit message and add a link to https://issues.redhat.com/browse/OCPBUGS-9037 (feel free to do more, but I think that that would be good enough).
There was a problem hiding this comment.
The comment at the top of this file should be updated to document that the controller copies the certificate for the canary... though the more I think about it, the more I think we should have a separate controller for this purpose.
| // The ingress canary verifies that the default ingress controller is functioning. Since it uses a | ||
| // passthrough route, we mirror the default ingress controller's certificate in the canary to detect any | ||
| // issues with that certificate, so if the default controller's cert is updated, update the canary's cert as | ||
| // well. | ||
| if ingress.Name == manifests.DefaultIngressControllerName { | ||
| log.Info("Ensuring canary certificate") | ||
| daemonset := &appsv1.DaemonSet{} | ||
| err = r.client.Get(ctx, controller.CanaryDaemonSetName(), daemonset) |
There was a problem hiding this comment.
Do we need a watch on secrets so that we update the canary's copy when the cluster-admin updates the original secret? Maybe re-using this controller isn't the best approach. I suppose we can refactor later on as a follow-up.
| // All ingresses should have a deployment, so this one may not have been | ||
| // created yet. Retry after a reasonable amount of time. |
There was a problem hiding this comment.
This comment looks like copypasta.
| canaryCertName := controller.RouterEffectiveDefaultCertificateSecretName(ingress, daemonset.Namespace) | ||
| canaryRef := metav1.OwnerReference{ | ||
| APIVersion: "apps/v1", | ||
| Kind: "Daemonset", |
There was a problem hiding this comment.
The kind is miscapitalized here:
| Kind: "Daemonset", | |
| Kind: "DaemonSet", |
Does using "Daemonset" actually work? I don't know how forgiving the API server [or garbage collector] is.
| if err := r.client.Get(ctx, defaultCertName, defaultCert); err != nil { | ||
| errs = append(errs, fmt.Errorf("failed to get certificate for canary: %w", err)) | ||
| } | ||
| canaryCert := defaultCert.DeepCopy() |
There was a problem hiding this comment.
We really only care about defaultCert.Data and defaultCert.SecretType, right?
I suppose using DeepCopy will work as you stomp ObjectMeta and the API server doesn't send StringData; using DeepCopy just does some extra (unnecessary) work.
| errs = append(errs, fmt.Errorf("failed to get certificate for canary: %w", err)) | ||
| } | ||
| canaryCert := defaultCert.DeepCopy() | ||
| canaryCertName := controller.RouterEffectiveDefaultCertificateSecretName(ingress, daemonset.Namespace) |
There was a problem hiding this comment.
This should work, but is there a reason to copy the name from the effective certificate? If you used a static name, you could get rid of canarySecretName in pkg/operator/controller/canary/daemonset.go and simplify the logic a bit.
There was a problem hiding this comment.
One consequence of copying the name is that you can end up with multiple secrets in the canary namespace if IngressController.spec.defaultCertificate is updated.
For example, the TestUpdateDefaultIngressControllerSecret test updates the default certificate from the default "router-certs-default" secret to a "test-xyz" secret (where "xyz" is a randomly generated suffix) and then reverts the default certificate back to "router-certs-default". As a consequence the CI artifacts have both a "router-certs-default" secret and a "test-xyz" secret in the canary namespace.
If the canary daemonset were ever deleted, the owner reference would cause these secrets to be cleaned up, but otherwise you can accumulate these secrets. This isn't a major problem, but it does create some unnecessary cruft.
There was a problem hiding this comment.
Well... I wonder whether the issue referenced at #1155 (comment) will prevent garbage collection from working properly?
|
Looking at the ingress-operator logs in the e2e-aws-operator artifacts, I see some repeated errors: |
|
Closing in favor of #1334 |
|
@rfredette: This pull request references Jira Issue OCPBUGS-9037. The bug has been updated to no longer refer to the pull request using the external bug tracker. This pull request references Jira Issue OCPBUGS-64565. The bug has been updated to no longer refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Utilize the existing ingress controller certificate management controller to also manage the certificate for the ingress canary, and use that certificate when serving the canary endpoint.