-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Helm HA install w/cert-manager webhook CA fails w/webhook linkerd-proxy-injector error #7699
Comments
Is the issue basically that there's a race between the The namespaceSelector:
matchExpressions:
- key: config.linkerd.io/admission-webhooks
operator: NotIn
values:
- disabled When Linkerd is installed via the CLI, we get a namespace like: apiVersion: v1
kind: Namespace
metadata:
annotations:
linkerd.io/inject: disabled
labels:
config.linkerd.io/admission-webhooks: disabled
kubernetes.io/metadata.name: linkerd
linkerd.io/control-plane-ns: linkerd
linkerd.io/is-control-plane: "true"
name: linkerd Does your linkerd namespace have these labels? |
How is the namespace intended to acquire these annotations and labels in the helm install? Must the namespace be pre-created and pre-configured to support an HA install? Since the crds chart is a prereq for the control-plane chart, the control-plane chart should be able to assume the namespace exists. Therefore, could the control plane chart use a pre-hook to annotate and label the linkerd namespace if needed? In this BYO cert-manager webhook CA scenario, it seems that the cert-manager namespace also needs the disallow admission-webhooks annotation to avoid a similar MutatingWebhookConfiguration race. Would it make sense for the control-plane chart ha-values.yaml to include an optional property containing a list of namespace names that require the disallow admission-webhooks annotation with default value I guess I could have worked-around this issue by also using my custom job above to annotate the linkerd namespace as well. However, I just assumed the namespace annotations and labels were already being taken care of for me since the cli already does it. Finally, how is linkerd-cni namespace supposed to acquire its annotations and labels using the Helm install? |
Thanks for the feedback @ron1 ! This is indeed a problem with the helm charts in edge under HA. As per helm best-practices we stopped having the chart generate the namespace so we can't rely on those labels. But if I'm not mistaking, we can address this by simply adding an objectSelector to the webhook config that would discard any pod with the label
When the webhook race for bringing linkerd-proxy-injector up gets resolved, I think this won't be a problem. Cert-manager might fail starting first, but should quickly converge to a running state as soon as the injector comes up. OTOH kube-system does require to be ignored, otherwise the cluster will easily get deadlocked. We recommend filtering it out somewhere in the docs, but I think it's a good idea to have it done by default in HA. I'll see if there's a standard label identifying kube-system that we can use in the objectSelector to avoid relying on a helm hook.
Sounds like there's an issue there too. We'll need to include it in the objectSelector as well. |
Due to Rancher Fleet bug rancher/fleet#486, I seem to be limited in my ability to control the order in which the cert-manager and linkerd workloads are deployed. And just to confirm, you think the potential race between the linkerd-proxy-injector MutatingWebhookConfiguration resource, the linkerd-proxy-injector webhook cert, the linkerd-proxy-injector deployment, and the cert-manager-cainjector should be insignificant. |
At least on Kubernetes 1.21+, could you exclude |
Ok I didn't have in mind cert-manager-cainjector. Seems like we'll have the injector bypass the entire cert-manager namespace then.
That seems to be the case in all the clusters I've checked 👍 |
FYI, I was able to successfully deploy linkerd workload to linkerd-cni, linkerd, and linkerd-viz namespaces with the following linkerd-control-plane helm values and kustomize post-renderer patch. Thanks much for your assistance. helm values.yaml
kustomization.yaml
|
Sounds good, I'll push a fix taking this into account 👍 |
Fixes #7699 The problem didn't affect 2.11, only latest edges since the Helm charts got split into `linkerd-crds` and `linkerd-control-plane` and we stopped creating the linkerd namespace. With the surrendering of the creation of the namespace, we can no longer guarantee the existence of the `config.linkerd.io/admission-webhooks` label, so this PR creates an `objectSelector` for the injector that filters-out control-plane components, based on the existence of the `linkerd.io/control-plane-component` label. Given we still want the multicluster components to be injected, we had to be rename its `linkerd.io/control-plane-component` label to `component`, following the same convention used by the other extensions. The corresponding Prometheus rule for scraping the service mirrors was updated accordingly. A similar filter was added for the linkerd-cni DaemonSet. Also, now that the `kubernetes.io/metadata.name` is prevalent, we're also using it to filter out the kube-system and cert-manager namespaces. The former namespace was already mentioned in the docs; the latter is also included to avoid having races with cert-manager-cainjector which can be used to provision the injector's cert.
* Fix HA race when installing through Helm Fixes #7699 The problem didn't affect 2.11, only latest edges since the Helm charts got split into `linkerd-crds` and `linkerd-control-plane` and we stopped creating the linkerd namespace. With the surrendering of the creation of the namespace, we can no longer guarantee the existence of the `config.linkerd.io/admission-webhooks` label, so this PR creates an `objectSelector` for the injector that filters-out control-plane components, based on the existence of the `linkerd.io/control-plane-component` label. Given we still want the multicluster components to be injected, we had to be rename its `linkerd.io/control-plane-component` label to `component`, following the same convention used by the other extensions. The corresponding Prometheus rule for scraping the service mirrors was updated accordingly. A similar filter was added for the linkerd-cni DaemonSet. Also, now that the `kubernetes.io/metadata.name` is prevalent, we're also using it to filter out the kube-system and cert-manager namespaces. The former namespace was already mentioned in the docs; the latter is also included to avoid having races with cert-manager-cainjector which can be used to provision the injector's cert.
* Fix HA race when installing through Helm Fixes #7699 The problem didn't affect 2.11, only latest edges since the Helm charts got split into `linkerd-crds` and `linkerd-control-plane` and we stopped creating the linkerd namespace. With the surrendering of the creation of the namespace, we can no longer guarantee the existence of the `config.linkerd.io/admission-webhooks` label, so this PR creates an `objectSelector` for the injector that filters-out control-plane components, based on the existence of the `linkerd.io/control-plane-component` label. Given we still want the multicluster components to be injected, we had to be rename its `linkerd.io/control-plane-component` label to `component`, following the same convention used by the other extensions. The corresponding Prometheus rule for scraping the service mirrors was updated accordingly. A similar filter was added for the linkerd-cni DaemonSet. Also, now that the `kubernetes.io/metadata.name` is prevalent, we're also using it to filter out the kube-system and cert-manager namespaces. The former namespace was already mentioned in the docs; the latter is also included to avoid having races with cert-manager-cainjector which can be used to provision the injector's cert. (cherry picked from commit 539bcce) Signed-off-by: Oliver Gould <[email protected]>
* Fix HA race when installing through Helm Fixes #7699 The problem didn't affect 2.11, only latest edges since the Helm charts got split into `linkerd-crds` and `linkerd-control-plane` and we stopped creating the linkerd namespace. With the surrendering of the creation of the namespace, we can no longer guarantee the existence of the `config.linkerd.io/admission-webhooks` label, so this PR creates an `objectSelector` for the injector that filters-out control-plane components, based on the existence of the `linkerd.io/control-plane-component` label. Given we still want the multicluster components to be injected, we had to be rename its `linkerd.io/control-plane-component` label to `component`, following the same convention used by the other extensions. The corresponding Prometheus rule for scraping the service mirrors was updated accordingly. A similar filter was added for the linkerd-cni DaemonSet. Also, now that the `kubernetes.io/metadata.name` is prevalent, we're also using it to filter out the kube-system and cert-manager namespaces. The former namespace was already mentioned in the docs; the latter is also included to avoid having races with cert-manager-cainjector which can be used to provision the injector's cert.
What is the issue?
Helm HA install w/cert-manager webhook CA fails with all 3 linkerd control plane Deployments reporting
How can it be reproduced?
Perform Helm HA install w/cert-manager managed webhook CA
Logs, error output, etc
output of
linkerd check -o short
Environment
Possible solution
A work-around is to install with MutatingWebhookConfiguration failurePolicy of Ignore and change the policy to Fail after install.
Additional context
No response
Would you like to work on fixing this bug?
no
The text was updated successfully, but these errors were encountered: