Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrading sometimes requires restarting linkerd-destination #6940

Closed
alpeb opened this issue Sep 21, 2021 · 0 comments · Fixed by #6954 or #11440
Closed

Upgrading sometimes requires restarting linkerd-destination #6940

alpeb opened this issue Sep 21, 2021 · 0 comments · Fixed by #6954 or #11440

Comments

@alpeb
Copy link
Member

alpeb commented Sep 21, 2021

The policy controller currently doesn't detect changes in the certificate used for serving the validation webhook (see seanmonstar/warp#659).
This means that when linkerd was installed with webhookFailurePolicy=Fail, an upgrade that doesn't cause changes in the linkerd-destination manifest (and so doesn't cause it to be restarted) will change that cert but the policy controller will still have loaded the old one, making any attempt to persist a Server to fail.

Repro (tested with edge-21.9.3):

$ linkerd install --set webhookFailurePolicy=Fail | k apply -f -
$ linkerd upgrade | k apply -f -
$ k create ns emojivoto
$ cat << EOF | k apply -f -
apiVersion: policy.linkerd.io/v1alpha1
kind: Server
metadata:
  namespace: emojivoto
  name: emoji-grpc
spec:
  podSelector:
    matchLabels:
      app: emoji-svc
  port: grpc
  proxyProtocol: gRPC
EOF

Error from server (InternalError): error when creating "emoji-policy.yml": Internal error occurred: failed calling webhook "linkerd-policy-validator.linkerd.io": Post "https://linkerd-policy-validator.linkerd.svc:443/?timeout=10s": x509: certificate signed by unknown authority (possibly because of "x509: invalid signature: parent certificate cannot sign this kind of certificate" while trying to verify candidate authority certificate "linkerd-policy-validator.linkerd.svc")

Possible solutions:

  • We could add a checksum/config annotation that ties restarts to the destination pod to changes to the destination-rbac.yaml file (after variables replacement) just like the injector does
  • Have the controller watch the cert directory and trigger a restart when changed
alpeb added a commit that referenced this issue Sep 22, 2021
Fixes #6940

Upon every upgrade the certs used by the policy controller validator
change, but the controller doesn't detect that, which breaks the webhook
requests. As a temporary solution, we force the pod to restart by adding
a `checksum/config` annotation to the manifest, like the injector
currently has.
alpeb added a commit that referenced this issue Sep 23, 2021
Fixes #6940

Upon every upgrade the certs used by the policy controller validator
change, but the controller doesn't detect that, which breaks the webhook
requests. As a temporary solution, we force the pod to restart by adding
a `checksum/config` annotation to the manifest, like the injector
currently has.
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 24, 2021
adleong pushed a commit that referenced this issue Oct 5, 2023
…1440)

Fixes #6940

Added a `checksum/config` annotation into the destination, proxy-injector and tap-injector workloads, whose value is calculated as the SHA256 of the template file containing the TLS cert they depend on. This is necessary so that every time those other files change (they get re-generated on every upgrade or config update via `linkerd upgrade`), the workloads change as well. 
We had this in place before, but with the 2.12 helm charts migrations we dropped it by mistake.

Signed-off-by: Cameron Boulton <[email protected]>
hawkw pushed a commit that referenced this issue Nov 22, 2023
…1440)

Fixes #6940

Added a `checksum/config` annotation into the destination, proxy-injector and tap-injector workloads, whose value is calculated as the SHA256 of the template file containing the TLS cert they depend on. This is necessary so that every time those other files change (they get re-generated on every upgrade or config update via `linkerd upgrade`), the workloads change as well. 
We had this in place before, but with the 2.12 helm charts migrations we dropped it by mistake.

Signed-off-by: Cameron Boulton <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
1 participant