Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

Helm operator on AKS randomly deletes releases #1447

Closed
brantb opened this issue Oct 15, 2018 · 3 comments
Closed

Helm operator on AKS randomly deletes releases #1447

brantb opened this issue Oct 15, 2018 · 3 comments
Labels
FAQ Issues that come up a lot helm

Comments

@brantb
Copy link
Contributor

brantb commented Oct 15, 2018

When Tiller runs in AKS (Azure's hosted Kubernetes service), it sometimes gets into an inconsistent state because of a combination of a known networking issue in AKS (Azure's hosted Kubernetes service) and client-go not handling intermittent network failures nicely.

This causes the helm operator to occasionally reinstall a release (instead of upgrading it as it should), which fails because the release already exists. The helm operator then purges the "failed" release.

A workaround for this (until the AKS team can apply it globally) is to apply the environment variables mentioned in Azure/AKS#676 to the helm operator pod.

@brantb
Copy link
Contributor Author

brantb commented Oct 15, 2018

#1446 adds an extraEnvs value to the helm chart, so you can apply the workaround like so:

helmOperator:
  extraEnvs:
  - name: KUBERNETES_PORT_443_TCP_ADDR
    value: <your-fqdn-prefix>.hcp.<region>.azmk8s.io
  - name: KUBERNETES_PORT
    value: tcp://<your-fqdn-prefix>.hcp.<region>.azmk8s.io:443
  - name: KUBERNETES_PORT_443_TCP
    value: tcp://<your-fqdn-prefix>.hcp.<region>.azmk8s.io:443
  - name: KUBERNETES_SERVICE_HOST
    value: <your-fqdn-prefix>.hcp.<region>.azmk8s.io

@stefanprodan stefanprodan added FAQ Issues that come up a lot helm labels Oct 16, 2018
@squaremo
Copy link
Member

Are there any fixes we can apply within helm-op itself?

stefanprodan added a commit that referenced this issue Nov 18, 2018
- purge a Helm release only if there is a single revision and that one failed
- prevent Helm release deletion if Kubernetes API connectivity is flaky
- fix #1524 #1447
@stefanprodan
Copy link
Member

Resolved in #1530

hiddeco pushed a commit that referenced this issue Nov 20, 2018
- purge a Helm release only if there is a single revision and that one failed
- prevent Helm release deletion if Kubernetes API connectivity is flaky
- fix #1524 #1447

(cherry picked from commit 9571c6a)
hiddeco pushed a commit to hiddeco/flux that referenced this issue Nov 20, 2018
- purge a Helm release only if there is a single revision and that one failed
- prevent Helm release deletion if Kubernetes API connectivity is flaky
- fix fluxcd#1524 fluxcd#1447

(cherry picked from commit 9571c6a)
squaremo pushed a commit that referenced this issue Dec 11, 2018
- purge a Helm release only if there is a single revision and that one failed
- prevent Helm release deletion if Kubernetes API connectivity is flaky
- fix #1524 #1447
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FAQ Issues that come up a lot helm
Projects
None yet
Development

No branches or pull requests

3 participants