Helm operator on AKS randomly deletes releases #1447

brantb · 2018-10-15T17:46:10Z

When Tiller runs in AKS (Azure's hosted Kubernetes service), it sometimes gets into an inconsistent state because of a combination of a known networking issue in AKS (Azure's hosted Kubernetes service) and client-go not handling intermittent network failures nicely.

This causes the helm operator to occasionally reinstall a release (instead of upgrading it as it should), which fails because the release already exists. The helm operator then purges the "failed" release.

A workaround for this (until the AKS team can apply it globally) is to apply the environment variables mentioned in Azure/AKS#676 to the helm operator pod.

brantb · 2018-10-15T19:04:16Z

#1446 adds an extraEnvs value to the helm chart, so you can apply the workaround like so:

helmOperator:
  extraEnvs:
  - name: KUBERNETES_PORT_443_TCP_ADDR
    value: <your-fqdn-prefix>.hcp.<region>.azmk8s.io
  - name: KUBERNETES_PORT
    value: tcp://<your-fqdn-prefix>.hcp.<region>.azmk8s.io:443
  - name: KUBERNETES_PORT_443_TCP
    value: tcp://<your-fqdn-prefix>.hcp.<region>.azmk8s.io:443
  - name: KUBERNETES_SERVICE_HOST
    value: <your-fqdn-prefix>.hcp.<region>.azmk8s.io

squaremo · 2018-10-18T15:42:50Z

Are there any fixes we can apply within helm-op itself?

- purge a Helm release only if there is a single revision and that one failed - prevent Helm release deletion if Kubernetes API connectivity is flaky - fix #1524 #1447

stefanprodan · 2018-11-19T15:47:26Z

Resolved in #1530

- purge a Helm release only if there is a single revision and that one failed - prevent Helm release deletion if Kubernetes API connectivity is flaky - fix #1524 #1447 (cherry picked from commit 9571c6a)

- purge a Helm release only if there is a single revision and that one failed - prevent Helm release deletion if Kubernetes API connectivity is flaky - fix fluxcd#1524 fluxcd#1447 (cherry picked from commit 9571c6a)

- purge a Helm release only if there is a single revision and that one failed - prevent Helm release deletion if Kubernetes API connectivity is flaky - fix #1524 #1447

stefanprodan added FAQ Issues that come up a lot helm labels Oct 16, 2018

sfrique mentioned this issue Nov 16, 2018

Helm operator deletes (reinstall) releases on kubernetes API errors #1524

Closed

stefanprodan added a commit that referenced this issue Nov 18, 2018

Prevent Helm release deletion of installed charts

9571c6a

- purge a Helm release only if there is a single revision and that one failed - prevent Helm release deletion if Kubernetes API connectivity is flaky - fix #1524 #1447

stefanprodan mentioned this issue Nov 18, 2018

Prevent Helm release deletion of installed charts #1530

Merged

stefanprodan closed this as completed Nov 19, 2018

squaremo pushed a commit that referenced this issue Dec 11, 2018

Prevent Helm release deletion of installed charts

9858555

- purge a Helm release only if there is a single revision and that one failed - prevent Helm release deletion if Kubernetes API connectivity is flaky - fix #1524 #1447

ellieayla mentioned this issue Jan 11, 2019

Flux can get stuck, producing no output/work, with no liveness check on AKS #1648

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Helm operator on AKS randomly deletes releases #1447

Helm operator on AKS randomly deletes releases #1447

brantb commented Oct 15, 2018

brantb commented Oct 15, 2018

squaremo commented Oct 18, 2018

stefanprodan commented Nov 19, 2018

Helm operator on AKS randomly deletes releases #1447

Helm operator on AKS randomly deletes releases #1447

Comments

brantb commented Oct 15, 2018

brantb commented Oct 15, 2018

squaremo commented Oct 18, 2018

stefanprodan commented Nov 19, 2018