Helm deploy fails on pre-install hooks (AKS only) #455

msiegenthaler · 2018-06-20T14:46:44Z

Our helm deployments fail to install on the AKS cluster. The same charts work fine on other clusters including the ACS cluster.

Reproduction:

[ms@ ~] helm install --name notebook stable/tensorflow-notebook
Error: watch closed before Until timeout

(I chose the tensorflow-notebook chart for the reproduction because it's not huge and easily available. The same thing also happens with other charts)

Tiller log:

+ tiller-deploy-7ccf99cd64-5745j › tiller
tiller-deploy-7ccf99cd64-5745j tiller [main] 2018/06/20 14:35:43 Starting Tiller v2.9.1 (tls=false)
tiller-deploy-7ccf99cd64-5745j tiller [main] 2018/06/20 14:35:43 GRPC listening on :44134
tiller-deploy-7ccf99cd64-5745j tiller [main] 2018/06/20 14:35:43 Probes listening on :44135
tiller-deploy-7ccf99cd64-5745j tiller [main] 2018/06/20 14:35:43 Storage driver is ConfigMap
tiller-deploy-7ccf99cd64-5745j tiller [main] 2018/06/20 14:35:43 Max history per release is 0
tiller-deploy-7ccf99cd64-5745j tiller [tiller] 2018/06/20 14:35:53 preparing install for notebook
tiller-deploy-7ccf99cd64-5745j tiller [storage] 2018/06/20 14:35:53 getting release history for "notebook"
tiller-deploy-7ccf99cd64-5745j tiller [tiller] 2018/06/20 14:35:53 rendering tensorflow-notebook chart using values
tiller-deploy-7ccf99cd64-5745j tiller [tiller] 2018/06/20 14:35:54 performing install for notebook
tiller-deploy-7ccf99cd64-5745j tiller [tiller] 2018/06/20 14:35:54 executing 1 pre-install hooks for notebook
tiller-deploy-7ccf99cd64-5745j tiller [kube] 2018/06/20 14:35:54 building resources from manifest
tiller-deploy-7ccf99cd64-5745j tiller [kube] 2018/06/20 14:35:54 creating 1 resource(s)
tiller-deploy-7ccf99cd64-5745j tiller [kube] 2018/06/20 14:36:54 Watching for changes to Secret notebook-tensorflow-notebook with timeout of 5m0s
tiller-deploy-7ccf99cd64-5745j tiller [tiller] 2018/06/20 14:36:54 warning: Release notebook pre-install tensorflow-notebook/templates/secrets.yaml could not complete: watch closed before Until timeout
tiller-deploy-7ccf99cd64-5745j tiller [tiller] 2018/06/20 14:36:54 failed install perform step: watch closed before Until timeout

The secret did get created during this:

[ms@ ~] kubectl get secret
NAME                           TYPE                                  DATA      AGE
default-token-vvtqq            kubernetes.io/service-account-token   3         3h
notebook-tensorflow-notebook   Opaque                                1         6m

We reproduced the issue on three separate AKS clusters, all on kubernetes 1.9.6 in west and north europe. We tested with helm 2.7.0, 2.9.0, 2.9.1.
As said above, the same works without issues in our ACS cluster and on multiple terraform based ones (Kubernetes versions 1.7.7 (ACS), 1.8.5, 1.8.4).

Charts without a pre-install hook (or maybe other hooks, didn't isolate that) deploy without any issue on the AKS cluster. The cluster appears to be fine, calls via kubectl work, kubectl port-forward also works, helm list works.

The text was updated successfully, but these errors were encountered:

m1o1 · 2018-06-20T17:27:35Z

Same issue (referenced above: istio/istio#6301) while helm installing Istio. In my case it was failing on post-install hooks. Could this be related to AdmissionControllers?

twendt · 2018-06-21T12:05:22Z

This is related to #408

We had our clusters patched yesterday and now helm pre-install hooks are working again.

m1o1 · 2018-06-21T14:12:57Z

Just tried on a new centralus cluster and it still fails

rite2nikhil · 2018-06-22T04:49:47Z

All (in all regions) new cluster creates should be patched with the fix. thanks for your patience, please report if you still see issues. Existing clusters will eventually get patched.

m1o1 · 2018-06-22T12:46:40Z

I just tried again with Istio on a new cluster in eastus with rbac enabled, and I'm still seeing this issue. Here are the relevant logs for the tiller:

[tiller] 2018/06/22 12:38:30 executing 2 post-install hooks for istio
[kube] 2018/06/22 12:38:30 building resources from manifest
[kube] 2018/06/22 12:38:30 creating 1 resource(s)
[kube] 2018/06/22 12:38:31 Watching for changes to Job istio-cleanup-old-ca with timeout of 5m0s
[kube] 2018/06/22 12:38:31 Add/Modify event for istio-cleanup-old-ca: ADDED
[kube] 2018/06/22 12:38:31 istio-cleanup-old-ca: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
[tiller] 2018/06/22 12:39:31 warning: Release istio post-install istio/charts/security/templates/cleanup-old-ca.yaml could not complete: watch closed before Until timeout
[tiller] 2018/06/22 12:39:31 warning: Release "istio" failed post-install: watch closed before Until timeout
[storage] 2018/06/22 12:39:31 updating release "istio.v1"
[tiller] 2018/06/22 12:39:31 failed install perform step: watch closed before Until timeout

Edit* Just tried installing the tensorflow notebook, which succeeded. Strange that Istio still fails with this.
Edit2* Same issue on centralus

ChingizMard · 2018-06-22T14:56:52Z

I happen to have the similar issue on AKS with post-install hooks but while installing cockroachdb. Seems like an exactly same issue as @andrew-dinunzio is having. Here are the logs:

[tiller] 2018/06/22 14:50:34 executing 2 post-install hooks for cockroachdb
[kube] 2018/06/22 14:50:34 building resources from manifest
[kube] 2018/06/22 14:50:34 creating 1 resource(s)
[kube] 2018/06/22 14:50:34 Watching for changes to Job cockroachdb-cockroachdb-init with timeout of 5m0s
[kube] 2018/06/22 14:50:34 Add/Modify event for cockroachdb-cockroachdb-init: ADDED
[kube] 2018/06/22 14:50:34 cockroachdb-cockroachdb-init: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
[kube] 2018/06/22 14:50:34 Add/Modify event for cockroachdb-cockroachdb-init: MODIFIED
[kube] 2018/06/22 14:50:34 cockroachdb-cockroachdb-init: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
[tiller] 2018/06/22 14:51:34 warning: Release cockroachdb post-install cockroachdb/templates/cluster-init.yaml could not complete: watch closed before Until timeout
[tiller] 2018/06/22 14:51:34 warning: Release "cockroachdb" failed post-install: watch closed before Until timeout
[storage] 2018/06/22 14:51:34 updating release "cockroachdb.v1"
[tiller] 2018/06/22 14:51:34 failed install perform step: watch closed before Until timeout

rite2nikhil · 2018-06-22T16:13:03Z

This is caused by the idle timeout on watches, currently this is expected to be ~60s, aware of this issue, partial fix rolling out next week, working at high priority on a full fix.

Thanks for reporting the issue.

msiegenthaler · 2018-06-22T16:19:14Z

Works for me now after upgrading the cluster to 1.10.3

rlucioni · 2018-06-22T18:23:25Z

The 60-second idle timeout on watches is also preventing our post-install hooks from succeeding. Our cluster is running 1.9.6.

m1o1 · 2018-06-22T18:28:56Z

We tested with 1.10.3 and it still fails. It's a configuration of the api-server that's responsible for this I think.

m1o1 · 2018-06-25T13:49:54Z

There's another issue with Istio that might be related to this watch timeout issue. Looking at the logs for the mixer with kubectl logs -n istio-system istio-telemetry-54b5bf4847-l9qqt mixer, I see the following logs:

2018-06-25T10:17:50.439694Z     error   istio.io/istio/mixer/pkg/config/crd/store.go:169: Failed to watch *unstructured.Unstructured: the server was unable to return a response in the time allotted, but may still be processing the request (get kuberneteses.config.istio.io)

The container never crashes - it just repeats this error repeatedly. Does not happen on minikube.

rite2nikhil · 2018-07-05T02:02:21Z

We have rolled out fixes, please provide feedback if there are issues with new cluster creates

StianOvrevage · 2018-07-05T11:03:48Z

We had problems with 1.10.3 and the prometheus-operator chart. Will try again today on a new cluster.

prometheus-operator/prometheus-operator#1514 (comment)

m1o1 · 2018-07-05T13:01:45Z

We still have the watch timeout issue for a default installation of Istio (rbac enabled in AKS).

Here are the logs:

$ k logs -n kube-system tiller-deploy-84f4c8bb78-klc5h
[main] 2018/07/05 14:49:39 Starting Tiller v2.8.2 (tls=false)
[main] 2018/07/05 14:49:39 GRPC listening on :44134
...
[tiller] 2018/07/05 14:49:58 executing 2 post-install hooks for istio
[kube] 2018/07/05 14:49:58 building resources from manifest
[kube] 2018/07/05 14:49:58 creating 1 resource(s)
[kube] 2018/07/05 14:49:59 Watching for changes to Job istio-cleanup-old-ca with timeout of 5m0s
[kube] 2018/07/05 14:49:59 Add/Modify event for istio-cleanup-old-ca: ADDED
[kube] 2018/07/05 14:49:59 istio-cleanup-old-ca: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
[tiller] 2018/07/05 14:50:59 warning: Release istio post-install istio/charts/security/templates/cleanup-old-ca.yaml could not complete: watch closed before Until timeout
[tiller] 2018/07/05 14:50:59 warning: Release "istio" failed post-install: watch closed before Until timeout
[storage] 2018/07/05 14:50:59 updating release "istio.v1"
[tiller] 2018/07/05 14:50:59 failed install perform step: watch closed before Until timeout

Watch still closes after 1 minute.

Edit* Even though the 1 minute timeout issue is still not working, the issue with the mixer above (about watching *unstructured.Unstructured) seems to be working now.

I think I did find a hacky work-around for now where if I make sure the image for the post-install job exists on every node before it runs (by creating a DaemonSet with the image and resetting the command to an infinite loop that just sleeps), the job takes less than a minute, so we won't get the watch timeout fail. Not ideal, but wanted to report my findings.

StianOvrevage · 2018-07-05T13:06:14Z

Created a new AKS cluster now and try to install prometheus-operator:

$ helm repo add coreos https://s3-eu-west-1.amazonaws.com/coreos-charts/stable/

$ helm install coreos/prometheus-operator --name prometheus-operator --set serviceMonitorsSelector="{}" --debug
[debug] Created tunnel using local port: '4917'

[debug] SERVER: "127.0.0.1:4917"

[debug] Original chart version: ""
[debug] Fetched coreos/prometheus-operator to /root/.helm/cache/archive/prometheus-operator-0.0.27.tgz

[debug] CHART PATH: /root/.helm/cache/archive/prometheus-operator-0.0.27.tgz

Error: watch closed before Until timeout

blackbaud-brandonstirnaman · 2018-07-08T16:29:49Z

Can confirm @StianOvrevage's comment. Cluster created yesterday (1.10.3) fails the same.

[debug] Created tunnel using local port: '59464'

[debug] SERVER: "localhost:59464"

[debug] Original chart version: ""
[debug] Fetched coreos/prometheus-operator to /Users/brandon.stirnaman/.helm/cache/archive/prometheus-operator-0.0.27.tgz

[debug] CHART PATH: /Users/brandon.stirnaman/.helm/cache/archive/prometheus-operator-0.0.27.tgz

Error: watch closed before Until timeout

StianOvrevage · 2018-07-16T09:53:06Z

Looks like this might be working on 1.10.5:

$ helm install coreos/prometheus-operator --name prometheus-operator
NAME:   prometheus-operator
LAST DEPLOYED: Mon Jul 16 11:48:55 2018
NAMESPACE: default
STATUS: DEPLOYED

m1o1 · 2018-07-17T13:27:12Z

Istio seems to now get past the "watch closed before timeout" issue, but still fails with "timed out waiting for condition". Tried with Helm versions 2.8.2 (with kubectl client version 1.9.1) and version 2.9.1 (with kubectl client version 1.10.5). I gave it a 16 minute timeout window. Logs of the tiller:

$ k logs -n kube-system tiller-deploy-5c688d5f9b-mltcv
[main] 2018/07/17 13:05:25 Starting Tiller v2.9.1 (tls=false)
--snip--
[tiller] 2018/07/17 13:07:59 executing 2 post-install hooks for istio
[kube] 2018/07/17 13:07:59 building resources from manifest
[kube] 2018/07/17 13:07:59 creating 1 resource(s)
[kube] 2018/07/17 13:07:59 Watching for changes to Job istio-cleanup-old-ca with timeout of 16m40s
[kube] 2018/07/17 13:07:59 Add/Modify event for istio-cleanup-old-ca: ADDED
[kube] 2018/07/17 13:07:59 istio-cleanup-old-ca: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
[tiller] 2018/07/17 13:24:39 warning: Release istio post-install istio/charts/security/templates/cleanup-old-ca.yaml could not complete: timed out waiting for the condition
[tiller] 2018/07/17 13:24:39 warning: Release "istio" failed post-install: timed out waiting for the condition
[storage] 2018/07/17 13:24:39 updating release "istio.v1"
[tiller] 2018/07/17 13:24:39 failed install perform step: timed out waiting for the condition

Works on Minikube. Here are the logs for Minikube (Minikube 0.25.2 Kubernetes 1.9.4 and Helm 2.9.1. I can't get a 1.10.0 minikube cluster started on Windows for some reason):

[tiller] 2018/07/17 13:42:22 executing 2 post-install hooks for istio
[kube] 2018/07/17 13:42:22 building resources from manifest
[kube] 2018/07/17 13:42:22 creating 1 resource(s)
[kube] 2018/07/17 13:42:22 Watching for changes to Job istio-cleanup-old-ca with timeout of 16m40s
[kube] 2018/07/17 13:42:22 Add/Modify event for istio-cleanup-old-ca: ADDED
[kube] 2018/07/17 13:42:22 istio-cleanup-old-ca: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
[kube] 2018/07/17 13:44:16 Add/Modify event for istio-cleanup-old-ca: MODIFIED
[tiller] 2018/07/17 13:44:16 deleting post-install hook istio-mixer-post-install for release istio due to "before-hook-creation" policy
[kube] 2018/07/17 13:44:16 Starting delete for "istio-mixer-post-install" Job
[kube] 2018/07/17 13:44:16 Using reaper for deleting "istio-mixer-post-install"
[kube] 2018/07/17 13:44:16 jobs.batch "istio-mixer-post-install" not found
[kube] 2018/07/17 13:44:16 building resources from manifest
[kube] 2018/07/17 13:44:16 creating 1 resource(s)
[kube] 2018/07/17 13:44:16 Watching for changes to Job istio-mixer-post-install with timeout of 16m40s
[kube] 2018/07/17 13:44:16 Add/Modify event for istio-mixer-post-install: ADDED
[kube] 2018/07/17 13:44:16 istio-mixer-post-install: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
[kube] 2018/07/17 13:44:16 Add/Modify event for istio-mixer-post-install: MODIFIED
[kube] 2018/07/17 13:44:16 istio-mixer-post-install: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
[kube] 2018/07/17 13:44:26 Add/Modify event for istio-mixer-post-install: MODIFIED
[tiller] 2018/07/17 13:44:26 hooks complete for post-install istio
[tiller] 2018/07/17 13:44:26 deleting post-install hook istio-cleanup-old-ca for release istio due to "hook-succeeded" policy
[kube] 2018/07/17 13:44:26 Starting delete for "istio-cleanup-old-ca" Job
[kube] 2018/07/17 13:44:26 Using reaper for deleting "istio-cleanup-old-ca"
[storage] 2018/07/17 13:44:26 updating release "istio.v1"

Edit* The workaround of deploying a DaemonSet with the ~700MB hyperkube image before helm installing Istio seems to still work though (for client/server: 1.9.1/1.9.6 and 1.10.5/1.10.5). Tiller logs of this success:

[tiller] 2018/07/17 16:53:13 executing 2 post-install hooks for istio
[kube] 2018/07/17 16:53:13 building resources from manifest
[kube] 2018/07/17 16:53:13 creating 1 resource(s)
[kube] 2018/07/17 16:53:13 Watching for changes to Job istio-cleanup-old-ca with timeout of 5m0s
[kube] 2018/07/17 16:53:13 Add/Modify event for istio-cleanup-old-ca: ADDED
[kube] 2018/07/17 16:53:13 istio-cleanup-old-ca: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
[kube] 2018/07/17 16:53:28 Add/Modify event for istio-cleanup-old-ca: MODIFIED
[tiller] 2018/07/17 16:53:28 deleting post-install hook istio-mixer-post-install for release istio due to "before-hook-creation" policy
[kube] 2018/07/17 16:53:36 Starting delete for "istio-mixer-post-install" Job
[kube] 2018/07/17 16:53:36 Using reaper for deleting "istio-mixer-post-install"
[kube] 2018/07/17 16:53:36 jobs.batch "istio-mixer-post-install" not found
[kube] 2018/07/17 16:53:36 building resources from manifest
[kube] 2018/07/17 16:53:36 creating 1 resource(s)
[kube] 2018/07/17 16:53:36 Watching for changes to Job istio-mixer-post-install with timeout of 5m0s
[kube] 2018/07/17 16:53:36 Add/Modify event for istio-mixer-post-install: ADDED
[kube] 2018/07/17 16:53:36 istio-mixer-post-install: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
[kube] 2018/07/17 16:53:36 Add/Modify event for istio-mixer-post-install: MODIFIED
[kube] 2018/07/17 16:53:36 istio-mixer-post-install: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
[kube] 2018/07/17 16:53:41 Add/Modify event for istio-mixer-post-install: MODIFIED
[tiller] 2018/07/17 16:53:41 hooks complete for post-install istio
[tiller] 2018/07/17 16:53:41 deleting post-install hook istio-cleanup-old-ca for release istio due to "hook-succeeded" policy
[kube] 2018/07/17 16:53:41 Starting delete for "istio-cleanup-old-ca" Job
[kube] 2018/07/17 16:53:41 Using reaper for deleting "istio-cleanup-old-ca"

necevil · 2018-07-18T16:48:06Z

I can confirm that (as of today) in my case upgrading to 1.10.5 via the docs over here (https://docs.microsoft.com/en-us/azure/aks/upgrade-cluster) fixed my issue and I was able to install prometheus via helm.

My guess is this one can be closed but it would be great to see if anyone else has some feedback.

StianOvrevage · 2018-07-30T12:17:53Z

This is now failing again on 1.10.5:

helm install coreos/prometheus-operator --name prometheus-operator
Error: watch closed before Until timeout

:(

rite2nikhil · 2018-07-30T14:16:24Z

The fix has not rolled out yet as it did not make it to the realease last week. apologies for the delay, I expect it to start rollout today and rollout to all production regions by end. To expedite patch mailto:[email protected]
with sub id, resource group and resource id(cluster name)

yanivoliver · 2018-08-05T12:59:40Z

@rite2nikhil - The issue is still occurring in a newly created AKS cluster.

[kube] 2018/08/05 12:55:05 Starting delete for "cpanel" Job
[kube] 2018/08/05 12:55:05 Using reaper for deleting "cpanel"
[kube] 2018/08/05 12:55:05 jobs.batch "cpanel" not found
[kube] 2018/08/05 12:55:05 building resources from manifest
[kube] 2018/08/05 12:55:05 creating 1 resource(s)
[kube] 2018/08/05 12:55:05 Watching for changes to Job cpanel with timeout of 0s
[kube] 2018/08/05 12:55:05 Add/Modify event for cpanel: ADDED
[kube] 2018/08/05 12:55:05 cpanel: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
[tiller] 2018/08/05 12:56:05 warning: Release excited-panther post-install data-bootstrap/charts/deployment/cpanel-bootstrap.yaml could not complete: watch closed before Until timeout
[tiller] 2018/08/05 12:56:05 warning: Release "excited-panther" failed post-install: watch closed before Until timeout
[storage] 2018/08/05 12:56:05 updating release "excited-panther.v1"
[tiller] 2018/08/05 12:56:05 failed install perform step: watch closed before Until timeout

yarinm · 2018-08-05T13:03:43Z

This happens to me as well with a a newly created cluster.

ckanljung · 2018-08-06T11:05:40Z

This still happens for me with the post-install stage of prometheus-operator on a newly created cluster. I've tested with both version 1.9.6 and 1.10.6. Region is western-europe.

This worked fine on a newly created cluster (1.9.6) last week though.

rite2nikhil · 2018-08-07T20:47:20Z

The fix for helm/tiller will get rolled out by end of next week, so if this is urgent Please send your cluster info @ [email protected]

twendt · 2018-08-09T11:23:54Z

Is there a way to see that the fix is in when we deploy a new cluster? Is it somehow tied to the acsengineVersion tag?

StianOvrevage · 2018-08-20T09:43:47Z

This still fails on 1.11.1 on a newly created cluster today @rite2nikhil . Does this mean the fix is not fixing it or that this is several different problems or that it does not apply to new clusters yet?

Also agree with @twendt . Being able to see which version gets deployed and maybe a changelog and/or status of known bugs would be nice.

tkaepp · 2018-08-22T12:52:36Z

@StianOvrevage I had the same problem with prometheus-operator. I was able to deploy it successfully with the --debug parameter. Without the parameter it failed after 60s. With the parameter it completed after ~10-15s

Cluster was created yesterday with K8s Version 1.11.1

m1o1 · 2018-08-22T13:03:16Z

@tkaepp is it possible that it succeeded the second time because the image was already cached on the nodes? Kubernetes will keep the image on a node for 5(?) minutes after nothing is using it anymore. If kubernetes does not have to pull the images, some helm installs work (because they're able to complete before the 1 minute timeout). This is actually the basis of a workaround I had to helm install Istio, where I deploy a DaemonSet with the hyperkube image before installing - that way the postinstall jobs can complete much more quickly (and within the 1 minute timeout period).

I suspect that if you tried the --debug parameter on a freshly created AKS cluster, it would fail similarly.

doog33k · 2018-08-22T14:18:20Z

Totaly agree with you @StianOvrevage .
I just updated my cluster to 1.11.1 and did a test, and I'm still not able to do my helm upgrade because of
"Error: UPGRADE FAILED: watch closed before Until timeout"

jskulavik · 2018-08-22T14:25:11Z

@tkaepp and @plc402 This is a known issue that in most cases can be quickly remedied. Please file a support request via portal.azure.com and link to this issue. In doing so, we will be able to provide a fix. Thank you!

doog33k · 2018-08-23T15:09:14Z

@jskulavik
Thank, I did that yesterday, but right now, they answered me the following :
"With Helm being a 3rd party product I am not sure of it’s behavior, but my best guess is that the watch has a timeout of 1m or 60s and it timed out before the operation completed on the Azure side, but the error it gives is misleading as the upgrade did continue and complete on the Azure side."

jskulavik · 2018-08-24T22:38:13Z

Hi @plc402,

Please reply to support requesting that they assign the case to me and we will look at your cluster. Thank you.

m1o1 · 2018-09-06T18:58:22Z

Tested with a new cluster today on Kubernetes 1.11.2, Helm 2.10.0 and Istio 1.0.1, and the helm install worked (without the DaemonSet workaround)!

Still having issues with watches and listing resources (interacting with the apiserver in general), but this issue in particular (helm installing with postinstall jobs) seems fixed for me.

Edit* Region was centralus

nphmuller · 2018-09-13T08:51:14Z

~~Can also confirmed this is fixed. Upgrading to 1.11.2 wasn't enough, I had to redeploy the cluster.~~

~~Update: Region is West-Europe~~

Nevermind it's happening again:

E0914 09:05:33.142333       1 streamwatcher.go:109] Unable to decode an event from the watch stream: stream error: stream ID 131; INTERNAL_ERROR
[tiller] 2018/09/14 09:05:33 warning: Release myapp post-install myapp/templates/job.yaml could not complete: watch closed before Until timeout
[tiller] 2018/09/14 09:05:33 warning: Release "myapp" failed post-install: watch closed before Until timeout
[storage] 2018/09/14 09:05:33 updating release "myapp.v1"
[tiller] 2018/09/14 09:05:33 failed install perform step: watch closed before Until timeout

StianOvrevage · 2018-09-13T11:13:09Z

Not fixed here on 1.11.2 on a cluster created 16 minutes ago.

helm install coreos/prometheus-operator --name prometheus-operator
Error: watch closed before Until timeout

bash-4.4# kubectl get nodes
NAME                       STATUS    ROLES     AGE       VERSION
aks-nodepool1-28497563-0   Ready     agent     16m       v1.11.2

nemzes · 2018-09-18T08:13:41Z

Just created a cluster with 1.11.2; it worked this time. 🤷‍♂️

StianOvrevage · 2018-10-09T08:43:21Z

Fails:

Installing helm chart for prometheus-operator
helm install coreos/prometheus-operator --name prometheus-operator
Error: watch closed before Until timeout

jskulavik · 2018-10-09T15:19:43Z

Please see issue #676 which we are actively working to address.

StianOvrevage · 2018-10-09T19:19:48Z

Thanks @jskulavik . Any idea on how to implement the workaround with setting the KUBERNETES_* env vars on Helm? Add them to the tiller Deployment maybe?

jskulavik · 2018-10-09T19:26:19Z

Hi @StianOvrevage, yes, that would be the best place to start in this case given that you're running into Helm issues

f4tq · 2018-10-16T00:48:32Z

I was hitting this watch closed before Until timeout with aks 1.11.3, tiller 2.11 when I created the cluster with az aks create --node-name xxx . ....
The moment I removed --node-name, istio 1.0.1 installed cleanly. I had been installing cleanly for a while but got sick of all my cluster worker nodes named aks-nodepool1-xxx (i.e. the nodepool default ) otherwise I got watch timeout...
I hope provides a clue (and workaround for others) .

nphmuller · 2018-10-29T10:40:34Z

This seems to be fixed by: #676 (opt-in preview atm)

jnoller · 2019-04-03T21:50:18Z

Closing this issue as old/stale/resolved.

Note:
If this issue still comes up, please confirm you are running the latest AKS release. If you are on the latest release and the issue can be re-created outside of your specific cluster please open a new github issue.

If you are only seeing this behavior on clusters with a unique configuration (such as custom DNS/VNet/etc) please open an Azure technical support ticket.

msiegenthaler mentioned this issue Jun 20, 2018

Helm install times out due to postinstall fail istio/istio#6301

Closed

gianrubio mentioned this issue Jul 5, 2018

Helm install times out on Azure AKS with RBAC enabled prometheus-operator/prometheus-operator#1514

Closed

m1o1 mentioned this issue Jul 13, 2018

MutatingWebhookConfiguration doesn't work #235

Closed

m1o1 mentioned this issue Aug 2, 2018

Istio Galley crashes repeatedly on 1.0.0 istio/istio#7596

Closed

gurvindersingh mentioned this issue Aug 8, 2018

watch closed before Until timeout helm/helm#2918

Closed

markwaldkat mentioned this issue Aug 20, 2018

helm install breaks nondeterministically #521

Closed

m1o1 mentioned this issue Aug 30, 2018

Performance degradation for high levels of in-cluster kube-apiserver traffic #620

Closed

Turil mentioned this issue Feb 6, 2019

Install fails via Helm and values.yml - Release "istio" failed post-install: timed out waiting for the condition istio/istio#11523

Closed

jnoller closed this as completed Apr 3, 2019

ghost locked as resolved and limited conversation to collaborators Aug 6, 2020

Helm deploy fails on pre-install hooks (AKS only) #455

Helm deploy fails on pre-install hooks (AKS only) #455

Comments

msiegenthaler commented Jun 20, 2018

m1o1 commented Jun 20, 2018 • edited Loading

twendt commented Jun 21, 2018

m1o1 commented Jun 21, 2018 • edited Loading

rite2nikhil commented Jun 22, 2018

m1o1 commented Jun 22, 2018 • edited Loading

ChingizMard commented Jun 22, 2018

rite2nikhil commented Jun 22, 2018

msiegenthaler commented Jun 22, 2018

rlucioni commented Jun 22, 2018 • edited Loading

m1o1 commented Jun 22, 2018 • edited Loading

m1o1 commented Jun 25, 2018

rite2nikhil commented Jul 5, 2018

StianOvrevage commented Jul 5, 2018

m1o1 commented Jul 5, 2018 • edited Loading

StianOvrevage commented Jul 5, 2018

blackbaud-brandonstirnaman commented Jul 8, 2018

StianOvrevage commented Jul 16, 2018

m1o1 commented Jul 17, 2018 • edited Loading

necevil commented Jul 18, 2018 • edited Loading

StianOvrevage commented Jul 30, 2018

rite2nikhil commented Jul 30, 2018

yanivoliver commented Aug 5, 2018

yarinm commented Aug 5, 2018

ckanljung commented Aug 6, 2018 • edited Loading

rite2nikhil commented Aug 7, 2018

twendt commented Aug 9, 2018

StianOvrevage commented Aug 20, 2018

tkaepp commented Aug 22, 2018

m1o1 commented Aug 22, 2018 • edited Loading

doog33k commented Aug 22, 2018

jskulavik commented Aug 22, 2018

doog33k commented Aug 23, 2018

jskulavik commented Aug 24, 2018

m1o1 commented Sep 6, 2018 • edited Loading

nphmuller commented Sep 13, 2018 • edited Loading

StianOvrevage commented Sep 13, 2018

nemzes commented Sep 18, 2018

StianOvrevage commented Oct 9, 2018

jskulavik commented Oct 9, 2018

StianOvrevage commented Oct 9, 2018

jskulavik commented Oct 9, 2018

f4tq commented Oct 16, 2018

nphmuller commented Oct 29, 2018

jnoller commented Apr 3, 2019

m1o1 commented Jun 20, 2018 •

edited

Loading

m1o1 commented Jun 21, 2018 •

edited

Loading

m1o1 commented Jun 22, 2018 •

edited

Loading

rlucioni commented Jun 22, 2018 •

edited

Loading

m1o1 commented Jun 22, 2018 •

edited

Loading

m1o1 commented Jul 5, 2018 •

edited

Loading

m1o1 commented Jul 17, 2018 •

edited

Loading

necevil commented Jul 18, 2018 •

edited

Loading

ckanljung commented Aug 6, 2018 •

edited

Loading

m1o1 commented Aug 22, 2018 •

edited

Loading

m1o1 commented Sep 6, 2018 •

edited

Loading

nphmuller commented Sep 13, 2018 •

edited

Loading