-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kube-svc-redirect CrashLoopBackOff #56
Comments
I'm experiencing the same issue. Notably, the kubernetes-dashboard pod is also failing, meaning it is impossible for me to use the dashboard, either through 'kubectl proxy', or 'az aks browse'. |
@smithc @Guillaume-Mayer can you provide more details like name of your resource group and resource. This will enable us to look up logs and aid investigation. |
Hi @amanohar, my resource group is 'cs-kube' and my aks container service name is 'cs-cluster'. Let me know if you need any further information; I'm happy to help. Actually, I just checked my kube-system pods once again (having left the cluster up and running for a few days) - it appears that the kubernetes-dashboard, and kube-svc-redirect containers have finally started working. Here's the output of
Notice that the number of restarts is quite high on both the kube-svc-redirect and kubernetes-dashboard pods. I'd be interested to know if anything was done on Microsoft's side to help stabilize those services, or if we should be on the lookout for those failing again in the future. In any case, I just want to say thanks for taking the time to help out with troubleshooting. |
Likewise, I am experiencing a similar issue. Is there any progress on fixing this? I have destroyed and created (3) new clusters in the last two days, with the same results. Prior to this, I was able to create clusters, although upgrading the Kubernetes version or adding nodes to cluster didn't work, at least I could create a new cluster. Now, that is not even working.
|
I'm having the same issue. I just tried AKS for the first time last night. I stood up a cluster in east last night and encountered the same thing. I tore down the cluster and stood up another one this morning only to get the same result. When I ran the create command this morning, I used the flag Ultimately I pulled the image and tried playing with it locally to understand what it's doing: In the container I could see the script. I set the two environment variables APISERVER_FQDN and KUBERNETES_SVC_IP to match the values listed by running
That's when I realized the api FQDN doesn't match what I see in azure portal as it has a prefix I tried running |
So, to further test my above theory I ran, I could see my new daemon set and the pod it created. The pod ran without crashing. Unfortunately kubernetes-dashboard was still in a crash loop. I left to go for a meeting and when I came back, my daemon set 'kube-svc-redirect-fix' was gone. I'm guessing whatever controller was replacing my changes to the actual 'kube-svc-redirect' was also watching kube-system in general and deleting any additional resources created. The pods in question however were no longer stuck in crash loop:
I could now access the dashboard |
Want to shed a little light on the underlying issue. As part of your AKS cluster we provision a dedicated IP address that is used by the infrastructure that lets When we provision an AKS cluster, that IP address allocation is async. We've had a few service bugs and regional rate limits that have extended the allocation beyond 15 minutes. This looks like Once these pods do connect up successfully, they will remain connected and shouldn't go back into There have been a few cases where that allocation permanently fails. Longer-term, we are working on making this part of the service a lot more robust. I'm going to close out this issue, since we don't have any active incidents at the moment! |
I had this problem and it turned out that I had a resource that used the subnet dedicated to AKS. you have to check this and if so, remove the resource. |
it happened to me after adding more nodes to the cluster. Luckily, kubectl still works; the cluster returned to normal after deleting failing pods (repeatedly) |
We encountered this issue today as well, after adding more nodes on a 1.10.6 pretty vanilla cluster (no advanced networking). |
Today I created a new cluster in eastus.
When I look at system pods:
kubectl get pods -n kube-system
, I get that:heapster-58f795c4cf-bmsjn 2/2 Running 0 46m
kube-dns-v20-6c8f7f988b-75xlk 3/3 Running 0 46m
kube-dns-v20-6c8f7f988b-rvv2x 3/3 Running 0 46m
kube-proxy-2plc9 1/1 Running 0 46m
kube-proxy-pb6tr 1/1 Running 0 46m
kube-proxy-xhvm8 1/1 Running 0 46m
kube-svc-redirect-j8v6c 0/1 CrashLoopBackOff 13 46m
kube-svc-redirect-jz5bl 0/1 CrashLoopBackOff 13 46m
kube-svc-redirect-p5hbb 0/1 Error 14 46m
kubernetes-dashboard-6fc8cf9586-fhscp 0/1 CrashLoopBackOff 13 46m
tunnelfront-7446f49869-mj2n5 1/1 Running 0 46m
The text was updated successfully, but these errors were encountered: