Advanced networking pods getting wrong IPs #533

mbrancato · 2018-07-16T20:20:26Z

We've identified an issue when using the advanced networking and custom Vnet that pods are getting assigned the wrong IPs. This causes a number of issues in the cluster when deploying apps.

This is likely deployment specific as we have deployed with the Portal and seen it work correctly, but with Terraform we're seeing these issues. We're currently working thru a support request to identify a cause.

This seems related to other issues, or at least has presented the same symptoms (outside the IP issue). For example we see several pods in kube-system in crashloops, and can't see logs from pods. Having seen others referencing scaling down to one node (#232 (comment)), I tried something similar and this is a workaround. I drained all but one node, and things work. I then uncordoned other nodes to deploy apps. We can see logs now, however, I'm confident that if azureproxy, tunnelfront and others move to other nodes, things will start failing again.

This definitely appears to be a deployment issue using Terraform. I wanted to at least document this here, but I'm unsure if the issue is in the Terraform provider or something that has changed in the AKS APIs, etc. Thoughts?

To see whats happening, using the Advanced networking and custom vnet, we see pods on the 10.244.x.x network:

NAMESPACE     NAME                                  READY     STATUS             RESTARTS   AGE       IP           NODE
kube-system   azureproxy-7c677567f6-6mnbt           0/1       CrashLoopBackOff   8          23m       10.244.1.3   aks-default-16208061-0
kube-system   heapster-56c6f9566f-d7hwc             2/2       Running            0          23m       10.244.2.2   aks-default-16208061-1
kube-system   kube-dns-v20-7c556f89c5-j5xsx         3/3       Running            0          23m       10.244.1.2   aks-default-16208061-0
kube-system   kube-dns-v20-7c556f89c5-mzbzm         3/3       Running            0          23m       10.244.2.3   aks-default-16208061-1
kube-system   kube-proxy-9nt2g                      1/1       Running            0          23m       10.4.2.4     aks-default-16208061-1
kube-system   kube-proxy-fsbhg                      1/1       Running            0          23m       10.4.2.6     aks-default-16208061-2
kube-system   kube-proxy-hhdzk                      1/1       Running            0          23m       10.4.2.5     aks-default-16208061-0
kube-system   kube-svc-redirect-hskld               0/1       CrashLoopBackOff   5          23m       10.4.2.5     aks-default-16208061-0
kube-system   kube-svc-redirect-lnqfb               1/1       Running            5          23m       10.4.2.4     aks-default-16208061-1
kube-system   kube-svc-redirect-w5kd8               0/1       CrashLoopBackOff   5          23m       10.4.2.6     aks-default-16208061-2
kube-system   kubernetes-dashboard-b85c46fc-h2rcx   0/1       CrashLoopBackOff   9          23m       10.244.1.5   aks-default-16208061-0
kube-system   tunnelfront-5899cd69c6-qlkxc          1/1       Running            0          23m       10.244.1.4   aks-default-16208061-0

References:
#2
#232
#56

The text was updated successfully, but these errors were encountered:

mbrancato · 2018-07-16T22:56:33Z

I've confirmed this same issue when creating an advanced networking cluster using the Azure CLI 2.0.41.

sukrit007 · 2018-07-23T07:24:57Z

Just ran into this issue and similar symptoms when using CLI to create the cluster. Was support able to triage the root cause of this issue @mbrancato?

mbrancato · 2018-07-24T03:49:15Z

@sukrit007 This is an issue related to the pod-cidr value and network-plugin values. Make sure you set --network-plugin azure when using the CLI. That should get you going with the CLI.

The Terraform issue here is being handled by hashicorp/terraform-provider-azurerm#1434

sukrit007 · 2018-07-24T08:19:42Z

Thanks @mbrancato that worked like a charm!

rite2nikhil · 2018-08-07T20:31:27Z

Closing the issue itself as see resolution, @mbrancato please re-open if this is not the case

mbrancato mentioned this issue Jul 16, 2018

AKS - kubenet issues with Kubernetes 1.10.x #531

Closed

rite2nikhil closed this as completed Aug 7, 2018

derekperkins mentioned this issue Aug 29, 2018

Cluster unavailable after upgrade from 1.10.5 to 1.11.2 #626

Closed

ghost locked as resolved and limited conversation to collaborators Aug 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Advanced networking pods getting wrong IPs #533

Advanced networking pods getting wrong IPs #533

mbrancato commented Jul 16, 2018

mbrancato commented Jul 16, 2018

sukrit007 commented Jul 23, 2018

mbrancato commented Jul 24, 2018

sukrit007 commented Jul 24, 2018

rite2nikhil commented Aug 7, 2018

Advanced networking pods getting wrong IPs #533

Advanced networking pods getting wrong IPs #533

Comments

mbrancato commented Jul 16, 2018

mbrancato commented Jul 16, 2018

sukrit007 commented Jul 23, 2018

mbrancato commented Jul 24, 2018

sukrit007 commented Jul 24, 2018

rite2nikhil commented Aug 7, 2018