Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advanced networking pods getting wrong IPs #533

Closed
mbrancato opened this issue Jul 16, 2018 · 5 comments
Closed

Advanced networking pods getting wrong IPs #533

mbrancato opened this issue Jul 16, 2018 · 5 comments

Comments

@mbrancato
Copy link

We've identified an issue when using the advanced networking and custom Vnet that pods are getting assigned the wrong IPs. This causes a number of issues in the cluster when deploying apps.

This is likely deployment specific as we have deployed with the Portal and seen it work correctly, but with Terraform we're seeing these issues. We're currently working thru a support request to identify a cause.

This seems related to other issues, or at least has presented the same symptoms (outside the IP issue). For example we see several pods in kube-system in crashloops, and can't see logs from pods. Having seen others referencing scaling down to one node (#232 (comment)), I tried something similar and this is a workaround. I drained all but one node, and things work. I then uncordoned other nodes to deploy apps. We can see logs now, however, I'm confident that if azureproxy, tunnelfront and others move to other nodes, things will start failing again.

This definitely appears to be a deployment issue using Terraform. I wanted to at least document this here, but I'm unsure if the issue is in the Terraform provider or something that has changed in the AKS APIs, etc. Thoughts?

To see whats happening, using the Advanced networking and custom vnet, we see pods on the 10.244.x.x network:

NAMESPACE     NAME                                  READY     STATUS             RESTARTS   AGE       IP           NODE
kube-system   azureproxy-7c677567f6-6mnbt           0/1       CrashLoopBackOff   8          23m       10.244.1.3   aks-default-16208061-0
kube-system   heapster-56c6f9566f-d7hwc             2/2       Running            0          23m       10.244.2.2   aks-default-16208061-1
kube-system   kube-dns-v20-7c556f89c5-j5xsx         3/3       Running            0          23m       10.244.1.2   aks-default-16208061-0
kube-system   kube-dns-v20-7c556f89c5-mzbzm         3/3       Running            0          23m       10.244.2.3   aks-default-16208061-1
kube-system   kube-proxy-9nt2g                      1/1       Running            0          23m       10.4.2.4     aks-default-16208061-1
kube-system   kube-proxy-fsbhg                      1/1       Running            0          23m       10.4.2.6     aks-default-16208061-2
kube-system   kube-proxy-hhdzk                      1/1       Running            0          23m       10.4.2.5     aks-default-16208061-0
kube-system   kube-svc-redirect-hskld               0/1       CrashLoopBackOff   5          23m       10.4.2.5     aks-default-16208061-0
kube-system   kube-svc-redirect-lnqfb               1/1       Running            5          23m       10.4.2.4     aks-default-16208061-1
kube-system   kube-svc-redirect-w5kd8               0/1       CrashLoopBackOff   5          23m       10.4.2.6     aks-default-16208061-2
kube-system   kubernetes-dashboard-b85c46fc-h2rcx   0/1       CrashLoopBackOff   9          23m       10.244.1.5   aks-default-16208061-0
kube-system   tunnelfront-5899cd69c6-qlkxc          1/1       Running            0          23m       10.244.1.4   aks-default-16208061-0

References:
#2
#232
#56

@mbrancato
Copy link
Author

I've confirmed this same issue when creating an advanced networking cluster using the Azure CLI 2.0.41.

@sukrit007
Copy link

Just ran into this issue and similar symptoms when using CLI to create the cluster. Was support able to triage the root cause of this issue @mbrancato?

@mbrancato
Copy link
Author

@sukrit007 This is an issue related to the pod-cidr value and network-plugin values. Make sure you set --network-plugin azure when using the CLI. That should get you going with the CLI.

The Terraform issue here is being handled by hashicorp/terraform-provider-azurerm#1434

@sukrit007
Copy link

Thanks @mbrancato that worked like a charm!

@rite2nikhil
Copy link

Closing the issue itself as see resolution, @mbrancato please re-open if this is not the case

@ghost ghost locked as resolved and limited conversation to collaborators Aug 4, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants