Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Virtualbox DNS relay fails under load: getaddrinfo EAI_AGAIN, i/o timeout #3606

Closed
kiboliu opened this issue Jan 30, 2019 · 7 comments
Closed
Labels
area/dns DNS issues co/virtualbox priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@kiboliu
Copy link

kiboliu commented Jan 30, 2019

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG REPORT

Please provide the following details:

Environment:

Minikube version (use minikube version): v0.33.1

  • OS (e.g. from /etc/os-release): 16.04
  • VM Driver (e.g. cat ~/.minikube/machines/minikube/config.json | grep DriverName): virtualbox
  • ISO version (e.g. cat ~/.minikube/machines/minikube/config.json | grep -i ISO or minikube ssh cat /etc/VERSION): v0.33.1
  • Install tools:
  • Others:
    The above can be generated in one go with the following commands (can be copied and pasted directly into your terminal):
minikube version
echo "";
echo "OS:";
cat /etc/os-release
echo "";
echo "VM driver:"; 
grep DriverName ~/.minikube/machines/minikube/config.json
echo "";
echo "ISO version";
grep -i ISO ~/.minikube/machines/minikube/config.json

What happened:
In my application running on the cluster, I need to upload local file to azure storage blob. For small files it works fine. However for large files (about 70mb), it always throws error

Error: getaddrinfo EAI_AGAIN xxx.blob.core.windows.net

The EAI_AGAIN error code is a DNS lookup timeout error. I check some settings:

# minikube ssh & sudo vi /etc/resolve.conf
nameserver 10.0.2.3

# Two coredns pods are running, with lots of timeout error in logs
A: unreachable backend: read udp 172.17.0.2:59109->10.0.2.3:53: i/o timeout

# A service named kube-dns exists, cluster-ip is 10.96.0.10
Name:              kube-dns
Namespace:         kube-system
Labels:            k8s-app=kube-dns
                   kubernetes.io/cluster-service=true
                   kubernetes.io/name=KubeDNS
Annotations:       prometheus.io/port=9153
                   prometheus.io/scrape=true
Selector:          k8s-app=kube-dns
Type:              ClusterIP
IP:                10.96.0.10
Port:              dns  53/UDP
TargetPort:        53/UDP
Endpoints:         172.17.0.2:53,172.17.0.3:53
Port:              dns-tcp  53/TCP
TargetPort:        53/TCP
Endpoints:         172.17.0.2:53,172.17.0.3:53
Session Affinity:  None
Events:            <none>

# A endpoint named kube-dns exists
Name:         kube-dns
Namespace:    kube-system
Labels:       k8s-app=kube-dns
              kubernetes.io/cluster-service=true
              kubernetes.io/name=KubeDNS
Annotations:  <none>
Subsets:
  Addresses:          172.17.0.2,172.17.0.3
  NotReadyAddresses:  <none>
  Ports:
    Name     Port  Protocol
    ----     ----  --------
    dns      53    UDP
    dns-tcp  53    TCP

Events:  <none>

I'm not sure how the pod running in the cluster to do DNS lookup, so I did not catch anything wrong from above information.

I also try minikube ssh into the vm and do nslookup [dnsname], sometimes fail sometimes succeed.

What you expected to happen:
Succeed DNS lookup.
How to reproduce it (as minimally and precisely as possible):
Build a cluster and upload a large file to azure blob storage.
Output of minikube logs (if applicable):
This error repeatedly occurs, not sure if it's related

Jan 30 00:20:27 minikube kubelet[2862]: E0130 00:20:27.380930    2862 kubelet_volumes.go:154] Orphaned pod "08c3b471-2416-11e9-8c7a-08002706d1dd" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.

Anything else do we need to know:
Virtualbox network settings (not changed, it's default):

Adapter 1: Paravirtualized Network (NAT)
Adapter 2: Paravirtualized Network (Host-only Adapter, 'vboxnet0')
@kiboliu
Copy link
Author

kiboliu commented Jan 30, 2019

For additional information to my problem, I also get this error for other dns name such as 'api.github.com'

@tstromberg tstromberg added area/dns DNS issues co/virtualbox priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Jan 30, 2019
@tstromberg tstromberg changed the title DNS lookup error in application running in cluster DNS issues: getaddrinfo EAI_AGAIN xxx.blob.core.windows.net / unreachable backend: read ->10.0.2.3:53: i/o timeout Feb 5, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 7, 2019
@tstromberg
Copy link
Contributor

I suspect there may have been firewall interference here. Is it possible that there was a VPN or firewall configured on the host?

Also: Do you mind trying with minikube v1.1.0? Thanks!

@tstromberg tstromberg added r/2019q2 Issue was last reviewed 2019q2 triage/needs-information Indicates an issue needs more information in order to work on it. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 22, 2019
@cvila84
Copy link

cvila84 commented Jun 24, 2019

Hello, I confirm the issue even with minikube v1.1.1

From my understanding, it comes from Virtualbox NAT DNS proxy, which sometimes silently crashes, resulting in i/o timeout on kube-dns (at it refers to NAT DNS proxy to resolve addresses external to K8S cluster)

I tested with latest 5.2.30 and 6.0.8 VB but gets same error after several minutes/hours.

I tried to replace NAT by NAT network on minikube VM and for the moment, everything is working well.

@tstromberg tstromberg added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. triage/needs-information Indicates an issue needs more information in order to work on it. labels Jul 17, 2019
@sharifelgamal sharifelgamal added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Jul 18, 2019
@tstromberg
Copy link
Contributor

Upstream: https://www.virtualbox.org/ticket/14736

@tstromberg tstromberg changed the title DNS issues: getaddrinfo EAI_AGAIN xxx.blob.core.windows.net / unreachable backend: read ->10.0.2.3:53: i/o timeout Virtualbox DNS relay fails periodically: getaddrinfo EAI_AGAIN Sep 19, 2019
@tstromberg tstromberg added kind/support Categorizes issue or PR as a support question. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. priority/backlog Higher priority than priority/awaiting-more-evidence. r/2019q2 Issue was last reviewed 2019q2 labels Sep 19, 2019
@tstromberg
Copy link
Contributor

Has anyone seen this with more recent releases of Virtualbox?

I don't see anything from their changelog that suggests the issue has been fixed.

@tstromberg tstromberg removed the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Sep 19, 2019
@tstromberg tstromberg changed the title Virtualbox DNS relay fails periodically: getaddrinfo EAI_AGAIN Virtualbox DNS relay fails under load: getaddrinfo EAI_AGAIN, i/o timeout Sep 20, 2019
@tstromberg tstromberg removed the kind/support Categorizes issue or PR as a support question. label Oct 23, 2019
@tstromberg tstromberg added kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence. and removed kind/bug Categorizes issue or PR as related to a bug. labels Oct 23, 2019
@tstromberg
Copy link
Contributor

Closing this as I haven't heard much here, and there is isn't much we can do about the VirtualBox DNS relay. If you run into this, try upgrading to VirtualBox 6.0.14+, and if that fails, try another hypervisor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dns DNS issues co/virtualbox priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

6 participants