Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KUBE DNS Issue with Node JS there is error 1 Error: getaddrinfo EAI_AGAIN #68321

Closed
leeadh opened this issue Sep 6, 2018 · 12 comments
Closed
Assignees
Labels
sig/network Categorizes an issue or PR as relevant to SIG Network. triage/unresolved Indicates an issue that can not or will not be resolved.

Comments

@leeadh
Copy link

leeadh commented Sep 6, 2018

Would like to ask how to resolve this issue. I am running a node JS app and one of my methods is as below:

app.post('/', function (req, res) {
  let city = req.body.city;
  let url = `http://api.openweathermap.org/data/2.5/weather?q=${city}&units=imperial&appid=${apiKey}`

  request(url, function (err, response, body) {
    if(err){
      res.render('index', {weather: null, error: 'Error, please try again'});
      console.log("there is error 1 "+ err);
    } else {
      let weather = JSON.parse(body)
      if(weather.main == undefined){
        res.render('index', {weather: null, error: 'Error, please try again'});
        console.log("there is error 2 "+ err);
      } else {
        let weatherText = `It's ${weather.main.temp} degrees in ${weather.name}!`;
        res.render('index', {weather: weatherText, error: null});
      }
    }
  });
})

Locally it is working fine. However, when I deploy to kubernetes, I get this error. there is error 1 Error: getaddrinfo EAI_AGAIN api.openweathermap.org api.openweathermap.org:80

So i assume its something to do with my kubedns issue and I did a bit of digging and I find that my dns has an endpoint. At this point, I am kinda unsure what to do next. I guess its something to do with my dns but I am unsure. ANy help?

C:\Users\adrlee\Desktop\oracle\wercker>kubectl get ep kube-dns --namespace=kube-system
NAME       ENDPOINTS                                               AGE
kube-dns   10.244.2.2:53,10.244.2.3:53,10.244.2.5:53 + 5 more...   20d
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Sep 6, 2018
@chrisohaver
Copy link
Contributor

The output of kubectl get endpoints suggests this is probably a CoreDNS deployment (CoreDNS uses the same service name "kube-dns").

Nevertheless, the error i think means a timeout - which can mean

  • that the pod cannot route to the DNS service.
  • Or that coredns is crashing/not responding
  • Or upstream servers are timing out

Try ...

  • kubectl -n kube-system get pods ... to show status if dns pods
  • kubectl -n kube-system log coredns-XXX-XXX ... (replacing pod XXX-XXX with the real name of pod) to see if there are any errors in coredns logs
  • you can try spinning up a dns client Pod to test out DNS interactively, as outlined in this troubleshooting guide: open PR for CoreDNS version... Add CoreDNS details to DNS Debug docs website#10201

@chrisohaver
Copy link
Contributor

/sig network

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 6, 2018
@abrenneke
Copy link

For what it's worth, I've been fighting this for a long time. First with kube-dns, then in an attempt to fix it, I switched to coredns, which seemed to fix it for a while, but now it's back. The coredns logs really don't show much of anything:

$ kubectl logs -n=kube-system coredns-5f48c784c6-wrw9d
.:53
CoreDNS-1.1.0
linux/amd64, go1.10, c8d91500
2018/10/08 04:13:22 [INFO] CoreDNS-1.1.0
2018/10/08 04:13:22 [INFO] linux/amd64, go1.10, c8d91500
09/Oct/2018:04:11:17 +0000 [ERROR 2 [a url]. A] unreachable backend: read udp 192.168.137.56:57579->172.31.0.2:53: i/o timeout

$ kubectl logs -n=kube-system coredns-5f48c784c6-ss2mz
2018/10/08 04:13:22 [INFO] CoreDNS-1.1.0
2018/10/08 04:13:22 [INFO] linux/amd64, go1.10, c8d91500
.:53
CoreDNS-1.1.0
linux/amd64, go1.10, c8d91500

Though there are probably newer versions of coredns at this point.

It's very inconsistent, maybe happens a few times a day.

@charandas
Copy link

Like @SneakyMax , I tried moving to coredns, but even with the current release 1.2.6, I see this issue sometimes.

For me, it arises in clusters that are only connected via a layer-2 switch as far as their calico routing. The L3 network is there, but we prefer not to bind kubernetes services to it. This is so that we can migrate the IP Ranges in the datacenter, without having to worry about etcd, cert issuance, what have you.

@Nilubkal
Copy link

Hey guys i am facing the same problem , i am on azure AKS with the following specs:
kube-dns version 20
k8s v 1.11.5

and arbitrary the following error appears:
[HPM] Error occurred while trying to proxy request < rq > from to http://
(EAI_AGAIN) (https://nodejs.org/api/errors.html#errors_common_system_errors)
"errno": "EAI_AGAIN",
"code": "EAI_AGAIN",

@deepanvermag3
Copy link

Any update on this? @Nilubkal @charandas @SneakyMax I have this issue:
I have a running AWS EKS cluster with 2 worker nodes of type T2 and M3. I have mongo deployed on M3 and my application code deployed on T2. If kube-dns pod is running on M3, then microservices don't talk to mongo, but if it runs on T2, microservices can talk to mongo. How to work on this problem?

@thockin thockin added the triage/unresolved Indicates an issue that can not or will not be resolved. label Mar 8, 2019
@leogr
Copy link

leogr commented Apr 5, 2019

Same issue with minikube (Server version v1.14.0) and a CPU intensive Node JS app.
Trying kubectl -n kube-system log coredns-XXX-XXX I get a lot timeouts as like as the followings:

...
[ERROR] plugin/errors: 2 www.tachospionitalia.com. A: unreachable backend: read udp 172.17.0.2:52286->10.0.2.3:53: i/o timeout
[ERROR] plugin/errors: 2 www.tachospionitalia.com. AAAA: unreachable backend: read udp 172.17.0.2:57048->10.0.2.3:53: i/o timeout
...

@saidmasoud
Copy link

Just adding my anecdotal experience with this issue in case anyone else has the same problem. We experienced a large amount of EAI_AGAIN errors (k8s version 1.11.7) running kubedns. It turned out that a few bad deployments, whose pods kept looping on CrashLoopBackError (literally hundreds of times!) due to low mem limits were causing cascading DNS errors across all of our apps. After deleting the bad deployments, the DNS errors went away. So this definitely seems like an issue that can be caused by multiple causes, and I'm not sure how to properly mitigate it in case it happens again.

@prameshj
Copy link
Contributor

prameshj commented May 2, 2019

@saidmasoud Were the nodes overcommitted on memory in order for kube-dns pods to be OOMKilled? Maybe the number of connections dnsmasq supports need to be reduced?

@saidmasoud
Copy link

sorry @prameshj let me clarify: it was not the kube-dns pods that were OOMing, it was the pods which had memory limits not high enough to run idle. the kube-scheduler kept trying to provision the pods literally hundreds of times (and I think a thousand times in one instance!!!) Once we deleted those deployments, the EAI_AGAIN errors went away.

@prameshj
Copy link
Contributor

prameshj commented May 7, 2019

Thanks @saidmasoud, looks like this was a client-side error then. Closing this one, please reopen if this is something on the clusterDNS side.
/close

@k8s-ci-robot
Copy link
Contributor

@prameshj: Closing this issue.

In response to this:

Thanks @saidmasoud, looks like this was a client-side error then. Closing this one, please reopen if this is something on the clusterDNS side.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/network Categorizes an issue or PR as relevant to SIG Network. triage/unresolved Indicates an issue that can not or will not be resolved.
Projects
None yet
Development

No branches or pull requests