Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dns resolving not working after adding new agent #2241

Closed
SagiMedinaCheckmarx opened this issue Sep 13, 2020 · 6 comments
Closed

dns resolving not working after adding new agent #2241

SagiMedinaCheckmarx opened this issue Sep 13, 2020 · 6 comments

Comments

@SagiMedinaCheckmarx
Copy link

SagiMedinaCheckmarx commented Sep 13, 2020

Environmental Info:
K3s Version:
k3s version v1.18.8+k3s1 (6b59531)

Node(s) CPU architecture, OS, and Version:

Linux ip-aws #36-Ubuntu SMP Tue Aug 18 08:58:43 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Linux ip-aws #36-Ubuntu SMP Tue Aug 18 08:58:43 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:

1 master
1 worker

Describe the bug:

After installing new agent and join him into the cluster i can't resolve any DNS record including google.com and kubernetes.default

Steps To Reproduce:

On master:

  • curl -sfL https://get.k3s.io | sh
  • kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- nslookup kubernetes.default --> working!
  • kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- nslookup google.com --> working!
  • cat /var/lib/rancher/k3s/server/node-token -> copy

On worker:

then back to master:

  • kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- nslookup kubernetes.default --> not working!
  • kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- nslookup google.com --> not working!

Expected behavior:

DNS resolving should work.

Actual behavior:

DNS resolving is not working.

Additional context / logs:

kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- nslookup google.com
If you don't see a command prompt, try pressing enter.
Address 1: 10.43.0.10

nslookup: can't resolve 'google.com'
pod "busybox" deleted
pod default/busybox terminated (Error)

An additional thing that may be related - I only expose the following ports on both nodes:
8472 (UDP)
6443 (TCP)

@memelet
Copy link

memelet commented Oct 2, 2020

This seems to be exactly what I am seeing. In my case I'm using vagrant vms, but also ubuntu (ubuntu/bionic64)

But I find that dns always resolves in pods on master, even after agents are added. Only pods on the agent fail to resolve.

@brandond
Copy link
Member

brandond commented Oct 2, 2020

See the required ports here:
https://rancher.com/docs/k3s/latest/en/installation/installation-requirements/#networking

@memelet If you continue to have problems after confirming that these ports are open, please open a new issue and fill out the issue template so that we have sufficient information on your environment.

@pnavais
Copy link

pnavais commented Oct 18, 2020

Same issue here (1 master / 3 workers on Manjaro ARM 64 @ Raspberry Pi 4). Interestingly enough, after a restart of k3s on all nodes (systemctl restart k3s.service (master) systemctl restart k3s-agent.service (workers) DNS starts working properly. This happens systematically, even after master/workers boot, DNS only starts working after k3s restart on nodes where pods were deployed.

I've been using the following deployment manifest to test the issue :

kind: Deployment
apiVersion: apps/v1
metadata:
  name: hello
spec:
  replicas: 3
  selector:
    matchLabels:
      role: hello
  template:
    metadata:
      labels:
        role: hello
    spec:
      containers:
      - name: hello
        image: pnavais/hello-app-arm:1.0
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8080

Trying to resolve on any pod fails unless k3s on node where that particular pod is running is restarted :

  • kubectl exec -ti hello-..... -- nslookup google.com <--- timeout

@brandond
Copy link
Member

It sounds like traffic to the DNS service is being dropped if the coredns pod isn't running on the node that's trying to do the lookup. Do you have the default firewall (firewalld/ufw) enabled? Are you using nftables, or iptables-legacy?

@pnavais
Copy link

pnavais commented Oct 19, 2020

No iptables frontend (firewalld/ufw) installed and using iptables legacy (v1.8.5).

Some screenshots attached showing running pods (same pods after restarting k3s agent on worker node)

CC0425A4-73AE-44D3-914E-04497DE3680D

415F6C5A-2CB0-46C3-B895-416E5DA395FA

@stale
Copy link

stale bot commented Jul 31, 2021

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@stale stale bot added the status/stale label Jul 31, 2021
@stale stale bot closed this as completed Aug 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants