dns resolving not working after adding new agent #2241

SagiMedinaCheckmarx · 2020-09-13T09:54:28Z

Environmental Info:
K3s Version:
k3s version v1.18.8+k3s1 (6b59531)

Node(s) CPU architecture, OS, and Version:

Linux ip-aws #36-Ubuntu SMP Tue Aug 18 08:58:43 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Linux ip-aws #36-Ubuntu SMP Tue Aug 18 08:58:43 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:

1 master
1 worker

Describe the bug:

After installing new agent and join him into the cluster i can't resolve any DNS record including google.com and kubernetes.default

Steps To Reproduce:

On master:

curl -sfL https://get.k3s.io | sh
kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- nslookup kubernetes.default --> working!
kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- nslookup google.com --> working!
cat /var/lib/rancher/k3s/server/node-token -> copy

On worker:

curl -sfL https://get.k3s.io | K3S_URL=https://.....:6443 K3S_TOKEN=.... sh

then back to master:

kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- nslookup kubernetes.default --> not working!
kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- nslookup google.com --> not working!

Expected behavior:

DNS resolving should work.

Actual behavior:

DNS resolving is not working.

Additional context / logs:

kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- nslookup google.com
If you don't see a command prompt, try pressing enter.
Address 1: 10.43.0.10

nslookup: can't resolve 'google.com'
pod "busybox" deleted
pod default/busybox terminated (Error)

An additional thing that may be related - I only expose the following ports on both nodes:
8472 (UDP)
6443 (TCP)

The text was updated successfully, but these errors were encountered:

memelet · 2020-10-02T02:00:12Z

This seems to be exactly what I am seeing. In my case I'm using vagrant vms, but also ubuntu (ubuntu/bionic64)

But I find that dns always resolves in pods on master, even after agents are added. Only pods on the agent fail to resolve.

brandond · 2020-10-02T06:00:49Z

See the required ports here:
https://rancher.com/docs/k3s/latest/en/installation/installation-requirements/#networking

@memelet If you continue to have problems after confirming that these ports are open, please open a new issue and fill out the issue template so that we have sufficient information on your environment.

pnavais · 2020-10-18T19:00:43Z

Same issue here (1 master / 3 workers on Manjaro ARM 64 @ Raspberry Pi 4). Interestingly enough, after a restart of k3s on all nodes (systemctl restart k3s.service (master) systemctl restart k3s-agent.service (workers) DNS starts working properly. This happens systematically, even after master/workers boot, DNS only starts working after k3s restart on nodes where pods were deployed.

I've been using the following deployment manifest to test the issue :

kind: Deployment
apiVersion: apps/v1
metadata:
  name: hello
spec:
  replicas: 3
  selector:
    matchLabels:
      role: hello
  template:
    metadata:
      labels:
        role: hello
    spec:
      containers:
      - name: hello
        image: pnavais/hello-app-arm:1.0
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8080

Trying to resolve on any pod fails unless k3s on node where that particular pod is running is restarted :

kubectl exec -ti hello-..... -- nslookup google.com <--- timeout

brandond · 2020-10-19T18:53:12Z

It sounds like traffic to the DNS service is being dropped if the coredns pod isn't running on the node that's trying to do the lookup. Do you have the default firewall (firewalld/ufw) enabled? Are you using nftables, or iptables-legacy?

pnavais · 2020-10-19T20:02:36Z

No iptables frontend (firewalld/ufw) installed and using iptables legacy (v1.8.5).

Some screenshots attached showing running pods (same pods after restarting k3s agent on worker node)

stale · 2021-07-31T00:45:38Z

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

stale bot added the status/stale label Jul 31, 2021

stale bot closed this as completed Aug 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dns resolving not working after adding new agent #2241

dns resolving not working after adding new agent #2241

SagiMedinaCheckmarx commented Sep 13, 2020 •

edited

Loading

memelet commented Oct 2, 2020 •

edited

Loading

brandond commented Oct 2, 2020

pnavais commented Oct 18, 2020 •

edited

Loading

brandond commented Oct 19, 2020

pnavais commented Oct 19, 2020 •

edited

Loading

stale bot commented Jul 31, 2021

dns resolving not working after adding new agent #2241

dns resolving not working after adding new agent #2241

Comments

SagiMedinaCheckmarx commented Sep 13, 2020 • edited Loading

memelet commented Oct 2, 2020 • edited Loading

brandond commented Oct 2, 2020

pnavais commented Oct 18, 2020 • edited Loading

brandond commented Oct 19, 2020

pnavais commented Oct 19, 2020 • edited Loading

stale bot commented Jul 31, 2021

SagiMedinaCheckmarx commented Sep 13, 2020 •

edited

Loading

memelet commented Oct 2, 2020 •

edited

Loading

pnavais commented Oct 18, 2020 •

edited

Loading

pnavais commented Oct 19, 2020 •

edited

Loading