Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problematic Multi Node Networking with docker driver and kindnetd CNI #9838

Closed
sadlil opened this issue Dec 3, 2020 · 5 comments · Fixed by #9875
Closed

Problematic Multi Node Networking with docker driver and kindnetd CNI #9838

sadlil opened this issue Dec 3, 2020 · 5 comments · Fixed by #9875
Assignees
Labels
kind/support Categorizes issue or PR as a support question.

Comments

@sadlil
Copy link
Contributor

sadlil commented Dec 3, 2020

Creating a multi node minikube cluster with docker driver and kindnetd CNI seems to be resulting in broken netwroking inside the pods running in the worker nodes. This networking problem is not presents with calico CNI plugin.

$ minikube version
minikube version: v1.15.1
commit: 23f40a012abb52eff365ff99a709501a61ac5876
$ system_profiler SPSoftwareDataType
Software:
      System Version: macOS 10.15.7 (19H15)
      Kernel Version: Darwin 19.6.0
$ docker version
Client: Docker Engine - Community
 Cloud integration: 1.0.2
 Version:           19.03.13
 API version:       1.40
 Go version:        go1.13.15
 Git commit:        4484c46d9d
 Built:             Wed Sep 16 16:58:31 2020
 OS/Arch:           darwin/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.13
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       4484c46d9d
  Built:            Wed Sep 16 17:07:04 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.3.7
  GitCommit:        8fba4e9a7d01810a393d5d25a3621dc101981175
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Steps to reproduce the issue:

  1. minikube start -p multi-node -n 3
  2. Create nginx deployment and svc, and another network test deployment.
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-nginx
spec:
  selector:
    matchLabels:
      run: my-nginx
  replicas: 4
  template:
    metadata:
      labels:
        run: my-nginx
    spec:
      containers:
      - name: my-nginx
        image: nginx
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: my-nginx
  labels:
    run: my-nginx
spec:
  ports:
  - port: 80
    protocol: TCP
  selector:
    run: my-nginx
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: net-test
spec:
  selector:
    matchLabels:
      run: net-test
  replicas: 4
  template:
    metadata:
      labels:
        run: net-test
    spec:
      containers:
      - name: net-test
        image: praqma/network-multitool
        ports:
        - containerPort: 80
  1. Wait for the pods to be running. Pod running in different node are getting similar ip address. Then do a get pod -o wide.
$ kubectl get pods -o wide
NAMESPACE     NAME                                 READY   STATUS    RESTARTS   AGE     IP             NODE             NOMINATED NODE   READINESS GATES
default       my-nginx-5b56ccd65f-dgjfv            1/1     Running   0          6m22s   172.17.0.2     multi-node-m02   <none>           <none>
default       my-nginx-5b56ccd65f-g2w6z            1/1     Running   0          6m22s   172.17.0.4     multi-node-m03   <none>           <none>
default       my-nginx-5b56ccd65f-h4n82            1/1     Running   0          6m22s   172.17.0.3     multi-node       <none>           <none>
default       my-nginx-5b56ccd65f-pkftf            1/1     Running   0          6m21s   172.17.0.3     multi-node-m03   <none>           <none>
default       net-test-c4f9cfdd4-jjsbj             1/1     Running   0          6m22s   172.17.0.4     multi-node-m02   <none>           <none>
default       net-test-c4f9cfdd4-sqdzg             1/1     Running   0          6m22s   172.17.0.2     multi-node-m03   <none>           <none>
default       net-test-c4f9cfdd4-wxm7c             1/1     Running   0          6m22s   172.17.0.4     multi-node       <none>           <none>
default       net-test-c4f9cfdd4-zdl64             1/1     Running   0          6m22s   172.17.0.3     multi-node-m02   <none>           <none>

In above block the IP 127.17.0.3 is assigned in 3 pod running in 3 different nodes.

  1. Exec into net-test pod running is first node. kubectl exec -it net-test-c4f9cfdd4-wxm7c -- /bin/bash.

  2. Trying to curl in cluster service and google.com.

$ kubectl exec -it net-test-c4f9cfdd4-wxm7c -- /bin/bash
bash-5.0# curl google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>


bash-5.0# curl my-nginx.default
curl: (7) Failed to connect to my-nginx.default port 80: Connection refused


bash-5.0# curl my-nginx.default
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>


bash-5.0# curl my-nginx.default
curl: (7) Failed to connect to my-nginx.default port 80: Connection refused

One request to my-nginx.default failed but the next one succeeded, next one failed again.

  1. Exec into a pod running in worker nodes.
$ kubectl exec -it net-test-c4f9cfdd4-zdl64 -- /bin/bash
bash-5.0# curl google.com
curl: (6) Could not resolve host: google.com

bash-5.0# curl my-nginx.default
curl: (6) Could not resolve host: my-nginx.default

bash-5.0# curl my-nginx.default
curl: (6) Could not resolve host: my-nginx.default

bash-5.0# dig google.com
; <<>> DiG 9.16.6 <<>> google.com
;; global options: +cmd
;; connection timed out; no servers could be reached

No connectivity inside the pod.

  1. Checking the logs of kindnetd says
$ kubectl logs -f -n kube-system kindnet-kkcrx
I1203 17:36:06.659437       1 main.go:64] hostIP = 192.168.59.2
podIP = 192.168.59.2
I1203 17:36:07.955860       1 main.go:143] Node multi-node has no CIDR, ignoring
I1203 17:36:07.955927       1 main.go:143] Node multi-node-m02 has no CIDR, ignoring
I1203 17:36:17.964974       1 main.go:143] Node multi-node has no CIDR, ignoring
I1203 17:36:17.965059       1 main.go:143] Node multi-node-m02 has no CIDR, ignoring
I1203 17:36:27.992500       1 main.go:143] Node multi-node has no CIDR, ignoring
I1203 17:36:27.992598       1 main.go:143] Node multi-node-m02 has no CIDR, ignoring
I1203 17:36:38.840804       1 main.go:143] Node multi-node has no CIDR, ignoring
I1203 17:36:38.841956       1 main.go:143] Node multi-node-m02 has no CIDR, ignoring
  1. Inspecting node spec using --
$ kubectl get nodes -o custom-columns=NAME:.metadata.name,SPEC:.spec
NAME             SPEC
multi-node       map[]
multi-node-m02   map[]
multi-node-m03   map[]

Shows no podCIDR set, which seems to be a requirement for kindnetd https://github.com/kubernetes-sigs/kind/blob/master/images/kindnetd/cmd/kindnetd/main.go#L148.

Not sure if this is related, but while using calico via minikube start -n 3 --enable-default-cni=false --network-plugin=cni --cni='calico' works.

Full output of minikube start command used, if not already included:

$ minikube start -p multi-node -n 3 😄 [multi-node] minikube v1.15.1 on Darwin 10.15.7 ✨ Automatically selected the docker driver 👍 Starting control plane node multi-node in cluster multi-node 🔥 Creating docker container (CPUs=2, Memory=1987MB) ... 🐳 Preparing Kubernetes v1.19.4 on Docker 19.03.13 ... 🔎 Verifying Kubernetes components... 🌟 Enabled addons: storage-provisioner, default-storageclass

❗ Multi-node clusters are currently experimental and might exhibit unintended behavior.
📘 To track progress on multi-node clusters, see #7538.

👍 Starting node multi-node-m02 in cluster multi-node
🔥 Creating docker container (CPUs=2, Memory=1987MB) ...
🌐 Found network options:
▪ NO_PROXY=192.168.59.2
🐳 Preparing Kubernetes v1.19.4 on Docker 19.03.13 ...
▪ env NO_PROXY=192.168.59.2
🔎 Verifying Kubernetes components...

👍 Starting node multi-node-m03 in cluster multi-node
🔥 Creating docker container (CPUs=2, Memory=1987MB) ...
🌐 Found network options:
▪ NO_PROXY=192.168.59.2,192.168.59.3
🐳 Preparing Kubernetes v1.19.4 on Docker 19.03.13 ...
▪ env NO_PROXY=192.168.59.2
▪ env NO_PROXY=192.168.59.2,192.168.59.3
🔎 Verifying Kubernetes components...
🏄 Done! kubectl is now configured to use "multi-node" cluster and "default" namespace by default

Optional: Full output of minikube logs command:

@sadlil
Copy link
Contributor Author

sadlil commented Dec 6, 2020

/assign

@sadlil
Copy link
Contributor Author

sadlil commented Dec 6, 2020

Here is what I have found so far,

  1. To add podCIDR in node spec, we should ask kubeadm to add a flag to kube controller manager -- allocate-node-cidrs: "true". But with this we must specify podSubnet in ClusterConfiguration. An empty podSubnet might cause kube-controller-manager to crash.

  2. The networking is broken because the control plane node did not configure a cni even when we are asking to create multi node. when minikube starts the control-plane node cc.Nodes in config.ClusterConfig only have one node information, and https://github.com/kubernetes/minikube/blob/master/pkg/minikube/cni/cni.go#L112 causes to not chose KindNet, instead disabled CNI returned. doning a cni.Apply in control plane node does nothing.

As the CNI is broken and not podCIDR is not set, kindnetd returns the first available ip in the host, that causes multiple pods to have same ip in different nodes.

I am preparing a fix, will create a PR soon.

@azhao155
Copy link
Contributor

azhao155 commented Dec 7, 2020

To add cidr, could we --extra-config=kubeadm.pod-network-cidr=10.244.0.0/16 ?

@azhao155
Copy link
Contributor

azhao155 commented Dec 7, 2020

You are right, CIDR and cni selection are the two problems.
./out/minikube start -n 2 -p p1 --cni=kindnet --extra-config=kubeadm.pod-network-cidr=10.244.0.0/16
fixed the issue.

@sadlil
Copy link
Contributor Author

sadlil commented Dec 7, 2020

You are right setting both --cni=kindnet --extra-config=kubeadm.pod-network-cidr=10.244.0.0/16 works but setting only -extra-config=kubeadm.pod-network-cidr=10.244.0.0/16 doesn't. This is because of the node length check i mentioned. When we do provide a --cni we do not check for node counts before enabling it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants