Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] cni_compatible test failing due to outdated Cilium #2015

Closed
svteb opened this issue May 6, 2024 · 0 comments
Closed

[BUG] cni_compatible test failing due to outdated Cilium #2015

svteb opened this issue May 6, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@svteb
Copy link
Collaborator

svteb commented May 6, 2024

Describe the bug
For some reason my machine is incapable of creating the Cilium cluster as described in the setup_cilium_cluster function, that is with --version 1.10.5. Below are some outputs that describe the problem. Regardless, I've found out that using a newer Cilium version (such as 1.15.4) makes the cluster deploy successfully. I am not sure if changing the version won't break something (considering I have already tested it, it likely shouldn't).

Pods are forever stuck in ContainerCreating:

kubectl get pods -A --kubeconfig /home/ubuntu/.cnf-testsuite/tools/kind/cilium-test_admin.conf
NAMESPACE            NAME                                                READY   STATUS              RESTARTS   AGE
cnfspace             coredns-coredns-6fc69fdfd7-gp72g                    0/1     ContainerCreating   0          5m51s
kube-system          cilium-g8nn5                                        1/1     Running             0          10m
kube-system          cilium-operator-5cd47845bf-h6g5d                    1/1     Running             1          10m
kube-system          coredns-558bd4d5db-hk6g8                            0/1     ContainerCreating   0          10m
kube-system          coredns-558bd4d5db-xz4bd                            0/1     ContainerCreating   0          10m
kube-system          etcd-cilium-test-control-plane                      1/1     Running             0          10m
kube-system          kube-apiserver-cilium-test-control-plane            1/1     Running             0          10m
kube-system          kube-controller-manager-cilium-test-control-plane   1/1     Running             0          10m
kube-system          kube-proxy-t8s5g                                    1/1     Running             0          10m
kube-system          kube-scheduler-cilium-test-control-plane            1/1     Running             1          10m
local-path-storage   local-path-provisioner-85494db59d-bjtms             0/1     ContainerCreating   0          10m

Failing events:

kubectl describe pod coredns-558bd4d5db-hk6g8 -n kube-system --kubeconfig /home/ubuntu/.cnf-testsuite/tools/kind/cilium-test_admin.conf
Name:                 coredns-558bd4d5db-hk6g8
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 cilium-test-control-plane/172.18.0.3
Start Time:           Mon, 06 May 2024 06:12:20 +0000
Labels:               k8s-app=kube-dns
                      pod-template-hash=558bd4d5db
Annotations:          <none>
Status:               Pending
IP:                   
IPs:                  <none>
Controlled By:        ReplicaSet/coredns-558bd4d5db
Containers:
  coredns:
    Container ID:  
    Image:         k8s.gcr.io/coredns/coredns:v1.8.0
    Image ID:      
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-7x6f2 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  kube-api-access-7x6f2:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 CriticalAddonsOnly op=Exists
                             node-role.kubernetes.io/control-plane:NoSchedule
                             node-role.kubernetes.io/master:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                From               Message
  ----     ------                  ----               ----               -------
  Normal   Scheduled               10m                default-scheduler  Successfully assigned kube-system/coredns-558bd4d5db-hk6g8 to cilium-test-control-plane
  Warning  FailedScheduling        12m (x2 over 12m)  default-scheduler  0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
  Warning  FailedCreatePodSandBox  8m59s              kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "fd8413af31a7da4b9be76534009b5a2227a992b08f6f1ff865d633375a45e7fd": Unable to create endpoint: Cilium API client timeout exceeded
  Warning  FailedCreatePodSandBox  7m15s              kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "a243887b52479c5c420449424ba67ace45b040c879576ea206dbcaea72455f3d": Unable to create endpoint: Cilium API client timeout exceeded
  Warning  FailedCreatePodSandBox  5m34s              kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "cb56ad22c261ac9254a6dce3bc271379b4ace1d23633079e3d3c122d23513bdc": Unable to create endpoint: Cilium API client timeout exceeded
  Warning  FailedCreatePodSandBox  3m52s              kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "9ece139b4a71c544f69fecfd1e1194e53ea3c1b300d5f9917bbb8cc07f9cf5eb": Unable to create endpoint: Cilium API client timeout exceeded
  Warning  FailedCreatePodSandBox  2m9s               kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "963f1076eb73eed13ed9f4db65f0a45872f0928ef4598891bb3565bb738b0d7e": Unable to create endpoint: Cilium API client timeout exceeded
  Warning  FailedCreatePodSandBox  23s                kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "8d0905c8cc9b19c57f22a5eba492db149285fe9910d270f3d0ae5e02e976b832": Unable to create endpoint: Cilium API client timeout exceeded

Strangely high CPU usage:

docker stats
CONTAINER ID   NAME                        CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
f1f8a01d94ec   cilium-test-control-plane   526.35%   868.4MiB / 31.34GiB   2.71%     172MB / 35.4MB    950kB / 1.32GB    284
a7dff7d9c8ae   calico-test-control-plane   24.21%    985MiB / 31.34GiB     3.07%     198MB / 6.86MB    25.8MB / 1.71GB   473
top - 06:27:32 up 6 days, 22:46,  0 users,  load average: 9.13, 7.90, 5.87
Tasks:  37 total,   6 running,  31 sleeping,   0 stopped,   0 zombie
%Cpu(s):  5.4 us, 66.1 sy,  0.0 ni, 22.5 id,  5.9 wa,  0.0 hi,  0.1 si,  0.0 st
MiB Mem :  32092.5 total,   2628.5 free,   3065.2 used,  26398.8 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  28533.5 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                       
   6950 root      20   0    7672   3596   2508 R  99.3   0.0   0:11.03 tc                                                                                            
   6957 root      20   0    7416   3520   2560 R  99.3   0.0   0:07.21 tc                                                                                            
   6953 root      20   0    7416   3576   2616 R  98.7   0.0   0:09.26 tc                                                                                            
   6973 root      20   0    7288   3264   2464 R  98.7   0.0   0:04.91 tc                                                                                            
   6959 root      20   0    7288   3424   2628 R  98.3   0.0   0:06.68 tc

To Reproduce
Steps to reproduce the behavior:

  1. ./cnf-testsuite cnf_setup cnf-config=sample-cnfs/sample-coredns-cnf/cnf-testsuite.yml
  2. ./cnf-testsuite cni_compatible
  3. kubectl get pods -A --kubeconfig ~/.cnf-testsuite/tools/kind/cilium-test_admin.conf
  4. Calico seems to pass for me but Cilium fails after the 180 attempts timeout.

Expected behavior
Cilium cluster should be deployed without issues.

Device:

Linux, Ubuntu server 22.04, x86
kind version: v0.22.0
minikube version: v1.32.0
kubectl version: v1.23.13

Once this issue is address how will the fix be verified?
Hopefully it will not break the github actions :).

@svteb svteb added the bug Something isn't working label May 6, 2024
svteb added a commit to svteb/testsuite that referenced this issue May 17, 2024
@svteb svteb closed this as completed May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant