Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker_Linux: TestStartStop/group/containerd: listen tcp 0.0.0.0:8444: bind: address already in use #7521

Closed
tstromberg opened this issue Apr 8, 2020 · 4 comments · Fixed by #7570
Labels
kind/flake Categorizes issue or PR as related to a flaky test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@tstromberg
Copy link
Contributor

tstromberg commented Apr 8, 2020

https://storage.googleapis.com/minikube-builds/logs/7516/5110da9/Docker_Linux.html#fail_TestStartStop%2fgroup%2fcontainerd

helpers.go:279: TestStartStop/group/containerd: WARNING: pod list for "kubernetes-dashboard" "k8s-app=kubernetes-dashboard" returned: Get https://172.17.0.2:8444/api/v1/namespaces/kubernetes-dashboard/pods?labelSelector=k8s-app%3Dkubernetes-dashboard: dial tcp 172.17.0.2:8444: connect: connection refused

start_stop_delete_test.go:141: ***** TestStartStop/group/containerd: pod "k8s-app=kubernetes-dashboard" failed to start within 4m0s: timed out waiting for the condition ****

start_stop_delete_test.go:141: TestStartStop/group/containerd: showing logs for failed pods as of 2020-04-08 09:55:54.348541364 -0700 PDT m=+1755.844986383

start_stop_delete_test.go:142: failed waiting for 'addon dashboard' pod post-stop-start: k8s-app=kubernetes-dashboard within 4m0s: timed out waiting for the condition

stdout shows some initial trouble with restarting:

	  - Apr 08 16:52:55 containerd-20200408T094544.677419249-5291 kubelet[949]: E0408 16:52:55.918044     949 pod_workers.go:191] Error syncing pod 39a425d2906ed7aa1f9aa5ea712491eb ("kube-apiserver-containerd-20200408t094544.677419249-5291_kube-system(39a425d2906ed7aa1f9aa5ea712491eb)"), skipping: failed to "StartContainer" for "kube-apiserver" with CrashLoopBackOff: "back-off 40s restarting failed container=kube-apiserver pod=kube-apiserver-containerd-20200408t094544.677419249-5291_kube-system(39a425d2906ed7aa1f9aa5ea712491eb)"

	  - Apr 08 16:53:04 containerd-20200408T094544.677419249-5291 kubelet[2137]: E0408 16:53:04.812825    2137 pod_workers.go:191] Error syncing pod 39a425d2906ed7aa1f9aa5ea712491eb ("kube-apiserver-containerd-20200408t094544.677419249-5291_kube-system(39a425d2906ed7aa1f9aa5ea712491eb)"), skipping: failed to "StartContainer" for "kube-apiserver" with CrashLoopBackOff: "back-off 10s restarting failed container=kube-apiserver pod=kube-apiserver-containerd-20200408t094544.677419249-5291_kube-system(39a425d2906ed7aa1f9aa5ea712491eb)"

	  - Apr 08 16:53:17 containerd-20200408T094544.677419249-5291 kubelet[2137]: E0408 16:53:17.555017    2137 reflector.go:178] object-"kube-system"/"coredns": Failed to list *v1.ConfigMap: configmaps "coredns" is forbidden: User "system:node:containerd-20200408t094544.677419249-5291" cannot list resource "configmaps" in API group "" in the namespace "kube-system": no relationship found between node "containerd-20200408t094544.677419249-5291" and this object

	  - Apr 08 16:53:17 containerd-20200408T094544.677419249-5291 kubelet[2137]: E0408 16:53:17.572281    2137 reflector.go:178] object-"kube-system"/"storage-provisioner-token-kjwqw": Failed to list *v1.Secret: secrets "storage-provisioner-token-kjwqw" is forbidden: User "system:node:containerd-20200408t094544.677419249-5291" cannot list resource "secrets" in API group "" in the namespace "kube-system": no relationship found between node "containerd-20200408t094544.677419249-5291" and this object

	  - Apr 08 16:53:17 containerd-20200408T094544.677419249-5291 kubelet[2137]: E0408 16:53:17.573550    2137 reflector.go:178] object-"kube-system"/"kindnet-token-jvf7f": Failed to list *v1.Secret: secrets "kindnet-token-jvf7f" is forbidden: User "system:node:containerd-20200408t094544.677419249-5291" cannot list resource "secrets" in API group "" in the namespace "kube-system": no relationship found between node "containerd-20200408t094544.677419249-5291" and this object

	  - Apr 08 16:53:32 containerd-20200408T094544.677419249-5291 kubelet[2137]: E0408 16:53:32.710210    2137 pod_workers.go:191] Error syncing pod d48cee69-6e8a-4616-92fd-84b3520f13de ("storage-provisioner_kube-system(d48cee69-6e8a-4616-92fd-84b3520f13de)"), skipping: failed to "StartContainer" for "storage-provisioner" with CrashLoopBackOff: "back-off 10s restarting failed container=storage-provisioner pod=storage-provisioner_kube-system(d48cee69-6e8a-4616-92fd-84b3520f13de)"

	  - Error: failed to create listener: failed to listen on 0.0.0.0:8444: listen tcp 0.0.0.0:8444: bind: address already in use

Pods show that the apiserver restarted, but the dashboard didn't. It appears we lost state when restarting the apiserver?

	NAMESPACE     NAME                                                                READY   STATUS    RESTARTS   AGE     LABELS

	default       busybox                                                             1/1     Running   1          5m35s   integration-test=busybox

	kube-system   coredns-66bff467f8-kr8sv                                            1/1     Running   1          6m35s   k8s-app=kube-dns,pod-template-hash=66bff467f8
	kube-system   coredns-66bff467f8-vdwj2                                            1/1     Running   1          6m35s   k8s-app=kube-dns,pod-template-hash=66bff467f8
	kube-system   etcd-containerd-20200408t094544.677419249-5291                      1/1     Running   0          2m      component=etcd,tier=control-plane
	kube-system   kindnet-2ghpg                                                       1/1     Running   1          6m35s   app=kindnet,controller-revision-hash=5c5d549dd7,k8s-app=kindnet,pod-template-generation=1,tier=node
	kube-system   kube-apiserver-containerd-20200408t094544.677419249-5291            1/1     Running   5          2m47s   component=kube-apiserver,tier=control-plane
	kube-system   kube-controller-manager-containerd-20200408t094544.677419249-5291   1/1     Running   1          7m3s    component=kube-controller-manager,tier=control-plane
	kube-system   kube-proxy-68nxc                                                    1/1     Running   1          6m35s   controller-revision-hash=c8bb659c5,k8s-app=kube-proxy,pod-template-generation=1
	kube-system   kube-scheduler-containerd-20200408t094544.677419249-5291            1/1     Running   1          7m3s    component=kube-scheduler,tier=control-plane
	kube-system   storage-provisioner                                                 1/1     Running   3          6m36s   addonmanager.kubernetes.io/mode=Reconcile,integration-test=storage-provisioner

This sounds like a race condition bug: where we claim health on the previous nameserver pid, but it restarts in the middle of us enabling the dashboard. We guard against this for Kubernetes upgrades by asserting that the apiserver is running the right version, but we don't yet guard against this for other cases.

@tstromberg tstromberg added kind/flake Categorizes issue or PR as related to a flaky test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Apr 8, 2020
@tstromberg tstromberg added this to the v1.10.0 milestone Apr 8, 2020
@tstromberg tstromberg added kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. and removed kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. labels Apr 8, 2020
@medyagh
Copy link
Member

medyagh commented Apr 8, 2020

this is dupe of #7505
and I believe this happens because Kic Stop does not stop kubelet.
I have not seen this for other Drivers, have u?

@medyagh
Copy link
Member

medyagh commented Apr 9, 2020

here is the root cause: unable to stop kubelet:

Initiating transfer...
	I0408 09:29:12.140160    6997 kic_runner.go:91] Run: sudo mkdir -p /var/lib/minikube/binaries/v1.18.0
	I0408 09:29:12.291179    6997 kic_runner.go:91] Run: /bin/bash -c "pgrep kubelet && sudo systemctl stop kubelet"
	W0408 09:29:12.451968    6997 binaries.go:55] unable to stop kubelet: /bin/bash -c "pgrep kubelet && sudo systemctl stop kubelet": exit status 1
	stdout:
	
	stderr:
	I0408 09:29:12.452085    6997 binary.go:57] Not caching binary, using https://storage.googleapis.com/kubernetes-release/release/v1.18.0/bin/linux/amd64/kubectl?checksum=file:https://storage.googleapis.com/kubernetes-release/release/v1.18.0/bin/linux/amd64/kubectl.sha256
	I0408 09:29:12.452119    6997 binary.go:57] Not caching binary, using https://storage.googleapis.com/kubernetes-release/release/v1.18.0/bin/linux/amd64/kubeadm?checksum=file:https://storage.googleapis.com/kubernetes-release/release/v1.18.0/bin/linux/amd64/kubeadm.sha256
	I0408 09:29:12.452114    6997 binary.go:57] Not caching binary, using https://storage.googleapis.com/kubernetes-release/release/v1.18.0/bin/linux/amd64/kubelet?checksum=file:https://storage.googleapis.com/kubernetes-release/release/v1.18.0/bin/linux/amd64/kubelet.sha256
	I0408 09:29:13.724256    6997 kic_runner.go:91] Run: sudo mkdir -p /var/tmp/minikube /etc/systemd/system/kubelet.service.d /lib/systemd/system /etc/cni/net.d
	I0408 09:29:15.908380    6997 kic_runner.go:91] Run: /bin/bash -c "pgrep kubelet && diff -u /lib/systemd/system/kubelet.service /lib/systemd/system/kubelet.service.new && diff -u /etc/systemd/system/kubelet.service.d/10-kubeadm.conf /etc/systemd/system/kubelet.service.d/10-kubeadm.conf.new"
	I0408 09:29:16.058527    6997 kic_runner.go:91] Run: /bin/bash -c "sudo cp /lib/systemd/system/kubelet.service.new /lib/systemd/system/kubelet.service && sudo cp /etc/systemd/system/kubelet.service.d/10-kubeadm.conf.new /etc/systemd/system/kubelet.service.d/10-kubeadm.conf && sudo systemctl daemon-reload && sudo systemctl restart kubelet"
	I0408 09:29:16.368955    6997 kubeadm.go:279] StartCluster: {Name:offline-crio-20200408T092706.125460795-5291 KeepContext:false EmbedCerts:false MinikubeISO: Memory:2000 CPUs:2 DiskSize:20000 Driver:docker HyperkitVpnKitSock: HyperkitVSockPorts:[] DockerEnv:[] InsecureRegistry:[] RegistryMirror:[] HostOnlyCIDR:192.168.99.1/24 HypervVirtualSwitch: HypervUseExternalSwitch:false HypervExternalAdapter: KVMNetwork:default KVMQemuURI:qemu:///system KVMGPU:false KVMHidden:false DockerOpt:[] DisableDriverMounts:false NFSShare:[] NFSSharesRoot:/nfsshares UUID: NoVTXCheck:false DNSProxy:false HostDNSResolver:true HostOnlyNicType:virtio NatNicType:virtio KubernetesConfig:{KubernetesVersion:v1.18.0 ClusterName:offline-crio-20200408T092706.125460795-5291 APIServerName:minikubeCA APIServerNames:[] APIServerIPs:[] DNSDomain:cluster.local ContainerRuntime:crio CRISocket: NetworkPlugin:cni FeatureGates: ServiceCIDR:10.96.0.0/12 ImageRepository: ExtraOptions:[{Component:kubeadm Key:pod-network-cidr Value:10.24
4.0.0/16}] ShouldLoadCachedImages:true EnableDefaultCNI:true NodeIP: NodePort:8443 NodeName:} Nodes:[{Name: IP:172.17.0.4 Port:8443 KubernetesVersion:v1.18.0 ControlPlane:true Worker:true}] Addons:map[] VerifyComponents:map[apiserver:true apps_running:true default_sa:true system_pods:true]}
	I0408 09:29:16.369051    6997 cri.go:41] listing CRI containers in root : {State:paused Name: Namespaces:[kube-system]}
	I0408 09:29:16.369168    6997 kic_runner.go:91] Run: sudo -s eval "crictl ps -a --quiet --label io.kubernetes.pod.namespace=kube-system"
	I0408 09:29:16.660419    6997 cri.go:76] found id: ""
	I0408 09:29:16.660532    6997 kic_runner.go:91] Run: sudo ls /var/lib/kubelet/kubeadm-flags.env /var/lib/kubelet/config.yaml /var/lib/minikube/etcd
	I0408 09:29:16.893801    6997 kic_runner.go:91] Run: sudo cp /var/tmp/minikube/kubeadm.yaml.new /var/tmp/minikube/kubeadm.yaml
	I0408 09:29:17.057261    6997 kubeadm.go:214] ignoring SystemVerification for kubeadm because of either driver or kubernetes version
	I0408 09:29:17.057368    6997 kic_runner.go:91] Run: sudo /bin/bash -c "grep https://172.17.0.4:8443 /etc/kubernetes/admin.conf || sudo rm -f /etc/kubernetes/admin.conf"
	I0408 09:29:17.309913    6997 kic_runner.go:91] Run: sudo /bin/bash -c "grep https://172.17.0.4:8443 /etc/kubernetes/kubelet.conf || sudo rm -f /etc/kubernetes/kubelet.conf"
	I0408 09:29:17.485165    6997 kic_runner.go:91] Run: sudo /bin/bash -c "grep https://172.17.0.4:8443 /etc/kubernetes/controller-manager.conf || sudo rm -f /etc/kubernetes/controller-manager.conf"
	I0408 09:29:17.659312    6997 kic_runner.go:91] Run: sudo /bin/bash -c "grep https://172.17.0.4:8443 /etc/kubernetes/scheduler.conf || sudo rm -f /etc/kubernetes/scheduler.conf"
	I0408 09:29:17.902013    6997 kic_runner.go:91] Run: /bin/bash -c "sudo env PATH=/var/lib/minikube/binaries/v1.18.0:$PATH kubeadm init --config /var/tmp/minikube/kubeadm.yaml  --ignore-preflight-errors=DirAvailable--etc-kubernetes-manifests,DirAvailable--var-lib-minikube,DirAvailable--var-lib-minikube-etcd,FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml,FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml,FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml,FileAvailable--etc-kubernetes-manifests-etcd.yaml,Port-10250,Swap,SystemVerification,SystemVerification,FileContent--proc-sys-net-bridge-bridge-nf-call-iptables"
	I0408 09:29:33.100290    6997 kic_runner.go:118] Done: [docker exec --privileged offline-crio-20200408T092706.125460795-5291 /bin/bash -c sudo env PATH=/var/lib/minikube/binaries/v1.18.0:$PATH kubeadm init --config /var/tmp/minikube/kubeadm.yaml  --ignore-preflight-errors=DirAvailable--etc-kubernetes-manifests,DirAvailable--var-lib-minikube,DirAvailable--var-lib-minikube-etcd,FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml,FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml,FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml,FileAvailable--etc-kubernetes-manifests-etcd.yaml,Port-10250,Swap,SystemVerification,SystemVerification,FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: (15.198232934s)
	I0408 09:29:33.100500    6997 kic_runner.go:91] Run: sudo /var/lib/minikube/binaries/v1.18.0/kubectl create --kubeconfig=/var/lib/minikube/kubeconfig -f -
	I0408 09:29:33.913721    6997 kic_runner.go:91] Run: sudo /var/lib/minikube/binaries/v1.18.0/kubectl label nodes minikube.k8s.io/version=v1.9.2 minikube.k8s.io/commit=7765ba936d5dfc4b3e341c46987ed00cf1cdc75c minikube.k8s.io/name=offline-crio-20200408T092706.125460795-5291 minikube.k8s.io/updated_at=2020_04_08T09_29_33_0700 --all --overwrite --kubeconfig=/var/lib/minikube/kubeconfig
	I0408 09:29:34.249566    6997 kic_runner.go:91] Run: /bin/bash -c "cat /proc/$(pgrep kube-apiserver)/oom_adj"
	I0408 09:29:34.429637    6997 ops.go:35] apiserver oom_adj: -16
	I0408 09:29:34.429787    6997 kic_runner.go:91] Run: sudo /var/lib/minikube/binaries/v1.18.0/kubectl create clusterrolebinding minikube-rbac --clusterrole=cluster-admin --serviceaccount=kube-system:default --kubeconfig=/var/lib/minikube/kubeconfig
	I0408 09:29:34.706776    6997 kubeadm.go:806] duration metric: took 277.085156ms to wait for elevateKubeSystemPrivileges.
	I0408 09:29:34.706824    6997 kubeadm.go:281] StartCluster complete in 18.337878344s
	I0408 09:29:34.706858    6997 settings.go:123] acquiring lock: {Name:mkf6e2799fb0be0a5d6f568756066b61ece62ea9 Clock:{} Delay:500ms Timeout:1m0s Cancel:<nil>}

@medyagh
Copy link
Member

medyagh commented Apr 9, 2020

so one step closer !

make: `out/minikube' is up to date.
✋  Stopping "minikube" in docker ...
Inside Stop
list containers before kubelet stop
------------------------
unable list containers : crictl list: sudo -s eval "crictl ps -a --quiet --label io.kubernetes.pod.namespace=kube-system; crictl ps -a --quiet --label io.kubernetes.pod.namespace=kubernetes-dashboard; crictl ps -a --quiet --label io.kubernetes.pod.namespace=storage-gluster; crictl ps -a --quiet --label io.kubernetes.pod.namespace=istio-operator": exit status 1
stdout:

@medyagh
Copy link
Member

medyagh commented Apr 9, 2020

ok I found the total root cause !

u can not 'crictl rm' a container before stopping it , sent a PR !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/flake Categorizes issue or PR as related to a flaky test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
2 participants