Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestStartStop/group/containerd: busybox does not appear after restart #7704

Closed
medyagh opened this issue Apr 16, 2020 · 6 comments · Fixed by #7705 or #8022
Closed

TestStartStop/group/containerd: busybox does not appear after restart #7704

medyagh opened this issue Apr 16, 2020 · 6 comments · Fixed by #7705 or #8022
Labels
co/docker-driver Issues related to kubernetes in container kind/flake Categorizes issue or PR as related to a flaky test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@medyagh
Copy link
Member

medyagh commented Apr 16, 2020

I have a feeling that this is happening because we apply the kic overlay before default service account is up

https://storage.googleapis.com/minikube-builds/logs/7611/d3db795/Docker_Linux.html#failsection

-- stdout --
	  - Apr 16 03:02:00 containerd-20200415T195638-32245 kubelet[1799]: E0416 03:02:00.567033    1799 reflector.go:178] object-"kube-system"/"kindnet-token-gdctk": Failed to list *v1.Secret: secrets "kindnet-token-gdctk" is forbidden: User "system:node:containerd-20200415t195638-32245" cannot list resource "secrets" in API group "" in the namespace "kube-system": no relationship found between node "containerd-20200415t195638-32245" and this object
	  - Apr 16 03:02:00 containerd-20200415T195638-32245 kubelet[1799]: E0416 03:02:00.568857    1799 reflector.go:178] object-"kube-system"/"kindnet-token-gdctk": Failed to list *v1.Secret: secrets "kindnet-token-gdctk" is forbidden: User "system:node:containerd-20200415t195638-32245" cannot list resource "secrets" in API group "" in the namespace "kube-system": no relationship found between node "containerd-20200415t195638-32245" and this object
	  - Apr 16 03:02:00 containerd-20200415T195638-32245 kubelet[1799]: E0416 03:02:00.575966    1799 reflector.go:178] object-"kube-system"/"kube-proxy": Failed to list *v1.ConfigMap: configmaps "kube-proxy" is forbidden: User "system:node:containerd-20200415t195638-32245" cannot list resource "configmaps" in API group "" in the namespace "kube-system": no relationship found between node "containerd-20200415t195638-32245" and this object
	  - Apr 16 03:02:00 containerd-20200415T195638-32245 kubelet[1799]: E0416 03:02:00.592505    1799 reflector.go:178] object-"kube-system"/"kube-proxy-token-sg9t9": Failed to list *v1.Secret: secrets "kube-proxy-token-sg9t9" is forbidden: User "system:node:containerd-20200415t195638-32245" cannot list resource "secrets" in API group "" in the namespace "kube-system": no relationship found between node "containerd-20200415t195638-32245" and this object
-- /stdout --

To run locally:

make integration -e TEST_ARGS="-test.run TestStartStop/group/containerd --profile=minikube --cleanup=false --minikube-start-args="--driver=docker""
@medyagh medyagh added the co/docker-driver Issues related to kubernetes in container label Apr 16, 2020
@medyagh medyagh added this to the v1.10.0 milestone Apr 16, 2020
@medyagh medyagh added the kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. label Apr 16, 2020
@medyagh medyagh changed the title TestStartStop Containerd: "cannot list resource "secrets" in API group "" in the namespace "kube-system": no relationship found between node "containerd-20200415t195638-32245" and this object" TestStartStop Containerd: "cannot list resource "secrets" in API group "" in the namespace "kube-system": no relationship found between node "containerd...." and this object" Apr 16, 2020
@tstromberg
Copy link
Contributor

tstromberg commented Apr 16, 2020

Note that the actual test failure is:

start_stop_delete_test.go:139: failed waiting for pod 'busybox' post-stop-start: integration-test=busybox within 7m0s: timed out waiting for the condition

If you look at the kubectl output, you'll see that busybox does not even appear in the list, which says to me that we may have lost our etcd state:

	NAMESPACE              NAME                                                       READY   STATUS    RESTARTS   AGE     LABELS
	kube-system            coredns-66bff467f8-fw5kp                                   1/1     Running   0          6m51s   k8s-app=kube-dns,pod-template-hash=66bff467f8
	kube-system            coredns-66bff467f8-wptx8                                   1/1     Running   0          6m51s   k8s-app=kube-dns,pod-template-hash=66bff467f8
	kube-system            etcd-containerd-20200415t195638-32245                      1/1     Running   0          6m55s   component=etcd,tier=control-plane
	kube-system            kindnet-bd7mz                                              1/1     Running   0          6m51s   app=kindnet,controller-revision-hash=7968cb6854,k8s-app=kindnet,pod-template-generation=1,tier=node
	kube-system            kube-apiserver-containerd-20200415t195638-32245            1/1     Running   0          6m55s   component=kube-apiserver,tier=control-plane
	kube-system            kube-controller-manager-containerd-20200415t195638-32245   1/1     Running   2          6m55s   component=kube-controller-manager,tier=control-plane
	kube-system            kube-proxy-vltmf                                           1/1     Running   0          6m51s   controller-revision-hash=c8bb659c5,k8s-app=kube-proxy,pod-template-generation=1
	kube-system            kube-scheduler-containerd-20200415t195638-32245            1/1     Running   2          6m55s   component=kube-scheduler,tier=control-plane
	kube-system            storage-provisioner                                        1/1     Running   0          7m5s    addonmanager.kubernetes.io/mode=Reconcile,integration-test=storage-provisioner
	kubernetes-dashboard   dashboard-metrics-scraper-84bfdf55ff-jpdqc                 1/1     Running   0          6m51s   k8s-app=dashboard-metrics-scraper,pod-template-hash=84bfdf55ff
	kubernetes-dashboard   kubernetes-dashboard-bc446cc64-6fjf2                       1/1     Running   1          6m51s   k8s-app=kubernetes-dashboard,pod-template-hash=bc446cc64

What you've cited is our attempt to get a list of possible problems from the logs.

@tstromberg tstromberg changed the title TestStartStop Containerd: "cannot list resource "secrets" in API group "" in the namespace "kube-system": no relationship found between node "containerd...." and this object" TestStartStop/group/containerd: busybox does not appear after restart Apr 16, 2020
@tstromberg
Copy link
Contributor

If you look at the instances of this failure, they climbed up significantly starting April 10th:

find . -type f -mtime -90 -name "*Linux.txt" | xargs egrep "/containerd.*busybox.*failed to start" | cut -d: -f1 | xargs ls -lad

-rw-r--r-- 1 jenkins jenkins  154621 Apr  4 23:50 ./builds/1387/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  168688 Apr  5 00:08 ./builds/1388/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  384105 Apr  7 11:22 ./builds/1440/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  741561 Apr 10 02:24 ./builds/1542/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  461460 Apr 10 03:04 ./builds/1543/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  538427 Apr 10 05:28 ./builds/1547/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  669098 Apr 10 11:30 ./builds/1551/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  605166 Apr 10 13:28 ./builds/1554/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  572495 Apr 10 13:41 ./builds/1555/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  523620 Apr 10 13:58 ./builds/1556/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  679678 Apr 10 15:50 ./builds/1562/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  646941 Apr 10 17:53 ./builds/1567/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  808888 Apr 11 01:26 ./builds/1571/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  726528 Apr 11 03:24 ./builds/1573/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  953732 Apr 12 13:29 ./builds/1597/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  811428 Apr 12 17:28 ./builds/1599/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  679061 Apr 13 11:24 ./builds/1610/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  803717 Apr 13 13:35 ./builds/1613/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  997585 Apr 13 17:27 ./builds/1615/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  867428 Apr 14 00:33 ./builds/1623/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  758268 Apr 14 01:09 ./builds/1624/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  703041 Apr 14 01:25 ./builds/1625/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  732245 Apr 14 07:29 ./builds/1628/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins 1022205 Apr 14 16:37 ./builds/1634/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  860233 Apr 14 17:20 ./builds/1635/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  746067 Apr 14 19:24 ./builds/1637/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  776391 Apr 14 23:20 ./builds/1641/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  876031 Apr 15 10:53 ./builds/1649/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  736166 Apr 15 13:46 ./builds/1652/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  809852 Apr 15 15:26 ./builds/1653/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  786479 Apr 15 19:21 ./builds/1657/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  798772 Apr 15 20:09 ./builds/1663/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  737707 Apr 15 21:21 ./builds/1664/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  681982 Apr 15 21:26 ./builds/1665/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  737833 Apr 15 23:26 ./builds/1666/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  754455 Apr 16 03:24 ./builds/1671/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  727163 Apr 16 05:30 ./builds/1672/Docker_Linux.txt
-rw-r--r-- 1 jenkins jenkins  740315 Apr 16 08:10 ./builds/1676/Docker_Linux.txt

Starting with this PR: #7580 (note: this may be a red herring)

@tstromberg
Copy link
Contributor

In the last 48 hours, this particular test fails 57% of the time:

tstromberg@jenkins:/var/lib/jenkins/jobs/docker_Linux_integration$ find . -type f -mtime -2 -name "*Linux.txt" | xargs egrep -h -- "---.*: TestStartStop/group/containerd" | cut -d: -f1 | sort | uniq -c
     17         --- FAIL
     30         --- PASS

@tstromberg tstromberg added kind/flake Categorizes issue or PR as related to a flaky test. and removed kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. labels Apr 16, 2020
@medyagh medyagh reopened this Apr 16, 2020
@medyagh
Copy link
Member Author

medyagh commented Apr 16, 2020

@tstromberg I wonder what is the rate, since I merged the PR that makes cluster role binding more important

@medyagh medyagh added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Apr 16, 2020
@medyagh
Copy link
Member Author

medyagh commented Apr 16, 2020

I have a feeling the reason this error shows up more is we fixed all the errors and other problems that was happening before this and it was never coming here

@medyagh
Copy link
Member Author

medyagh commented Apr 24, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
co/docker-driver Issues related to kubernetes in container kind/flake Categorizes issue or PR as related to a flaky test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
3 participants