Cluster API periodic test jobs are getting stuck in pending #8560

sbueringer · 2023-04-24T15:23:03Z

What steps did you take and what happened?

From time to time some of our periodic test jobs are getting stuck in pending. I.e. a run of the test job is shown as Pending and no new runs are scheduled.

Example:

(URL: https://prow.k8s.io/?repo=kubernetes-sigs%2Fcluster-api&job=periodic*&state=pending)

In this exampe periodic-cluster-api-e2e-main is stuck in pending.

What did you expect to happen?

Jobs should be just scheduled continuously and not get stuck in pending and block further runs.

Cluster API version

main (probably also on other branches)

Kubernetes version

No response

Anything else you would like to add?

Notes:

This could be a Prow issue, but it might also be of something specific to our jobs

Mitigation:

One of the CAPI maintainers has to cancel the job currently stuck in pending and then restart it (I would recommend with latest configuration)
I think the CI team has to be aware of this when they check testgrid and notify a maintainer if a job is stuck.

Impact:

The stuck periodic is not run anymore and thus doesn't produce any new test data

Debug ideas:

Wait until another Job is stuck in pending
Figure out why exactly it is shown as pending (the job in the Prow UI itself looks like it's completed)
It might be necessary to look into the responsible Prow component and figure out why no new jobs are started (but might be the same as 2.)

Label(s) to be applied

/kind bug
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.
/area testing

sbueringer · 2023-04-24T15:27:43Z

cc @kubernetes-sigs/cluster-api-release-team

furkatgofurov7 · 2023-04-24T19:24:23Z

I played around a bit and found:

main (probably also on other branches)

I see it in other release branches as well (1.2, 1.3 & 1.4)

One of the CAPI maintainers has to cancel the job currently stuck in pending and then restart it (I would recommend with latest configuration)

Actually, I do not think this is necessary, as long as you authorize yourself using your GH account with prow, anyone can cancel and re-run the job (tried it myself)

It might be necessary to look into the responsible Prow component and figure out why no new jobs are started (but might be the same as 2.)

Since we do not have a direct access to prow cluster I can't think of a way we could check prow components

sbueringer · 2023-04-25T05:17:45Z

@furkatgofurov7 were you testing rerun on PR jobs or periodics? I wasn't able to rerun periodics until we added the cluster-api-maintainers group to rerun authconfig in our prowjob config a few weeks back.

Since we do not have a direct access to prow cluster I can't think of a way we could check prow components

I meant looking into the source code + the data we see under artifacts. Some components like the UI (deck) can also be run locally without access to the cluster

furkatgofurov7 · 2023-04-25T07:55:23Z

@furkatgofurov7 were you testing rerun on PR jobs or periodics? I wasn't able to rerun periodics until we added the cluster-api-maintainers group to rerun authconfig in our prowjob config a few weeks back.

(URL: https://prow.k8s.io/?repo=kubernetes-sigs%2Fcluster-api&job=periodic*&state=pending)

I believe periodic, and the one from the filter you provided, it was https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.2#capi-e2e-mink8s-release-1-2 one

I meant looking into the source code + the data we see under artifacts. Some components like the UI (deck) can also be run locally without access to the cluster

👍🏼

sbueringer · 2023-04-25T08:09:40Z

That's strange, this shouldn't be possible (xref: https://github.com/kubernetes/test-infra/blob/master/config/jobs/kubernetes-sigs/cluster-api/cluster-api-periodics-release-1-2.yaml#L213-L216).

(btw on PRs however it's different. Not sure exactly but either the PR author, org members or everyone can restart there)

Independent of all of that. We can definitely also start with the following:

wait until one of the less important jobs gets stuck (e.g. one of the v1.2 jobs)
open an issue (referencing that job) in test-infra and/or ask for help in Kubernetes Slack

fabriziopandini · 2023-05-01T15:08:40Z

/triage accepted

chrischdi · 2023-05-02T07:20:03Z

Two new ones stuck in Pending:

Both are marked as failed in Spyglass:

Both show the following at the end:

ERROR: Found unexpected running containers:
CONTAINER ID   IMAGE                  COMMAND                  CREATED             STATUS             PORTS                                  NAMES
8e7a762bdd15   kindest/node:v1.26.3   "/usr/local/bin/entr…"   About an hour ago   Up About an hour   33937/tcp, 127.0.0.1:33937->6443/tcp   k8s-upgrade-and-conformance-77ry3o-xr5sp-r6lsq
+ EXIT_VALUE=1
+ set +o xtrace
Cleaning up after docker in docker.
================================================================================
Cleaning up after docker
Error response from daemon: Could not kill running container 8e7a762bdd154b7492a8effc6fa47e838ea343642ca61cc6dc8ce03ccd794f40, cannot remove - tried to kill container, but did not receive an exit event
Stopping Docker: dockerProgram process in pidfile '/var/run/docker-ssd.pid', 1 process(es), refused to die.
================================================================================
Done cleaning up after docker in docker.

chrischdi · 2023-05-02T07:23:50Z

Note: I don't have the permissions to stop or restart them.

sbueringer · 2023-05-02T08:58:26Z

I wonder what happened when CAPD tried to remove this container (I assume it tried it)?

Do we know if we have the same error also in jobs which are then not stuck in pending?

Not sure if there is a way to force remove the container more than the job already tries.

Maybe there's something in the logs of this container which explains why it's not shutting down.

chrischdi · 2023-05-02T14:33:44Z

I was not able to figure something out on both jobs from CAPD or other logs. For both occurrences, CAPD logged the dockermachine deletion as successful.

Do we know if we have the same error also in jobs which are then not stuck in pending?

I did not find any via k8s-triage.

Not sure if there is a way to force remove the container more than the job already tries.

According this, it is already a docker rm -f

Maybe there's something in the logs of this container which explains why it's not shutting down.

We don't fetch them currently. Docker logs and containerd logs did not help in this case.
Even the container seems to go to exitted acording containerd logs.

sbueringer · 2023-05-02T16:23:00Z

Maybe there's something in the logs of this container which explains why it's not shutting down.

We don't fetch them currently. Docker logs and containerd logs did not help in this case.
Even the container seems to go to exitted acording containerd logs.

I meant the logs of the container of the wl cluster: e.g. https://gcsweb.k8s.io/gcs/kubernetes-jenkins/logs/periodic-cluster-api-e2e-main/1653406428393639936/artifacts/clusters/quick-start-s3y53t/machines/ (that's the one which doesn't shutdown, right?)

Maybe we can also extend CAPD to actually check if the container is gone (?)

chrischdi · 2023-05-02T16:51:55Z

Maybe there's something in the logs of this container which explains why it's not shutting down.

We don't fetch them currently. Docker logs and containerd logs did not help in this case.
Even the container seems to go to exitted acording containerd logs.

I meant the logs of the container of the wl cluster: e.g. https://gcsweb.k8s.io/gcs/kubernetes-jenkins/logs/periodic-cluster-api-e2e-main/1653406428393639936/artifacts/clusters/quick-start-s3y53t/machines/ (that's the one which doesn't shutdown, right?)

Nope, the logs or data of the in this case relevant node/machine are not there (should be k8s-upgrade-and-conformance-77ry3o-xr5sp-r6lsq in your linked job which does not exist here: https://gcsweb.k8s.io/gcs/kubernetes-jenkins/logs/periodic-cluster-api-e2e-mink8s-release-1-4/1653042540242276352/artifacts/clusters/k8s-upgrade-and-conformance-77ry3o/machines/ ). Maybe this is because of the moment where we actually dump the data, we rely on the existing machines and don't lookup if there are additional unexpected containers / nodes in containers to dump data from.

However, CAPD logs the container's stdout before removing it, but there is nothing suspicious in it 🤷‍♂️

Source: Search for the machine name.

Inspected the container:
{ContainerJSONBase:0xc0009ebe40 Mounts:[{Type:bind Name: Source:/var/run/docker.sock Destination:/var/run/docker.sock Driver: Mode: RW:true Propagation:rprivate} {Type:bind Name: Source:/lib/modules Destination:/lib/modules Driver: Mode:ro RW:false Propagation:rprivate} {Type:volume Name:4d5afb4528ec71ed44bbf5624e2d83f08e86e93574e108b26ae97c814dc0583d Source:/docker-graph/volumes/4d5afb4528ec71ed44bbf5624e2d83f08e86e93574e108b26ae97c814dc0583d/_data Destination:/var Driver:local Mode: RW:true Propagation:}] Config:0xc000fb8f00 NetworkSettings:0xc00012ab00}
Got logs from the container:
INFO: ensuring we can execute mount/umount even with userns-remap
INFO: remounting /sys read-only
INFO: making mounts shared
INFO: detected cgroup v1
WARN: cgroupns not enabled! Please use cgroup v2, or cgroup v1 with cgroupns enabled.
INFO: fix cgroup mounts for all subsystems
INFO: unmounting and removing /sys/fs/cgroup/misc
INFO: clearing and regenerating /etc/machine-id
Initializing machine ID from random generator.
INFO: faking /sys/class/dmi/id/product_name to be "kind"
INFO: faking /sys/class/dmi/id/product_uuid to be random
INFO: faking /sys/devices/virtual/dmi/id/product_uuid as well
INFO: setting iptables to detected mode: legacy
INFO: Detected IPv4 address: 172.18.0.20
INFO: Detected IPv6 address: fc00:f853:ccd:e793::14
systemd 249.11-0ubuntu3.7 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY -P11KIT -QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
Detected virtualization docker.
Detected architecture x86-64.

Welcome to �[1mUbuntu 22.04.2 LTS�[0mstart job for default target Graphical Interface.
[�[0;32m  OK  �[0m] Created slice �[0;1;39mslice used to run Kubernetes / Kubelet�[0m.
[�[0;32m  OK  �[0m] Created slice �[0;1;39mSlice /system/modprobe�[0m.
[�[0;32m  OK  �[0m] Started �[0;1;39mDispatch Password …ts to Console Directory Watch�[0m.
[�[0;32m  OK  �[0m] Set up automount �[0;1;39mArbitrary…s File System Automount Point�[0m.
[�[0;32m  OK  �[0m] Reached target �[0;1;39mLocal Encrypted Volumes�[0m.
[�[0;32m  OK  �[0m] Reached target �[0;1;39mPath Units�[0m.
[�[0;32m  OK  �[0m] Reached target �[0;1;39mSlice Units�[0m.
[�[0;32m  OK  �[0m] Reached target �[0;1;39mSwaps�[0m.
[�[0;32m  OK  �[0m] Reached target �[0;1;39mLocal Verity Protected Volumes�[0m.
[�[0;32m  OK  �[0m] Listening on �[0;1;39mJournal Audit Socket�[0m.
[�[0;32m  OK  �[0m] Listening on �[0;1;39mJournal Socket (/dev/log)�[0m.
[�[0;32m  OK  �[0m] Listening on �[0;1;39mJournal Socket�[0m.
[�[0;32m  OK  �[0m] Reached target �[0;1;39mSocket Units�[0m.
         Mounting �[0;1;39mHuge Pages File System�[0m...
         Mounting �[0;1;39mKernel Debug File System�[0m...
         Mounting �[0;1;39mKernel Trace File System�[0m...
         Starting �[0;1;39mJournal Service�[0m...
         Starting �[0;1;39mCreate List of Static Device Nodes�[0m...
         Starting �[0;1;39mLoad Kernel Module configfs�[0m...
         Starting �[0;1;39mLoad Kernel Module fuse�[0m...
         Starting �[0;1;39mRemount Root and Kernel File Systems�[0m...
         Starting �[0;1;39mApply Kernel Variables�[0m...
[�[0;32m  OK  �[0m] Mounted �[0;1;39mHuge Pages File System�[0m.
[�[0;32m  OK  �[0m] Mounted �[0;1;39mKernel Debug File System�[0m.
[�[0;32m  OK  �[0m] Mounted �[0;1;39mKernel Trace File System�[0m.
[�[0;32m  OK  �[0m] Finished �[0;1;39mCreate List of Static Device Nodes�[0m.
[email protected]: Deactivated successfully.
[�[0;32m  OK  �[0m] Finished �[0;1;39mLoad Kernel Module configfs�[0m.
[�[0;32m  OK  �[0m] Started �[0;1;39mJournal Service�[0m.
[�[0;32m  OK  �[0m] Finished �[0;1;39mLoad Kernel Module fuse�[0m.
[�[0;32m  OK  �[0m] Finished �[0;1;39mRemount Root and Kernel File Systems�[0m.
[�[0;32m  OK  �[0m] Finished �[0;1;39mApply Kernel Variables�[0m.
         Mounting �[0;1;39mFUSE Control File System�[0m...
         Starting �[0;1;39mFlush Journal to Persistent Storage�[0m...
         Starting �[0;1;39mCreate System Users�[0m...
         Starting �[0;1;39mRecord System Boot/Shutdown in UTMP�[0m...
[�[0;32m  OK  �[0m] Mounted �[0;1;39mFUSE Control File System�[0m.
[�[0;32m  OK  �[0m] Finished �[0;1;39mFlush Journal to Persistent Storage�[0m.
[�[0;32m  OK  �[0m] Finished �[0;1;39mCreate System Users�[0m.
[�[0;32m  OK  �[0m] Finished �[0;1;39mRecord System Boot/Shutdown in UTMP�[0m.
         Starting �[0;1;39mCreate Static Device Nodes in /dev�[0m...
[�[0;32m  OK  �[0m] Finished �[0;1;39mCreate Static Device Nodes in /dev�[0m.
[�[0;32m  OK  �[0m] Reached target �[0;1;39mPreparation for Local File Systems�[0m.
[�[0;32m  OK  �[0m] Reached target �[0;1;39mLocal File Systems�[0m.
[�[0;32m  OK  �[0m] Reached target �[0;1;39mSystem Initialization�[0m.
[�[0;32m  OK  �[0m] Started �[0;1;39mDaily Cleanup of Temporary Directories�[0m.
[�[0;32m  OK  �[0m] Reached target �[0;1;39mBasic System�[0m.
[�[0;32m  OK  �[0m] Reached target �[0;1;39mTimer Units�[0m.
         Starting �[0;1;39mcontainerd container runtime�[0m...
[�[0;32m  OK  �[0m] Started �[0;1;39mcontainerd container runtime�[0m.
[�[0;32m  OK  �[0m] Reached target �[0;1;39mMulti-User System�[0m.
[�[0;32m  OK  �[0m] Reached target �[0;1;39mGraphical Interface�[0m.
         Starting �[0;1;39mRecord Runlevel Change in UTMP�[0m...
[�[0;32m  OK  �[0m] Finished �[0;1;39mRecord Runlevel Change in UTMP�[0m.

Maybe we can also extend CAPD to actually check if the container is gone (?)

Yeah, that'd maybe a good idea to get more near to the issue.

sbueringer · 2023-05-02T16:58:48Z

Detecting that case in CAPD would make it possible to dump more data (like all the files we have in artifacts usually)

chrischdi · 2023-05-15T07:01:49Z

Another pending job at https://prow.k8s.io/?repo=kubernetes-sigs%2Fcluster-api&state=pending

https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/periodic-cluster-api-e2e-mink8s-release-1-4/1657414027363487744

chrischdi · 2023-05-15T07:27:40Z

One idea to proceed: Check if this is caused by a zombie process.

add something like ps -ef | grep defunct to https://github.com/kubernetes-sigs/cluster-api/blob/b474354cf31553242df9f65f4b95ae26319f8843/scripts/ci-e2e.sh and wait for the next occurrence.

chrischdi · 2023-05-16T07:13:08Z

Two more pending. There is no ps -ef yet available on this ones, hopefully we gather more info next time!.

https://prow.k8s.io/?repo=kubernetes-sigs%2Fcluster-api&state=pending

sbueringer · 2023-05-16T07:26:23Z

Restarted + cherry-picked your PR

chrischdi · 2023-05-17T07:05:35Z

Another new prowjob stuck in Pending, but with the latest change we get some more insights:

It looks like that a container got stuck in runc init and/or a docker-proxy command still running, which is also related to the same "stuck" container:

UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 May16 ?        00:00:00 /tools/entrypoint
...
root          23       1 22 May16 ?        00:23:27 /usr/bin/dockerd -p /var/run/docker.pid --data-root=/docker-graph --registry-mirror=https://mirror.gcr.io
...
root      257499      23  0 May16 ?        00:00:00 /usr/bin/docker-proxy -proto tcp -host-ip 127.0.0.1 -host-port 32809 -container-ip 172.18.0.10 -container-port 6443
root      257514       1  0 May16 ?        00:00:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id f2b3aaa5affd38f7c77f448adb728d36aa8c7a7800388dd38b987e94917930ea -address /var/run/docker/containerd/containerd.sock
root      257539  257514  0 May16 ?        00:00:00 [systemd]
...
root      275715  257539  0 May16 ?        00:00:00 runc init
...

source

Docker version information:

Client: Docker Engine - Community
 Version:           23.0.1
 API version:       1.42
 Go version:        go1.19.5
 Git commit:        a5ee5b1
 Built:             Thu Feb  9 19:46:54 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          23.0.1
  API version:      1.42 (minimum version 1.12)
  Go version:       go1.19.5
  Git commit:       bc3805a
  Built:            Thu Feb  9 19:46:54 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.18
  GitCommit:        2456e983eb9e37e47538f59ea18f2043c9a73640
 runc:
  Version:          1.1.4
  GitCommit:        v1.1.4-0-g5fd4c4d
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

source

sbueringer · 2023-05-17T09:01:37Z

I'm not sure what we already have. Do you think it makes sense to run docker inspect + logs on all leftover containers at this point?

I wonder if that container is even in "shutting down"

chrischdi · 2023-05-24T07:11:43Z

Two more pending friends:

I'm not sure what we already have. Do you think it makes sense to run docker inspect + logs on all leftover containers at this point?

I wonder if that container is even in "shutting down"

Good question. Would be also good to maybe take a look at the pid's of the leftover ones. E.g. via cat /proc/<pid>/status (xref) or check the freezer cgroup state opencontainers/runc#3759 (comment)

Edit: also maybe: cat /proc/<pod>>/stack

sbueringer · 2023-05-24T07:25:02Z

Sounds good

chrischdi · 2023-06-01T08:39:09Z

cc @sbueringer : a new stuck in pending job, ready to get deleted :-)

Some analysis from the newest pending but finished job: https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/periodic-cluster-api-e2e-dualstack-ipv6-main/1663826741237387264 :

Continuing on the new data and the stuck runc init from ps -ef:

root      113973   97202  0 09:01 ?        00:00:00 runc init

We got the following outputs for /proc/113973/status (from here):

Name:	runc:[2:INIT]
Umask:	0022
State:	D (disk sleep)
Tgid:	113973
Ngid:	0
Pid:	113973
PPid:	97202
TracerPid:	0
Uid:	0	0	0	0
Gid:	0	0	0	0
FDSize:	64
Groups:	0 
NStgid:	113973	1674	1
NSpid:	113973	1674	1
NSpgid:	113973	1674	1
NSsid:	113973	1674	1
VmPeak:	 1015784 kB
VmSize:	 1015784 kB
VmLck:	       0 kB
VmPin:	       0 kB
VmHWM:	    9992 kB
VmRSS:	    9992 kB
RssAnon:	    3044 kB
RssFile:	    6948 kB
RssShmem:	       0 kB
VmData:	   77820 kB
VmStk:	     132 kB
VmExe:	    4024 kB
VmLib:	    1700 kB
VmPTE:	     140 kB
VmSwap:	       0 kB
HugetlbPages:	       0 kB
CoreDumping:	0
THP_enabled:	1
Threads:	5
SigQ:	2/257203
SigPnd:	0000000000000100
ShdPnd:	0000000000000100
SigBlk:	0000000000000000
SigIgn:	0000000000000000
SigCgt:	fffffffdffc1feff
CapInh:	0000000000000000
CapPrm:	000001ffffffffff
CapEff:	000001ffffffffff
CapBnd:	000001ffffffffff
CapAmb:	0000000000000000
NoNewPrivs:	0
Seccomp:	0
Seccomp_filters:	0
Speculation_Store_Bypass:	thread vulnerable
SpeculationIndirectBranch:	conditional enabled
Cpus_allowed:	ff
Cpus_allowed_list:	0-7
Mems_allowed:	00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list:	0
voluntary_ctxt_switches:	3
nonvoluntary_ctxt_switches:	215

and for /proc/113973/stack

[<0>] __refrigerator+0x4c/0x130
[<0>] get_signal+0x7a8/0x900
[<0>] arch_do_signal_or_restart+0xde/0x100
[<0>] exit_to_user_mode_loop+0xc4/0x160
[<0>] exit_to_user_mode_prepare+0xa0/0xb0
[<0>] syscall_exit_to_user_mode+0x27/0x50
[<0>] do_syscall_64+0x69/0xc0
[<0>] entry_SYSCALL_64_after_hwframe+0x61/0xcb

So we can see the reason for the process to not finish is, that it is in an uninterruptible State: D (disk sleep).

From issues researched at runc (e.g. opencontainers/runc#3663, opencontainers/runc#2753) there had been similar issues already in the past. And the source for the issue could also be in the kernel.

We could try to get more output via:

     # echo `w` to `/proc/sysrq-trigger` to dump tasks that are in uninterruptable (blocked)
     # stateinterruptible to dmesg and write dmesg to a file for debugging purposes.
     echo w > /proc/sysrq-trigger
     dmesg > ${ARTIFACTS_LOCAL}/dmesg.txt

But I don't know if that information would help us further, this is just some random google find which could help digging into it. (I would first have to learn then what that output means).

sbueringer · 2023-06-01T12:32:19Z

Maybe a good next step is to first move some jobs to the community cluster and then maybe we can do some debugging with @ameukam.

(when Christian is back from PTO)

P.S. restarted the pending job

chrischdi · 2023-06-27T07:01:38Z

Another job in pending, but failed since two days:

https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/periodic-cluster-api-e2e-mink8s-release-1-4/1673084379238240256

sbueringer · 2023-06-27T08:23:22Z

Another job in pending, but failed since two days:

https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/periodic-cluster-api-e2e-mink8s-release-1-4/1673084379238240256

Thx. Restarted so we have test coverage for the release today

chrischdi · 2023-07-10T07:05:32Z

@sbueringer : another one pending since two days:
https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/periodic-cluster-api-e2e-mink8s-main/1677639594595586048

chrischdi · 2023-07-17T07:33:01Z

Note: @sbueringer another three ones pending today.

fabriziopandini · 2024-04-11T19:20:11Z

/priority important-longterm

sbueringer · 2024-04-12T05:01:30Z

Haven't seen this in a while. @adilGhaffarDev Are you aware of any recent occurences?

sbueringer · 2024-04-12T05:01:48Z

cc @killianmuldoon @chrischdi (in case you unblocked any recently)

adilGhaffarDev · 2024-04-15T11:39:43Z

Haven't seen this in a while. @adilGhaffarDev Are you aware of any recent occurences?

Same here, I haven't seen any for a while.

sbueringer · 2024-04-15T12:03:16Z

Let's close for now. Let's reopen if it occurs again

/close

k8s-ci-robot · 2024-04-15T12:03:21Z

@sbueringer: Closing this issue.

In response to this:

Let's close for now. Let's reopen if it occurs again

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. area/testing Issues or PRs related to testing needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 24, 2023

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 1, 2023

chrischdi mentioned this issue May 15, 2023

🌱 e2e: log leftover processes to eventually detect zombies #8662

Merged

chrischdi mentioned this issue May 17, 2023

Migrate prowjobs to new infrastructure #8689

Closed

chrischdi mentioned this issue May 24, 2023

✨ ci: collect debug information about leftover processes #8734

Merged

k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Apr 11, 2024

k8s-ci-robot closed this as completed Apr 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster API periodic test jobs are getting stuck in pending #8560

Cluster API periodic test jobs are getting stuck in pending #8560

sbueringer commented Apr 24, 2023 •

edited

Loading

sbueringer commented Apr 24, 2023

furkatgofurov7 commented Apr 24, 2023 •

edited

Loading

sbueringer commented Apr 25, 2023

furkatgofurov7 commented Apr 25, 2023

sbueringer commented Apr 25, 2023

fabriziopandini commented May 1, 2023

chrischdi commented May 2, 2023 •

edited

Loading

chrischdi commented May 2, 2023

sbueringer commented May 2, 2023 •

edited

Loading

chrischdi commented May 2, 2023

sbueringer commented May 2, 2023 •

edited

Loading

chrischdi commented May 2, 2023

sbueringer commented May 2, 2023

chrischdi commented May 15, 2023

chrischdi commented May 15, 2023

chrischdi commented May 16, 2023

sbueringer commented May 16, 2023

chrischdi commented May 17, 2023 •

edited

Loading

sbueringer commented May 17, 2023 •

edited

Loading

chrischdi commented May 24, 2023 •

edited

Loading

sbueringer commented May 24, 2023

chrischdi commented Jun 1, 2023

sbueringer commented Jun 1, 2023 •

edited

Loading

chrischdi commented Jun 27, 2023

sbueringer commented Jun 27, 2023

chrischdi commented Jul 10, 2023

chrischdi commented Jul 17, 2023

fabriziopandini commented Apr 11, 2024

sbueringer commented Apr 12, 2024

sbueringer commented Apr 12, 2024

adilGhaffarDev commented Apr 15, 2024

sbueringer commented Apr 15, 2024

k8s-ci-robot commented Apr 15, 2024

Cluster API periodic test jobs are getting stuck in pending #8560

Cluster API periodic test jobs are getting stuck in pending #8560

Comments

sbueringer commented Apr 24, 2023 • edited Loading

What steps did you take and what happened?

What did you expect to happen?

Cluster API version

Kubernetes version

Anything else you would like to add?

Label(s) to be applied

sbueringer commented Apr 24, 2023

furkatgofurov7 commented Apr 24, 2023 • edited Loading

sbueringer commented Apr 25, 2023

furkatgofurov7 commented Apr 25, 2023

sbueringer commented Apr 25, 2023

fabriziopandini commented May 1, 2023

chrischdi commented May 2, 2023 • edited Loading

chrischdi commented May 2, 2023

sbueringer commented May 2, 2023 • edited Loading

chrischdi commented May 2, 2023

sbueringer commented May 2, 2023 • edited Loading

chrischdi commented May 2, 2023

sbueringer commented May 2, 2023

chrischdi commented May 15, 2023

chrischdi commented May 15, 2023

chrischdi commented May 16, 2023

sbueringer commented May 16, 2023

chrischdi commented May 17, 2023 • edited Loading

sbueringer commented May 17, 2023 • edited Loading

chrischdi commented May 24, 2023 • edited Loading

sbueringer commented May 24, 2023

chrischdi commented Jun 1, 2023

sbueringer commented Jun 1, 2023 • edited Loading

chrischdi commented Jun 27, 2023

sbueringer commented Jun 27, 2023

chrischdi commented Jul 10, 2023

chrischdi commented Jul 17, 2023

fabriziopandini commented Apr 11, 2024

sbueringer commented Apr 12, 2024

sbueringer commented Apr 12, 2024

adilGhaffarDev commented Apr 15, 2024

sbueringer commented Apr 15, 2024

k8s-ci-robot commented Apr 15, 2024

sbueringer commented Apr 24, 2023 •

edited

Loading

furkatgofurov7 commented Apr 24, 2023 •

edited

Loading

chrischdi commented May 2, 2023 •

edited

Loading

sbueringer commented May 2, 2023 •

edited

Loading

sbueringer commented May 2, 2023 •

edited

Loading

chrischdi commented May 17, 2023 •

edited

Loading

sbueringer commented May 17, 2023 •

edited

Loading

chrischdi commented May 24, 2023 •

edited

Loading

sbueringer commented Jun 1, 2023 •

edited

Loading