Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory utilization on 1.13.4 release #3057

Closed
yeazelm opened this issue Apr 26, 2023 · 10 comments · Fixed by #3054
Closed

High memory utilization on 1.13.4 release #3057

yeazelm opened this issue Apr 26, 2023 · 10 comments · Fixed by #3054
Labels
area/core Issues core to the OS (variant independent) status/in-progress This issue is currently being worked on type/bug Something isn't working

Comments

@yeazelm
Copy link
Contributor

yeazelm commented Apr 26, 2023

Image I'm using:
ami-035c9bfb9c905837c - but I believe this is with any AWS K8S variant. I've reproduced this on x86_64.

What I expected to happen:
The cluster nodes to work without filling up on memory and errors to come in the logs about "Timed out while waiting for systemd to remove" in the kubelet logs.

What actually happened:

Apr 26 21:38:55 ip-192-168-81-168.us-west-2.compute.internal kubelet[1373]: I0426 21:38:55.258940    1373 pod_container_manager_linux.go:192] "Failed to delete cgroup paths" cgroupName=[kubepods besteffort pod482dc361-4c7d-4502-b54d-4ce53af7089e] err="unable to destroy cgroup paths for cgroup [kubepods besteffort pod482dc361-4c7d-4502-b54d-4ce53af7089e] : Timed out while waiting for systemd to remove kubepods-besteffort-pod482dc361_4c7d_4502_b54d_4ce53af7089e.slice"

How to reproduce the problem:
I launch a cluster with 2 nodes, add in a some load (in this case simulated by a webserver and some pods calling the webserver) and add in a CronJob as follows:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: hello
spec:
  schedule: "* * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox:1.28
            imagePullPolicy: IfNotPresent
            command:
            - /bin/sh
            - -c
            - date; echo Hello from the Kubernetes cluster; sleep $(( ( RANDOM % 10 )  + 1 ));
          restartPolicy: OnFailure
          hostNetwork: true
      parallelism: 60

After some time, the memory on the node fills up with kubelet taking up GB of ram and the logs contain logs of errors with failing to clean up the cgroups.

This seems to be a problem when lots of pods are cycled through. It may also have to do with the CronJob terminating the container since just simulating load and deleting the running containers manually doesn't seem to trigger it.

This does not happen on 1.13.3.

@yeazelm yeazelm added type/bug Something isn't working status/needs-triage Pending triage or re-evaluation labels Apr 26, 2023
@stmcginnis stmcginnis linked a pull request Apr 27, 2023 that will close this issue
@bcressey
Copy link
Contributor

git bisect narrowed down the problem to opencontainers/runc@e4ce94e, which was meant to fix kubernetes/kubernetes#112124, but seems to trigger exactly the problem described in kubernetes/kubernetes#112124 (comment).

That issue also mentions that the "misc" controller was introduced in kernel 5.13, which explains why this affects variants with the 5.15 kernel (k8s-1.24, k8s-1.25) but not older variants with the 5.10 kernel (k8s-1.22, k8s-1.23).

The runc change also only affects the v1 controller, so k8s-1.26 variants using cgroup v2 are not affected.

@kolyshkin
Copy link

Which kubernetes version was used?

I suspect that runc 1.1.6 binary creates misc cgroup, and then kubelet uses runc's libcontainer (of an older version) to remove it. That older libcontainer version doesn't know about misc (and systemd doesn't know about it either), so it's not removed.

Thus, bumping the runc/libcontainer dependency to 1.1.6 in k8s should fix this.

@kolyshkin
Copy link

Which kubernetes version was used?

Ah I see it was k8s-1.24 and k8s-1.25. I will open PRs to backport kubernetes/kubernetes#117242 to those releases.

@hajdukda
Copy link

What about excessive logging caused by runc 1.1.6 "Path not found" issue ? seems to also be related to this issue ?
kubernetes/kubernetes#112650

@kolyshkin
Copy link

@yeazelm Two questions

  1. Do you have systemd logs from the node? Any error logged by systemd when trying to remove a kubepods slice?
  2. Can you check that [1.25] vendor: bump runc to 1.1.6 kubernetes/kubernetes#117682 fixes the issue?

@yeazelm
Copy link
Contributor Author

yeazelm commented May 2, 2023

@kolyshkin

  1. Is this the one you are referring to? We were more focused on other log entries but I saw entries like this:
Apr 26 23:46:11 ip-192-168-95-231.us-west-2.compute.internal kubelet[1382]: I0426 23:46:11.369822    1382 kubelet_getters.go:300] "Path does not exist" path="/var/lib/kubelet/pods/75acd028-7061-4134-ba28-65d7a3f54b11/volumes"
Apr 26 23:46:11 ip-192-168-95-231.us-west-2.compute.internal kubelet[1382]: I0426 23:46:11.457770    1382 kubelet_getters.go:300] "Path does not exist" path="/var/lib/kubelet/pods/bbad7c97-abc2-44a2-b319-8b718baef57b/volumes"

as well as:

Apr 26 23:52:22 ip-192-168-77-103.us-west-2.compute.internal kubelet[1383]: W0426 23:52:22.012444    1383 watcher.go:93] Error while processing event ("/sys/fs/cgroup/perf_event/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7b5c1fd3_5c3a_4c77_8078_a26c32e184c6.slice/cri-containerd-ea7e295879cf9e17053af5f9e1c7a9dda45ea655fd40415e331cff1abd92d981.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/perf_event/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7b5c1fd3_5c3a_4c77_8078_a26c32e184c6.slice/cri-containerd-ea7e295879cf9e17053af5f9e1c7a9dda45ea655fd40415e331cff1abd92d981.scope: no such file or directory

Not sure if those are the ones you were looking for, but can grab the entries again and get more of them if you think that would help.
2. This will take a bit off work to test but can take a look later today and see if I can get a build with this fix included and see if it resolves the issue.

@kolyshkin
Copy link

Not sure if those are the ones you were looking for, but can grab the entries again and get more of them if you think that would help.

I was asking about systemd logs related to an error like this one:

Apr 26 21:38:55 ip-192-168-81-168.us-west-2.compute.internal kubelet[1373]: I0426 21:38:55.258940 1373 pod_container_manager_linux.go:192] "Failed to delete cgroup paths" cgroupName=[kubepods besteffort pod482dc361-4c7d-4502-b54d-4ce53af7089e] err="unable to destroy cgroup paths for cgroup [kubepods besteffort pod482dc361-4c7d-4502-b54d-4ce53af7089e] : Timed out while waiting for systemd to remove kubepods-besteffort-pod482dc361_4c7d_4502_b54d_4ce53af7089e.slice"

What happens here, kubelet asks systemd to remove the unit, and times out wairing for reply from systemd. The timeout is long enough -- 30 seconds. I was wondering if there are some logs emitted by systemd related to the above named slice.

@kolyshkin
Copy link

  1. This will take a bit off work to test but can take a look later today and see if I can get a build with this fix included and see if it resolves the issue.

This would be awesome, as currently this is the proposed fix, but no one had it tested yet.

@yeazelm
Copy link
Contributor Author

yeazelm commented May 2, 2023

@kolyshkin I took kubernetes/kubernetes#117682 manually and have version "Kubelet version" kubeletVersion="v1.25.8-eks-c05fe32" which is solving this problem for us. Looks like it works with runc 1.1.6 just fine!

As for the log entries when this is failing, it really just spams:

May 02 21:33:54 ip-192-168-85-106.us-west-2.compute.internal kubelet[1378]: I0502 21:33:54.838225    1378 pod_container_manager_linu
x.go:191] "Failed to delete cgroup paths" cgroupName=[kubepods besteffort pode84798a6-8308-4177-90a3-dec0725b3e49] err="unable to de
stroy cgroup paths for cgroup [kubepods besteffort pode84798a6-8308-4177-90a3-dec0725b3e49] : Timed out while waiting for systemd to
 remove kubepods-besteffort-pode84798a6_8308_4177_90a3_dec0725b3e49.slice"

coupled with:

May 02 21:33:54 ip-192-168-85-106.us-west-2.compute.internal kubelet[1378]: I0502 21:33:54.442010    1378 kubelet_getters.go:306] "P
ath does not exist" path="/var/lib/kubelet/pods/411999be-00e4-420c-a75b-8b74b56f99d4/volumes"

I don't see any other log entries (although its a bit tough since we end up getting quite a few of these so I could be missing other messages if they are infrequent).

@stmcginnis
Copy link
Contributor

Looks like this has been addressed with #3074.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core Issues core to the OS (variant independent) status/in-progress This issue is currently being worked on type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants