Use runtime.slice and system.slice cgroup settings in k8s variants #1681

webern · 2021-07-30T17:46:14Z

Per the discussion, investigate and possibly implement the recommended cgroup settings "runtime.slice" and "system.slice".

Discussed in #1679

^{Originally posted by cyrus-mc July 29, 2021}
https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/node-allocatable.md#recommended-cgroups-setup

As highlighted above there should be a system.slice/cgroup and a podruntime.slice/cgroup that correspond to systemReserved and kubeReserved settings. Looking at the setup of Bottlerocket is appears there is just system.slice under which kubelet and runtime plus all system components run.

Was there a reason for this design decision? As it doesn't map nicely to the configuration settings k8s provides.

^{Originally posted by webern July 29, 2021}

Thanks for bringing this to our attention, and for the pointer to the proposal. Your link also led us to find this https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/#kube-reserved which suggests "runtime.slice" and "system.slice" and points to the design proposal. We'll review our current slice setup and make sure it's aligned with kubelet functionality, unless it's something you'd like to contribute.

cyrus-mc · 2021-07-30T18:30:08Z

@webern I am actually taking a stab at implementing this. Not sure if AWS is open to public contributions but I can open a PR once complete.

webern · 2021-07-30T19:12:15Z

@cyrus-mc that's great! We are open to and enthused about public contributions! Here is our contributing guide for reference. Let us know if you need any help or input. If you have design choices to make we can discuss that here as well. 🚀

cyrus-mc · 2021-07-31T01:13:47Z

@webern Great. I am doing some experimental builds just to get the build process down. The changes are actually quite trivial as I build a Fedora CoreOS AMI that we currently use as our worker node in EKS and setup the cgroups as outlined in the link above.

Running into an early issue on creating my own build. Followed instructions and created an AMI. When I boot that AMI I get Kernel is locked down from securityfs: see man kernel_lockdown.7

And the image never fully boots.

webern · 2021-07-31T19:17:53Z

I just tried booting an instance with this in my userdata

[settings.kernel]
lockdown = "confidentiality"

And I saw Kernel is locked down from securityfs; see man kernel_lockdown.7 but the instance fully booted and is fine. So I don't think that log line represents the cause of error. Often when an instance fails to boot it is because the userdata is incorrect in some way.

Do you see this in your logs Started Bottlerocket API server? If so then after that various Bottlerocket services such as early-boot-config, sundog, thar-be-settings start reading and using userdata. Did your boot make it this far, and do you see errors from any of these services?

webern · 2021-07-31T19:20:10Z

P.S. You are trying to boot a build that has no changes, i.e. from develop right? Or does the build that fails to boot have your cgroup changes in it?

cyrus-mc · 2021-08-02T03:39:20Z

I just rebuilt and it worked. I must have done something stupid.

Anyway I pretty much have the PR for this issue done and will open tomorrow after a few additional changes.

arnaldo2792 · 2021-08-04T21:16:41Z

@cyrus-mc you did not do something wrong 👍 , we merged #1668, which introduced some runtime problems if you were running a pod that loads a kernel module, most likely kube-proxy,that's why we reverted the kernel modules compression in #1689.

webern added area/kubernetes K8s including EKS, EKS-A, and including VMW status/needs-triage Pending triage or re-evaluation area/core Issues core to the OS (variant independent) labels Jul 30, 2021

webern assigned jhaynes Jul 30, 2021

cyrus-mc mentioned this issue Aug 2, 2021

Use runtime.slice and system.slice cgroup settings in k8s variants #1684

Merged

jhaynes added priority/p1 status/in-progress This issue is currently being worked on labels Aug 3, 2021

jhaynes added this to the next milestone Aug 3, 2021

jhaynes removed their assignment Aug 3, 2021

jhaynes linked a pull request Aug 5, 2021 that will close this issue

Use runtime.slice and system.slice cgroup settings in k8s variants #1684

Merged

samuelkarp closed this as completed in #1684 Aug 9, 2021

bcressey removed status/needs-triage Pending triage or re-evaluation status/in-progress This issue is currently being worked on labels Nov 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use runtime.slice and system.slice cgroup settings in k8s variants #1681

Use runtime.slice and system.slice cgroup settings in k8s variants #1681

webern commented Jul 30, 2021

cyrus-mc commented Jul 30, 2021

webern commented Jul 30, 2021

cyrus-mc commented Jul 31, 2021

webern commented Jul 31, 2021 •

edited

Loading

webern commented Jul 31, 2021

cyrus-mc commented Aug 2, 2021

arnaldo2792 commented Aug 4, 2021

Use runtime.slice and system.slice cgroup settings in k8s variants #1681

Use runtime.slice and system.slice cgroup settings in k8s variants #1681

Comments

webern commented Jul 30, 2021

Discussed in #1679

cyrus-mc commented Jul 30, 2021

webern commented Jul 30, 2021

cyrus-mc commented Jul 31, 2021

webern commented Jul 31, 2021 • edited Loading

webern commented Jul 31, 2021

cyrus-mc commented Aug 2, 2021

arnaldo2792 commented Aug 4, 2021

webern commented Jul 31, 2021 •

edited

Loading