Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use runtime.slice and system.slice cgroup settings in k8s variants #1681

Closed
webern opened this issue Jul 30, 2021 Discussed in #1679 · 7 comments · Fixed by #1684
Closed

Use runtime.slice and system.slice cgroup settings in k8s variants #1681

webern opened this issue Jul 30, 2021 Discussed in #1679 · 7 comments · Fixed by #1684
Labels
area/core Issues core to the OS (variant independent) area/kubernetes K8s including EKS, EKS-A, and including VMW
Milestone

Comments

@webern
Copy link
Contributor

webern commented Jul 30, 2021

Per the discussion, investigate and possibly implement the recommended cgroup settings "runtime.slice" and "system.slice".

Discussed in #1679

Originally posted by cyrus-mc July 29, 2021
https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/node-allocatable.md#recommended-cgroups-setup

As highlighted above there should be a system.slice/cgroup and a podruntime.slice/cgroup that correspond to systemReserved and kubeReserved settings. Looking at the setup of Bottlerocket is appears there is just system.slice under which kubelet and runtime plus all system components run.

Was there a reason for this design decision? As it doesn't map nicely to the configuration settings k8s provides.

Originally posted by webern July 29, 2021

Thanks for bringing this to our attention, and for the pointer to the proposal. Your link also led us to find this https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/#kube-reserved which suggests "runtime.slice" and "system.slice" and points to the design proposal. We'll review our current slice setup and make sure it's aligned with kubelet functionality, unless it's something you'd like to contribute.

@webern webern added area/kubernetes K8s including EKS, EKS-A, and including VMW status/needs-triage Pending triage or re-evaluation area/core Issues core to the OS (variant independent) labels Jul 30, 2021
@cyrus-mc
Copy link

@webern I am actually taking a stab at implementing this. Not sure if AWS is open to public contributions but I can open a PR once complete.

@webern
Copy link
Contributor Author

webern commented Jul 30, 2021

@cyrus-mc that's great! We are open to and enthused about public contributions! Here is our contributing guide for reference. Let us know if you need any help or input. If you have design choices to make we can discuss that here as well. 🚀

@cyrus-mc
Copy link

@webern Great. I am doing some experimental builds just to get the build process down. The changes are actually quite trivial as I build a Fedora CoreOS AMI that we currently use as our worker node in EKS and setup the cgroups as outlined in the link above.

Running into an early issue on creating my own build. Followed instructions and created an AMI. When I boot that AMI I get Kernel is locked down from securityfs: see man kernel_lockdown.7

And the image never fully boots.

@webern
Copy link
Contributor Author

webern commented Jul 31, 2021

I just tried booting an instance with this in my userdata

[settings.kernel]
lockdown = "confidentiality"

And I saw Kernel is locked down from securityfs; see man kernel_lockdown.7 but the instance fully booted and is fine. So I don't think that log line represents the cause of error. Often when an instance fails to boot it is because the userdata is incorrect in some way.

Do you see this in your logs Started Bottlerocket API server? If so then after that various Bottlerocket services such as early-boot-config, sundog, thar-be-settings start reading and using userdata. Did your boot make it this far, and do you see errors from any of these services?

@webern
Copy link
Contributor Author

webern commented Jul 31, 2021

P.S. You are trying to boot a build that has no changes, i.e. from develop right? Or does the build that fails to boot have your cgroup changes in it?

@cyrus-mc
Copy link

cyrus-mc commented Aug 2, 2021

I just rebuilt and it worked. I must have done something stupid.

Anyway I pretty much have the PR for this issue done and will open tomorrow after a few additional changes.

@jhaynes jhaynes added priority/p1 status/in-progress This issue is currently being worked on labels Aug 3, 2021
@jhaynes jhaynes added this to the next milestone Aug 3, 2021
@jhaynes jhaynes removed their assignment Aug 3, 2021
@arnaldo2792
Copy link
Contributor

@cyrus-mc you did not do something wrong 👍 , we merged #1668, which introduced some runtime problems if you were running a pod that loads a kernel module, most likely kube-proxy,that's why we reverted the kernel modules compression in #1689.

@bcressey bcressey removed status/needs-triage Pending triage or re-evaluation status/in-progress This issue is currently being worked on labels Nov 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core Issues core to the OS (variant independent) area/kubernetes K8s including EKS, EKS-A, and including VMW
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants