Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow bootstrap containers to manage swap #2537

Merged
merged 1 commit into from
Nov 3, 2022

Conversation

stefansundin
Copy link
Contributor

@stefansundin stefansundin commented Oct 31, 2022

Issue number:

Closes Provides a workaround for #1911

I don't really want to close #1911 because I think configuring swap should be possible without a bootstrap container. It should be as simple as configuring one or more [[swap]] section with a location and size. Doing that would probably require packaging mkswap and swapon though, and that is currently not something I know how to do.

Description of changes:

Allows bootstrap containers to manage swap by allowing the swapon and swapoff syscalls.

Testing done:

I built my own ami with this change and tested that a bootstrap container is able to set up swap. I created a docker image that just runs whatever you give it as user data, to make it easy to change the script.

This is my script:

#!/bin/bash -ex
if [[ ! -f /.bottlerocket/rootfs/var/swapfile ]]; then
  dd if=/dev/zero of=/.bottlerocket/rootfs/var/swapfile bs=1M count=300
  chmod 600 /.bottlerocket/rootfs/var/swapfile
  mkswap /.bottlerocket/rootfs/var/swapfile
fi
swapon /.bottlerocket/rootfs/var/swapfile

In the OS configuration I run it with:

[settings.bootstrap-containers.swap]
source = "public.ecr.aws/stefansundin/bottlerocket-bootstrap-exec-user-data:latest"
mode = "always"
essential = false
user-data = "IyEvYmluL2Jhc2ggLWV4CmlmIFtbICEgLWYgLy5ib3R0bGVyb2NrZXQvcm9vdGZzL3Zhci9zd2FwZmlsZSBdXTsgdGhlbgogIGRkIGlmPS9kZXYvemVybyBvZj0vLmJvdHRsZXJvY2tldC9yb290ZnMvdmFyL3N3YXBmaWxlIGJzPTFNIGNvdW50PTMwMAogIGNobW9kIDYwMCAvLmJvdHRsZXJvY2tldC9yb290ZnMvdmFyL3N3YXBmaWxlCiAgbWtzd2FwIC8uYm90dGxlcm9ja2V0L3Jvb3Rmcy92YXIvc3dhcGZpbGUKZmkKc3dhcG9uIC8uYm90dGxlcm9ja2V0L3Jvb3Rmcy92YXIvc3dhcGZpbGU="

On the instance I can see that it works:

bash-5.1# free -m
               total        used        free      shared  buff/cache   available
Mem:             410         234           5           0         170         165
Swap:            299          30         269

Would be great to allow for swap. It is currently the only thing blocking me from utilizing Bottlerocket to a greater extent. Thanks!

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

@etungsten etungsten self-assigned this Oct 31, 2022
@etungsten etungsten added the area/core Issues core to the OS (variant independent) label Oct 31, 2022
Copy link
Contributor

@etungsten etungsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!
I'll let others comment on if it's OK to enable these syscalls in the bootstrap container.

sources/host-ctr/cmd/host-ctr/main.go Outdated Show resolved Hide resolved
@bcressey
Copy link
Contributor

bcressey commented Nov 1, 2022

Thanks for implementing this!

@etungsten etungsten removed the request for review from zmrow November 1, 2022 16:36
@stefansundin
Copy link
Contributor Author

@bcressey Is there a reason why we need a function that returns the function used?

I tried this and it seems to work just as well:

$ git diff sources/host-ctr/cmd/host-ctr/main.go
diff --git a/sources/host-ctr/cmd/host-ctr/main.go b/sources/host-ctr/cmd/host-ctr/main.go
index 935aba07..977330d8 100644
--- a/sources/host-ctr/cmd/host-ctr/main.go
+++ b/sources/host-ctr/cmd/host-ctr/main.go
@@ -642,7 +642,7 @@ func withBootstrap() oci.SpecOpts {
                // container's capabilities.
                seccomp.WithDefaultProfile(),
                oci.WithAllDevicesAllowed,
-               withSwapManagement(),
+               withSwapManagement,
        )
 }
 
@@ -841,17 +841,15 @@ func withRootFsShared() oci.SpecOpts {
 }
 
 // withSwapManagement allows the swapon and swapoff syscalls
-func withSwapManagement() oci.SpecOpts {
-       return func(_ context.Context, _ oci.Client, _ *containers.Container, s *runtimespec.Spec) error {
-               if s.Linux != nil && s.Linux.Seccomp != nil && s.Linux.Seccomp.Syscalls != nil {
-                       s.Linux.Seccomp.Syscalls = append(s.Linux.Seccomp.Syscalls, runtimespec.LinuxSyscall{
-                               Names:  []string{"swapon", "swapoff"},
-                               Action: runtimespec.ActAllow,
-                               Args:   []runtimespec.LinuxSeccompArg{},
-                       })
-               }
-               return nil
+func withSwapManagement(_ context.Context, _ oci.Client, _ *containers.Container, s *runtimespec.Spec) error {
+       if s.Linux != nil && s.Linux.Seccomp != nil && s.Linux.Seccomp.Syscalls != nil {
+               s.Linux.Seccomp.Syscalls = append(s.Linux.Seccomp.Syscalls, runtimespec.LinuxSyscall{
+                       Names:  []string{"swapon", "swapoff"},
+                       Action: runtimespec.ActAllow,
+                       Args:   []runtimespec.LinuxSeccompArg{},
+               })
        }
+       return nil
 }

And I also think early-returning when checking for nil might also make it a little bit better?

// withSwapManagement allows the swapon and swapoff syscalls
func withSwapManagement(_ context.Context, _ oci.Client, _ *containers.Container, s *runtimespec.Spec) error {
	if s.Linux == nil || s.Linux.Seccomp == nil || s.Linux.Seccomp.Syscalls == nil {
		return nil
	}
	s.Linux.Seccomp.Syscalls = append(s.Linux.Seccomp.Syscalls, runtimespec.LinuxSyscall{
		Names:  []string{"swapon", "swapoff"},
		Action: runtimespec.ActAllow,
		Args:   []runtimespec.LinuxSeccompArg{},
	})
	return nil
}

Anyway, I won't change it unless you tell me to. :)

@arnaldo2792
Copy link
Contributor

I did some research to understand why the containerd's default seccomp filter doesn't include the swapon / swapoff syscalls, just to be sure we weren't doing the wrong thing. Probably they will answer later in this discussion.

Regardless of the outcome of the linked discussion, in an offline chat I had with @dims I learned that by default the kubelet will fail to start if it wasn't started with --fail-swap-on=false, which is clearly the case in Bottlerocket's implementation. For details refer to this PR.

There are ongoing efforts to better support swap memory in the kubelet, and there is some work that already landed, but said work has to be picked up by the CRI implementations (aka, containerd) to close the gap. I think once they close that gap, we should follow @stefansundin suggestion of having an API to allocate swap memory and support the new settings that allow the kubelet to use the allocated memory (@bcressey / @etungsten what do you all think?).

All that said, @stefansundin if you are using an ECS image you should be able to use the swap memory without impacting the ECS agent. But that isn't the case for the kubernetes images, since as mentioned before, the kubelet will refuse to start.

@arnaldo2792 arnaldo2792 merged commit e8a5391 into bottlerocket-os:develop Nov 3, 2022
@easy1481437320 easy1481437320 mentioned this pull request Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core Issues core to the OS (variant independent)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Swap
4 participants