From 9bc42d61bb6ce280c48a6b491357ec773b2adf45 Mon Sep 17 00:00:00 2001 From: Aleksa Sarai Date: Mon, 4 Nov 2024 20:41:21 +1100 Subject: [PATCH 1/3] dmz: overlay: set xino=off to disable dmesg spam If /run/runc and /usr/bin are on different filesystems, overlayfs may enable the xino feature which results in the following log message: kernel: overlayfs: "xino" feature enabled using 3 upper inode bits. Each time we have to protect /proc/self/exe. So disable xino to remove the log message (we don't care about the inode numbers of the files anyway). Signed-off-by: Aleksa Sarai --- libcontainer/dmz/overlayfs_linux.go | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/libcontainer/dmz/overlayfs_linux.go b/libcontainer/dmz/overlayfs_linux.go index 92cb1944e59..b81b7025895 100644 --- a/libcontainer/dmz/overlayfs_linux.go +++ b/libcontainer/dmz/overlayfs_linux.go @@ -84,6 +84,13 @@ func sealedOverlayfs(binPath, tmpDir string) (_ *os.File, Err error) { return nil, fmt.Errorf("fsconfig set overlayfs lowerdir=%s: %w", lowerDirStr, err) } + // We don't care about xino (Linux 4.17) but it will be auto-enabled on + // some systems (if /run/runc and /usr/bin are on different filesystems) + // and this produces spurious dmesg log entries. We can safely ignore + // errors when disabling this because we don't actually care about the + // setting and we're just opportunistically disabling it. + _ = unix.FsconfigSetString(int(overlayCtx.Fd()), "xino", "off") + // Get an actual handle to the overlayfs. if err := unix.FsconfigCreate(int(overlayCtx.Fd())); err != nil { return nil, os.NewSyscallError("fsconfig create overlayfs", err) From aa505bfa89a6feaae5378a7a6e5166886f8bc0fc Mon Sep 17 00:00:00 2001 From: Aleksa Sarai Date: Mon, 4 Nov 2024 20:47:07 +1100 Subject: [PATCH 2/3] memfd-bind: mention that overlayfs obviates the need for it Signed-off-by: Aleksa Sarai --- contrib/cmd/memfd-bind/README.md | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/contrib/cmd/memfd-bind/README.md b/contrib/cmd/memfd-bind/README.md index a83cc78208c..e529eacfeaf 100644 --- a/contrib/cmd/memfd-bind/README.md +++ b/contrib/cmd/memfd-bind/README.md @@ -1,6 +1,15 @@ ## memfd-bind ## -`runc` normally has to make a binary copy of itself when constructing a +> **NOTE**: Since runc 1.2.0, runc will now use a private overlayfs mount to +> protect the runc binary. This protection is far more light-weight than +> memfd-bind, and for most users this should obviate the need for `memfd-bind` +> entirely. Rootless containers will still make a memfd copy (unless you are +> using `runc` itself inside a user namespace -- a-la +> [`rootlesskit`][rootlesskit]), but `memfd-bind` is not particularly useful +> for rootless container users anyway (see [Caveats](#Caveats) for more +> details). + +`runc` sometimes has to make a binary copy of itself when constructing a container process in order to defend against certain container runtime attacks such as CVE-2019-5736. @@ -38,6 +47,8 @@ much memory usage they can use: container process setup takes up about 10MB per process spawned inside the container by runc (both pid1 and `runc exec`). +[rootlesskit]: https://github.com/rootless-containers/rootlesskit + ### Caveats ### There are several downsides with using `memfd-bind` on the `runc` binary: From b9dfb22dbfefe0b211adfe634d5348cd1ad39266 Mon Sep 17 00:00:00 2001 From: Aleksa Sarai Date: Mon, 4 Nov 2024 20:49:42 +1100 Subject: [PATCH 3/3] readme: drop unused memfd-bind reference Fixes: 871057d863e8 ("drop runc-dmz solution according to overlay solution") Signed-off-by: Aleksa Sarai --- README.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/README.md b/README.md index 8cbe1fe6878..50fcd4e9222 100644 --- a/README.md +++ b/README.md @@ -113,8 +113,6 @@ The following build tags were used earlier, but are now obsoleted: - **apparmor** (since runc v1.0.0-rc93 the feature is always enabled) - **selinux** (since runc v1.0.0-rc93 the feature is always enabled) - [contrib-memfd-bind]: /contrib/cmd/memfd-bind/README.md - ### Running the test suite `runc` currently supports running its test suite via Docker.