Skip to content

Commit

Permalink
New tests for user namespaces and groups issue
Browse files Browse the repository at this point in the history
This test illustrate an issue when trying to use runc with user
namespaces in Kubernetes.

runc needs to bind mount files from /var/lib/kubelet/pods/... (such as
etc-hosts) into the container. When using user namespaces, the bind
mount didn't work anymore when runc is started from a systemd unit.

The workaround is to start the systemd unit with SupplementaryGroups=0.

runc needs to have permission on the directory to stat() the source of
the bind mount. Without user namespaces, this is not a problem since
runc is running as root, so it has 'rwx' permissions over the directory:

drwxr-x---. 8 root   root   4096 May 28 18:05 /var/lib/kubelet

Moreover, runc has CAP_DAC_OVERRIDE at this point because the mount
phase happens before giving up the additional permissions.

However, when using user namespaces, the runc process is belonging to a
different user than root (depending on the mapping). /var/lib/kubelet is
seen as belonging to the special unmapped user (65534, nobody). runc
does not have 'rwx' permissions anymore but the empty '---' permission
for 'other'.

CAP_DAC_OVERRIDE is also no effective because the kernel performs the
capability check with capable_wrt_inode_uidgid(inode, CAP_DAC_OVERRIDE).
This checks that the owner of the /var/lib/kubelet is mapped in the
current user namespace, which is not the case.

Despite that, bind mounting /var/lib/kubelet/pods/...etc-hosts was
working when runc was started manually with 'sudo' but not working
when started from a systemd unit. The difference is how supplementary
groups are handled between sudo and systemd units: systemd does not set
supplementary groups by default.

$ sudo grep -E 'Groups:|Uid:|Gid:' /proc/self/status
Uid:	0	0	0	0
Gid:	0	0	0	0
Groups:	0

$ sudo systemd-run -t grep -E 'Groups:|Uid:|Gid:' /proc/self/status
Running as unit: run-u296886.service
Press ^] three times within 1s to disconnect TTY.
Uid:	0	0	0	0
Gid:	0	0	0	0
Groups:

When runc has the supplementary group 0 configured, it is retained
during the bind-mount phase, even though it is an unmapped group (runc
temporarily sees 'Groups: 65534' in its own /proc/self/status), so runc
effectively has the 'r-x' permissions over /var/lib/kubelet. This makes
the bind mount of etc-hosts work.

After the mount phase, runc will set the credential correctly (following
OCI's config.json specification), so the container will not retain this
unmapped supplementary group.

It is difficult to set up supplementary groups from Golang code
automatically with syscall.Setgroups() because "at the kernel level,
user IDs and group IDs are a per-thread attribute" (man setgroups) and
the way Golang uses threads make it difficult to predict which thread is
going to be used to execute runc. glibc's setgroup() is a wrapper that
changes the credentials for all threads but Golang does not use the
glibc implementation.

Signed-off-by: Alban Crequy <[email protected]>
  • Loading branch information
alban committed Sep 10, 2020
1 parent 4dbfb35 commit 7c0529f
Showing 1 changed file with 45 additions and 0 deletions.
45 changes: 45 additions & 0 deletions tests/integration/userns.bats
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#!/usr/bin/env bats

load helpers

function setup() {
teardown_busybox
setup_busybox
mkdir -p "$BUSYBOX_BUNDLE"/source-{accessible,inaccessible}/dir
chmod 750 "$BUSYBOX_BUNDLE"/source-inaccessible
mkdir -p "$BUSYBOX_BUNDLE"/rootfs/{proc,sys,tmp}
mkdir -p "$BUSYBOX_BUNDLE"/rootfs/tmp/{accessible,inaccessible}
update_config ' .process.args += ["-c", "echo HelloWorld"] '
update_config ' .linux.namespaces += [{"type": "user"}]
| .linux.uidMappings += [{"hostID": 100000, "containerID": 0, "size": 65534}]
| .linux.gidMappings += [{"hostID": 100000, "containerID": 0, "size": 65534}] '
}

function teardown() {
teardown_busybox
}

@test "userns without mount" {
runc run test_userns_without_mount
[ "$status" -eq 0 ]

[[ "${output}" == *"HelloWorld"* ]]
}

@test "userns with simple mount" {
update_config ' .mounts += [{"source": "source-accessible/dir", "destination": "/tmp/accessible", "options": ["bind"]}] '

runc run test_userns_with_simple_mount
[ "$status" -eq 0 ]

[[ "${output}" == *"HelloWorld"* ]]
}

@test "userns with inaccessible mount" {
update_config ' .mounts += [{"source": "source-inaccessible/dir", "destination": "/tmp/inaccessible", "options": ["bind"]}] '

runc run test_userns_with_difficult_mount
[ "$status" -eq 0 ]

[[ "${output}" == *"HelloWorld"* ]]
}

0 comments on commit 7c0529f

Please sign in to comment.