Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protect snapshot layers via SELinux policy #935

Merged
merged 10 commits into from
Jun 4, 2020

Conversation

bcressey
Copy link
Contributor

Issue number:
#764

Description of changes:
Make host-ctr, docker, and containerd pass a context= option for overlayfs mounts, to allow writes to the upper layer that would be blocked by labels on the lower layers.

Update the SELinux policy such that layers receive a different label when unpacked.

Fix a warning from runc by avoiding an incorrect attempt to mount mqueue with a label that does not match the host.

Testing done:

Tested aws-dev image locally and aws-k8s-1.15 image in EC2. Verified that conformance test passed. No AVC denials were logged. No SELinux warnings were logged for the mqueue mount.

Confirmed via mount that host containers, Kubernetes pods, and Docker containers have the context option specified.

# host-ctr
overlay on /run/host-containerd/io.containerd.runtime.v2.task/default/control/rootfs type overlay (rw,relatime,context=system_u:object_r:local_t:s0,...)

# k8s
overlay on /run/containerd/io.containerd.runtime.v2.task/k8s.io/a96361aa8a9104e5e67b0323af70dced907fdbac365f6d763bb2a40afc2d7a87/rootfs type overlay (rw,relatime,context=system_u:object_r:local_t:s0,...)

# docker
overlay on /local/var/lib/docker/overlay2/3f3a32a4f6a20aee6bc9c783b7332c51716d8e25104d57de4dee92194d91cef8/merged type overlay (rw,relatime,context="system_u:object_r:local_t:s0:c472,c993",...)

Confirmed that underlying directories have the cache_t label:

# ls -latrZ /var/lib/docker/overlay2/
drwx------.  3 root root system_u:object_r:cache_t:s0 4096 May 28 22:11 4a0ae1371ad625c99d1247a32f87f65ed8055ee22d70f93992fd69239c4f938a

# ls -latrZ /var/lib/host-containerd/io.containerd.snapshotter.v1.overlay
drwx------. 13 root root system_u:object_r:cache_t:s0  4096 May 28 22:10 snapshots

# ls -latrZ /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs
drwx------. 54 root root system_u:object_r:cache_t:s0   4096 May 28 21:41 snapshots

Confirmed that files created inside the container end up with the local_t label:

# find /var/lib/host-containerd/ -type f -context 'system_u:object_r:local_t:s0'
/var/lib/host-containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/14/fs/var/log/amazon/ssm/amazon-ssm-agent.log
/var/lib/host-containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/14/fs/var/log/amazon/ssm/errors.log
...

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

Signed-off-by: Ben Cressey <[email protected]>
This gives us a more accurate view of the daemon's status.

Signed-off-by: Ben Cressey <[email protected]>
This integrates an upstream patch for containerd so the mount label
will be used when the rootfs is mounted.

Signed-off-by: Ben Cressey <[email protected]>
@bcressey
Copy link
Contributor Author

bcressey commented Jun 1, 2020

Added a rule to cover /var/lib/containerd/io.containerd.metadata.v1.bolt/meta.db, under the assumption that the database also warrants protection.

Verified that an attempted write triggered a denial:

# echo -n 'system_u:system_r:container_t:s0' > /proc/self/attr/exec
# touch /var/lib/containerd/io.containerd.metadata.v1.bolt/meta.db
[ 1230.428146] audit: type=1400 audit(1591052837.699:11): avc:  denied  { write } for  pid=32082 comm="touch" name="meta.db" dev="nvme1n1p1" ino=784924 scontext=system_u:system_r:container_t:s0 tcontext=system_u:object_r:cache_t:s0 tclass=file permissive=1

This ensures that we always end up with a mount label for the rootfs,
which would otherwise not be writable.

Signed-off-by: Ben Cressey <[email protected]>
This integrates an upstream patch for containerd so the mount label
will be used when the rootfs is mounted.

Signed-off-by: Ben Cressey <[email protected]>
We expect the layers that make up containers to be labeled in a way
that blocks writes from most processes.

By applying a different label to the mount, we allow processes inside
a running container to modify their own root filesystem, which would
otherwise be immutable.

Signed-off-by: Ben Cressey <[email protected]>
This causes Docker to apply SELinux labels to processes and mounts.
Without the mount label, the root filesystem would not be writable.

Signed-off-by: Ben Cressey <[email protected]>
Otherwise, the first mount attempt fails and logs this warning:
  Same superblock, different security settings for (dev mqueue, type mqueue)

Signed-off-by: Ben Cressey <[email protected]>
Now that we have the machinery in place to provide a "context" option
for overlayfs mounts, we can use a read-only type for file objects in
the lower directories. We also record the type in `lxc_contexts` for
programs that rely on that file for discovery.

`host-ctr` now runs with the same type as the other container runtime
components, because it handles the `mount()` syscall. The kernel will
try to create a "work" subdirectory in workdir during the mount, so
the calling process needs permissions to do so.

Signed-off-by: Ben Cressey <[email protected]>
@bcressey bcressey merged commit 07efc8c into bottlerocket-os:develop Jun 4, 2020
@bcressey bcressey deleted the overlay-label branch June 4, 2020 23:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants