Skip to content

Commit c9321b0

Browse files
committed
Rephrase blog
1 parent 1c92352 commit c9321b0

File tree

1 file changed

+90
-95
lines changed

1 file changed

+90
-95
lines changed

content/en/blog/_posts/kubernetes-1-27-efficient-selinux-relabeling-beta.md

Lines changed: 90 additions & 95 deletions
Original file line numberDiff line numberDiff line change
@@ -7,125 +7,120 @@ slug: kubernetes-1-27-efficient-selinux-relabeling-beta
77

88
**Author:** Jan Šafránek (Red Hat)
99

10-
In Kubernetes 1.27 we graduated to beta a more efficient way, how SELinux labels
11-
are applied to volumes used by Pods.
10+
# The problem
1211

13-
## Tl;Dr
12+
On Linux with Security-Enhanced Linux (SELinux) enabled, it's traditionally
13+
the container runtime who applies SELinux labels to a Pod and all it's volumes.
14+
Kubernetes only provides the SELinux label from Pod's Security Context fields
15+
to the container runtime.
1416

15-
If a Pod has SELinux context assigned **and** the operating system supports
16-
SELinux **and** the Pod uses a PersistentVolume with
17-
`accessMode: ReadWriteOncePod` **and** the CSI driver
18-
that handles the volume announces `SELinuxMount: true` in its CSIDriver
19-
instance, **then** Kubernetes + the CSI driver mounts the volume with the Pod's
20-
SELinux label directly, and the container runtime will not relabel the files on
21-
the volume.
17+
The container runtime then recursively changes SELinux label on all files that
18+
are visible to the Pod's containers. This can be time-consuming, if there are
19+
many files on the volume, especially when the volume is on a remote filesystem.
2220

23-
Nothing changes on Windows or on Linux that does not use SELinux.
21+
{{< note >}}
22+
If a container uses `subPath` of a volume, only that `subPath` of the whole
23+
volume is relabeled. This allows two pods that have two different SELinux labels
24+
to use the same volume, as long as they use different subpaths of it.
25+
{{< /note >}}
2426

25-
See below for more description and future direction.
27+
{{< note >}}
28+
If a Pod does not have any SELinux label assigned in Kubernetes API, the
29+
container runtime assigns a unique random one, so a process that potentially
30+
escapes the container boundary cannot access data of any other container on the
31+
host. The container runtime still recursively relabels all pod volumes with this
32+
random SELinux label.
33+
{{< /note >}}
2634

27-
## SELinux in containers
35+
# Improvement using mount options
2836

29-
See excellent
30-
[visual SELinux guide](https://opensource.com/business/13/11/selinux-policy-guide)
31-
by Daniel J Walsh. Note that the guide is older than Kubernetes, it describes
32-
*Multi-Category Security* (MCS) mode using virtual machines as an example,
33-
however, similar concept is used for containers.
37+
If a Pod + its volume satisfies **all** following conditions, Kubernetes will
38+
_mount_ the volume directly with the right SELinux label. Such mount will happen
39+
in a constant time and the container runtime will not need to recursively
40+
relabel any files on it.
3441

35-
See a series of blog posts for details how exactly SELinux is applied to
36-
containers by container runtimes:
42+
1. The operating system must support SELinux.
3743

38-
* [How SELinux separates containers using Multi-Level Security](https://www.redhat.com/en/blog/how-selinux-separates-containers-using-multi-level-security)
39-
* [Why you should be using Multi-Category Security for your Linux containers](https://www.redhat.com/en/blog/why-you-should-be-using-multi-category-security-your-linux-containers)
44+
Without SELinux support detected, kubelet and the container runtime does not
45+
do anything with regard to SELinux.
4046

41-
## SELinux in Kubernetes
47+
1. The [feature gates](/docs/reference/command-line-tools-reference/feature-gates/)
48+
`ReadWriteOncePod` and `SELinuxMountReadWriteOncePod` must be enabled.
49+
These feature gates are Beta in Kubernetes 1.27 and Alpha in 1.25.
4250

43-
Kubernetes allows setting the complete pod process label in `securityContext`
44-
field of a Pod, or in `securityContext` of each container in the Pod.
51+
With any of these feature gates disabled, SELinux labels will be always
52+
applied by the container runtime by a recursive walk through the volume
53+
(or its subPaths).
4554

46-
Kubernetes passes the SELinux label to the container runtime, together
47-
with pod's volumes and their subpaths. By default, Kubernetes tells the
48-
container runtime to recursively apply the SELinux label to all files on all
49-
volumes that support SELinux before running the pod containers.
55+
1. The Pod must have at least `seLinuxOptions.level` assigned in its [Pod Security Context](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context) or all Pod containers must have it set in their [Security Contexts](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context-1).
56+
Kubernetes will read the default `user`, `role` and `type` from the operating
57+
system defaults (typically `system_u`, `system_r` and `container_t`).
5058

51-
{{< caution >}}
52-
The container runtime relabels only the part of a volume that's visible to the
53-
running container(s). If a container uses `subPath` of a volume, only that
54-
`subPath` is relabeled.
59+
Without Kubernetes knowing at least the SELinux `level`, the container
60+
runtime will assign a random one _after_ the volumes are mounted. The
61+
container runtime will still relabel the volumes recursively in that case.
5562

56-
This allows two pods that have two different SELinux labels to use the same
57-
volume, as long as they use different subpaths of it.
58-
{{< /caution >}}
63+
1. The volume must be a Persistent Volume with
64+
[Access Mode](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes)
65+
`ReadWriteOncePod`.
5966

60-
{{< caution >}}
61-
If a Pod does not have any SELinux label assigned in Kubernetes API, the
62-
container runtime assigns a unique random one, so a process that potentially
63-
escapes the container boundary cannot access data of any other container on the
64-
host. The container runtime still recursively relabels all pod volumes with this
65-
random SELinux label.
66-
{{< /caution >}}
67+
{{< caution >}}
68+
This is a limitation of the initial implementation. As described above,
69+
two Pods can have a different SELinux label and still use the same volume,
70+
as long as they use a different `subPath` of it. This use case is not
71+
possible when the volumes are _mounted_ with the SELinux label, because the
72+
whole volume is mounted and most filesystems don't support mounting a single
73+
volume multiple times with multiple SELinux labels.
74+
{{< /caution >}}
6775

68-
It's up to the cluster user, or a security related admission plugin, to set the
69-
SELinux labels on Pods so Pods that should share volumes have the same SELinux
70-
label.
76+
{{< note >}}
77+
Please report in
78+
[the feature issue](https://github.com/kubernetes/enhancements/issues/1710)
79+
if running two Pods with two different SELinux contexts and using
80+
different `subPaths` of the same volume is necessary in your deployments.
81+
Such pods may not run when we extend the feature to all volume access modes.
82+
{{< /note >}}
7183

72-
# Improvement using mount options
84+
1. The volume plugin or the CSI driver responsible for the volume supports
85+
mounting with SELinux mount options.
7386

74-
Linux kernel with SELinux support allows the first mount of a volume to set
75-
SELinux label on the whole volume using `-o context=<SELinux label>` mount
76-
option. This way, all files will have assigned the given label in a constant
77-
time, without recursively walking through the whole volumes.
87+
These in-tree volume plugins support mounting with SELinux mount options:
88+
`fc`, `iscsi`, and `rbd`.
7889

79-
`context` mount option cannot be applied to bind-mounts or re-mounts of already
80-
mounted volumes. Since it's a CSI driver that does the first mount of a volume,
81-
it must be the CSI driver who actually applies this mount option. We added a new
82-
field `SELinuxMount` to CSI Driver object, so CSI drivers can announce if they
83-
support `-o context` mount option.
90+
CSI drivers that support mounting with SELinux mount options must announce
91+
that in their
92+
[CSI Driver](https://kubernetes.io/docs/reference/kubernetes-api/config-and-storage-resources/csi-driver-v1/)
93+
instance by setting `seLinuxMount` field.
8494

85-
If Kubernetes knows SELinux label of a Pod **and** CSI driver responsible for
86-
a pod's volume announces `SELinuxMount: true` **and** the volume has Access Mode
87-
`ReadWriteOncePod`, then it will ask the CSI driver to mount the volume with
88-
mount option `context=<SELinux label>` **and** it will tell the container
89-
runtime not to relabel content of the volume - all files already have the right
90-
label.
95+
Volumes managed by other volume plugins or CSI drivers that don't
96+
set `seLinuxMount: true` will be recursively relabelled by the container
97+
runtime.
9198

92-
{{< note >}}
93-
Not all filesystems support `-o context` mount option out of the box. For
94-
example, blindly passing `-o context=<SELinux label>` to mount of a share from a
95-
NFS server would set the SELinux context for all subsequent mounts from the same
96-
server. A CSI driver that uses NFS must be smart enough to add `nosharecache`
97-
mount option, so a subsequent mount of a different volume from the same NFS
98-
server can have a different `context` option. It's up to a CSI driver vendor
99-
to carefully weight benefits of applying SELinux label in a constant time
100-
and potential performance impact caused by the necessary mount options
101-
and to test the CSI driver in a SELinux enabled environment before setting
102-
`SELinuxMount` to `true`.
103-
{{< /note >}}
99+
## Mounting with SELinux context
104100

105-
## Limitation of the initial implementation
101+
When all aforementioned conditions are met, kubelet will
102+
pass `-o context=<SELinux label>` mount option to the volume plugin or CSI
103+
driver. CSI driver vendors must ensure that this mount option is supported
104+
by their CSI driver and, if necessary, the CSI driver appends other mount
105+
options that are needed for `-o context` to work.
106106

107-
{{< caution >}}
108-
Since the `context` mount option always applies to the whole volume, two pods
109-
with two different SELinux context may not access the same volume, even if
110-
they use different subpaths of it. It depends on the CSI driver if it supports
111-
mounting a single volume multiple times with different SELinux labels - it's
112-
often easy for shared filesystems like NFS, CIFS, GlusterFS and CephFS, but it's
113-
impossible to mount a single block device with ext4 filesystem on the
114-
same host twice with different SELinux contexts.
115-
{{< /caution >}}
107+
For example, NFS may need `-o context=<SELinux label>,nosharecache`, so each
108+
volume mounted from the same NFS server can have a different SELinux label
109+
value. Similarly, CIFS may need `-o context=<SELinux label>,nosharesock`.
116110

117-
Due to this limitation, we've chosen to implement `context` mount only for
118-
Persistent Volumes that have Access Mode `ReadWriteOncePod` in Kubernetes 1.27.
119-
Such volumes can be used only by a single pod and thus only with one SELinux
120-
label.
111+
It's up to the CSI driver vendor to test their CSI driver in a SELinux enabled
112+
environment before setting `seLinuxMount: true` in the CSI Driver instance.
121113

122-
[The KEP describes additional metrics](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/1710-selinux-relabeling#monitoring-requirements)
123-
that count how many pods would not start if we extended the implementation to
124-
all volume Access Modes.
114+
# How can I learn more?
115+
SELinux in containers: see excellent
116+
[visual SELinux guide](https://opensource.com/business/13/11/selinux-policy-guide)
117+
by Daniel J Walsh. Note that the guide is older than Kubernetes, it describes
118+
*Multi-Category Security* (MCS) mode using virtual machines as an example,
119+
however, similar concept is used for containers.
125120

126-
We kindly ask Kubernetes cluster admins to check the metrics and report any
127-
breakage that would be caused by extending the `context` mount to *all* volumes.
128-
Please tag `@jsafrane` in Kubernetes issues.
121+
See a series of blog posts for details how exactly SELinux is applied to
122+
containers by container runtimes:
123+
* [How SELinux separates containers using Multi-Level Security](https://www.redhat.com/en/blog/how-selinux-separates-containers-using-multi-level-security)
124+
* [Why you should be using Multi-Category Security for your Linux containers](https://www.redhat.com/en/blog/why-you-should-be-using-multi-category-security-your-linux-containers)
129125

130-
# How can I learn more?
131126
Read the KEP: [Speed up SELinux volume relabeling using mounts](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/1710-selinux-relabeling)

0 commit comments

Comments
 (0)