From 84e472e95c951bcc75bd4ce9193093fcab91d68b Mon Sep 17 00:00:00 2001 From: Sascha Grunert Date: Thu, 15 Jul 2021 09:55:58 +0200 Subject: [PATCH] Add seccomp default feature blog post This adds the blog post about the new Kubernetes `SeccompDefault` alpha feature. Signed-off-by: Sascha Grunert --- .../blog/_posts/2021-08-25-seccomp-default.md | 267 ++++++++++++++++++ 1 file changed, 267 insertions(+) create mode 100644 content/en/blog/_posts/2021-08-25-seccomp-default.md diff --git a/content/en/blog/_posts/2021-08-25-seccomp-default.md b/content/en/blog/_posts/2021-08-25-seccomp-default.md new file mode 100644 index 0000000000000..c38b7fdee168e --- /dev/null +++ b/content/en/blog/_posts/2021-08-25-seccomp-default.md @@ -0,0 +1,267 @@ +--- +layout: blog +title: "Enable seccomp for all workloads with a new v1.22 alpha feature" +date: 2021-08-25 +slug: seccomp-default +--- + +**Author:** Sascha Grunert, Red Hat + +This blog post is about a new Kubernetes feature introduced in v1.22, which adds +an additional security layer on top of the existing seccomp support. Seccomp is +a security mechanism for Linux processes to filter system calls (syscalls) based +on a set of defined rules. Applying seccomp profiles to containerized workloads +is one of the key tasks when it comes to enhancing the security of the +application deployment. Developers, site reliability engineers and +infrastructure administrators have to work hand in hand to create, distribute +and maintain the profiles over the applications life-cycle. + +You can use the [`securityContext`][seccontext] field of Pods and their +containers can be used to adjust security related configurations of the +workload. Kubernetes introduced dedicated [seccomp related API +fields][seccontext] in this `SecurityContext` with the [graduation of seccomp to +General Availability (GA)][ga] in v1.19.0. This enhancement allowed an easier +way to specify if the whole pod or a specific container should run as: + +[seccontext]: /docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context-1 +[ga]: https://kubernetes.io/blog/2020/08/26/kubernetes-release-1.19-accentuate-the-paw-sitive/#graduated-to-stable + +- `Unconfined`: seccomp will not be enabled +- `RuntimeDefault`: the container runtimes default profile will be used +- `Localhost`: a node local profile will be applied, which is being referenced + by a relative path to the seccomp profile root (`/seccomp`) + of the kubelet + +With the graduation of seccomp, nothing has changed from an overall security +perspective, because `Unconfined` is still the default. This is totally fine if +you consider this from the upgrade path and backwards compatibility perspective of +Kubernetes releases. But it also means that it is more likely that a workload +runs without seccomp at all, which should be fixed in the long term. + +## `SeccompDefault` to the rescue + +Kubernetes v1.22.0 introduces a new kubelet [feature gate][gate] +`SeccompDefault`, which has been added in `alpha` state as every other new +feature. This means that it is disabled by default and can be enabled manually +for every single Kubernetes node. + +[gate]: /docs/reference/command-line-tools-reference/feature-gates + +What does the feature do? Well, it just changes the default seccomp profile from +`Unconfined` to `RuntimeDefault`. If not specified differently in the pod +manifest, then the feature will add a higher set of security constraints by +using the default profile of the container runtime. These profiles may differ +between runtimes like [CRI-O][crio] or [containerd][ctrd]. They also differ for +its used hardware architectures. But generally speaking, those default profiles +allow a common amount of syscalls while blocking the more dangerous ones, which +are unlikely or unsafe to be used in a containerized application. + +[crio]: https://github.com/cri-o/cri-o/blob/fe30d62/vendor/github.com/containers/common/pkg/seccomp/default_linux.go#L45 +[ctrd]: https://github.com/containerd/containerd/blob/e1445df/contrib/seccomp/seccomp_default.go#L51 + +### Enabling the feature + +Two kubelet configuration changes have to be made to enable the feature: + +1. **Enable the feature** gate by setting the `SeccompDefault=true` via the command + line (`--feature-gates`) or the [kubelet configuration][kubelet] file. +2. **Turn on the feature** by enabling the feature by adding the + `--seccomp-default` command line flag or via the [kubelet + configuration][kubelet] file (`seccompDefault: true`). + +[kubelet]: /docs/tasks/administer-cluster/kubelet-config-file + +The kubelet will error on startup if only one of the above steps have been done. + +### Trying it out + +If the feature is enabled on a node, then you can create a new workload like +this: + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: test-pod +spec: + containers: + - name: test-container + image: nginx:1.21 +``` + +Now it is possible to inspect the used seccomp profile by using +[`crictl`][crictl] while investigating the containers [runtime +specification][rspec]: + +[crictl]: https://github.com/kubernetes-sigs/cri-tools +[rspec]: https://github.com/opencontainers/runtime-spec/blob/0c021c1/config-linux.md#seccomp + +```bash +CONTAINER_ID=$(sudo crictl ps -q --name=test-container) +sudo crictl inspect $CONTAINER_ID | jq .info.runtimeSpec.linux.seccomp +``` + +```yaml +{ + "defaultAction": "SCMP_ACT_ERRNO", + "architectures": ["SCMP_ARCH_X86_64", "SCMP_ARCH_X86", "SCMP_ARCH_X32"], + "syscalls": [ + { + "names": ["_llseek", "_newselect", "accept", …, "write", "writev"], + "action": "SCMP_ACT_ALLOW" + }, + … + ] +} +``` + +You can see that the lower level container runtime ([CRI-O][crio-home] and +[runc][runc] in our case), successfully applied the default seccomp profile. +This profile denies all syscalls per default, while allowing commonly used ones +like [`accept`][accept] or [`write`][write]. + +[crio-home]: https://github.com/cri-o/cri-o +[runc]: https://github.com/opencontainers/runc +[accept]: https://man7.org/linux/man-pages/man2/accept.2.html +[write]: https://man7.org/linux/man-pages/man2/write.2.html + +Please note that the feature will not influence any Kubernetes API for now. +Therefore, it is not possible to retrieve the used seccomp profile via `kubectl` +`get` or `describe` if the [`SeccompProfile`][api] field is unset within the +`SecurityContext`. + +[api]: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context-1 + +The feature also works when using multiple containers within a pod, for example +if you create a pod like this: + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: test-pod +spec: + containers: + - name: test-container-nginx + image: nginx:1.21 + securityContext: + seccompProfile: + type: Unconfined + - name: test-container-redis + image: redis:6.2 +``` + +then you should see that the `test-container-nginx` runs without a seccomp profile: + +```bash +sudo crictl inspect $(sudo crictl ps -q --name=test-container-nginx) | + jq '.info.runtimeSpec.linux.seccomp == null' +true +``` + +Whereas the container `test-container-redis` runs with `RuntimeDefault`: + +```bash +sudo crictl inspect $(sudo crictl ps -q --name=test-container-redis) | + jq '.info.runtimeSpec.linux.seccomp != null' +true +``` + +The same applies to the pod itself, which also runs with the default profile: + +```bash +sudo crictl inspectp (sudo crictl pods -q --name test-pod) | + jq '.info.runtimeSpec.linux.seccomp != null' +true +``` + +### Upgrade strategy + +It is recommended to enable the feature in multiple steps, whereas different +risks and mitigations exist for each one. + +#### Feature gate enabling + +Enabling the feature gate at the kubelet level will not turn on the feature, but +will make it possible by using the `SeccompDefault` kubelet configuration or the +`--seccomp-default` CLI flag. This can be done by an administrator for the whole +cluster or only a set of nodes. + +#### Testing the Application + +If you're trying this within a dedicated test environment, you have to ensure +that the application code does not trigger syscalls blocked by the +`RuntimeDefault` profile before enabling the feature on a node. This can be done +by: + +- _Recommended_: Analyzing the code (manually or by running the application with + [strace][strace]) for any executed syscalls which may be blocked by the + default profiles. If that's the case, then you can override the default by + explicitly setting the pod or container to run as `Unconfined`. Alternatively, + you can create a custom seccomp profile (see optional step below). + profile based on the default by adding the additional syscalls to the + `"action": "SCMP_ACT_ALLOW"` section. + +- _Recommended_: Manually set the profile to the target workload and use a + rolling upgrade to deploy into production. Rollback the deployment if the + application does not work as intended. + +- _Optional_: Run the application against an end-to-end test suite to trigger + all relevant code paths with `RuntimeDefault` enabled. If a test fails, use + the same mitigation as mentioned above. + +- _Optional_: Create a custom seccomp profile based on the default and change + its default action from `SCMP_ACT_ERRNO` to `SCMP_ACT_LOG`. This means that + the seccomp filter for unknown syscalls will have no effect on the application + at all, but the system logs will now indicate which syscalls may be blocked. + This requires at least a Kernel version 4.14 as well as a recent [runc][runc] + release. Monitor the application hosts audit logs (defaults to + `/var/log/audit/audit.log`) or syslog entries (defaults to `/var/log/syslog`) + for syscalls via `type=SECCOMP` (for audit) or `type=1326` (for syslog). + Compare the syscall ID with those [listed in the Linux Kernel + sources][syscalls] and add them to the custom profile. Be aware that custom + audit policies may lead into missing syscalls, depending on the configuration + of auditd. + +- _Optional_: Use cluster additions like the [Security Profiles Operator][spo] + for profiling the application via its [log enrichment][logs] capabilities or + recording a profile by using its [recording feature][rec]. This makes the + above mentioned manual log investigation obsolete. + +[syscalls]: https://github.com/torvalds/linux/blob/7bb7f2a/arch/x86/entry/syscalls/syscall_64.tbl +[spo]: https://github.com/kubernetes-sigs/security-profiles-operator +[logs]: https://github.com/kubernetes-sigs/security-profiles-operator/blob/c90ef3a/installation-usage.md#record-profiles-from-workloads-with-profilerecordings +[rec]: https://github.com/kubernetes-sigs/security-profiles-operator/blob/c90ef3a/installation-usage.md#using-the-log-enricher +[strace]: https://man7.org/linux/man-pages/man1/strace.1.html + +#### Deploying the modified application + +Based on the outcome of the application tests, it may be required to change the +application deployment by either specifying `Unconfined` or a custom seccomp +profile. This is not the case if the application works as intended with +`RuntimeDefault`. + +#### Enable the kubelet configuration + +If everything went well, then the feature is ready to be enabled by the kubelet +configuration or its corresponding CLI flag. This should be done on a per-node +basis to reduce the overall risk of missing a syscall during the investigations +when running the application tests. If it's possible to monitor audit logs +within the cluster, then it's recommended to do this for eventually missed +seccomp events. If the application works as intended then the feature can be +enabled for further nodes within the cluster. + +## Conclusion + +Thank you for reading this blog post! I hope you enjoyed to see how the usage of +seccomp profiles has been evolved in Kubernetes over the past releases as much +as I do. On your own cluster, change the default seccomp profile to +`RuntimeDefault` (using this new feature) and see the security benefits, and, of +course, feel free to reach out any time for feedback or questions. + +--- + +_Editor's note: If you have any questions or feedback about this blog post, feel +free to reach out via the [Kubernetes slack in #sig-node][slack]._ + +[slack]: https://kubernetes.slack.com/messages/sig-node