cgroups: select which cgroup hierarchy and subsystem state to use #369

tixxdz · 2022-08-29T16:42:27Z

This was part of #225 , it was cleaned in order to improve our logic how we operate on cgroup hierarchies.

bpf:cgroups: pass subsys index when operating on cgroup_subsys_state set

Select which cgroup controllers to use at runtime by analyzing current machine
cgroup configuration and adapt the bpf helpers to use the best option.

We have experienced events that did not have the proper container ID 'docker' nor the pod fields set. The reason is due to:

In Cgroupv1 mode usually systemd only sets up by default the 'cpu, cpuacct, memory, devices and pids' controllers, cpuset which in normal cases indexed at 0 is not installed. Since some container runtimes and different environments may use systemd as a cgroup driver this can cause problems where we won't operate on the right hierarchy. We should also note that these controllers are kernel compile CONFIG_* options, and Tetragon should only work on machines that have the CONFIG_CGROUP_PIDS or CONFIG_CGROUP_MEMORY controllers compiled. Most production machines these days have or must have these compiled in to properly work.

Let's be consistent with systemd, Kubernetes and Container runtimes and select by default either the 'memory' or 'pids' to be used as the tracking Cgroup hierarchy for all processes. Usually these two controllers are always present and set. We do this by selecting the right hierarchy ID and the cgroup subsystem state index.

In Cgroupv2 mode, systemd successfully sets the related controllers that are safe to be used by default. However we have experienced machines that did not have the cpuset controller. In order to avoid such errors we do same operation for Cgroupv1, we gather the Cgroup subsystem state index and pass it into tetragon_conf struct at startup. To get the Cgroup ID we first use the default Cgroupv2 BPF helpers, if they fail we fallback to the per subsystem index. Last, to get the Cgroup name we always query the subsystem index and read the kernfs node name.

This allows Tetragon to work on different environments, regardless of the Cgroup configuration and driver being used.

Further reference: https://github.com/systemd/systemd/blob/main/src/basic/cgroup-util.h#L20

Signed-off-by: Djalal Harouni [email protected]

willfindlay

This fixes a nasty cgroupsv2 bug we've been tracking on 4.19 🎉

This is a preparation patch that adds: tetragon_conf struct to store Tetragon runtime configuration in order to improve cgroup implementation and how we lookup container IDs. User space will gather information and store it into bpf map where cgroup helpers will read it and adapt their behavior accordingly. We have experienced events that did not have the proper container ID 'docker' nor the pod fields set. The reasons are: In Cgroupv1 mode usually systemd only sets up by default the 'cpu, cpuacct, memory, devices and pids' controllers, cpuset controller which in normal cases indexed at 0 is not installed, and it was the default controller that our bpf helpers used to fetch cgroup information including the name. Since some container runtimes and different environments may use systemd as a cgroup driver this can cause problems where we won't operate on the right hierarchy (controller). We should also note that these controllers are kernel compile CONFIG_* options, and Tetragon should only work on machines that have the CONFIG_CGROUP_PIDS or CONFIG_CGROUP_MEMORY controllers compiled. Most production machines these days have or must have these compiled-in to properly work. Let's be consistent with systemd, Kubernetes and Container runtimes, select by default either the 'memory' or 'pids' controllers to be used as the tracking Cgroup hierarchy for all processes in Cgroupv1. Usually these two controllers are always present and set. We do this by selecting the right hierarchy ID and the cgroup subsystem state index that it initialized once during boot and propagated to all css_set's of tasks of the machine. In Cgroupv2 mode, systemd successfully sets the related controllers that are safe to be used by default. However we have experienced machines that did not have the cpuset controller which is kind of strange, the controller is not propagated down to services and processes. This ends up in same error as for Cgroupv1. In order to avoid such errors we do same operation for Cgroupv1, we gather the Cgroup subsystem state index and pass it into tetragon_conf struct at startup. Then to get the Cgroup ID we first use the default Cgroupv2 BPF helpers, if they fail we fallback to the per subsystem state index. Last, to get the Cgroup name we always query the subsystem state index and read the kernfs node name. This should allow Tetragon to work on most of environments, regardless of the Cgroup configuration and driver being used, assuming they have the CONFIG_CGROUP_MEMORY or CONFIG_CGROUP_PIDS compiled-in. Signed-off-by: Djalal Harouni <[email protected]>

Update our definition of CGROUP_SUBSYS_COUNT since new Cgroup controllers were added. These value will be used as an in-bound limit guard. We are only interested in 'memory' and 'pids' indexes, however we will use the values provided here for safety in-bound checks. Since some controllers may not be compiled in, we instead read /proc/cgroups at startup and update the right subsystem indexes at runtime to accommodate to the current machine where Tetragon is running, this is more flexible. Reference: https://elixir.bootlin.com/linux/v5.19/source/include/linux/cgroup_subsys.h Signed-off-by: Djalal Harouni <[email protected]>

Preparation patch for later that adds convenience macro to check that provided named type (struct/union/enum/typedef) exists in a target kernel. We need this to check that 'union kernfs_node_id' type exists which was the id for kernfs nodes in kernels prior to 5.5 Code reference from: https://github.com/libbpf/libbpf Signed-off-by: Djalal Harouni <[email protected]>

… older In newer kernels the kernfs_node id is u64 type, however on kernels from 5.4 and older it was a union. So let's add the kernfs_node_id union type so we can check it at runtime and properly operate on the right structure layout to get the Cgroup ID on these older kernels. This never worked and we did not notice it since we do not use the Cgroup IDs. However this will change in future so make sure we fix this now while we are it. This also helps debugging so we get the right IDs instead of zeroes. Signed-off-by: Djalal Harouni <[email protected]>

Add our bpf cgroups helpers to properly operate on desired cgroups. The helpers allow to select which css, cgroup and related information to use. Some upstream BPF helpers work only on Cgroupv2 where we want more flexiblity, work on both Cgroupv1 and v2 without distinction and allow Tetragon to adapt to current machine where it is running. Signed-off-by: Djalal Harouni <[email protected]>

Use the new bpf cgroups helpers to gather cgroup information during execve() events. This will ensure that: - We operate on the right hierarchy, css and its cgroup where the information we want is available. - Use the passed subsys index as a selection to get the cgroup name which can be transformed to a container ID in user-space. - Fix the get cgroup id logic which never worked on older kernels on cgroupv1, it always returned zero, with this change we will get the right cgroup ids. Signed-off-by: Djalal Harouni <[email protected]>

This package contains helpers to operate on cgroups: - Performs cgroup filesystem detection. - Performs cgroup mode detection based on https://systemd.io/CGROUP_DELEGATION/ but should also work for non-systemd init machines. - Validates cgroup paths obtained from /proc/self/cgroup for both cgroupv1 and cgroupv2 All these will be used in follow up patches. Signed-off-by: Djalal Harouni <[email protected]>

Signed-off-by: Djalal Harouni <[email protected]>

Add helpers to read and write Tetragon runtime conf that is stored in a bpf map. The UpdateRuntimeConf() Gathers information about Tetragon runtime environment and updates BPF TetragonConfMap It detects the CgroupFS magic, Cgroup runtime mode, discovers cgroup css's that are registered during boot and propagated to all tasks inside their css_set, detects the deployment mode from kubernetes, containers, to standalone or systemd services. All discovered information will also be logged for debugging purpose. Signed-off-by: Djalal Harouni <[email protected]>

Update TetragonConf bpf map at startup with the gathered cgroup and environment information. Signed-off-by: Djalal Harouni <[email protected]>

Signed-off-by: Djalal Harouni <[email protected]>

tixxdz requested a review from a team as a code owner August 29, 2022 16:42

tixxdz requested a review from tpapagian August 29, 2022 16:42

tixxdz force-pushed the pr/tixxdz/cgroup-select-hierarchies branch from d8fd0f8 to 1e56861 Compare September 5, 2022 12:54

tixxdz requested review from jrfastab, kkourt and willfindlay September 5, 2022 13:49

willfindlay approved these changes Sep 6, 2022

View reviewed changes

tixxdz force-pushed the pr/tixxdz/cgroup-select-hierarchies branch from 175dad1 to 6d50460 Compare September 7, 2022 14:05

tixxdz requested a review from kevsecurity September 7, 2022 14:08

tixxdz force-pushed the pr/tixxdz/cgroup-select-hierarchies branch from 6d50460 to c69cdef Compare September 7, 2022 14:25

tixxdz added 11 commits September 7, 2022 17:32

pkg:sensors: load TetragonConf bpf map tg_conf_map

87bf34a

Signed-off-by: Djalal Harouni <[email protected]>

pkg:observer: Update BPF TetragonConfMap with runtime info

0e580cd

Update TetragonConf bpf map at startup with the gathered cgroup and environment information. Signed-off-by: Djalal Harouni <[email protected]>

tests: add tg_conf_map bpf map to TestLoadInitialSensor()

8efc6eb

Signed-off-by: Djalal Harouni <[email protected]>

tixxdz force-pushed the pr/tixxdz/cgroup-select-hierarchies branch from c69cdef to 8efc6eb Compare September 7, 2022 15:35

kevsecurity approved these changes Sep 7, 2022

View reviewed changes

tixxdz merged commit b7503f0 into main Sep 7, 2022

tixxdz deleted the pr/tixxdz/cgroup-select-hierarchies branch September 7, 2022 16:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cgroups: select which cgroup hierarchy and subsystem state to use #369

cgroups: select which cgroup hierarchy and subsystem state to use #369

tixxdz commented Aug 29, 2022 •

edited

Loading

willfindlay left a comment

cgroups: select which cgroup hierarchy and subsystem state to use #369

cgroups: select which cgroup hierarchy and subsystem state to use #369

Conversation

tixxdz commented Aug 29, 2022 • edited Loading

willfindlay left a comment

Choose a reason for hiding this comment

tixxdz commented Aug 29, 2022 •

edited

Loading