Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backports/v0.8: cgroups fixes and tests backport #629

Merged
merged 16 commits into from
Jan 18, 2023

Conversation

tixxdz
Copy link
Member

@tixxdz tixxdz commented Jan 16, 2023

This backport is a follow up of cgroups fixes:

Goes on top of #627

@tixxdz tixxdz requested a review from a team as a code owner January 16, 2023 11:48
@tixxdz tixxdz requested review from jrfastab and removed request for a team January 16, 2023 11:48
@tixxdz tixxdz requested a review from kkourt January 16, 2023 16:53
Base automatically changed from backports/v0.8/tixxdz/prs-471 to v0.8 January 17, 2023 06:55
tixxdz added 16 commits January 17, 2023 08:25
Fix a bug that was only referencing the last entry of the loadedSensors
on disable path. This patch ensures that we disable all sensors.

Signed-off-by: Djalal Harouni <[email protected]>
(cherry picked from commit 930f9cb)
Add test to emulate k8s hierarchy in cgroupv2 under unified mode

Signed-off-by: Djalal Harouni <[email protected]>
(cherry picked from commit eaa92b6)
Add test to emulate k8s hierarhcy in cgroupv2 under hybrid mode

Signed-off-by: Djalal Harouni <[email protected]>
(cherry picked from commit b0a9f0a)
This test cgroupv1 subsys indexes from a custom file.

Signed-off-by: Djalal Harouni <[email protected]>
(cherry picked from commit 41b54db)
Signed-off-by: Djalal Harouni <[email protected]>
(cherry picked from commit 25c7466)
Export the CgroupController struct so we can use it by the cgroup
sensors tests in next patch. We need this to assert cgroup events when
we operate on cgroupv1 multiple hierarchies.

This is explicitly a separate patch as it changes core package.

Signed-off-by: Djalal Harouni <[email protected]>
(cherry picked from commit c942c75)
Add test to emulate k8s hierarhcy in cgroupv1 under hybrid mode.

The test will create multiple cgroup hierarchies, mount the related
controllers, create cgroups and ensure that we only receive, store,
and remove the cgroups of the desired hierarchy.

Signed-off-by: Djalal Harouni <[email protected]>
(cherry picked from commit e18f6f0)
Allow to select which controller to use when testing cgroupv1
hierarchies, this will ensure all those controllers can properly
used as a fallback mechanism to guarantee proper operations.

Signed-off-by: Djalal Harouni <[email protected]>
(cherry picked from commit 633db8f)
This will ensure that if we pass invalid or not supported controllers,
current implementation will fallback to the proper working one.

The selected controller and hierarchy will be used for tracking,
receive events and hold cgroup data, where the not supported
controllers will not receive any event.

Signed-off-by: Djalal Harouni <[email protected]>
(cherry picked from commit 79f3741)
Signed-off-by: Djalal Harouni <[email protected]>
(cherry picked from commit 7b57800)
The cgroup ID kube->cgrpid never worked, usually it will be zero
we didn't notice as it is not used. Fix this since we are improving
our cgroups management.

This patch contains:

- Add our bpf helper tg_get_current_cgroup_id() to get cgroup IDs,
  upstream helper handles only unified cgroupv2.

  For our cases we may have users that are still running in hybrid
  mode (cgroupv1 and cgroupv2); so let's check the cgroupfs magic
  that was passed from userspace during startup, and adapt to it.
  Otherwise we fallback to operate on a cgroup of the desired css
  that was also detected during startup, this will handle both
  cgroupv1 and cgroupv2 at same time.

  Note: if in unified pure cgroupv2 mode, then resources are managed
  by systemd within cgroupv2 hierarchy, if in hybrid mode then
  resources will be managed in cgroupv1 hierarchies while cgroupv2
  will only be used for process tracking... this does not guarantee
  the following setups:

  - K8s could be using only cgroupv1 especially with its cgroup
    driver:
    kubernetes/kubernetes#108028
    https://kubernetes.io/docs/concepts/architecture/cgroups/

  - Nested k8s setups usually endup with a cgroupv1 filesystem even
    if host is running in unified or hybrid modes, seems also related
    to different k8s tools and versions.

  This new helper covers all these cases.

- Fix how we declare cgroup bpf helpers as they are enums, add
  bpf_core_enum_value_exists() from libbpf and check function ids
  inside 'enum bpf_func_id' to see if the helper is available.

- Add EVENT_ERROR_CGROUP_ID flag to note that we failed to read the
  cgroup ID of the process. This allows to easily trace errors from
  user space.

Signed-off-by: Djalal Harouni <[email protected]>
(cherry picked from commit dc708e1)
The hierarchy ID is useful when running under multiple hierarchies
to be able to debug or ensure that events belong to the corresponding
hierarchy.

Signed-off-by: Djalal Harouni <[email protected]>
(cherry picked from commit 958bd15)
Signed-off-by: Djalal Harouni <[email protected]>
(cherry picked from commit 999aa7c)
…ver()

Signed-off-by: Djalal Harouni <[email protected]>
(cherry picked from commit cb6c420)
The EventDocker* errors are in reality related to cgroups, so rename
the flags to reflect that and improve their user space description too.

Signed-off-by: Djalal Harouni <[email protected]>
(cherry picked from commit acdfc94)
Set appropriate EVENT_ERROR_CGROUP_* flags when reading cgroup info
from BPF side fails. It allows to easily identify from user space the
offending bpf code.

Signed-off-by: Djalal Harouni <[email protected]>
(cherry picked from commit 82b216c)
@tixxdz tixxdz force-pushed the backports/v0.8/tixxdz/cgroups-set-1 branch from 6d6487e to 440ccf5 Compare January 17, 2023 07:25
@kkourt kkourt merged commit 4f69f1c into v0.8 Jan 18, 2023
@kkourt kkourt deleted the backports/v0.8/tixxdz/cgroups-set-1 branch January 18, 2023 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants