Skip to content

cleanup: drop capture_enabled flag from driver#709

Merged
poiana merged 15 commits intomasterfrom
cleanup/drop_capture_enabled
Nov 22, 2022
Merged

cleanup: drop capture_enabled flag from driver#709
poiana merged 15 commits intomasterfrom
cleanup/drop_capture_enabled

Conversation

@FedeDP
Copy link
Contributor

@FedeDP FedeDP commented Nov 10, 2022

What type of PR is this?

/kind cleanup

Any specific area of the project related to this PR?

/area API-version
/area driver-kmod
/area driver-bpf
/area driver-modern-bpf
/area libscap-engine-bpf
/area libscap-engine-kmod
/area libscap-engine-modern-bpf
/area libscap
/area libpman
/area tests

Does this PR require a change in the driver versions?

/version driver-API-version-major

We changed how libscap starts the capture for the driver.

What this PR does / why we need it:

This PR drops useless capture_enabled, by leveraging new runtime tracepoint management feature.
Basically, instead of stopping the capture by setting the flag, but keeping everything running (so, every tracepoint is still called but it just checks if capture is enabled and leaves), we detach every tracepoint.

Moreover, made tracepoint dynamic management per-consumer in kmod.
Tracepoints will be attached using a ref-counted method, so that a tracepoint that is not requested by any consumer, gets detached.
Moreover, now kmod follows other drivers behavior: as soon as it is started, no tracepoint is attached. Instead, tracepoints must be manually set by userspace consumer, using the PPM_IOCTL_MANAGE_TP ioctl.
As a side effect of the above, this also means that an inserted kmod won't steal any resource until any consumer is actually using it, in a way much similar to other drivers.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

@poiana
Copy link
Contributor

poiana commented Nov 10, 2022

@FedeDP: The label(s) area/api-version cannot be applied, because the repository doesn't have them.

Details

In response to this:

What type of PR is this?

/kind cleanup

Any specific area of the project related to this PR?

/area API-version
/area driver-kmod
/area driver-bpf
/area driver-modern-bpf
/area libscap-engine-bpf
/area libscap-engine-kmod
/area libscap-engine-modern-bpf
/area libscap
/area libpman
/area tests

Does this PR require a change in the driver versions?

/version driver-API-version-major

We changed how libscap start the capture for the driver.
TO BE DONE!

What this PR does / why we need it:

This PR drops useless capture_enabled, by leveraging new runtime tracepoint management feature.
Basically, instead of stopping the capture by setting the flag, but keeping everything running (so, every tracepoint is still called but it just checks if capture is enabled and leaves), we detach every tracepoint.

Moreover, made tracepoint dynamic management per-consumer in kmod.
Tracepoints will be attached using a ref-counted method, so that a tracepoint that is not requested by any consumer, gets detached.
Moreover, now kmod follows other drivers behavior: as soon as it is started, no tracepoint is attached. Instead, tracepoints must be manually set by userspace consumer, using the PPM_IOCTL_MANAGE_TP ioctl.
All of the above, also means that an inserted kmod won't steal any resource until any consumer is actually using it, in a way much similar to other drivers.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@FedeDP FedeDP changed the title cleanup: drop capture_enabled flag from driver wip: cleanup: drop capture_enabled flag from driver Nov 10, 2022
@FedeDP FedeDP changed the title wip: cleanup: drop capture_enabled flag from driver cleanup: drop capture_enabled flag from driver Nov 10, 2022
@@ -1 +1 @@
2.0.0
3.0.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We changed how scap tells drivers to start/stop the capture.

struct scap_bpf_settings *settings,
enum syscall_flags drop_flags)
{
struct scap_bpf_settings *settings;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the retrieval of bpf settings down to call_filler as it is now unused in attached programs.

uint16_t statsd_port;
unsigned long buffer_bytes_dim; /* Every consumer will have its per-CPU buffer dim in bytes. */
DECLARE_BITMAP(events_mask, PPM_EVENT_MAX);
u32 tracepoints_attached;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every consumer owns its own tracepoint mask.


#ifndef UDIG
struct ppm_consumer_t {
unsigned int id; // numeric id for the consumer (ie: registration index)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the consumer index (ie: the first added will have 0, the second 1...).

m_event_type = event_type;
switch(event_type)
m_event_direction = BOTH_EVENT;
switch(m_event_type)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it is enable_capture job to attach required tracepoints.

event_data.event_info.context_data.sched_next = (void *)0;

record_event_consumer(consumer, PPME_SCAPEVENT_E, UF_NEVER_DROP, ppm_nsecs(), &event_data);
record_event_consumer(consumer, PPME_SCAPEVENT_E, UF_NEVER_DROP, ppm_nsecs(), &event_data, TP_VAL_INTERNAL);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internal event; use canary value for tp.

struct event_data_t event_data = {0};

if (record_event_consumer(consumer, PPME_DROP_E, UF_NEVER_DROP, ns, &event_data) == 0) {
if (record_event_consumer(consumer, PPME_DROP_E, UF_NEVER_DROP, ns, &event_data, TP_VAL_INTERNAL) == 0) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internal event; use canary value for tp.

struct event_data_t event_data = {0};

if (record_event_consumer(consumer, PPME_DROP_X, UF_NEVER_DROP, ns, &event_data) == 0) {
if (record_event_consumer(consumer, PPME_DROP_X, UF_NEVER_DROP, ns, &event_data, TP_VAL_INTERNAL) == 0) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internal event; use canary value for tp.

int32_t cbres = PPM_SUCCESS;
int cpu;

if (tp_type < TP_VAL_INTERNAL && !(consumer->tracepoints_attached & (1 << tp_type)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if the consumer is attached to the tracepoint, or if the tracepoint is a canary value.

event_data.event_info.context_data.sched_prev = (void *)cpu;
event_data.event_info.context_data.sched_next = (void *)sd_action;
record_event_all_consumers(PPME_CPU_HOTPLUG_E, UF_NEVER_DROP, &event_data);
record_event_all_consumers(PPME_CPU_HOTPLUG_E, UF_NEVER_DROP, &event_data, TP_VAL_INTERNAL);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internal event; use canary value for tp.

@Andreagit97 Andreagit97 added this to the 0.10.0 milestone Nov 10, 2022
@FedeDP
Copy link
Contributor Author

FedeDP commented Nov 10, 2022

Most of the interesting stuff (and the locs) is in kmod; i highlighted relevant changes in the self-review.

@FedeDP
Copy link
Contributor Author

FedeDP commented Nov 14, 2022

Updated kmod code:
It now can't have desynced g_tracepoints_attached and g_tracepoints_refs.
Only thing that can happens is that when destroying a consumer, that is the only one present, thus triggering a tracepoint detach, for some reason not all tracepoints can be detached; so we end up with some refs/attached tracepoints
that are not properly cleaned, even if no consumer exists.

This just means that we won't be able to disable them anymore because we will have some zombie references to those tracepoints.
Not a big deal, and i think it should really never happen.

/cc @gnosek @Andreagit97

@poiana poiana requested review from Andreagit97 and gnosek November 14, 2022 09:51
Comment on lines 627 to +631
u32 idx;
u32 new_val;
u32 curr_val;
int ret = 0;

if (new_tp_set == g_tracepoints_attached)
{
// ok already equal
return ret;
}
int cpu;
int ret;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would initialize all these variables just to be future proof

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't mix declaration and code in c89/90 :) I mean, we actually can but then any added variable must be initialized.
We try to avoid doing that in kmod!

Andreagit97
Andreagit97 previously approved these changes Nov 21, 2022
Copy link
Member

@Andreagit97 Andreagit97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

FedeDP and others added 12 commits November 22, 2022 09:39
…dropped `capture_enabled` from modern bpf probe.

Signed-off-by: Federico Di Pierro <nierro92@gmail.com>
…er in headers that can be included by other engines.

Signed-off-by: Federico Di Pierro <nierro92@gmail.com>
Signed-off-by: Federico Di Pierro <nierro92@gmail.com>
Signed-off-by: Federico Di Pierro <nierro92@gmail.com>
Moreover, made tracepoint dynamic management per-consumer.
Tracepoints will be attached using a ref-counted method, so that
a tracepoint that is not requested by any consumer, gets detached.

Moreover, now kmod follows other drivers behavior: as soon as it is started,
no tracepoint is attached. Instead, tracepoints must be manually set
by userspace consumer, using the PPM_IOCTL_MANAGE_TP ioctl.

All of the above, also means that an inserted kmod won't steal any resource
until any consumer is actually using it, in a way much similar to other drivers.

Signed-off-by: Federico Di Pierro <nierro92@gmail.com>
…nimum driver api version.

Signed-off-by: Federico Di Pierro <nierro92@gmail.com>
Signed-off-by: Federico Di Pierro <nierro92@gmail.com>
Signed-off-by: Federico Di Pierro <nierro92@gmail.com>
…heir original position.

Signed-off-by: Federico Di Pierro <nierro92@gmail.com>
It now **can't** have desynced `g_tracepoints_attached` and `g_tracepoints_refs`.
Only thing that can happens is that when destroying a consumer (that is the only one present),
for any reason, not all tracepoints can be detached; so we end up with some refs/attached tracepoints
that are not properly cleaned, even if no consumer exists.

This just means that we won't be able to disable them anymore because we will have
some zombie references to those tracepoints.
Not a big deal, and i think it should really never happen.

Signed-off-by: Federico Di Pierro <nierro92@gmail.com>
Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it>
Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it>
@FedeDP
Copy link
Contributor Author

FedeDP commented Nov 22, 2022

Rebased on top of master.

Copy link
Member

@Andreagit97 Andreagit97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@poiana poiana added the lgtm label Nov 22, 2022
@poiana
Copy link
Contributor

poiana commented Nov 22, 2022

LGTM label has been added.

DetailsGit tree hash: 11e47c318daf26b0f0d946b8d0e56d33564f127b

@poiana
Copy link
Contributor

poiana commented Nov 22, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Andreagit97, FedeDP, leogr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [Andreagit97,FedeDP,leogr]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@FedeDP
Copy link
Contributor Author

FedeDP commented Dec 2, 2022

/milestone 4.0.0+driver

@poiana poiana modified the milestones: 0.10.0, 4.0.0+driver Dec 2, 2022
@FedeDP FedeDP mentioned this pull request Dec 2, 2022
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants