Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow for concurrent listing of tracing policies #2506

Merged
merged 14 commits into from
Jun 10, 2024

Conversation

kkourt
Copy link
Contributor

@kkourt kkourt commented Jun 4, 2024

Fixes: #2429

This PR modifies the sensor manager to allow listing of tracing policies when sensors are loaded/unloaded.

Some BPF programs (and their corresponding sensors) take a long time to load. In the existing implementation, listing policies is serialized with loading/unloading so listing policies has to wait for any loading/unloading operations to complete.

Because tracing policies are listed as part of the metrics, 5ce0f71 introduced a timeout for listing tracing policies, which leads to this timeout being hit: #2429.

This PR addresses this issue by introducing new states for policies (loading/unloading), and modifying the sensor manager to allow for concurrent listing of policies with loading/unloading. To do so, we add a RW lock and execute listing outside of the manager goroutine.

The PR also introduces a sensor interface that allows for better testing. The PR uses this interface to test the newly added functionality.

Please see patches for more details.

sensors: allow reporting policy status when loading/unloading sensors

Copy link

netlify bot commented Jun 4, 2024

Deploy Preview for tetragon ready!

Name Link
🔨 Latest commit e5f3585
🔍 Latest deploy log https://app.netlify.com/sites/tetragon/deploys/6667155058325300096ef433
😎 Deploy Preview https://deploy-preview-2506--tetragon.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@kkourt kkourt added the release-note/minor This PR introduces a minor user-visible change label Jun 4, 2024
@kkourt kkourt force-pushed the pr/kkorut/sensors-concurrent-listing branch from 4fb2f68 to 47690ce Compare June 4, 2024 13:20
@kkourt kkourt marked this pull request as ready for review June 4, 2024 13:36
@kkourt kkourt requested a review from a team as a code owner June 4, 2024 13:36
@kkourt kkourt requested a review from tixxdz June 4, 2024 13:36
@kkourt kkourt marked this pull request as draft June 4, 2024 13:37
@kkourt kkourt force-pushed the pr/kkorut/sensors-concurrent-listing branch 2 times, most recently from 00de621 to 56bd4bc Compare June 4, 2024 14:23
@mtardy mtardy self-requested a review June 4, 2024 14:40
@kkourt kkourt force-pushed the pr/kkorut/sensors-concurrent-listing branch from 56bd4bc to 8aa738c Compare June 4, 2024 15:45
@kkourt kkourt marked this pull request as ready for review June 4, 2024 16:36
Copy link
Member

@mtardy mtardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good to me, really like the testing with the fake delayed sensor.

As we discussed offline previously, this makes the code significantly more complex indeed, especially with the manual locking parts with the loading and unloading parts. Touching this code would require people to read the git history to understand, maybe we can add more immediate context by commenting a bit more. But the presence of explicit locks is already explicit comments in some ways.

pkg/sensors/handler.go Outdated Show resolved Hide resolved
pkg/sensors/manager.go Show resolved Hide resolved
pkg/sensors/handler.go Show resolved Hide resolved
kkourt added 14 commits June 10, 2024 16:55
Previously, the value type of the collection map in the handler was
"collection", i.e., a non-pointer type. This patch changes it to
*collection to avoid copies. This will allow for some code
simplifications which will come in later patches.

Signed-off-by: Kornilios Kourtis <[email protected]>
The previous patch modified the collection map to hold pointers. Hence,
there is no need to update the map, we can just update the collection.

Signed-off-by: Kornilios Kourtis <[email protected]>
This patch creates a new structure collectionMap which contains the map
for the collections and a lock. The intention is to implement
a read-only method to list tracig policies that does not have to go via
the manager goroutine.

Signed-off-by: Kornilios Kourtis <[email protected]>
Add a listPolicies method in collectionMap. This is meant to be used
outside of the manager goroutine.

Signed-off-by: Kornilios Kourtis <[email protected]>
This patch modifies the Manager ListTracingPolicies to not go via the
manager goroutine, and, instead directly accesses the collection map.
This will allow listing the policies without having to wait for BPF
program loads to finish. Note that ListTracingPolicies is used both from
the gRPC server and the metrics.

Signed-off-by: Kornilios Kourtis <[email protected]>
Add a loading and unloading state for tracing policies. It will be used
in subsequent patches for listing policies during loading/unloading.

Signed-off-by: Kornilios Kourtis <[email protected]>
Signed-off-by: Kornilios Kourtis <[email protected]>
This patch modifies addTracingPolicy to release the lock to the
collections while loading a sensor. Before releasing the lock, the state
is changed to LoadingState.

Hence, while loading the sensors of a policy, it is now possible to list
the collections. Note that we assume that collections cannot be
modified, i.e., the state of a collection cannot change while loading.
Above is achieved by only alllowing read access outside of the
goroutine.

Signed-off-by: Kornilios Kourtis <[email protected]>
The unload (load) operation only makes sense for enabled (disabled) collections.
Add the proper checks.

Signed-off-by: Kornilios Kourtis <[email protected]>
Instead of destroying the sensors of a policy and then removing them
from the collection, remove them first and then destroy them. Once we
remove them, we can release the collection lock that allows listing of
the policies.

Signed-off-by: Kornilios Kourtis <[email protected]>
This patch modifies disableTracingPolicy to release the lock to the
collections while unloading a sensor. Before releasing the lock, the state
is changed to UnloadingState.

Signed-off-by: Kornilios Kourtis <[email protected]>
This patch modifies disableTracingPolicy to release the lock to the
collections while enabling a sensor. Before releasing the lock, the state
is changed to LoadingState.

Signed-off-by: Kornilios Kourtis <[email protected]>
current code use a struct Sensor, which makes testing the sensor manager
code hard. This patch introduces a sensor interface.

Subsequent patches will use the interface for testing.

Signed-off-by: Kornilios Kourtis <[email protected]>
This patch tests the intermediate states that were introduced in the
previous patches. Specifically, it tests whether ListTracingPolicies
call observes loading/unloading states for the corresponding sensors. It
does so by a TestDelayedSensor that implements the SensorIface
interface.

Signed-off-by: Kornilios Kourtis <[email protected]>
@kkourt kkourt force-pushed the pr/kkorut/sensors-concurrent-listing branch from 8aa738c to e5f3585 Compare June 10, 2024 15:01
@kkourt kkourt merged commit d744b36 into main Jun 10, 2024
46 checks passed
@kkourt kkourt deleted the pr/kkorut/sensors-concurrent-listing branch June 10, 2024 17:00
tpapagian added a commit that referenced this pull request Jun 12, 2024
After merging #2506 it seems that
there are cases where 10 * 1 sec is not enough when waiting for a
tracing policy to load (mainly in e2e tests).

This patch increases that to 20 * 1 sec to avoid timeouts.

Signed-off-by: Anastasios Papagiannis <[email protected]>
tpapagian added a commit that referenced this pull request Jun 12, 2024
After merging #2506 it seems that
there are cases where 10 * 1 sec is not enough when waiting for a
tracing policy to load (mainly in e2e tests).

This patch increases that to 20 * 1 sec to avoid timeouts.

Signed-off-by: Anastasios Papagiannis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/minor This PR introduces a minor user-visible change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sensor manager: allow concurrent listing with loading.
2 participants