Dynamic add/remove of PID selection#1388
Conversation
30138ef to
bac7549
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1388 +/- ##
==========================================
+ Coverage 43.86% 44.26% +0.39%
==========================================
Files 318 320 +2
Lines 34920 35175 +255
==========================================
+ Hits 15319 15569 +250
- Misses 18595 18614 +19
+ Partials 1006 992 -14
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
45adc85 to
45bd813
Compare
grcevski
left a comment
There was a problem hiding this comment.
This is pretty cool! I was wondering what you think about making the whole selection criteria reloadable? Was that difficult to achieve? Because now we have the pids in the selection criteria, I was thinking maybe it's doable to just pass new criteria to the matcher...
How does the dynamic PID communicate to the attacher that it should remove the process? Right now the only mechanism is from the watcher to send the process terminated, I think we need to send an event like that on remove...
I actually did think about that, and it's totally another way we could do it. But the question is I think do we want to potentially have a globattribute for each add/update of PIDs, or store all the PIDs that get added/updated in a single globattribute? Or should globattribute have an identifier that we can use to edit when we want or add/delete when we want
that is a good point and something I didn't think of, I'll make that update (should be pretty easy to do) |
8ae5d2f to
0c64452
Compare
|
@grcevski I pushed a new commit that wires synthetic deletes through from the appolly instrumenter to the attach. It was a little more complicated than I thought, mostly just because of the multiple hops and passing the channel through the finder (from matcher, to exectyper, to containerstoreupdater, to finally attacher). I'd like to continue poking at refactoring some of those internals in future PRs but lmk what you think. For passing entire criteria to the matcher, it would probably just be one layer above all of this. But the question is still how do you distinguish different criteria when adding/removing? Semantically, are users expected today to use multiple different Criteria for things like port selection? It seems like the IntEnum (fka PortEnum) type is meant to keep that all in a single Criteria. But I just don't know what the intended usage was |
mariomac
left a comment
There was a problem hiding this comment.
Amazing job! I think the code can be simplified and decoupled a bit. Basically instead of adding the PIDs accounting logic to an interface that has multiple implementations and instances, which was initially planned to contain user configuration; we can move this to a concrete type that can be directly added to the instrumenter/appo11y and passed to the matcher as a reference.
This way we remove a lot of wiring logic and decouple the configuration types from the dynamic PID selection capabilities.
| // AddPIDs adds PIDs to the selector's list; no-op for selectors that do not support dynamic PIDs. | ||
| AddPIDs(pids ...uint32) | ||
| // RemovePIDs removes PIDs from the selector's list; no-op for selectors that do not support dynamic PIDs. | ||
| RemovePIDs(pids ...uint32) |
There was a problem hiding this comment.
Selector should be a read-only only interface for accepting user-provided configuration and now we are turning it mutable, and overloading it with extra methods that have to be implemented with no-ops in most implementations. Also we have to consider that OBI can manage multiple GlobAttributes implementations (one for each user-defined criteria set).
What I would suggest:
- Remove AddPIDs/RemovePIDs from the
Selectorinterface. So the finding criteria types are just user-configuration types without extra logic. - Create a
DynamicPIDSelector(or whatever type) concrete type that is unique to OBI. Probably needs to be preloaded by the aggregation of all the user-providedtarget_pidsconfigurations. - The instrumenter or appo11y types will hold a reference to the
DynamicPIDSelectortype. It can call AddPIDs/RemovePIDs directly without the interface. - Only Matcher should know about dynamic PID selection. Matcher already owns ProcessHistory, so it is the right place to: (1) treat runtime PIDs as an override or supplement, (2) emit synthetic deletes from
Removed.
This also would avoid most of the WithPIDSelectorChangeNotifier, WithPIDSelector, Set.. wiring logic. You can pas one dependency into discovery, not two. Replace WithPIDSelector + WithPIDSelectorChangeNotifier with one WithPIDController(...), or keep it entirely internal if vendored users do not need it.
There was a problem hiding this comment.
agreed, I like this suggestion. I was trying to wire these changes through without changing too much of the existing pathways, but that made the overall approach look weird with the problems you're pointing out. This can definitely be more like what you're saying
| // as well as PIDs from processes that setup a new connection | ||
| func ProcessWatcherFunc(cfg *obi.Config, ebpfContext *ebpfcommon.EBPFEventContext, output *msg.Queue[[]Event[ProcessAttrs]]) swarm.RunFunc { | ||
| // as well as PIDs from processes that setup a new connection. | ||
| func ProcessWatcherFunc(cfg *obi.Config, ebpfContext *ebpfcommon.EBPFEventContext, output *msg.Queue[[]Event[ProcessAttrs]], pidSelector services.Selector) swarm.RunFunc { |
There was a problem hiding this comment.
I think that once we implement my suggestion for criteria.go, we can revert all the changes in ProcessWatcher because the selection logic is really handled in the Matcher node.
b8096c7 to
d29322d
Compare
mariomac
left a comment
There was a problem hiding this comment.
Much better now! I'd still implement this comment: https://github.com/open-telemetry/opentelemetry-ebpf-instrumentation/pull/1388/changes#r2918809548 but it can be carried out in another PR.
d29322d to
aa2a145
Compare
grcevski
left a comment
There was a problem hiding this comment.
I really like the clean separation of concerns now with the introduction of the DynamicPIDSelector and also praise for all the new tests that were added!
I added a few comments, but I think the main one I'm concerned is making the dynamic pid selector work with existing processes. This is the scenario I have in mind:
- Process PID 42 starts, nobody cares about it, we don't trigger matcher on it.
- watcher_proc polls /proc, sees PID 42, then it emits EventCreated{pid:42} and adds it to pa.pids[42]
- Event flows to the matcher. At this point DynamicPIDSelector is empty, so no criteria match, PID 42 is not considered, but we have configured the dynamic pid selector.
- Caller calls AddPIDs(42) on the dynamic selector
- Next poll, watcher_proc calls checkNewProcessNotification(42, ...), pa.pids[42] already exists , returns false and no event emitted to trigger the pipeline.
- The matcher is never told to re-evaluate PID 42, which now is part of the memory of the dynamic pid selector.
|
|
||
| // AddPIDs adds PIDs to the set (deduplicated). | ||
| func (d *DynamicPIDSelector) AddPIDs(pids ...uint32) { | ||
| if len(pids) == 0 { |
There was a problem hiding this comment.
Do we need a notify path here too? What if we dropped OBI in an existing environment and wanted to dynamically add a process that has already been running a while. I think watcher_proc will remember it as something it has seen and unless something tells the pipeline to conside this new PID we may not re-evaluate?
There was a problem hiding this comment.
added something similar to the remove channel, but it's an add notify to the watcher_proc that tells it to "forget this pid" so the next snapshot picks it up and matches it to the dynamic selector
| d.notifyRemoved(removedPIDs) | ||
| } | ||
|
|
||
| func (d *DynamicPIDSelector) notifyRemoved(removedPIDs []app.PID) { |
There was a problem hiding this comment.
Is there a timing issue here if RemovePIDs is called twice and matcher hasn't drained the first PID? Could this cause the second pid to never be uninstrumented?
There was a problem hiding this comment.
good point, what I ended up doing here was give dynamicselector an addqueue+removequeue, so it can store pending calls and process them in batches. do you have any other suggestions for an approach there?
There was a problem hiding this comment.
I think that should work and we can fix it if it causes you deadlocks.
aa2a145 to
b842295
Compare
b842295 to
218366b
Compare
#### What this PR does / why we need it: This adds `opentelemetry-ebpf-instrumentation` (OBI) as a supported workload instrumentation distro. The benefit of this is to provide additional coverage and offer users a way to deploy OBI within our platform if they'd like to. It relies on features we added upstream to OBI in open-telemetry/opentelemetry-ebpf-instrumentation#1321 and open-telemetry/opentelemetry-ebpf-instrumentation#1388 to support dynamic PID selection (as opposed to OBI's static service/exe config approach). The OBI `Instrumenter` is a single long-lived routine that handles process events, filters/matches them based on config criteria (ie, which PIDs we want), and attaches shared eBPF programs to those processes. The `DynamicSelector` we added upstream provides a hook into the `Instrumenter` to update the filter/matching criteria during runtime without needing to restart OBI. This adds OBI as an sdk/distro that can be used with any language. The basic flow of the new distro is: 1. `NewOBIInstrumentationFactory()`: Initialize the OBI config and DynamicSelector, but does not start the OBI `Instrumenter` routine yet. This is to prevent the `Instrumenter` from running and using resources if nothing is actually instrumented by OBI. 2. `factory.CreateInstrumentation(ctx, pid)`: If the OBI `Instrumenter` is not started yet, this starts it and returns an `obiInstrumentation` handle that stores that DynamicSelector and the instrumented PID. This does not attach the PID to OBI yet. 3. `obiInstrumentation.Load()`: Uses the stored PID in the `o` instrumentation object to call `DynamicSelector.AddPIDs(pid)` which updates the running OBI manager to include/attach this process. 4. `obiInstrumentation.Close()`: Removes the stored PID in `o` with `DynamicSelect.RemovePIDs(pid)` to update the OBI manager to exclude/detach this process. A caveat to this design is that it's difficult to _stop_ the Instrumenter once it's started (ie, all OBI processes get uninstrumented and we no longer need that routine) without complex async management. We could do something like this on `Close()`: ```go func Close() { f.Selector.RemovePIDs(pid) if len(f.Selector.GetPIDs()) == 0 { cancelObiCtx() } } ``` But that would require a mutex, and it's possible that another OBI app could come along in the meantime waiting to be instrumented while that mutex is held, and the Instrumenter gets cancelled before the new app gets added. Which would be very confusing to debug. I tried a lot of different approaches and they were all messy. I think some more upstream OBI changes around an actual handle on the instrumenter would help, along with other use cases for holding a reference to the instrumenter itself. Overall: * OBI Factory -- Called once at odiglet startup, runs nothing * OBI CreateInstrumentation -- Starts OBI (if necessary) as a singleton and creates the per-process `instrumentation` used by Odiglet * Instrumentation Load -- Attaches the PID to the OBI singleton Another possible upstream contribution would be a signal from OBI for when it's actually started/running/ready to accept new PIDs which Load() could wait on This whole approach is slightly different from languages like Go, where the *Odiglet* is the long lived process, so each `CreateInstrumentation()` in Go calls the go-auto framework to load, attach, and manage. In the case of OBI, the OBI Instrumenter handles loading, attaching, and managing so Odiglet is a wrapped layer on top of that for integration with our control plane. This also adds OBI ebpf generation to the odiglet Dockerfile using the upstream [obi-generator[(https://github.com/open-telemetry/opentelemetry-ebpf-instrumentation/pkgs/container/obi-generator) The OBI distro is added as its own module so it can be imported into enterprise easier: odigos-io/odigos-enterprise#2577 (see that PR for details on OBI distro module) UI Changes here: odigos-io/ui-kit#748 New C++ app in simple-demo for an e2e test here: odigos-io/simple-demo#67 #### Changelog entry: Does this PR introduce a user-facing bug fix, feature, dependency update, or breaking change?? <!-- This section will go in the release notes for this version. Is this something users should be able to find easily? If no, just write "NONE" in the release-note block below. If yes, please add a release note in the block below describing this in 1-2 sentences for our changelog. --> ```release-note feat: Support opentelemetry-ebpf-instrumentation (OBI) for workload instrumentation ```
Follow up to #1321
This adds the ability to add and remove PID selections dynamically for callers that import the Instrumenter. It's used by providing an option to
instrumenter.Run()calledWithDynamicPIDSelector(). The caller can initialize adiscover.NewDynamicPIDSelector()which they can hold reference to and add/update pids dynamically.This is meant as an incremental step toward the ultimate goal of modularizing the Instrumenter and Tracers. In the future, it would be nice to support returning an instance of the Instrumenter directly. But in the interest of a smaller PR, I wanted to wire this through as much existing code as possible.
The intended usage is documented in a new test, but it is essentially:
t;dr
DynamicPIDSelectortype providingAddPIDs(),RemovePIDs(), andGetPIDs()functionsDynamicPIDSelectorimplemented as aCriteriafor Matcher when process created events are receivedRemovedNotify()channel handles synthetic delete events fromDynamicPIDSelector.RemovePIDs()which triggers the existingm.filterDeleted()flowChecklist