Skip to content

Dynamic add/remove of PID selection#1388

Merged
damemi merged 7 commits into
open-telemetry:mainfrom
damemi:dynamic-pid-selector
Mar 16, 2026
Merged

Dynamic add/remove of PID selection#1388
damemi merged 7 commits into
open-telemetry:mainfrom
damemi:dynamic-pid-selector

Conversation

@damemi
Copy link
Copy Markdown
Member

@damemi damemi commented Mar 2, 2026

Follow up to #1321

This adds the ability to add and remove PID selections dynamically for callers that import the Instrumenter. It's used by providing an option to instrumenter.Run() called WithDynamicPIDSelector(). The caller can initialize a discover.NewDynamicPIDSelector() which they can hold reference to and add/update pids dynamically.

This is meant as an incremental step toward the ultimate goal of modularizing the Instrumenter and Tracers. In the future, it would be nice to support returning an instance of the Instrumenter directly. But in the interest of a smaller PR, I wanted to wire this through as much existing code as possible.

The intended usage is documented in a new test, but it is essentially:

package main

import (
	"context"
	"go.opentelemetry.io/obi/pkg/appolly/discover"
	"go.opentelemetry.io/obi/pkg/instrumenter"
	"go.opentelemetry.io/obi/pkg/obi"
)

func main() {
	ctx := context.Background()
	cfg := &obi.Config{ /* ... */ }

	// Create the selector and pass it into Run. No callback and no reference to the instrumenter.
	sel := discover.NewDynamicPIDSelector()
	_ = instrumenter.Run(ctx, cfg, instrumenter.WithDynamicPIDSelector(sel))

	// Add/remove PIDs at runtime by calling the selector directly.
	sel.AddPIDs(42, 100)
	pids, _ := sel.GetPIDs()
	sel.RemovePIDs(42)
}

t;dr

  • New DynamicPIDSelector type providing AddPIDs(), RemovePIDs(), and GetPIDs() functions
  • DynamicPIDSelector implemented as a Criteria for Matcher when process created events are received
  • RemovedNotify() channel handles synthetic delete events from DynamicPIDSelector.RemovePIDs() which triggers the existing m.filterDeleted() flow

Checklist

@damemi damemi requested a review from a team as a code owner March 2, 2026 16:05
@damemi damemi force-pushed the dynamic-pid-selector branch 3 times, most recently from 30138ef to bac7549 Compare March 2, 2026 17:06
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 2, 2026

Codecov Report

❌ Patch coverage is 71.19342% with 70 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.26%. Comparing base (8f0335b) to head (c401fcd).
⚠️ Report is 15 commits behind head on main.

Files with missing lines Patch % Lines
pkg/appolly/discover/dynamic_pid_selector.go 68.54% 37 Missing and 2 partials ⚠️
pkg/appolly/discover/watcher_proc.go 50.00% 14 Missing and 1 partial ⚠️
pkg/appolly/discover/matcher.go 80.43% 7 Missing and 2 partials ⚠️
pkg/appolly/discover/finder.go 75.00% 3 Missing ⚠️
pkg/internal/appolly/appolly.go 86.36% 3 Missing ⚠️
pkg/instrumenter/opts.go 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1388      +/-   ##
==========================================
+ Coverage   43.86%   44.26%   +0.39%     
==========================================
  Files         318      320       +2     
  Lines       34920    35175     +255     
==========================================
+ Hits        15319    15569     +250     
- Misses      18595    18614      +19     
+ Partials     1006      992      -14     
Flag Coverage Δ
integration-test 21.04% <21.22%> (-0.12%) ⬇️
integration-test-arm 0.00% <0.00%> (ø)
integration-test-vm-x86_64-5.15.152 ?
integration-test-vm-x86_64-6.10.6 ?
k8s-integration-test 2.24% <0.00%> (-0.02%) ⬇️
oats-test 0.00% <0.00%> (ø)
unittests 47.34% <79.62%> (+2.36%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@damemi damemi force-pushed the dynamic-pid-selector branch 3 times, most recently from 45adc85 to 45bd813 Compare March 2, 2026 21:27
Copy link
Copy Markdown
Contributor

@grcevski grcevski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty cool! I was wondering what you think about making the whole selection criteria reloadable? Was that difficult to achieve? Because now we have the pids in the selection criteria, I was thinking maybe it's doable to just pass new criteria to the matcher...

How does the dynamic PID communicate to the attacher that it should remove the process? Right now the only mechanism is from the watcher to send the process terminated, I think we need to send an event like that on remove...

@damemi
Copy link
Copy Markdown
Member Author

damemi commented Mar 4, 2026

I was wondering what you think about making the whole selection criteria reloadable? Was that difficult to achieve? Because now we have the pids in the selection criteria, I was thinking maybe it's doable to just pass new criteria to the matcher...

I actually did think about that, and it's totally another way we could do it. But the question is I think do we want to potentially have a globattribute for each add/update of PIDs, or store all the PIDs that get added/updated in a single globattribute? Or should globattribute have an identifier that we can use to edit when we want or add/delete when we want

How does the dynamic PID communicate to the attacher that it should remove the process? Right now the only mechanism is from the watcher to send the process terminated, I think we need to send an event like that on remove...

that is a good point and something I didn't think of, I'll make that update (should be pretty easy to do)

@damemi damemi force-pushed the dynamic-pid-selector branch 3 times, most recently from 8ae5d2f to 0c64452 Compare March 6, 2026 20:51
@damemi
Copy link
Copy Markdown
Member Author

damemi commented Mar 6, 2026

@grcevski I pushed a new commit that wires synthetic deletes through from the appolly instrumenter to the attach. It was a little more complicated than I thought, mostly just because of the multiple hops and passing the channel through the finder (from matcher, to exectyper, to containerstoreupdater, to finally attacher). I'd like to continue poking at refactoring some of those internals in future PRs but lmk what you think.

For passing entire criteria to the matcher, it would probably just be one layer above all of this. But the question is still how do you distinguish different criteria when adding/removing?

Semantically, are users expected today to use multiple different Criteria for things like port selection? It seems like the IntEnum (fka PortEnum) type is meant to keep that all in a single Criteria. But I just don't know what the intended usage was

Copy link
Copy Markdown
Contributor

@mariomac mariomac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing job! I think the code can be simplified and decoupled a bit. Basically instead of adding the PIDs accounting logic to an interface that has multiple implementations and instances, which was initially planned to contain user configuration; we can move this to a concrete type that can be directly added to the instrumenter/appo11y and passed to the matcher as a reference.

This way we remove a lot of wiring logic and decouple the configuration types from the dynamic PID selection capabilities.

Comment thread pkg/appolly/discover/matcher.go Outdated
Comment thread pkg/appolly/services/criteria.go Outdated
Comment on lines +174 to +177
// AddPIDs adds PIDs to the selector's list; no-op for selectors that do not support dynamic PIDs.
AddPIDs(pids ...uint32)
// RemovePIDs removes PIDs from the selector's list; no-op for selectors that do not support dynamic PIDs.
RemovePIDs(pids ...uint32)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Selector should be a read-only only interface for accepting user-provided configuration and now we are turning it mutable, and overloading it with extra methods that have to be implemented with no-ops in most implementations. Also we have to consider that OBI can manage multiple GlobAttributes implementations (one for each user-defined criteria set).

What I would suggest:

  1. Remove AddPIDs/RemovePIDs from the Selector interface. So the finding criteria types are just user-configuration types without extra logic.
  2. Create a DynamicPIDSelector (or whatever type) concrete type that is unique to OBI. Probably needs to be preloaded by the aggregation of all the user-provided target_pids configurations.
  3. The instrumenter or appo11y types will hold a reference to the DynamicPIDSelector type. It can call AddPIDs/RemovePIDs directly without the interface.
  4. Only Matcher should know about dynamic PID selection. Matcher already owns ProcessHistory, so it is the right place to: (1) treat runtime PIDs as an override or supplement, (2) emit synthetic deletes from Removed.

This also would avoid most of the WithPIDSelectorChangeNotifier, WithPIDSelector, Set.. wiring logic. You can pas one dependency into discovery, not two. Replace WithPIDSelector + WithPIDSelectorChangeNotifier with one WithPIDController(...), or keep it entirely internal if vendored users do not need it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, I like this suggestion. I was trying to wire these changes through without changing too much of the existing pathways, but that made the overall approach look weird with the problems you're pointing out. This can definitely be more like what you're saying

Comment thread pkg/instrumenter/opts.go Outdated
Comment thread pkg/appolly/discover/watcher_proc.go Outdated
// as well as PIDs from processes that setup a new connection
func ProcessWatcherFunc(cfg *obi.Config, ebpfContext *ebpfcommon.EBPFEventContext, output *msg.Queue[[]Event[ProcessAttrs]]) swarm.RunFunc {
// as well as PIDs from processes that setup a new connection.
func ProcessWatcherFunc(cfg *obi.Config, ebpfContext *ebpfcommon.EBPFEventContext, output *msg.Queue[[]Event[ProcessAttrs]], pidSelector services.Selector) swarm.RunFunc {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that once we implement my suggestion for criteria.go, we can revert all the changes in ProcessWatcher because the selection logic is really handled in the Matcher node.

Comment thread pkg/instrumenter/opts.go Outdated
Copy link
Copy Markdown
Contributor

@mariomac mariomac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better now! I'd still implement this comment: https://github.com/open-telemetry/opentelemetry-ebpf-instrumentation/pull/1388/changes#r2918809548 but it can be carried out in another PR.

@damemi damemi force-pushed the dynamic-pid-selector branch from d29322d to aa2a145 Compare March 13, 2026 15:35
Copy link
Copy Markdown
Contributor

@grcevski grcevski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the clean separation of concerns now with the introduction of the DynamicPIDSelector and also praise for all the new tests that were added!

I added a few comments, but I think the main one I'm concerned is making the dynamic pid selector work with existing processes. This is the scenario I have in mind:

  1. Process PID 42 starts, nobody cares about it, we don't trigger matcher on it.
  2. watcher_proc polls /proc, sees PID 42, then it emits EventCreated{pid:42} and adds it to pa.pids[42]
  3. Event flows to the matcher. At this point DynamicPIDSelector is empty, so no criteria match, PID 42 is not considered, but we have configured the dynamic pid selector.
  4. Caller calls AddPIDs(42) on the dynamic selector
  5. Next poll, watcher_proc calls checkNewProcessNotification(42, ...), pa.pids[42] already exists , returns false and no event emitted to trigger the pipeline.
  6. The matcher is never told to re-evaluate PID 42, which now is part of the memory of the dynamic pid selector.

Comment thread pkg/internal/appolly/appolly.go

// AddPIDs adds PIDs to the set (deduplicated).
func (d *DynamicPIDSelector) AddPIDs(pids ...uint32) {
if len(pids) == 0 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a notify path here too? What if we dropped OBI in an existing environment and wanted to dynamically add a process that has already been running a while. I think watcher_proc will remember it as something it has seen and unless something tells the pipeline to conside this new PID we may not re-evaluate?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added something similar to the remove channel, but it's an add notify to the watcher_proc that tells it to "forget this pid" so the next snapshot picks it up and matches it to the dynamic selector

d.notifyRemoved(removedPIDs)
}

func (d *DynamicPIDSelector) notifyRemoved(removedPIDs []app.PID) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a timing issue here if RemovePIDs is called twice and matcher hasn't drained the first PID? Could this cause the second pid to never be uninstrumented?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, what I ended up doing here was give dynamicselector an addqueue+removequeue, so it can store pending calls and process them in batches. do you have any other suggestions for an approach there?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that should work and we can fix it if it causes you deadlocks.

Comment thread pkg/instrumenter/instrumenter_test.go
Comment thread pkg/appolly/discover/matcher.go Outdated
@damemi damemi force-pushed the dynamic-pid-selector branch from aa2a145 to b842295 Compare March 13, 2026 20:13
@damemi damemi force-pushed the dynamic-pid-selector branch from b842295 to 218366b Compare March 13, 2026 20:37
Copy link
Copy Markdown
Contributor

@grcevski grcevski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@damemi damemi merged commit 6e25e94 into open-telemetry:main Mar 16, 2026
65 checks passed
@MrAlias MrAlias added this to the v0.7.0 milestone Apr 1, 2026
damemi added a commit to odigos-io/odigos that referenced this pull request Apr 27, 2026
#### What this PR does / why we need it:
This adds `opentelemetry-ebpf-instrumentation` (OBI) as a supported
workload instrumentation distro.

The benefit of this is to provide additional coverage and offer users a
way to deploy OBI within our platform if they'd like to.

It relies on features we added upstream to OBI in
open-telemetry/opentelemetry-ebpf-instrumentation#1321
and
open-telemetry/opentelemetry-ebpf-instrumentation#1388
to support dynamic PID selection (as opposed to OBI's static service/exe
config approach).

The OBI `Instrumenter` is a single long-lived routine that handles
process events, filters/matches them based on config criteria (ie, which
PIDs we want), and attaches shared eBPF programs to those processes. The
`DynamicSelector` we added upstream provides a hook into the
`Instrumenter` to update the filter/matching criteria during runtime
without needing to restart OBI.

This adds OBI as an sdk/distro that can be used with any language. The
basic flow of the new distro is:
1. `NewOBIInstrumentationFactory()`: Initialize the OBI config and
DynamicSelector, but does not start the OBI `Instrumenter` routine yet.
This is to prevent the `Instrumenter` from running and using resources
if nothing is actually instrumented by OBI.
2. `factory.CreateInstrumentation(ctx, pid)`: If the OBI `Instrumenter`
is not started yet, this starts it and returns an `obiInstrumentation`
handle that stores that DynamicSelector and the instrumented PID. This
does not attach the PID to OBI yet.
3. `obiInstrumentation.Load()`: Uses the stored PID in the `o`
instrumentation object to call `DynamicSelector.AddPIDs(pid)` which
updates the running OBI manager to include/attach this process.
4. `obiInstrumentation.Close()`: Removes the stored PID in `o` with
`DynamicSelect.RemovePIDs(pid)` to update the OBI manager to
exclude/detach this process.

A caveat to this design is that it's difficult to _stop_ the
Instrumenter once it's started (ie, all OBI processes get uninstrumented
and we no longer need that routine) without complex async management. We
could do something like this on `Close()`:

```go
func Close() {
  f.Selector.RemovePIDs(pid)
  if len(f.Selector.GetPIDs()) == 0 {
    cancelObiCtx()
  }
}
```

But that would require a mutex, and it's possible that another OBI app
could come along in the meantime waiting to be instrumented while that
mutex is held, and the Instrumenter gets cancelled before the new app
gets added. Which would be very confusing to debug. I tried a lot of
different approaches and they were all messy. I think some more upstream
OBI changes around an actual handle on the instrumenter would help,
along with other use cases for holding a reference to the instrumenter
itself.

Overall:

* OBI Factory -- Called once at odiglet startup, runs nothing
* OBI CreateInstrumentation -- Starts OBI (if necessary) as a singleton
and creates the per-process `instrumentation` used by Odiglet
* Instrumentation Load -- Attaches the PID to the OBI singleton

Another possible upstream contribution would be a signal from OBI for
when it's actually started/running/ready to accept new PIDs which Load()
could wait on

This whole approach is slightly different from languages like Go, where
the *Odiglet* is the long lived process, so each
`CreateInstrumentation()` in Go calls the go-auto framework to load,
attach, and manage. In the case of OBI, the OBI Instrumenter handles
loading, attaching, and managing so Odiglet is a wrapped layer on top of
that for integration with our control plane.

This also adds OBI ebpf generation to the odiglet Dockerfile using the
upstream
[obi-generator[(https://github.com/open-telemetry/opentelemetry-ebpf-instrumentation/pkgs/container/obi-generator)

The OBI distro is added as its own module so it can be imported into
enterprise easier:
odigos-io/odigos-enterprise#2577 (see that PR
for details on OBI distro module)

UI Changes here: odigos-io/ui-kit#748

New C++ app in simple-demo for an e2e test here:
odigos-io/simple-demo#67

#### Changelog entry: Does this PR introduce a user-facing bug fix,
feature, dependency update, or breaking change??
<!--
This section will go in the release notes for this version. Is this
something users should be able to find easily?
If no, just write "NONE" in the release-note block below.
If yes, please add a release note in the block below describing this in
1-2 sentences for our changelog.
-->

```release-note
feat: Support opentelemetry-ebpf-instrumentation (OBI) for workload instrumentation
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants