feat: Allow selecting instrumentation by PIDs#1321
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1321 +/- ##
===========================================
+ Coverage 19.60% 43.61% +24.01%
===========================================
Files 242 307 +65
Lines 28070 32962 +4892
===========================================
+ Hits 5502 14376 +8874
+ Misses 21922 17665 -4257
- Partials 646 921 +275
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| type PidSelector struct { | ||
| Pid app.PID | ||
| Name string | ||
| Namespace string | ||
| } | ||
|
|
||
| func (p *PidSelector) GetName() string { return p.Name } | ||
| func (p *PidSelector) GetNamespace() string { return p.Namespace } | ||
| func (p *PidSelector) GetPath() StringMatcher { return nilMatcher{} } | ||
| func (p *PidSelector) GetPathRegexp() StringMatcher { return nilMatcher{} } | ||
| func (p *PidSelector) GetOpenPorts() *PortEnum { return &PortEnum{} } | ||
| func (p *PidSelector) GetPID() (app.PID, bool) { return p.Pid, true } | ||
| func (p *PidSelector) IsContainersOnly() bool { return false } | ||
| func (p *PidSelector) GetExportModes() ExportModes { return ExportModes{} } | ||
| func (p *PidSelector) GetSamplerConfig() *SamplerConfig { return nil } | ||
| func (p *PidSelector) GetRoutesConfig() *CustomRoutesConfig { return nil } | ||
| func (p *PidSelector) MetricsConfig() perapp.SvcMetricsConfig { return perapp.SvcMetricsConfig{} } | ||
|
|
||
| func (p *PidSelector) RangeMetadata() iter.Seq2[string, StringMatcher] { | ||
| return func(_ func(string, StringMatcher) bool) {} | ||
| } | ||
|
|
||
| func (p *PidSelector) RangePodLabels() iter.Seq2[string, StringMatcher] { | ||
| return func(_ func(string, StringMatcher) bool) {} | ||
| } | ||
|
|
||
| func (p *PidSelector) RangePodAnnotations() iter.Seq2[string, StringMatcher] { | ||
| return func(_ func(string, StringMatcher) bool) {} | ||
| } |
There was a problem hiding this comment.
I think introducing a whole new Selector implementation type is not needed.
Analogously to what we do with the Open Ports attribute, you can just keep the GetPID() method in the Selector interface, plus implement it in the RegexSelector and GlobAttributes implementations, so any user can add the PID selection criteria to either the discovery > instrument or the (deprecated) discovery > services.
There was a problem hiding this comment.
sgtm, updated to just keep GetPID in the Selector interface and added to RegexSelector and GlobAttributes
2967a76 to
20458cb
Compare
grcevski
left a comment
There was a problem hiding this comment.
It's looking good @damemi! I really like the level of testing done.
I had one question relating if we should make the PID rule absolute and exclusive and a suggestion about commoning the code with the ports selector. It it's effectively the same type of value, a list of ints and ranges.
| return false | ||
| } | ||
| // PID matches; if this is PID-only criteria, skip path/port checks | ||
| if !a.GetPath().IsSet() && !a.GetPathRegexp().IsSet() && a.GetOpenPorts().Len() == 0 { |
There was a problem hiding this comment.
I'm wondering, why not just return if the PID matches and ignore the rest? We can put a validation rule on the selectors that will ensure that if you choose a PID, nothing else should be set.
I guess one reason to think this would need further selectors is if OBI is namespaced, but in that case it can only see a subset of the PIDs. I think we should make it absolute, you've set a PID on this selector you match that one directly.
There was a problem hiding this comment.
true, PID is a pretty explicit/atomic selector so it is weird to then scope up to other fields.
we might want to reevaluate selector criteria at some point, I could see reasons for using pid selector alongside others but for now I think just return pidInList() pretty much covers it here
|
|
||
| // TargetPIDs is a list of process IDs to instrument. Supports YAML arrays (e.g. [1234, 5678]), | ||
| // a single YAML number, or env/comma-separated (e.g. OTEL_EBPF_TARGET_PID=1234,5678). | ||
| type TargetPIDs []uint32 |
There was a problem hiding this comment.
What do you think about reusing PortEnum and renaming it into something more appropriate? It supports something similar, e.g 80,443,8000-8999. I don't mind the new format, but it creates an inconsistency in how the end users would define this.
There was a problem hiding this comment.
Sounds good to me, renamed PortEnum to IntEnum (wdyt?)
|
🥇 great first PR @damemi ! |
#### What this PR does / why we need it: This adds `opentelemetry-ebpf-instrumentation` (OBI) as a supported workload instrumentation distro. The benefit of this is to provide additional coverage and offer users a way to deploy OBI within our platform if they'd like to. It relies on features we added upstream to OBI in open-telemetry/opentelemetry-ebpf-instrumentation#1321 and open-telemetry/opentelemetry-ebpf-instrumentation#1388 to support dynamic PID selection (as opposed to OBI's static service/exe config approach). The OBI `Instrumenter` is a single long-lived routine that handles process events, filters/matches them based on config criteria (ie, which PIDs we want), and attaches shared eBPF programs to those processes. The `DynamicSelector` we added upstream provides a hook into the `Instrumenter` to update the filter/matching criteria during runtime without needing to restart OBI. This adds OBI as an sdk/distro that can be used with any language. The basic flow of the new distro is: 1. `NewOBIInstrumentationFactory()`: Initialize the OBI config and DynamicSelector, but does not start the OBI `Instrumenter` routine yet. This is to prevent the `Instrumenter` from running and using resources if nothing is actually instrumented by OBI. 2. `factory.CreateInstrumentation(ctx, pid)`: If the OBI `Instrumenter` is not started yet, this starts it and returns an `obiInstrumentation` handle that stores that DynamicSelector and the instrumented PID. This does not attach the PID to OBI yet. 3. `obiInstrumentation.Load()`: Uses the stored PID in the `o` instrumentation object to call `DynamicSelector.AddPIDs(pid)` which updates the running OBI manager to include/attach this process. 4. `obiInstrumentation.Close()`: Removes the stored PID in `o` with `DynamicSelect.RemovePIDs(pid)` to update the OBI manager to exclude/detach this process. A caveat to this design is that it's difficult to _stop_ the Instrumenter once it's started (ie, all OBI processes get uninstrumented and we no longer need that routine) without complex async management. We could do something like this on `Close()`: ```go func Close() { f.Selector.RemovePIDs(pid) if len(f.Selector.GetPIDs()) == 0 { cancelObiCtx() } } ``` But that would require a mutex, and it's possible that another OBI app could come along in the meantime waiting to be instrumented while that mutex is held, and the Instrumenter gets cancelled before the new app gets added. Which would be very confusing to debug. I tried a lot of different approaches and they were all messy. I think some more upstream OBI changes around an actual handle on the instrumenter would help, along with other use cases for holding a reference to the instrumenter itself. Overall: * OBI Factory -- Called once at odiglet startup, runs nothing * OBI CreateInstrumentation -- Starts OBI (if necessary) as a singleton and creates the per-process `instrumentation` used by Odiglet * Instrumentation Load -- Attaches the PID to the OBI singleton Another possible upstream contribution would be a signal from OBI for when it's actually started/running/ready to accept new PIDs which Load() could wait on This whole approach is slightly different from languages like Go, where the *Odiglet* is the long lived process, so each `CreateInstrumentation()` in Go calls the go-auto framework to load, attach, and manage. In the case of OBI, the OBI Instrumenter handles loading, attaching, and managing so Odiglet is a wrapped layer on top of that for integration with our control plane. This also adds OBI ebpf generation to the odiglet Dockerfile using the upstream [obi-generator[(https://github.com/open-telemetry/opentelemetry-ebpf-instrumentation/pkgs/container/obi-generator) The OBI distro is added as its own module so it can be imported into enterprise easier: odigos-io/odigos-enterprise#2577 (see that PR for details on OBI distro module) UI Changes here: odigos-io/ui-kit#748 New C++ app in simple-demo for an e2e test here: odigos-io/simple-demo#67 #### Changelog entry: Does this PR introduce a user-facing bug fix, feature, dependency update, or breaking change?? <!-- This section will go in the release notes for this version. Is this something users should be able to find easily? If no, just write "NONE" in the release-note block below. If yes, please add a release note in the block below describing this in 1-2 sentences for our changelog. --> ```release-note feat: Support opentelemetry-ebpf-instrumentation (OBI) for workload instrumentation ```
This makes it possible to instrument by passing a PID, or list of PIDs, directly through the OBI config.
As a next step, it would be nice to be able to dynamically add/remove PIDs during runtime
Checklist