feat(collector): add allow_startup_failure config option#1233
Conversation
|
While this looks good to me, I'm not sure you should be doing profiling in the same collector as traces, metrics and profiles. Since this pattern isn't really what we've been advising for, I would hide this behavior behind a feature gate. |
Shouldn't feature gate be something that the downstream implementors decide about / put together depending on the requirements of the downstream collector? Essentially, this upstream PR makes this bit functionality available but is not opinionated in how it should be used. |
|
Yes, but this assumes profiles can fail to start (and never start), which would be fine with purely a log entry. That's probably not the behavior you want when you're using a collector dedicated to doing profiling. |
|
There is a precedent for feature gates that enable specific opinionated features. |
|
Any update on the direction here? |
Let's switch this to |
ac0804d to
3aa9282
Compare
|
Updated, switched to ErrorMode with ignore and propagate values following the OTTL pattern! |
|
Could you add tests? |
|
Sure, I can add tests. Should I cover just config validation (empty |
|
Ideally, both. |
|
Added config validation tests. For controller |
e444720 to
a71bc14
Compare
|
Moved the |
Add allow_startup_failure bool to the collector config. When true, startup errors from the profiler are logged but not returned to the OTel Collector, so it keeps running without profiling. Also extracted a profilerController interface to make the package testable and added unit tests for the new behavior. Closes open-telemetry#1214
Replace the bool allow_startup_failure config field with a string ErrorMode type following the OTTL pattern. Supported values are propagate (default) and ignore.
a71bc14 to
e835a71
Compare
Fixes #1214
Description
Adds a new
allow_startup_failureboolean config option to the profilerreceiver. When set to
true, errors duringStart()are logged but notreturned to the OTel Collector — so it keeps running without profiling
instead of failing entirely.
The Problem
eBPF profiler startup can fail for reasons outside the user's control
(old kernel, missing permissions, containerized environments). Right now
that failure takes down the entire collector, including logs and metrics
which are usually more critical than profiling.
Solution
Added
allow_startup_failureto the config. When true,Start()catchesthe error, logs it, and returns nil so the collector continues normally.
Also extracted a small
profilerControllerinterface to decouple theinternal Controller from the concrete type — this allowed adding the first
unit tests for the
collector/internalpackage.Testing
Added three unit tests:
true→ returns nil, logs errorfalse→ propagates error