Amend -off-cpu-threshold value#316
Conversation
| // Try to fit this PID space scaled down with cfg.OffCPUThreshold into | ||
| // this map. | ||
| adaption["sched_times"] = (4194304 / support.OffCPUThresholdMax) * cfg.OffCPUThreshold | ||
| adaption["sched_times"] = (4194304 * cfg.OffCPUThreshold) / support.OffCPUThresholdMax |
There was a problem hiding this comment.
Multiply first to prevent rounding inaccuracies.
|
We should have better unit testing to avoid this kind of regressions |
We should definitely have better unit testing. The current test code coverage for this project is not impressive. But in the case mentioned in the PR description, the error came from a Cilium ebpf function, hard to trigger/catch in a unit tests. Btw, this PR adds a check for the off-cpu threshold value in |
Co-authored-by: Christos Kalkanis <christos.kalkanis@elastic.co>
|
Sorry for the late comment on this. Wouldn't it be nicer to define this chance as a rate See https://pkg.go.dev/runtime#SetMutexProfileFraction for prior art that uses this approach. |
I agree that the current accepted values limit the user. Why not directly using a probability 1/X? (For example |
Sync from upstream (2025-03-12) Florian Lehner <florianl@users.noreply.github.com> symblib: expose API for single point lookups (open-telemetry#380) Co-authored-by: GitHub <noreply@github.com> Tolya Korniltsev <korniltsev.anatoly@gmail.com> chore: remove unused controller.Config fields (open-telemetry#387) Co-authored-by: GitHub <noreply@github.com> Florian Lehner <florianl@users.noreply.github.com> libpf: drop unused code (open-telemetry#386) Co-authored-by: GitHub <noreply@github.com> Florian Lehner <florianl@users.noreply.github.com> tracehandler: drop metadataWarnInhib (open-telemetry#385) Co-authored-by: GitHub <noreply@github.com> Florian Lehner <florianl@users.noreply.github.com> Go: update to go.opentelemetry.io/otel@v1.35.0 (open-telemetry#383) Co-authored-by: GitHub <noreply@github.com> Christos Kalkanis <christos.kalkanis@elastic.co> processmanager: Don't synchronize a process that's waiting cleanup (open-telemetry#379) Co-authored-by: GitHub <noreply@github.com> Florian Lehner <florianl@users.noreply.github.com> CI: use latest LTS kernel in tests (open-telemetry#382) Co-authored-by: GitHub <noreply@github.com> Florian Lehner <florianl@users.noreply.github.com> Makefile: add cargo clean to target clean (open-telemetry#381) Co-authored-by: GitHub <noreply@github.com> Christos Kalkanis <christos.kalkanis@elastic.co> Switch semantics for process.executable.name (open-telemetry#306) Co-authored-by: GitHub <noreply@github.com> Tim Rühsen <tim.ruhsen@elastic.co> Stabilize CI / integration tests (open-telemetry#378) Co-authored-by: GitHub <noreply@github.com> Florian Lehner <florianl@users.noreply.github.com> Docker fixup (open-telemetry#375) Co-authored-by: GitHub <noreply@github.com> Florian Lehner <florianl@users.noreply.github.com> Docker: fix rust set up (open-telemetry#371) Co-authored-by: GitHub <noreply@github.com> Florian Lehner <florianl@users.noreply.github.com> tracer: attach to all kprobes with prefix for off CPU profiling (open-telemetry#370) Co-authored-by: GitHub <noreply@github.com> Florian Lehner <florianl@users.noreply.github.com> Go: update to Go 1.23 (open-telemetry#372) Co-authored-by: GitHub <noreply@github.com> Florian Lehner <florianl@users.noreply.github.com> support: generate *ProcInfo types with cgo (open-telemetry#367) Co-authored-by: GitHub <noreply@github.com> Florian Lehner <florianl@users.noreply.github.com> process: reuse and preallocate memory (open-telemetry#355) Co-authored-by: GitHub <noreply@github.com> Florian Lehner <florianl@users.noreply.github.com> rust: preparations to integrate Rust (open-telemetry#360) Co-authored-by: GitHub <noreply@github.com> Christos Kalkanis <christos.kalkanis@elastic.co> Switch to OTel metrics (open-telemetry#348) Co-authored-by: GitHub <noreply@github.com> Tolya Korniltsev <korniltsev.anatoly@gmail.com> cargo: remove unused workspace dependency declarations (open-telemetry#364) Co-authored-by: GitHub <noreply@github.com> Tolya Korniltsev <korniltsev.anatoly@gmail.com> reporter: add custom gRPC dial options (open-telemetry#363) Co-authored-by: GitHub <noreply@github.com> umanwizard <brennan@umanwizard.com> Various fixes to node/V8 (open-telemetry#333) Co-authored-by: GitHub <noreply@github.com> Florian Lehner <florianl@users.noreply.github.com> doc: fix path of tooling (open-telemetry#361) Co-authored-by: GitHub <noreply@github.com> OpenTelemetry Bot <107717825+opentelemetrybot@users.noreply.github.com> Add FOSSA scanning workflow (open-telemetry#357) Co-authored-by: GitHub <noreply@github.com> Florian Lehner <florianl@users.noreply.github.com> rust: use macro for debug output (open-telemetry#356) Co-authored-by: GitHub <noreply@github.com> Florian Lehner <florianl@users.noreply.github.com> symblib/gosym: add single point lookup (open-telemetry#346) Co-authored-by: GitHub <noreply@github.com> Florian Lehner <florianl@users.noreply.github.com> README: provide devfiler v0.14.0 (open-telemetry#354) Co-authored-by: GitHub <noreply@github.com> Florian Lehner <florianl@users.noreply.github.com> CI: skip environment setup (open-telemetry#353) Co-authored-by: GitHub <noreply@github.com> Richard Chukwu <79311274+RichardChukwu@users.noreply.github.com> Improve contributor guide (open-telemetry#349) Co-authored-by: GitHub <noreply@github.com> Christos Kalkanis <christos.kalkanis@elastic.co> Fix build (open-telemetry#350) Co-authored-by: GitHub <noreply@github.com> Christos Kalkanis <christos.kalkanis@elastic.co> processinfo: refactor process metadata (open-telemetry#344) Co-authored-by: GitHub <noreply@github.com> Florian Lehner <florianl@users.noreply.github.com> reporter/pdata: do no generate profiles if there are no events (open-telemetry#347) Co-authored-by: GitHub <noreply@github.com> Florian Lehner <florianl@users.noreply.github.com> README: provide devfiler v0.13.0 (open-telemetry#343) Co-authored-by: GitHub <noreply@github.com> Christos Kalkanis <christos.kalkanis@elastic.co> processmanager: Fix process exit regression (open-telemetry#337) (open-telemetry#338) Co-authored-by: GitHub <noreply@github.com> Florian Lehner <florianl@users.noreply.github.com> libpf: drop Hash64 (open-telemetry#340) Co-authored-by: GitHub <noreply@github.com> Florian Lehner <florianl@users.noreply.github.com> cargo: set license field (open-telemetry#336) Co-authored-by: GitHub <noreply@github.com> Damien Mathieu <42@dmathieu.com> Use dummy support for any non-arm64 and non-amd64 archs (open-telemetry#335) Co-authored-by: GitHub <noreply@github.com> Florian Lehner <florianl@users.noreply.github.com> rust: drop anyhow dependency (open-telemetry#334) Co-authored-by: GitHub <noreply@github.com> Florian Lehner <florianl@users.noreply.github.com> support: use cgo to generate Go constants from eBPF (open-telemetry#332) Co-authored-by: GitHub <noreply@github.com> Christos Kalkanis <christos.kalkanis@elastic.co> processmanager: Don't log inside critical areas (open-telemetry#328) Co-authored-by: GitHub <noreply@github.com> Florian Lehner <florianl@users.noreply.github.com> CI: add test for Rust components (open-telemetry#326) Co-authored-by: GitHub <noreply@github.com> Florian Lehner <florianl@users.noreply.github.com> processmanager: simplify API and return early (open-telemetry#325) Co-authored-by: GitHub <noreply@github.com> Christos Kalkanis <christos.kalkanis@elastic.co> Add Rust native symbolization library and C API wrapper (open-telemetry#267) Co-authored-by: GitHub <noreply@github.com> Christos Kalkanis <christos.kalkanis@elastic.co> Metrics for trace event perf event monitor (open-telemetry#322) Co-authored-by: GitHub <noreply@github.com> Christos Kalkanis <christos.kalkanis@elastic.co> Delayed processing for ProcessManager.pidToProcessInfo (open-telemetry#321) Co-authored-by: GitHub <noreply@github.com> Christos Kalkanis <christos.kalkanis@elastic.co> Rework SymbolizationComplete (open-telemetry#307) Co-authored-by: GitHub <noreply@github.com> Tim Rühsen <tim.ruhsen@elastic.co> Amend -off-cpu-threshold value (open-telemetry#316) Co-authored-by: GitHub <noreply@github.com> Florian Lehner <florianl@users.noreply.github.com> reporter/collector: fix reporting issue (open-telemetry#319) Co-authored-by: GitHub <noreply@github.com> Florian Lehner <florianl@users.noreply.github.com> reporter: move pkg samples from internal to public (open-telemetry#314) Co-authored-by: GitHub <noreply@github.com> Florian Lehner <florianl@users.noreply.github.com> README: provide devfiler v0.11.0 (open-telemetry#313) Co-authored-by: GitHub <noreply@github.com>
This amends the
-off-cpu-thresholdvalues toBefore, a value of 1000 disabled off-cpu profiling, thus enabling reporting of all events wasn't possible, which isn't what users might expect. The value of 0 was "valid" but caused a division-by-zero panic.
Additionally, when introducing new configuration variables, a default of non-zero (or empty string) may cause issues in other code bases. For example, our experimental OTEL collector/receiver stopped working, because the newly introduced off-cpu threshold value wasn't initialized with
support.OffCPUThresholdMax(not easy to know when not closely following the PRs in this repo). A default of 0 for the off-cpu threshold naturally mitigates this.