fix(tracer): preserve keep/drop possibility for OTel bridge on unsampled spans#4631
Conversation
|
✅ Tests 🎉 All green!❄️ No new flaky tests detected 🎯 Code Coverage (details) 🔗 Commit SHA: 45ff255 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback! |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files
🚀 New features to boost your workflow:
|
BenchmarksBenchmark execution time: 2026-04-08 16:03:59 Comparing candidate commit 45ff255 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 215 metrics, 9 unstable metrics.
|
…led spans (#4631) Fixes the OTel bridge's handling of unsampled spans in `FromGenericCtx`. Previously, when an OTel parent context had `IsSampled() == false`, the bridge set `samplingDecision = decisionDrop` directly on the trace. Bypassing the atomic Compare-And-Swap (CAS) semantics that `keep()` and `drop()` rely on. This meant: - Error spans could never rescue the trace as `keep()` only CAS from `decisionNone`, not `decisionDrop` - The behavior diverged from the native DD tracer, where P0 traces are not hard-dropped client-side The fix leaves `samplingDecision` as `decisionNone` for drop decisions while still setting the P0 priority and locking the trace against resampling. This preserves the OTel sampling intent while restoring the native DD keep/drop CAS flow. The bug has been introduced in `v2.6.0` following: #4238 Fixes #4624 Discovered during investigation of [APMS-19054](https://datadoghq.atlassian.net/browse/APMS-19054) — a customer upgrading to dd-trace-go v2.6.0 + OTel observed `trace.*` metrics dropping to near-zero under low sampling rates when client-side stats were disabled. - [ ] Changed code has unit tests for its functionality at or near 100% coverage. - [ ] [System-Tests](https://github.com/DataDog/system-tests/) covering this feature have been added and enabled with the va.b.c-dev version tag. - [ ] There is a benchmark for any new code, or changes to existing code. - [ ] If this interacts with the agent in a new way, a system test has been added. - [ ] New code is free of linting errors. You can check this by running `make lint` locally. - [ ] New code doesn't break existing tests. You can check this by running `make test` locally. - [ ] Add an appropriate team label so this PR gets put in the right place for the release notes. - [ ] All generated files are up to date. You can check this by running `make generate` locally. - [ ] Non-trivial go.mod changes, e.g. adding new modules, are reviewed by @DataDog/dd-trace-go-guild. Make sure all nested modules are up to date by running `make fix-modules` locally. [APMS-19054]: https://datadoghq.atlassian.net/browse/APMS-19054?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Co-authored-by: kakkoyun <kakkoyun@users.noreply.github.com> Co-authored-by: benjamin.debernardi <benjamin.debernardi@datadoghq.com>
…led spans (#4631) Fixes the OTel bridge's handling of unsampled spans in `FromGenericCtx`. Previously, when an OTel parent context had `IsSampled() == false`, the bridge set `samplingDecision = decisionDrop` directly on the trace. Bypassing the atomic Compare-And-Swap (CAS) semantics that `keep()` and `drop()` rely on. This meant: - Error spans could never rescue the trace as `keep()` only CAS from `decisionNone`, not `decisionDrop` - The behavior diverged from the native DD tracer, where P0 traces are not hard-dropped client-side The fix leaves `samplingDecision` as `decisionNone` for drop decisions while still setting the P0 priority and locking the trace against resampling. This preserves the OTel sampling intent while restoring the native DD keep/drop CAS flow. The bug has been introduced in `v2.6.0` following: #4238 Fixes #4624 Discovered during investigation of [APMS-19054](https://datadoghq.atlassian.net/browse/APMS-19054) — a customer upgrading to dd-trace-go v2.6.0 + OTel observed `trace.*` metrics dropping to near-zero under low sampling rates when client-side stats were disabled. - [ ] Changed code has unit tests for its functionality at or near 100% coverage. - [ ] [System-Tests](https://github.com/DataDog/system-tests/) covering this feature have been added and enabled with the va.b.c-dev version tag. - [ ] There is a benchmark for any new code, or changes to existing code. - [ ] If this interacts with the agent in a new way, a system test has been added. - [ ] New code is free of linting errors. You can check this by running `make lint` locally. - [ ] New code doesn't break existing tests. You can check this by running `make test` locally. - [ ] Add an appropriate team label so this PR gets put in the right place for the release notes. - [ ] All generated files are up to date. You can check this by running `make generate` locally. - [ ] Non-trivial go.mod changes, e.g. adding new modules, are reviewed by @DataDog/dd-trace-go-guild. Make sure all nested modules are up to date by running `make fix-modules` locally. [APMS-19054]: https://datadoghq.atlassian.net/browse/APMS-19054?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Co-authored-by: kakkoyun <kakkoyun@users.noreply.github.com> Co-authored-by: benjamin.debernardi <benjamin.debernardi@datadoghq.com>
What does this PR do?
Fixes the OTel bridge's handling of unsampled spans in
FromGenericCtx. Previously, when an OTel parent context hadIsSampled() == false, the bridge setsamplingDecision = decisionDropdirectly on the trace. Bypassing the atomic Compare-And-Swap (CAS) semantics thatkeep()anddrop()rely on.This meant:
keep()only CAS fromdecisionNone, notdecisionDropThe fix leaves
samplingDecisionasdecisionNonefor drop decisions while still setting the P0 priority and locking the trace against resampling. This preserves the OTel sampling intent while restoring the native DD keep/drop CAS flow.The bug has been introduced in
v2.6.0following: #4238Motivation
Fixes #4624
Discovered during investigation of APMS-19054 — a customer upgrading to dd-trace-go v2.6.0 + OTel observed
trace.*metrics dropping to near-zero under low sampling rates when client-side stats were disabled.Reviewer's Checklist
make lintlocally.make testlocally.make generatelocally.make fix-moduleslocally.