Skip to content

fix(ddtrace/opentelemetry): fix ignored sampling decision from otel#4238

Merged
dd-mergequeue[bot] merged 2 commits intomainfrom
ben.db/otel-sampling-decision
Dec 23, 2025
Merged

fix(ddtrace/opentelemetry): fix ignored sampling decision from otel#4238
dd-mergequeue[bot] merged 2 commits intomainfrom
ben.db/otel-sampling-decision

Conversation

@genesor
Copy link
Copy Markdown
Member

@genesor genesor commented Dec 9, 2025

What does this PR do?

This PR fixes the current faulty behavior of the tracer that do not respect the parent sampling decision when the parent span is not a DataDog span but an OTel one.

Fixes #3639.

Motivation

Old ER ticket APMS-15887

takes over #3718

Reviewer's Checklist

  • Changed code has unit tests for its functionality at or near 100% coverage.
  • System-Tests covering this feature have been added and enabled with the va.b.c-dev version tag.
  • There is a benchmark for any new code, or changes to existing code.
  • If this interacts with the agent in a new way, a system test has been added.
  • New code is free of linting errors. You can check this by running ./scripts/lint.sh locally.
  • Add an appropriate team label so this PR gets put in the right place for the release notes.
  • Non-trivial go.mod changes, e.g. adding new modules, are reviewed by @DataDog/dd-trace-go-guild.

Unsure? Have a question? Request a review!

@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented Dec 9, 2025

Benchmarks

Benchmark execution time: 2025-12-23 10:10:22

Comparing candidate commit ff7dd2a in PR branch ben.db/otel-sampling-decision with baseline commit e744f8a in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 15 metrics, 0 unstable metrics.

@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 9, 2025

Codecov Report

❌ Patch coverage is 47.61905% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 54.73%. Comparing base (e744f8a) to head (ff7dd2a).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
ddtrace/tracer/spancontext.go 0.00% 11 Missing ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
ddtrace/opentelemetry/tracer.go 94.50% <100.00%> (+0.67%) ⬆️
ddtrace/tracer/spancontext.go 88.35% <0.00%> (-1.83%) ⬇️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@genesor genesor marked this pull request as ready for review December 10, 2025 11:45
@genesor genesor requested a review from a team as a code owner December 10, 2025 11:45
@genesor genesor requested a review from darccio December 10, 2025 12:52
Copy link
Copy Markdown
Member

@darccio darccio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread ddtrace/opentelemetry/tracer_test.go Outdated
Comment thread ddtrace/opentelemetry/tracer_test.go Outdated
Comment thread ddtrace/opentelemetry/tracer.go
Comment thread ddtrace/opentelemetry/tracer.go
Copy link
Copy Markdown
Contributor

@mtoffl01 mtoffl01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

Comment thread ddtrace/opentelemetry/tracer.go
@genesor genesor force-pushed the ben.db/otel-sampling-decision branch from 71af6bc to ff7dd2a Compare December 23, 2025 09:57
@genesor
Copy link
Copy Markdown
Member Author

genesor commented Dec 23, 2025

/merge

@dd-devflow-routing-codex
Copy link
Copy Markdown

dd-devflow-routing-codex Bot commented Dec 23, 2025

View all feedbacks in Devflow UI.

2025-12-23 10:04:11 UTC ℹ️ Start processing command /merge


2025-12-23 10:04:19 UTC ℹ️ MergeQueue: waiting for PR to be ready

This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
It will be added to the queue as soon as checks pass and/or get approvals.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.


2025-12-23 10:26:10 UTC ℹ️ MergeQueue: merge request added to the queue

The expected merge time in main is approximately 37m (p90).


2025-12-23 10:54:49 UTC ℹ️ MergeQueue: This merge request was merged

@dd-mergequeue dd-mergequeue Bot merged commit 746bea0 into main Dec 23, 2025
282 checks passed
@dd-mergequeue dd-mergequeue Bot deleted the ben.db/otel-sampling-decision branch December 23, 2025 10:54
gh-worker-dd-mergequeue-cf854d Bot pushed a commit that referenced this pull request Apr 8, 2026
…led spans (#4631)

### What does this PR do?

Fixes the OTel bridge's handling of unsampled spans in `FromGenericCtx`. Previously, when an OTel parent context had `IsSampled() == false`, the bridge set `samplingDecision = decisionDrop` directly on the trace. Bypassing the atomic Compare-And-Swap (CAS) semantics that `keep()` and `drop()` rely on.

This meant:
- Error spans could never rescue the trace as `keep()` only CAS from `decisionNone`, not `decisionDrop`
- The behavior diverged from the native DD tracer, where P0 traces are not hard-dropped client-side

The fix leaves `samplingDecision` as `decisionNone` for drop decisions while still setting the P0 priority and locking the trace against resampling. This preserves the OTel sampling intent while restoring the native DD keep/drop CAS flow.

The bug has been introduced in `v2.6.0` following: #4238 

### Motivation

Fixes #4624

Discovered during investigation of [APMS-19054](https://datadoghq.atlassian.net/browse/APMS-19054) — a customer upgrading to dd-trace-go v2.6.0 + OTel observed `trace.*` metrics dropping to near-zero under low sampling rates when client-side stats were disabled.

### Reviewer's Checklist

- [ ] Changed code has unit tests for its functionality at or near 100% coverage.
- [ ] [System-Tests](https://github.com/DataDog/system-tests/) covering this feature have been added and enabled with the va.b.c-dev version tag.
- [ ] There is a benchmark for any new code, or changes to existing code.
- [ ] If this interacts with the agent in a new way, a system test has been added.
- [ ] New code is free of linting errors. You can check this by running `make lint` locally.
- [ ] New code doesn't break existing tests. You can check this by running `make test` locally.
- [ ] Add an appropriate team label so this PR gets put in the right place for the release notes.
- [ ] All generated files are up to date. You can check this by running `make generate` locally.
- [ ] Non-trivial go.mod changes, e.g. adding new modules, are reviewed by @DataDog/dd-trace-go-guild. Make sure all nested modules are up to date by running `make fix-modules` locally.

[APMS-19054]: https://datadoghq.atlassian.net/browse/APMS-19054?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

Co-authored-by: kakkoyun <kakkoyun@users.noreply.github.com>
Co-authored-by: benjamin.debernardi <benjamin.debernardi@datadoghq.com>
genesor added a commit that referenced this pull request Apr 14, 2026
…led spans (#4631)

Fixes the OTel bridge's handling of unsampled spans in `FromGenericCtx`. Previously, when an OTel parent context had `IsSampled() == false`, the bridge set `samplingDecision = decisionDrop` directly on the trace. Bypassing the atomic Compare-And-Swap (CAS) semantics that `keep()` and `drop()` rely on.

This meant:
- Error spans could never rescue the trace as `keep()` only CAS from `decisionNone`, not `decisionDrop`
- The behavior diverged from the native DD tracer, where P0 traces are not hard-dropped client-side

The fix leaves `samplingDecision` as `decisionNone` for drop decisions while still setting the P0 priority and locking the trace against resampling. This preserves the OTel sampling intent while restoring the native DD keep/drop CAS flow.

The bug has been introduced in `v2.6.0` following: #4238

Fixes #4624

Discovered during investigation of [APMS-19054](https://datadoghq.atlassian.net/browse/APMS-19054) — a customer upgrading to dd-trace-go v2.6.0 + OTel observed `trace.*` metrics dropping to near-zero under low sampling rates when client-side stats were disabled.

- [ ] Changed code has unit tests for its functionality at or near 100% coverage.
- [ ] [System-Tests](https://github.com/DataDog/system-tests/) covering this feature have been added and enabled with the va.b.c-dev version tag.
- [ ] There is a benchmark for any new code, or changes to existing code.
- [ ] If this interacts with the agent in a new way, a system test has been added.
- [ ] New code is free of linting errors. You can check this by running `make lint` locally.
- [ ] New code doesn't break existing tests. You can check this by running `make test` locally.
- [ ] Add an appropriate team label so this PR gets put in the right place for the release notes.
- [ ] All generated files are up to date. You can check this by running `make generate` locally.
- [ ] Non-trivial go.mod changes, e.g. adding new modules, are reviewed by @DataDog/dd-trace-go-guild. Make sure all nested modules are up to date by running `make fix-modules` locally.

[APMS-19054]: https://datadoghq.atlassian.net/browse/APMS-19054?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

Co-authored-by: kakkoyun <kakkoyun@users.noreply.github.com>
Co-authored-by: benjamin.debernardi <benjamin.debernardi@datadoghq.com>
genesor added a commit that referenced this pull request Apr 14, 2026
…led spans (#4631)

Fixes the OTel bridge's handling of unsampled spans in `FromGenericCtx`. Previously, when an OTel parent context had `IsSampled() == false`, the bridge set `samplingDecision = decisionDrop` directly on the trace. Bypassing the atomic Compare-And-Swap (CAS) semantics that `keep()` and `drop()` rely on.

This meant:
- Error spans could never rescue the trace as `keep()` only CAS from `decisionNone`, not `decisionDrop`
- The behavior diverged from the native DD tracer, where P0 traces are not hard-dropped client-side

The fix leaves `samplingDecision` as `decisionNone` for drop decisions while still setting the P0 priority and locking the trace against resampling. This preserves the OTel sampling intent while restoring the native DD keep/drop CAS flow.

The bug has been introduced in `v2.6.0` following: #4238

Fixes #4624

Discovered during investigation of [APMS-19054](https://datadoghq.atlassian.net/browse/APMS-19054) — a customer upgrading to dd-trace-go v2.6.0 + OTel observed `trace.*` metrics dropping to near-zero under low sampling rates when client-side stats were disabled.

- [ ] Changed code has unit tests for its functionality at or near 100% coverage.
- [ ] [System-Tests](https://github.com/DataDog/system-tests/) covering this feature have been added and enabled with the va.b.c-dev version tag.
- [ ] There is a benchmark for any new code, or changes to existing code.
- [ ] If this interacts with the agent in a new way, a system test has been added.
- [ ] New code is free of linting errors. You can check this by running `make lint` locally.
- [ ] New code doesn't break existing tests. You can check this by running `make test` locally.
- [ ] Add an appropriate team label so this PR gets put in the right place for the release notes.
- [ ] All generated files are up to date. You can check this by running `make generate` locally.
- [ ] Non-trivial go.mod changes, e.g. adding new modules, are reviewed by @DataDog/dd-trace-go-guild. Make sure all nested modules are up to date by running `make fix-modules` locally.

[APMS-19054]: https://datadoghq.atlassian.net/browse/APMS-19054?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

Co-authored-by: kakkoyun <kakkoyun@users.noreply.github.com>
Co-authored-by: benjamin.debernardi <benjamin.debernardi@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: Sampling decision in OpenTelemetry span context is ignored

3 participants