Skip to content

datadog: honor extracted sampling decisions#30577

Merged
RyanTheOptimist merged 6 commits intoenvoyproxy:mainfrom
DataDog:david.goffredo/honor-extracted-sampling-decision
Nov 8, 2023
Merged

datadog: honor extracted sampling decisions#30577
RyanTheOptimist merged 6 commits intoenvoyproxy:mainfrom
DataDog:david.goffredo/honor-extracted-sampling-decision

Conversation

@dgoffredo
Copy link
Copy Markdown
Contributor

@dgoffredo dgoffredo commented Oct 27, 2023

Commit Message: honor extracted sampling decisions in the datadog tracer
Additional Description: See below.
Risk Level: low
Testing: See below.
Docs Changes: N/A
Release Notes (under bug fixes): honor extracted sampling decisions in the datadog tracer
Platform Specific Features: N/A

The Issue and Its Solution

Envoy users within Datadog noticed another bug caused by the migration of the Datadog tracer from OpenTracing to dd-trace-cpp that was released in Envoy 1.27.

Sometimes the Envoy spans are missing from the flame graphs of traces, and sometimes the entire subtree starting at the Envoy span is missing from the flame graphs of traces.

Datadog tracing in Envoy used to use the OpenTracing driver, but it has not since Envoy v1.27.

The OpenTracing driver sets the "sampling priority" to zero ("drop the trace") whenever bool Envoy::Tracing::Decision::traced is false. It does this regardless of whether a sampling decision was extracted from incoming request headers (e.g. x-b3-sampled or x-datadog-sampling-priority).

I followed suit in the non-OpenTracing Datadog tracer, but I missed something. The OpenTracing-based Datadog tracer, dd-opentracing-cpp, ignores sampling changes whenever a sampling decision has already been extracted from an incoming request.

So, in the OpenTracing-based code, when a sampling decision is extracted from an incoming request, here's what happened:

  1. The trace's sampling decision is set to whatever was extracted.
  2. If Envoy's sampling decision was "drop," then that decision is ignored even through the OpenTracing code tries to set the trace's decision accordingly.

The result is that extracted sampling decisions are always honored.

With the newer dd-trace-cpp based code, when a sampling decision is extracted from an incoming request, here's what happens:

  1. The trace's sampling decision is set to whatever was extracted.
  2. If Envoy's sampling decision was "drop," then the trace is dropped regardless of what was extracted.

So, if Envoy's configured sampling rate is less than 100%, users will see incomplete traces whenever Envoy is not the "root service." In some such traces, Envoy spans and possibly its descendants are missing.

This is a bug.

This revision restores the logic from the OpenTracing-based tracer: An Envoy decision of "drop" forces a dropped trace only if we haven't extracted a decision from an incoming request.

Testing

I added a unit test that checks a table of possiblities.

I also built Envoy and configured it to reverse proxy httpbin.org/headers with 1% sampling. Then I hit it with curl using a variety of propagation headers, such as x-datadog-sampling-priority and traceparent, and verified that the resulting sampling decision was consistent with what I sent.

When I don't send sampling information in the request headers, the resulting sampling decisions are consistent with Envoy's configured 1% rate.

Bug Fix

Please consider this PR as a bug fix so that it might be backported onto the v1.27 and v1.28 release branches.

- If a trace sampling decision is extracted from request headers,
  use it regardless of Envoy's sampling configuration.
- Refactor Tracer::startSpan to avoid the move assignment operator of
  datadog::tracing::Span, which is deleted in newer versions of
  dd-trace-cpp.
- Alter the comments above Tracer::startSpan to more accurately reflect
  changes made previously in envoyproxy#29932.

Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
@dgoffredo
Copy link
Copy Markdown
Contributor Author

@cgilmour, my teammate who figured out what the problem was.

Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
@lizan
Copy link
Copy Markdown
Member

lizan commented Oct 30, 2023

/assign-from @envoyproxy/envoy-maintainers

@repokitteh-read-only
Copy link
Copy Markdown

@envoyproxy/envoy-maintainers assignee is @snowp

🐱

Caused by: a #30577 (comment) was created by @lizan.

see: more, trace.

@cgilmour
Copy link
Copy Markdown
Contributor

cgilmour commented Nov 2, 2023

Hi @snowp, it'd be appreciated to have some movement on this one. Also I think assigning Matt Klein for review wasn't intended exclusively and another maintainer could take that on. Thanks!

@ggreenway
Copy link
Copy Markdown
Member

/assign-from @envoyproxy/envoy-maintainers

@repokitteh-read-only
Copy link
Copy Markdown

@envoyproxy/envoy-maintainers assignee is @RyanTheOptimist

🐱

Caused by: a #30577 (comment) was created by @ggreenway.

see: more, trace.

@envoyproxy envoyproxy deleted a comment from repokitteh-read-only bot Nov 2, 2023
Copy link
Copy Markdown
Contributor

@RyanTheOptimist RyanTheOptimist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, thanks for the investigation and bug fix! The PR description is excellent and really made it easy to understand. LGTM!

Please also update the changelog to reflect this fix.

@dgoffredo
Copy link
Copy Markdown
Contributor Author

Thanks for the kind words, @RyanTheOptimist. I'll update the changelog.

Also, what is the process for having this PR considered for backporting onto v1.27 and v1.28 as a bug fix?

@RyanTheOptimist
Copy link
Copy Markdown
Contributor

Also, what is the process for having this PR considered for backporting onto v1.27 and v1.28 as a bug fix?

Good question. @phlax, how do we usually handle bugfix backports?

Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
@dgoffredo dgoffredo force-pushed the david.goffredo/honor-extracted-sampling-decision branch from 3755761 to d4c4468 Compare November 3, 2023 14:12
@phlax
Copy link
Copy Markdown
Member

phlax commented Nov 3, 2023

/backport

Good question. @phlax, how do we usually handle bugfix backports?

mark it for backport and then if a pr hasnt been raised already i look at when raising backports

@dgoffredo if you are up for doing the backports the target branches are:

  • release/v1.28
  • release/v1.27
  • etc

@repokitteh-read-only repokitteh-read-only bot added the backport/review Request to backport to stable releases label Nov 3, 2023
@dgoffredo
Copy link
Copy Markdown
Contributor Author

@dgoffredo if you are up for doing the backports the target branches are [...]

Will do.

@dgoffredo
Copy link
Copy Markdown
Contributor Author

I'm working on the backport PRs for release/v1.27 and release/v1.28 now.

Should we first merge these changes into main to finalize the behavior before I propose the backports?

dgoffredo added a commit to DataDog/envoy that referenced this pull request Nov 6, 2023
…onto v1.27)

Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
dgoffredo added a commit to DataDog/envoy that referenced this pull request Nov 6, 2023
…onto v1.28)

Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
@dgoffredo
Copy link
Copy Markdown
Contributor Author

I've created two additional pull requests, one for backporting these changes to the v1.27 release branch, and another for the v1.28 release branch:

@RyanTheOptimist
Copy link
Copy Markdown
Contributor

@phlax are we OK to merge this PR or is there any addition process we need to follow?

mattklein123 pushed a commit that referenced this pull request Nov 8, 2023
…27) (#30750)

Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
mattklein123 pushed a commit that referenced this pull request Nov 8, 2023
…28) (#30751)

Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
@dgoffredo
Copy link
Copy Markdown
Contributor Author

Matt merged the backports onto release/v1.27 and release/v1.28.

@phlax Are we good to merge this into main?

@RyanTheOptimist RyanTheOptimist merged commit dfac3b9 into envoyproxy:main Nov 8, 2023
@phlax phlax removed the backport/review Request to backport to stable releases label Feb 6, 2024
SeanKilleen pushed a commit to SeanKilleen/envoy that referenced this pull request Apr 3, 2024
…onto v1.28) (envoyproxy#30751)

Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
Signed-off-by: Sean Killeen <SeanKilleen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants