open-telemetry · jmacd · Sep 8, 2025 · Sep 8, 2025 · Sep 8, 2025 · Oct 1, 2025
@@ -0,0 +1,236 @@
+---
+title: OpenTelemetry Sampling update
+linkTitle: OpenTelemetry Sampling update
+date: 2025-10-01
+author: >-
+  [Joshua MacDonald](https://github.com/jmacd) (Microsoft)
+sig: SIG Sampling
+# prettier-ignore
+cSpell:ignore:
+---
+
+## Introduction
+
+When OpenTelemetry first launched its Tracing specification over five
+years ago, there was a [conspicuous "TODO" involving probability
+sampling](https://github.com/open-telemetry/opentelemetry-specification/issues/1413)
+left behind, it warned users of inconsistent results except when used
+at the root span of a trace.
+
+This meant OpenTelemetry users could not safely configure independent
+probabilty sampling policies in a distributed system, as the
+specification did not cover how to acheive consistency. This feature,
+the ability to configure unequal-probability sampling policies within
+a trace and still expect complete traces, is something users expect;
+it lets service owners configure independent limits on the volume of
+tracing data collected in a system.
+
+## Consistency by example
+
+To see why consistency is important, consider a system with a Frontend
+and two backend services, Cache and Storage. The Frontend handles
+high-value user requests, therefore frontend requests are sampled at
+100%. The root span is significant because errors are visible to the
+end user, so it forms the basis of a SLO measurement in this example
+and the system operator is willing to collect every span.
+
+The Cache service receives a relatively high volume of requests, so to
+save on observability costs, this service is configured to sample
+1-in-1000 traces.  Because of the high rate of requests, this 0.1%
+policy ensures the Cache service produces enough traces for many
+observability scenarios.
+
+The Storage service receives a relatively low volume of requests,
+compared with the Cache server, but still a lot of requests compared
+with the Frontend service; Storage is configured to sample 1-in-10
+traces.
+
+When we ask for consistency in distribute tracing, the goal is to
+ensure that when the smallest probability sampler (here 0.1%) chooses
+to sample, that higher probability samplers make the same
+decision. Here are the properties we can rely on thanks to
+consistency:
+
+- All Frontend spans are collected
+- 1-in-10 spans will consist of Frontend and Storage spans
+- 1-in-1000 traces will be complete.
+
+## Problems with TraceIdRatioBased
+
+OpenTelemetry's initial tracing specification featured the
+`TraceIdRatioBased` probability sampler. It was intended to be
+consistent from the start, however the working group had a hard time
+agreeing over specific details.  The rest of the specification was
+ready to release; the leftover TODO about sampling consistency was
+mitigated by the fact that root-only sampling was the norm for
+contemporary open-source tracing systems.
+
+The "ratio-based" part of the name hints at the form of solution to
+the consistent sampling problem:
+
+1. Consider the TraceID value as an N-bit random value
+2. Compute the Nth power of two
+3. Multiply the power-of-two by the ratio, yields a "threshold" value
+4. Compare the TraceID with the threshold value, yields a consistent decision.
+
+We had trouble agreeing on this form of solution because of a larger
+question. *Which bits of the TraceID can we trust to be random?*
+Without foundational requirements about randomness, OpenTelemetry
+could not specify a consistent sampling decision.
+
+Lacking firm randomness requirements, a common approach is to use a
+hash function instead. Using `Hash(TraceID)` to produce N-bits
+randomness works reasonably well if the hash function is good, but
+this approach is not suitable in a cross-language SDK specification.
+
+The details here are tricky. How many bits of the TraceID would be
+enough? Could every language SDK efficiently implement the required
+logic?
+
+## Introducing W3C TraceContext Level 2
+
+OpenTelemetry defines its TraceID based on the W3C TraceContext
+specification. This was a [_Candidate
+Recommendation_](https://www.w3.org/standards/types/#x4-2-candidate-recommendation)
+at the time of the initial OpenTelemetry Tracing specification, it was
+finished as a [W3C
+Recommendation](https://www.w3.org/standards/types/#x5-1-recommendation)
+in the [W3C Trace Context Level
+1](https://www.w3.org/TR/trace-context-1/) standard.
+
+OpenTelemetry turned to the W3C Trace Context working group with this
+larger problem in mind. Could we including OpenTelemetry and
+non-OpenTelemetry tracing systems agree on how many bits of the
+TraceID were random?
+
+The [W3C TraceContext Level 2](https://www.w3.org/TR/trace-context-2/)
+specification, currently a [Candidate Recommendation
+Draft](https://www.w3.org/standards/types/#x4-2-1-candidate-recommendation-draft),
+answers this question with a new [`Random` Trace Flag
+value](https://www.w3.org/TR/trace-context-2/#random-trace-id-flag). With
+this flag, the new W3C specification requires the least-significant 56
+bits of the TraceID to be "sufficiently" random. This means, for
+example, when we [represent the TraceID as 32 hexadecimal
+digits](https://opentelemetry.io/docs/specs/otel/trace/api/#retrieving-the-traceid-and-spanid),
+the last, rightmost 14 digits are random. Represented as 16 bytes, the
+rightmost 7 are random.
+
+OpenTelemetry is adopting the W3C TraceContext Level 2 draft
+recommendation as the foundation for consistent sampling. All SDKs
+will set the `Random` flag and ensure that TraceIDs they generate have
+the required 56 bits of randomness.
+
+## Consistent sampling threshold for rejection
+
+Back to the "ratio-based" example, now we're able to obtain 56 bits of
+randomness from a TraceID, and the decision process described in
+outline above calls for a threshold for comparison. 
+
+There was one more thing we as a group wanted for the probability
+sampling specification, a way for SDKs to communicate their sampling
+decisions, both to one another via TraceContext as well as on the
+collection path after they are finished. 
+
+The new specification lets OpenTelemetry components communicate about
+"how much sampling" has been applied to a span. This supports many
+advanced sampling architectures:
+
+- Accurate counting of sampled spans
+- Consistent rate-limited sampling
+- Adapative sampling
+- Consistent multi-stage sampling.
+
+The key points of our design are summarized next, [curious readers
+will want to see the full
+specification](https://opentelemetry.io/docs/specs/otel/trace/tracestate-probability-sampling/).
+
+Given the number of bits, there is not much left to specify, however we
+wanted an approach that:
+
+- Supports both lexicographical and numerical comparison
+- Minimizes TraceContext overhead
+- Is legible for advanced OpenTelemetry users.
+
+Our approach is based on what we call the _sampling threshold for
+rejection_. Given randomness value `R` and threshold for rejection
+`T`, we make a positive sampling decision when `T <= R`. Equivalently,
+we make a negative sampling decision when `T > R`.
+
+By design, the threshold value `0` corresponds with 100% sampling, so
+users can easily recognize this configuration. Abstractly, both `R`
+and `T` have a range of 56 bits, can be represented as unsigned
+integers, 7-byte slices, or 14-hex-digit strings.
+
+## OpenTelemetry TraceState
+
+The W3C TraceContext specification defines two HTTP headers for use in
+distributed tracing systems, the `tracecontext` header, which contains
+version, TraceID, SpanID, and flags, and `tracestate` which supports
+"vendor-specific" additions to the context. OpenTelemetry Tracing SDKs
+will soon begin using adding an entry under the key "ot" in the
+`tracestate` header. Here's an example:
+
+```
+tracestate: ot=th:0
+```
+
+In a 100% sampling configuration, OpenTelemetry Tracing SDKs will
+insert `ot=th:0` in the TraceState. TraceState values, once entered in
+the context, are both propagated and recorded in the OpenTelemetry
+span data model. By design, the new OpenTelemetry TraceState value is
+only encoded and transmitted for positive sampling decisions, no
+`tracestate` header will appear as a result of negative sampling
+decisions.
+
+In this representation, sampling thresholds logically represent 14
+hexadecimal digits or 56 bits of information.
+
+However, to communicate the sampling threshold efficiently, we drop
+trailing zeros (except for `0` itself). This lets us limit threshold
+precision to fewer than 56 bits, which lowers the number of bytes per
+context. For example, threshold can be limited to 4 hexadecimal digits
+to avoid carrying around 10 more bytes of precision. Here is an
+example tracestate indicating 1% sampling, limited to 12-bits of
+precision:
+
+```
+tracestate: ot=th:fd7
+```
+
+We gave a lot of consideration to backwards compatibility, but we also
+wanted to be sure we could always use the stated sampling threshold
+for extrapolation, in a reliable, statistical sense. With this in
+mind, we there is one more OpenTelemetry TraceState value in our
+specification, a way to provide explicit randomness in the
+`tracestate` header.
+
+To enable consistent sampling and continue using non-random TraceIDs,
+for example, users can opt for explicit randomness:
+
+```
+tracestate: ot=rv:abcdef01234567
+```
+
+Explicit randomness values have a number of other uses in
+OpenTelemetry.
+
+## Looking forward
+
+This post covers an essential upgrade to OpenTelemetry Tracing
+specification, enabling a new generation of sampling components in
+both SDKs and Collector components. We couldn't cover everything here
+and plan to cover more in the future. 
+
+For now, here are some useful references including the four
+OpenTelemetry enhancement proposals that plotted our course:
+
+- [0168 Sampling Propagation](https://github.com/open-telemetry/opentelemetry-specification/blob/main/oteps/trace/0168-sampling-propagation.md)
+- [0170 Sampling Probability](https://github.com/open-telemetry/opentelemetry-specification/blob/main/oteps/trace/0170-sampling-probability.md)
+- [0235 Sampling Threshold in TraceSate](https://github.com/open-telemetry/opentelemetry-specification/blob/main/oteps/trace/0235-sampling-threshold-in-trace-state.md)
+- [0250 Composite Samplers](https://github.com/open-telemetry/opentelemetry-specification/blob/main/oteps/trace/0250-Composite_Samplers.md)
+
+and our primary specification documents:
+
+- [Trace Probability Sampling](https://opentelemetry.io/docs/specs/otel/trace/tracestate-probability-sampling/)
+- [Trace SDK Samplers](https://opentelemetry.io/docs/specs/otel/trace/sdk/#sampler)
+- [TraceID Randomness](https://opentelemetry.io/docs/specs/otel/trace/sdk/#traceid-randomness)