-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[Draft] Sampling milestones blog post #7735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,236 @@ | ||
| --- | ||
| title: OpenTelemetry Sampling update | ||
| linkTitle: OpenTelemetry Sampling update | ||
| date: 2025-10-01 | ||
| author: >- | ||
| [Joshua MacDonald](https://github.com/jmacd) (Microsoft) | ||
| sig: SIG Sampling | ||
| # prettier-ignore | ||
| cSpell:ignore: | ||
| --- | ||
|
|
||
| ## Introduction | ||
|
|
||
| When OpenTelemetry first launched its Tracing specification over five | ||
| years ago, there was a [conspicuous "TODO" involving probability | ||
| sampling](https://github.com/open-telemetry/opentelemetry-specification/issues/1413) | ||
| left behind, it warned users of inconsistent results except when used | ||
| at the root span of a trace. | ||
|
|
||
| This meant OpenTelemetry users could not safely configure independent | ||
| probabilty sampling policies in a distributed system, as the | ||
|
Check warning on line 21 in content/en/blog/2025/sampling-milestones.md
|
||
| specification did not cover how to acheive consistency. This feature, | ||
|
Check warning on line 22 in content/en/blog/2025/sampling-milestones.md
|
||
| the ability to configure unequal-probability sampling policies within | ||
| a trace and still expect complete traces, is something users expect; | ||
| it lets service owners configure independent limits on the volume of | ||
| tracing data collected in a system. | ||
|
|
||
| ## Consistency by example | ||
|
|
||
| To see why consistency is important, consider a system with a Frontend | ||
| and two backend services, Cache and Storage. The Frontend handles | ||
| high-value user requests, therefore frontend requests are sampled at | ||
| 100%. The root span is significant because errors are visible to the | ||
| end user, so it forms the basis of a SLO measurement in this example | ||
| and the system operator is willing to collect every span. | ||
|
|
||
| The Cache service receives a relatively high volume of requests, so to | ||
| save on observability costs, this service is configured to sample | ||
| 1-in-1000 traces. Because of the high rate of requests, this 0.1% | ||
| policy ensures the Cache service produces enough traces for many | ||
| observability scenarios. | ||
|
|
||
| The Storage service receives a relatively low volume of requests, | ||
| compared with the Cache server, but still a lot of requests compared | ||
| with the Frontend service; Storage is configured to sample 1-in-10 | ||
| traces. | ||
|
|
||
| When we ask for consistency in distribute tracing, the goal is to | ||
| ensure that when the smallest probability sampler (here 0.1%) chooses | ||
| to sample, that higher probability samplers make the same | ||
| decision. Here are the properties we can rely on thanks to | ||
| consistency: | ||
|
|
||
| - All Frontend spans are collected | ||
| - 1-in-10 spans will consist of Frontend and Storage spans | ||
| - 1-in-1000 traces will be complete. | ||
|
|
||
| ## Problems with TraceIdRatioBased | ||
|
|
||
| OpenTelemetry's initial tracing specification featured the | ||
| `TraceIdRatioBased` probability sampler. It was intended to be | ||
| consistent from the start, however the working group had a hard time | ||
| agreeing over specific details. The rest of the specification was | ||
| ready to release; the leftover TODO about sampling consistency was | ||
| mitigated by the fact that root-only sampling was the norm for | ||
| contemporary open-source tracing systems. | ||
|
Check failure on line 66 in content/en/blog/2025/sampling-milestones.md
|
||
|
|
||
| The "ratio-based" part of the name hints at the form of solution to | ||
| the consistent sampling problem: | ||
|
|
||
| 1. Consider the TraceID value as an N-bit random value | ||
| 2. Compute the Nth power of two | ||
| 3. Multiply the power-of-two by the ratio, yields a "threshold" value | ||
| 4. Compare the TraceID with the threshold value, yields a consistent decision. | ||
|
|
||
| We had trouble agreeing on this form of solution because of a larger | ||
| question. *Which bits of the TraceID can we trust to be random?* | ||
| Without foundational requirements about randomness, OpenTelemetry | ||
| could not specify a consistent sampling decision. | ||
|
|
||
| Lacking firm randomness requirements, a common approach is to use a | ||
| hash function instead. Using `Hash(TraceID)` to produce N-bits | ||
| randomness works reasonably well if the hash function is good, but | ||
| this approach is not suitable in a cross-language SDK specification. | ||
|
|
||
| The details here are tricky. How many bits of the TraceID would be | ||
| enough? Could every language SDK efficiently implement the required | ||
| logic? | ||
|
|
||
| ## Introducing W3C TraceContext Level 2 | ||
|
|
||
| OpenTelemetry defines its TraceID based on the W3C TraceContext | ||
| specification. This was a [_Candidate | ||
| Recommendation_](https://www.w3.org/standards/types/#x4-2-candidate-recommendation) | ||
| at the time of the initial OpenTelemetry Tracing specification, it was | ||
| finished as a [W3C | ||
| Recommendation](https://www.w3.org/standards/types/#x5-1-recommendation) | ||
| in the [W3C Trace Context Level | ||
| 1](https://www.w3.org/TR/trace-context-1/) standard. | ||
|
|
||
| OpenTelemetry turned to the W3C Trace Context working group with this | ||
| larger problem in mind. Could we including OpenTelemetry and | ||
| non-OpenTelemetry tracing systems agree on how many bits of the | ||
| TraceID were random? | ||
|
|
||
| The [W3C TraceContext Level 2](https://www.w3.org/TR/trace-context-2/) | ||
| specification, currently a [Candidate Recommendation | ||
| Draft](https://www.w3.org/standards/types/#x4-2-1-candidate-recommendation-draft), | ||
| answers this question with a new [`Random` Trace Flag | ||
| value](https://www.w3.org/TR/trace-context-2/#random-trace-id-flag). With | ||
| this flag, the new W3C specification requires the least-significant 56 | ||
| bits of the TraceID to be "sufficiently" random. This means, for | ||
| example, when we [represent the TraceID as 32 hexadecimal | ||
| digits](https://opentelemetry.io/docs/specs/otel/trace/api/#retrieving-the-traceid-and-spanid), | ||
| the last, rightmost 14 digits are random. Represented as 16 bytes, the | ||
| rightmost 7 are random. | ||
|
|
||
| OpenTelemetry is adopting the W3C TraceContext Level 2 draft | ||
| recommendation as the foundation for consistent sampling. All SDKs | ||
| will set the `Random` flag and ensure that TraceIDs they generate have | ||
| the required 56 bits of randomness. | ||
|
|
||
| ## Consistent sampling threshold for rejection | ||
|
|
||
| Back to the "ratio-based" example, now we're able to obtain 56 bits of | ||
| randomness from a TraceID, and the decision process described in | ||
| outline above calls for a threshold for comparison. | ||
|
|
||
| There was one more thing we as a group wanted for the probability | ||
| sampling specification, a way for SDKs to communicate their sampling | ||
| decisions, both to one another via TraceContext as well as on the | ||
| collection path after they are finished. | ||
|
|
||
| The new specification lets OpenTelemetry components communicate about | ||
| "how much sampling" has been applied to a span. This supports many | ||
| advanced sampling architectures: | ||
|
|
||
| - Accurate counting of sampled spans | ||
| - Consistent rate-limited sampling | ||
| - Adapative sampling | ||
|
Check warning on line 140 in content/en/blog/2025/sampling-milestones.md
|
||
| - Consistent multi-stage sampling. | ||
|
|
||
| The key points of our design are summarized next, [curious readers | ||
| will want to see the full | ||
| specification](https://opentelemetry.io/docs/specs/otel/trace/tracestate-probability-sampling/). | ||
|
|
||
| Given the number of bits, there is not much left to specify, however we | ||
| wanted an approach that: | ||
|
|
||
| - Supports both lexicographical and numerical comparison | ||
| - Minimizes TraceContext overhead | ||
| - Is legible for advanced OpenTelemetry users. | ||
|
|
||
| Our approach is based on what we call the _sampling threshold for | ||
| rejection_. Given randomness value `R` and threshold for rejection | ||
| `T`, we make a positive sampling decision when `T <= R`. Equivalently, | ||
| we make a negative sampling decision when `T > R`. | ||
|
|
||
| By design, the threshold value `0` corresponds with 100% sampling, so | ||
| users can easily recognize this configuration. Abstractly, both `R` | ||
| and `T` have a range of 56 bits, can be represented as unsigned | ||
| integers, 7-byte slices, or 14-hex-digit strings. | ||
|
|
||
| ## OpenTelemetry TraceState | ||
|
|
||
| The W3C TraceContext specification defines two HTTP headers for use in | ||
| distributed tracing systems, the `tracecontext` header, which contains | ||
| version, TraceID, SpanID, and flags, and `tracestate` which supports | ||
| "vendor-specific" additions to the context. OpenTelemetry Tracing SDKs | ||
| will soon begin using adding an entry under the key "ot" in the | ||
| `tracestate` header. Here's an example: | ||
|
|
||
| ``` | ||
| tracestate: ot=th:0 | ||
| ``` | ||
|
|
||
| In a 100% sampling configuration, OpenTelemetry Tracing SDKs will | ||
| insert `ot=th:0` in the TraceState. TraceState values, once entered in | ||
| the context, are both propagated and recorded in the OpenTelemetry | ||
| span data model. By design, the new OpenTelemetry TraceState value is | ||
| only encoded and transmitted for positive sampling decisions, no | ||
| `tracestate` header will appear as a result of negative sampling | ||
| decisions. | ||
|
|
||
| In this representation, sampling thresholds logically represent 14 | ||
| hexadecimal digits or 56 bits of information. | ||
|
|
||
| However, to communicate the sampling threshold efficiently, we drop | ||
| trailing zeros (except for `0` itself). This lets us limit threshold | ||
| precision to fewer than 56 bits, which lowers the number of bytes per | ||
| context. For example, threshold can be limited to 4 hexadecimal digits | ||
| to avoid carrying around 10 more bytes of precision. Here is an | ||
| example tracestate indicating 1% sampling, limited to 12-bits of | ||
| precision: | ||
|
|
||
| ``` | ||
| tracestate: ot=th:fd7 | ||
| ``` | ||
|
|
||
| We gave a lot of consideration to backwards compatibility, but we also | ||
| wanted to be sure we could always use the stated sampling threshold | ||
| for extrapolation, in a reliable, statistical sense. With this in | ||
| mind, we there is one more OpenTelemetry TraceState value in our | ||
| specification, a way to provide explicit randomness in the | ||
| `tracestate` header. | ||
|
|
||
| To enable consistent sampling and continue using non-random TraceIDs, | ||
| for example, users can opt for explicit randomness: | ||
|
|
||
| ``` | ||
| tracestate: ot=rv:abcdef01234567 | ||
| ``` | ||
|
|
||
| Explicit randomness values have a number of other uses in | ||
| OpenTelemetry. | ||
|
|
||
| ## Looking forward | ||
|
|
||
| This post covers an essential upgrade to OpenTelemetry Tracing | ||
| specification, enabling a new generation of sampling components in | ||
| both SDKs and Collector components. We couldn't cover everything here | ||
| and plan to cover more in the future. | ||
|
|
||
| For now, here are some useful references including the four | ||
| OpenTelemetry enhancement proposals that plotted our course: | ||
|
|
||
| - [0168 Sampling Propagation](https://github.com/open-telemetry/opentelemetry-specification/blob/main/oteps/trace/0168-sampling-propagation.md) | ||
| - [0170 Sampling Probability](https://github.com/open-telemetry/opentelemetry-specification/blob/main/oteps/trace/0170-sampling-probability.md) | ||
| - [0235 Sampling Threshold in TraceSate](https://github.com/open-telemetry/opentelemetry-specification/blob/main/oteps/trace/0235-sampling-threshold-in-trace-state.md) | ||
| - [0250 Composite Samplers](https://github.com/open-telemetry/opentelemetry-specification/blob/main/oteps/trace/0250-Composite_Samplers.md) | ||
|
|
||
| and our primary specification documents: | ||
|
|
||
| - [Trace Probability Sampling](https://opentelemetry.io/docs/specs/otel/trace/tracestate-probability-sampling/) | ||
| - [Trace SDK Samplers](https://opentelemetry.io/docs/specs/otel/trace/sdk/#sampler) | ||
| - [TraceID Randomness](https://opentelemetry.io/docs/specs/otel/trace/sdk/#traceid-randomness) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this topic, I followed the development of this spec somewhat closely, and I believe the blog post portraits the work that has been done. That said, I'm not sure what's the audience for this.
If we are trying to give the community of users an update about the sampling features that are coming, then I'd reframe this blog post, so that it starts with a problem statement, followed perhaps by a concrete use-case (real or not), and then what's being done to solve that. There's no need to get into the details of how things are calculated, just that the sampling threshold is propagated through regular trace context level 2, "coming soon to an SDK near you".
If we are trying to get maintainers to implement this, I'd make it very clear at the very beginning, and also start with a clear problem statement, to convince them that they should implement this in their SDKs.
I believe I still know the math behind this, and the blog post was a good refresher for me. I'm afraid readers not familiar with sampling (especially probabilistic) might get lost quickly though. Perhaps we could have a call somewhere like: "and if you are interested in knowing how this magic works or have an interest in statistics or probability, look at this doc. We'd love to have you with us!"