Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
236 changes: 236 additions & 0 deletions content/en/blog/2025/sampling-milestones.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,236 @@
---
title: OpenTelemetry Sampling update
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this topic, I followed the development of this spec somewhat closely, and I believe the blog post portraits the work that has been done. That said, I'm not sure what's the audience for this.

If we are trying to give the community of users an update about the sampling features that are coming, then I'd reframe this blog post, so that it starts with a problem statement, followed perhaps by a concrete use-case (real or not), and then what's being done to solve that. There's no need to get into the details of how things are calculated, just that the sampling threshold is propagated through regular trace context level 2, "coming soon to an SDK near you".

If we are trying to get maintainers to implement this, I'd make it very clear at the very beginning, and also start with a clear problem statement, to convince them that they should implement this in their SDKs.

I believe I still know the math behind this, and the blog post was a good refresher for me. I'm afraid readers not familiar with sampling (especially probabilistic) might get lost quickly though. Perhaps we could have a call somewhere like: "and if you are interested in knowing how this magic works or have an interest in statistics or probability, look at this doc. We'd love to have you with us!"

linkTitle: OpenTelemetry Sampling update
date: 2025-10-01
author: >-
[Joshua MacDonald](https://github.com/jmacd) (Microsoft)
sig: SIG Sampling
# prettier-ignore
cSpell:ignore:
---

## Introduction

When OpenTelemetry first launched its Tracing specification over five
years ago, there was a [conspicuous "TODO" involving probability
sampling](https://github.com/open-telemetry/opentelemetry-specification/issues/1413)
left behind, it warned users of inconsistent results except when used
at the root span of a trace.

This meant OpenTelemetry users could not safely configure independent
probabilty sampling policies in a distributed system, as the

Check warning on line 21 in content/en/blog/2025/sampling-milestones.md

View workflow job for this annotation

GitHub Actions / SPELLING check

Unknown word (probabilty) Suggestions: (probability, probabil, probably, probabila, probabilă)

Check warning on line 21 in content/en/blog/2025/sampling-milestones.md

View workflow job for this annotation

GitHub Actions / SPELLING check

Unknown word (probabilty) Suggestions: (probability, probabil, probably, probabila, probabilă)
specification did not cover how to acheive consistency. This feature,

Check warning on line 22 in content/en/blog/2025/sampling-milestones.md

View workflow job for this annotation

GitHub Actions / SPELLING check

Unknown word (acheive) Suggestions: (achieve, achevé, achève, acheviez, achieved)

Check warning on line 22 in content/en/blog/2025/sampling-milestones.md

View workflow job for this annotation

GitHub Actions / SPELLING check

Unknown word (acheive) Suggestions: (achieve, achevé, achève, acheviez, achieved)
the ability to configure unequal-probability sampling policies within
a trace and still expect complete traces, is something users expect;
it lets service owners configure independent limits on the volume of
tracing data collected in a system.

## Consistency by example

To see why consistency is important, consider a system with a Frontend
and two backend services, Cache and Storage. The Frontend handles
high-value user requests, therefore frontend requests are sampled at
100%. The root span is significant because errors are visible to the
end user, so it forms the basis of a SLO measurement in this example
and the system operator is willing to collect every span.

The Cache service receives a relatively high volume of requests, so to
save on observability costs, this service is configured to sample
1-in-1000 traces. Because of the high rate of requests, this 0.1%
policy ensures the Cache service produces enough traces for many
observability scenarios.

The Storage service receives a relatively low volume of requests,
compared with the Cache server, but still a lot of requests compared
with the Frontend service; Storage is configured to sample 1-in-10
traces.

When we ask for consistency in distribute tracing, the goal is to
ensure that when the smallest probability sampler (here 0.1%) chooses
to sample, that higher probability samplers make the same
decision. Here are the properties we can rely on thanks to
consistency:

- All Frontend spans are collected
- 1-in-10 spans will consist of Frontend and Storage spans
- 1-in-1000 traces will be complete.

## Problems with TraceIdRatioBased

OpenTelemetry's initial tracing specification featured the
`TraceIdRatioBased` probability sampler. It was intended to be
consistent from the start, however the working group had a hard time
agreeing over specific details. The rest of the specification was
ready to release; the leftover TODO about sampling consistency was
mitigated by the fact that root-only sampling was the norm for
contemporary open-source tracing systems.

Check failure on line 66 in content/en/blog/2025/sampling-milestones.md

View workflow job for this annotation

GitHub Actions / TEXT linter

textlint terminology error

Incorrect term: “open-source”, use “open source” instead

Check failure on line 66 in content/en/blog/2025/sampling-milestones.md

View workflow job for this annotation

GitHub Actions / TEXT linter

textlint terminology error

Incorrect term: “open-source”, use “open source” instead

The "ratio-based" part of the name hints at the form of solution to
the consistent sampling problem:

1. Consider the TraceID value as an N-bit random value
2. Compute the Nth power of two
3. Multiply the power-of-two by the ratio, yields a "threshold" value
4. Compare the TraceID with the threshold value, yields a consistent decision.

We had trouble agreeing on this form of solution because of a larger
question. *Which bits of the TraceID can we trust to be random?*
Without foundational requirements about randomness, OpenTelemetry
could not specify a consistent sampling decision.

Lacking firm randomness requirements, a common approach is to use a
hash function instead. Using `Hash(TraceID)` to produce N-bits
randomness works reasonably well if the hash function is good, but
this approach is not suitable in a cross-language SDK specification.

The details here are tricky. How many bits of the TraceID would be
enough? Could every language SDK efficiently implement the required
logic?

## Introducing W3C TraceContext Level 2

OpenTelemetry defines its TraceID based on the W3C TraceContext
specification. This was a [_Candidate
Recommendation_](https://www.w3.org/standards/types/#x4-2-candidate-recommendation)
at the time of the initial OpenTelemetry Tracing specification, it was
finished as a [W3C
Recommendation](https://www.w3.org/standards/types/#x5-1-recommendation)
in the [W3C Trace Context Level
1](https://www.w3.org/TR/trace-context-1/) standard.

OpenTelemetry turned to the W3C Trace Context working group with this
larger problem in mind. Could we including OpenTelemetry and
non-OpenTelemetry tracing systems agree on how many bits of the
TraceID were random?

The [W3C TraceContext Level 2](https://www.w3.org/TR/trace-context-2/)
specification, currently a [Candidate Recommendation
Draft](https://www.w3.org/standards/types/#x4-2-1-candidate-recommendation-draft),
answers this question with a new [`Random` Trace Flag
value](https://www.w3.org/TR/trace-context-2/#random-trace-id-flag). With
this flag, the new W3C specification requires the least-significant 56
bits of the TraceID to be "sufficiently" random. This means, for
example, when we [represent the TraceID as 32 hexadecimal
digits](https://opentelemetry.io/docs/specs/otel/trace/api/#retrieving-the-traceid-and-spanid),
the last, rightmost 14 digits are random. Represented as 16 bytes, the
rightmost 7 are random.

OpenTelemetry is adopting the W3C TraceContext Level 2 draft
recommendation as the foundation for consistent sampling. All SDKs
will set the `Random` flag and ensure that TraceIDs they generate have
the required 56 bits of randomness.

## Consistent sampling threshold for rejection

Back to the "ratio-based" example, now we're able to obtain 56 bits of
randomness from a TraceID, and the decision process described in
outline above calls for a threshold for comparison.

There was one more thing we as a group wanted for the probability
sampling specification, a way for SDKs to communicate their sampling
decisions, both to one another via TraceContext as well as on the
collection path after they are finished.

The new specification lets OpenTelemetry components communicate about
"how much sampling" has been applied to a span. This supports many
advanced sampling architectures:

- Accurate counting of sampled spans
- Consistent rate-limited sampling
- Adapative sampling

Check warning on line 140 in content/en/blog/2025/sampling-milestones.md

View workflow job for this annotation

GitHub Actions / SPELLING check

Unknown word (Adapative) Suggestions: (adaptive, adaptative, adaptiv, adaptiva, adaptivă)

Check warning on line 140 in content/en/blog/2025/sampling-milestones.md

View workflow job for this annotation

GitHub Actions / SPELLING check

Unknown word (Adapative) Suggestions: (adaptive, adaptative, adaptiv, adaptiva, adaptivă)
- Consistent multi-stage sampling.

The key points of our design are summarized next, [curious readers
will want to see the full
specification](https://opentelemetry.io/docs/specs/otel/trace/tracestate-probability-sampling/).

Given the number of bits, there is not much left to specify, however we
wanted an approach that:

- Supports both lexicographical and numerical comparison
- Minimizes TraceContext overhead
- Is legible for advanced OpenTelemetry users.

Our approach is based on what we call the _sampling threshold for
rejection_. Given randomness value `R` and threshold for rejection
`T`, we make a positive sampling decision when `T <= R`. Equivalently,
we make a negative sampling decision when `T > R`.

By design, the threshold value `0` corresponds with 100% sampling, so
users can easily recognize this configuration. Abstractly, both `R`
and `T` have a range of 56 bits, can be represented as unsigned
integers, 7-byte slices, or 14-hex-digit strings.

## OpenTelemetry TraceState

The W3C TraceContext specification defines two HTTP headers for use in
distributed tracing systems, the `tracecontext` header, which contains
version, TraceID, SpanID, and flags, and `tracestate` which supports
"vendor-specific" additions to the context. OpenTelemetry Tracing SDKs
will soon begin using adding an entry under the key "ot" in the
`tracestate` header. Here's an example:

```
tracestate: ot=th:0
```

In a 100% sampling configuration, OpenTelemetry Tracing SDKs will
insert `ot=th:0` in the TraceState. TraceState values, once entered in
the context, are both propagated and recorded in the OpenTelemetry
span data model. By design, the new OpenTelemetry TraceState value is
only encoded and transmitted for positive sampling decisions, no
`tracestate` header will appear as a result of negative sampling
decisions.

In this representation, sampling thresholds logically represent 14
hexadecimal digits or 56 bits of information.

However, to communicate the sampling threshold efficiently, we drop
trailing zeros (except for `0` itself). This lets us limit threshold
precision to fewer than 56 bits, which lowers the number of bytes per
context. For example, threshold can be limited to 4 hexadecimal digits
to avoid carrying around 10 more bytes of precision. Here is an
example tracestate indicating 1% sampling, limited to 12-bits of
precision:

```
tracestate: ot=th:fd7
```

We gave a lot of consideration to backwards compatibility, but we also
wanted to be sure we could always use the stated sampling threshold
for extrapolation, in a reliable, statistical sense. With this in
mind, we there is one more OpenTelemetry TraceState value in our
specification, a way to provide explicit randomness in the
`tracestate` header.

To enable consistent sampling and continue using non-random TraceIDs,
for example, users can opt for explicit randomness:

```
tracestate: ot=rv:abcdef01234567
```

Explicit randomness values have a number of other uses in
OpenTelemetry.

## Looking forward

This post covers an essential upgrade to OpenTelemetry Tracing
specification, enabling a new generation of sampling components in
both SDKs and Collector components. We couldn't cover everything here
and plan to cover more in the future.

For now, here are some useful references including the four
OpenTelemetry enhancement proposals that plotted our course:

- [0168 Sampling Propagation](https://github.com/open-telemetry/opentelemetry-specification/blob/main/oteps/trace/0168-sampling-propagation.md)
- [0170 Sampling Probability](https://github.com/open-telemetry/opentelemetry-specification/blob/main/oteps/trace/0170-sampling-probability.md)
- [0235 Sampling Threshold in TraceSate](https://github.com/open-telemetry/opentelemetry-specification/blob/main/oteps/trace/0235-sampling-threshold-in-trace-state.md)
- [0250 Composite Samplers](https://github.com/open-telemetry/opentelemetry-specification/blob/main/oteps/trace/0250-Composite_Samplers.md)

and our primary specification documents:

- [Trace Probability Sampling](https://opentelemetry.io/docs/specs/otel/trace/tracestate-probability-sampling/)
- [Trace SDK Samplers](https://opentelemetry.io/docs/specs/otel/trace/sdk/#sampler)
- [TraceID Randomness](https://opentelemetry.io/docs/specs/otel/trace/sdk/#traceid-randomness)
Loading