Skip to content

[Draft] Sampling milestones blog post#7735

Closed
jmacd wants to merge 4 commits into
open-telemetry:mainfrom
jmacd:jmacd/sampling_milestone_blog
Closed

[Draft] Sampling milestones blog post#7735
jmacd wants to merge 4 commits into
open-telemetry:mainfrom
jmacd:jmacd/sampling_milestone_blog

Conversation

@jmacd
Copy link
Copy Markdown
Contributor

@jmacd jmacd commented Sep 8, 2025

Work-in-progress to share with the Sampling SIG before asking for editorial help.

Copy link
Copy Markdown
Member

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A blog post on this was long overdue, thank you very much, @jmacd !!


## Intro

The OpenTelemetry sampling project promotes features and
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The OpenTelemetry sampling project promotes features and
The OpenTelemetry Sampling SIG promotes features and

cSpell:ignore:
---

## Intro
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there could be an "intro intro" paragraph. As a reader, why should I care? Is it for me? Like:

The OTel Sampling SIG promotes ... . In this blog post, we'll share the progress we've made over the past ... months, as well as provide a peek into the future.

Users look to OTel to provide ...

score. Adjusted count is the mathematical reciprocal of selection
probability. Here are a few examples of the term in use:

- _25% probability sampling is communicated by `ot=th:c`, corresponding with an adjusted count of 4 per item._
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we've introduced "ot" and "tc" before, but perhaps we could spell out that "c" means 25% probability sampling?

- _An adjusted count of N means we would expect to see N-1 similar items had we collected all of the data._

Our goal is that OpenTelemetry users can lower telemetry data
collection costs through sampling, while preserving adjusted count
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
collection costs through sampling, while preserving adjusted count
collection volume through sampling, while preserving adjusted count

It's not always about costs: the operational requirements (network bandwidth, for instance) might be a constraint in some scenarios.

- SDKs will record the tracestate field as part of the OTLP span record
- Collectors and backends will be able to count using adjusted counts, enabling acculate metrics calculated from sampled data.

We have supplemental guidelines for OpenTelemetry collectors in case
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"OpenTelemetry collectors" -- are you talking about people configuring OpenTelemetry Collector instances?

@@ -0,0 +1,258 @@
---
title: OpenTelemetry Sampling update
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this topic, I followed the development of this spec somewhat closely, and I believe the blog post portraits the work that has been done. That said, I'm not sure what's the audience for this.

If we are trying to give the community of users an update about the sampling features that are coming, then I'd reframe this blog post, so that it starts with a problem statement, followed perhaps by a concrete use-case (real or not), and then what's being done to solve that. There's no need to get into the details of how things are calculated, just that the sampling threshold is propagated through regular trace context level 2, "coming soon to an SDK near you".

If we are trying to get maintainers to implement this, I'd make it very clear at the very beginning, and also start with a clear problem statement, to convince them that they should implement this in their SDKs.

I believe I still know the math behind this, and the blog post was a good refresher for me. I'm afraid readers not familiar with sampling (especially probabilistic) might get lost quickly though. Perhaps we could have a call somewhere like: "and if you are interested in knowing how this magic works or have an interest in statistics or probability, look at this doc. We'd love to have you with us!"

probability. Here are a few examples of the term in use:

- _25% probability sampling is communicated by `ot=th:c`, corresponding with an adjusted count of 4 per item._
- _An adjusted count of N means we would expect to see N-1 similar items had we collected all of the data._
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one small thing that bothered me a bit: "similar items" is subjective here. By sampling, we are effectively throwing away data. If we are solely sampling based on the trace ID, then we might not be sampling enough of the rare events. Even if we are getting 1% of the rare events, the attributes within the events might not be representative (am I getting 1% of client=vip?).

I know it's a nit for the article, but I feel like users shouldn't be led to think that they will be able to sample 1% of their data and correctly extrapolate to 100% from that. They can't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs:blog An issue requesting a blog post, or a PR for a new blog post

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants