Skip to content

[processor/tailsampling] Initial support for early decisions#44456

Closed
csmarchbanks wants to merge 11 commits into
open-telemetry:mainfrom
csmarchbanks:tsp-early-decisions
Closed

[processor/tailsampling] Initial support for early decisions#44456
csmarchbanks wants to merge 11 commits into
open-telemetry:mainfrom
csmarchbanks:tsp-early-decisions

Conversation

@csmarchbanks
Copy link
Copy Markdown
Contributor

Description

This change adds a new optional interface that samplers can implement, EarlyEvaluator. If a sampler implements the EarlyEvaluator interface and the EarlyDecisions configuration is turned on then for each batch of traces received TSP will try to see if any spans from that batch will cause the trace to be sampled. If so, then we can send all the trace data we have along and remove it from memory. It is also possible that all policies return NotSampled, at which point TSP will drop the data.

In this initial implementation drop policies are not supported. In addition, the deprecated top level invert_match policies are also not supported and will run into issues if still being used.

Link to tracking issue

Part of #43876

Testing

I have added some tests and also ran this in some collectors in our infrastructure.

Documentation

TODO: I need to add more documentation than currently present

Copy link
Copy Markdown
Contributor

@yvrhdn yvrhdn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the new interface! I think it's a good balance between giving policies the opportunity to decide fast without making it too heavy. Left a few minor comments.

Comment thread processor/tailsamplingprocessor/processor.go Outdated

decision, err := earlyEval.EarlyEvaluate(tsp.ctx, id, currentSpans, trace)
if err != nil {
tsp.telemetry.ProcessorTailSamplingSamplingPolicyEvaluationError.Add(tsp.ctx, 1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: might be nice to add an attribute indicating where the error happened? Before this PR errors could only happen in samplingPolicyOnTick.

Comment thread processor/tailsamplingprocessor/processor.go Outdated
Comment thread processor/tailsamplingprocessor/internal/sampling/always_sample.go Outdated
Copy link
Copy Markdown
Contributor

@Logiraptor Logiraptor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sending a partial review while sitting in a hotel 😅, don't block on me while I'm OOO!

Comment thread processor/tailsamplingprocessor/pkg/samplingpolicy/samplingpolicy.go Outdated
Comment thread processor/tailsamplingprocessor/processor.go Outdated
// time this function is called. It is included for implementations that
// wait for a trigger, such as the parent span being received, before
// looking across all the received spans.
EarlyEvaluate(ctx context.Context, traceID pcommon.TraceID, newData ptrace.ResourceSpans, allData *TraceData) (Decision, error)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I understanding correctly that allData actually includes newData? Looking at how the incoming spans are appended before checking early decisions in processTrace. I wonder if there's a smaller interface we could require here that doesn't have this duplication or the need to warn about iterating ReceivedBatches?

Copy link
Copy Markdown
Contributor Author

@csmarchbanks csmarchbanks Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a tradeoff I intentionally made but I am open to discussion. I wanted to support some sort of trigger span being present inside of newData that then causes the early decision code to look over all trace data. A concrete example of that would be when the root span is received it may be desirable to look over the trace as a whole instead of just the batch to see how spans are connected.

It also allows an extension to mutate the data similar to adding the sampling policy attribute if desired (which we do in some of our extensions already).

As far as duplication goes, the newData is always the last resource span entry in allData right now, but that feels like too much of an implementation detail to expose to users.

Comment thread processor/tailsamplingprocessor/internal/sampling/latency.go

// We do not support early evaluation when drop policies are present yet.
if len(dropPolicies) > 0 {
earlyEvaluationPossible = false
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend some monitoring here (and elsewhere) so we can see clearly the "why" early decisions aren't possible across the cluster. That should help prioritize investment going forward.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, for sure, monitoring is on my TODO still. I am hoping we remove this in the future, but drop policies (and composite policies) look to involve tracking a bitset of early evaluation results that I didn't want to include in this PR.

@atoulme atoulme added the processor/tailsampling Tail sampling processor label Dec 8, 2025
Allow the processor to run early evaluations at batch ingest time and
make sampling decisions based on the result. This will drop or forward
traces before the entire batch is ready to reduce the memory required
keeping all trace data in memory until the end of the decision wait.

To begin with only implement "basic" samplers, those that do not support
invert, or nest other policies. This will still provide some gains while
keeping this change manageable.

Drop specifically will take more work to implement as we cannot make an
early Sampled decision until all drop policies have been evaluated which
will require some state to be maintained.
Combine the two interfaces previously created to simplify the code and
avoid as many casts everywhere. A sampler that does not support early
decisions can just return samplingpolicy.Unspecified.
@csmarchbanks
Copy link
Copy Markdown
Contributor Author

Closing this one for now as #44878 looks to have more impact in our environments. I may come back to it in the future.

atoulme pushed a commit that referenced this pull request Jan 8, 2026
…s received (#44878)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
When testing the work done to provide early decisions
(#44456)
the impact was very limited in some scenarios, specifically when
policies existed that look for long traces or if an error is ever
present. It is not possible to know if an error span will come along in
the future so almost all traces end up waiting until the decision wait
causing the savings to only be a couple of percent in some environments.

What I found myself wishing for was a way to run all decisions more
quickly for most cases, and what worked fairly well is to base that
decision on if a root span has been received or not. This change
implements a second decision wait to collect any straggler spans that
might be present for a trace (e.g. a second service with a different
batch timer before sending), but still allow it to be much faster than
the base decision wait. A good way of thinking about the two options is
that decision_wait is the maximum amount of time we will wait for a span
to arrive, and decision_wait_after_root_received is the minimum amount
of time we will wait for additional spans to come in.

The downside of this approach is that heavily asynchronous traces may
not be sampled as expected, however we find that those do not work very
well as it is since they commonly last longer than the decision wait
anyway. The behavior is opt in so no changes are needed for any users.

I wanted to keep the changes in this PR relatively small, but in the
future we could re-implement id batcher to support moving traces between
batches, or possibly a priority queue where we pop values until some
threshold, rather than having two batchers.

<!-- Issue number (e.g. #1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Part of #43876

<!--Describe what testing was performed and which tests were added.-->
#### Testing
Added smoke test for the new functionality

<!--Describe the documentation added.-->
#### Documentation
Added documentation explaining the new configuration variable.

<!--Please delete paragraphs that you did not use before submitting.-->
mbdeveci pushed a commit to mbdeveci/opentelemetry-collector-contrib that referenced this pull request Jan 12, 2026
…s received (open-telemetry#44878)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
When testing the work done to provide early decisions
(open-telemetry#44456)
the impact was very limited in some scenarios, specifically when
policies existed that look for long traces or if an error is ever
present. It is not possible to know if an error span will come along in
the future so almost all traces end up waiting until the decision wait
causing the savings to only be a couple of percent in some environments.

What I found myself wishing for was a way to run all decisions more
quickly for most cases, and what worked fairly well is to base that
decision on if a root span has been received or not. This change
implements a second decision wait to collect any straggler spans that
might be present for a trace (e.g. a second service with a different
batch timer before sending), but still allow it to be much faster than
the base decision wait. A good way of thinking about the two options is
that decision_wait is the maximum amount of time we will wait for a span
to arrive, and decision_wait_after_root_received is the minimum amount
of time we will wait for additional spans to come in.

The downside of this approach is that heavily asynchronous traces may
not be sampled as expected, however we find that those do not work very
well as it is since they commonly last longer than the decision wait
anyway. The behavior is opt in so no changes are needed for any users.

I wanted to keep the changes in this PR relatively small, but in the
future we could re-implement id batcher to support moving traces between
batches, or possibly a priority queue where we pop values until some
threshold, rather than having two batchers.

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Part of open-telemetry#43876

<!--Describe what testing was performed and which tests were added.-->
#### Testing
Added smoke test for the new functionality

<!--Describe the documentation added.-->
#### Documentation
Added documentation explaining the new configuration variable.

<!--Please delete paragraphs that you did not use before submitting.-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

processor/tailsampling Tail sampling processor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants