[processor/tailsampling] Allow faster decisions after the root span is received#44878
Conversation
When testing the work done to provide early decisions the impact was very limited in some scenarios, specifically when policies existed that look for long traces or if an error is ever present. It is not possible to know if an error span will come along in the future so almost all traces end up waiting until the decision wait causing the savings to only be a couple of percent in some environments. What I found myself wishing for was a way to run all decisions more quickly for most cases, and what worked fairly well is to base that decision on if a root span has been received or not. This change implements a second decision wait to collect any straggler spans that might be present for a trace (e.g. a second service with a different batch timer before sending), but still allow it to be much faster than the base decision wait. A good way of thinking about the two options is that decision_wait is the maximum amount of time we will wait for a span to arrive, and decision_wait_after_root_received is the minimum amount of time we will wait for additional spans to come in.
8dcdb13 to
f39e550
Compare
We only want to increment the trace dropped to early metric when encounter a trace that did not already have a decision made for it. We can use the cache to determine if a decision was appropriately made or if the trace was simply dropped. If running with block on overflow enabled there is no need to check as the code blocks instead of dropping traces that have not yet been processed by one of the two batchers.
9f6a321 to
dbecfe5
Compare
|
Pinging @Logiraptor for review as code owner |
There was a problem hiding this comment.
Pull request overview
This PR introduces a new configuration option decision_wait_after_root_received that enables faster sampling decisions after receiving a trace's root span, while still collecting straggler spans. This optimizes tail sampling by allowing most traces to be decided more quickly when their root span arrives, rather than waiting for the full decision_wait period.
Key Changes:
- Added
decision_wait_after_root_receivedconfiguration parameter for faster decisions after root span arrival - Refactored
Batchtype from slice to map for efficient batch merging - Enhanced trace processing to detect and track root spans
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| processor/tailsamplingprocessor/config.go | Adds new DecisionWaitAfterRootReceived configuration field |
| processor/tailsamplingprocessor/README.md | Documents the new configuration option |
| processor/tailsamplingprocessor/processor.go | Implements root span detection and dual-batcher logic for faster decisions |
| processor/tailsamplingprocessor/processor_test.go | Adds test coverage for root-received batcher and updates test utilities |
| processor/tailsamplingprocessor/internal/idbatcher/id_batcher.go | Changes Batch from slice to map for efficient merging |
| processor/tailsamplingprocessor/internal/idbatcher/id_batcher_test.go | Updates tests to work with map-based batches |
| .chloggen/tsp-decision-wait-after-root-received.yaml | Changelog entry for the enhancement |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if sp.ParentSpanID().IsEmpty() { | ||
| rootSpan = &sp | ||
| } |
There was a problem hiding this comment.
Taking the address of a loop variable sp is problematic. In Go, loop variables are reused across iterations, so &sp will always point to the last iteration's value. This means rootSpan may reference the wrong span or become invalid. Instead, create a copy of the span or use a different approach to store the root span reference.
There was a problem hiding this comment.
I don't think this is a valid comment as:
spis not a loop variable, but instead created in each iteration of the loop.- I believe this behavior was fixed in Go 1.22 anyway.
Leaving up for someone to double check though.
| assert.Less(t, len(allSampledTraces), len(traceIDs)*6/10) | ||
| assert.Greater(t, len(allSampledTraces), len(traceIDs)*4/10) |
There was a problem hiding this comment.
The magic numbers 6/10 and 4/10 represent the expected sampling range (40%-60%) around the 50% configured sampling percentage. Consider extracting these as named constants to make the test's intent clearer, such as expectedSamplingPercentage = 0.5, toleranceLower = 0.4, and toleranceUpper = 0.6.
| if tsp.rootReceivedBatcher != nil { | ||
| rootBatch, _ := tsp.rootReceivedBatcher.CloseCurrentAndTakeFirstBatch() | ||
| if batch == nil { | ||
| batch = rootBatch | ||
| } else { | ||
| for id := range rootBatch { | ||
| batch[id] = struct{}{} | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
If rootBatch is nil, the code at line 567-569 will panic when attempting to range over it. Add a nil check for rootBatch before attempting to merge it with batch, similar to the check for batch at line 564.
There was a problem hiding this comment.
That's not how looping over a nil map works in Go...
Logiraptor
left a comment
There was a problem hiding this comment.
Overall seems reasonable. A few nits. I would be interested in a refactor of the idBatcher so it can support re-prioritization, but I don't mind merging this first and leaving that for later.
| } | ||
| // Wait long enough that we pass the decision wait after a root is received, | ||
| // but no where near the base decision wait. | ||
| time.Sleep(2 * time.Second) |
There was a problem hiding this comment.
Is it possible to use the controller here and waitForTick instead?
There was a problem hiding this comment.
Not easily, mainly because the test controller uses a syncIDBatcher so any wait for tick is going to pull from the normal batch. I could refactor the test controller to allow multiple batches but didn't seem worthwhile as the test just takes a couple seconds. Plus it is kind of nice to have a test using the real batcher.
Let me know if you want me to look into the refactor though!
|
Agreed on wanting to refactor the id batcher to either a priority queue or an implementation that allows rebatching. I opened an issue for that work: #45054. |
|
You have approvals from codeowners but CI doesn't pass. Please check CI, push changes and mark ready to review again. |
|
Updated! Just merging main fixed the CI issues. |
|
Thank you for your contribution @csmarchbanks! 🎉 We would like to hear from you about your experience contributing to OpenTelemetry by taking a few minutes to fill out this survey. If you are getting started contributing, you can also join the CNCF Slack channel #opentelemetry-new-contributors to ask for guidance and get help. |
…s received (open-telemetry#44878) <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description When testing the work done to provide early decisions (open-telemetry#44456) the impact was very limited in some scenarios, specifically when policies existed that look for long traces or if an error is ever present. It is not possible to know if an error span will come along in the future so almost all traces end up waiting until the decision wait causing the savings to only be a couple of percent in some environments. What I found myself wishing for was a way to run all decisions more quickly for most cases, and what worked fairly well is to base that decision on if a root span has been received or not. This change implements a second decision wait to collect any straggler spans that might be present for a trace (e.g. a second service with a different batch timer before sending), but still allow it to be much faster than the base decision wait. A good way of thinking about the two options is that decision_wait is the maximum amount of time we will wait for a span to arrive, and decision_wait_after_root_received is the minimum amount of time we will wait for additional spans to come in. The downside of this approach is that heavily asynchronous traces may not be sampled as expected, however we find that those do not work very well as it is since they commonly last longer than the decision wait anyway. The behavior is opt in so no changes are needed for any users. I wanted to keep the changes in this PR relatively small, but in the future we could re-implement id batcher to support moving traces between batches, or possibly a priority queue where we pop values until some threshold, rather than having two batchers. <!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. --> #### Link to tracking issue Part of open-telemetry#43876 <!--Describe what testing was performed and which tests were added.--> #### Testing Added smoke test for the new functionality <!--Describe the documentation added.--> #### Documentation Added documentation explaining the new configuration variable. <!--Please delete paragraphs that you did not use before submitting.-->
…47535) #### Description Replaces @Logiraptor with @csmarchbanks as a codeowner of the tail sampling processor as I have been more active recently and @Logiraptor is working on other efforts. In addition, adds @carsonip as a new code owner. Some of my TSP efforts: * #42573 * #44878 * #45286 * #46161 TSP work that @carsonip has done: * #43561 * #46762 * #42326
…pen-telemetry#47535) #### Description Replaces @Logiraptor with @csmarchbanks as a codeowner of the tail sampling processor as I have been more active recently and @Logiraptor is working on other efforts. In addition, adds @carsonip as a new code owner. Some of my TSP efforts: * open-telemetry#42573 * open-telemetry#44878 * open-telemetry#45286 * open-telemetry#46161 TSP work that @carsonip has done: * open-telemetry#43561 * open-telemetry#46762 * open-telemetry#42326
…pen-telemetry#47535) #### Description Replaces @Logiraptor with @csmarchbanks as a codeowner of the tail sampling processor as I have been more active recently and @Logiraptor is working on other efforts. In addition, adds @carsonip as a new code owner. Some of my TSP efforts: * open-telemetry#42573 * open-telemetry#44878 * open-telemetry#45286 * open-telemetry#46161 TSP work that @carsonip has done: * open-telemetry#43561 * open-telemetry#46762 * open-telemetry#42326
Description
When testing the work done to provide early decisions (#44456) the impact was very limited in some scenarios, specifically when policies existed that look for long traces or if an error is ever present. It is not possible to know if an error span will come along in the future so almost all traces end up waiting until the decision wait causing the savings to only be a couple of percent in some environments.
What I found myself wishing for was a way to run all decisions more quickly for most cases, and what worked fairly well is to base that decision on if a root span has been received or not. This change implements a second decision wait to collect any straggler spans that might be present for a trace (e.g. a second service with a different batch timer before sending), but still allow it to be much faster than the base decision wait. A good way of thinking about the two options is that decision_wait is the maximum amount of time we will wait for a span to arrive, and decision_wait_after_root_received is the minimum amount of time we will wait for additional spans to come in.
The downside of this approach is that heavily asynchronous traces may not be sampled as expected, however we find that those do not work very well as it is since they commonly last longer than the decision wait anyway. The behavior is opt in so no changes are needed for any users.
I wanted to keep the changes in this PR relatively small, but in the future we could re-implement id batcher to support moving traces between batches, or possibly a priority queue where we pop values until some threshold, rather than having two batchers.
Link to tracking issue
Part of #43876
Testing
Added smoke test for the new functionality
Documentation
Added documentation explaining the new configuration variable.