-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sampling decison is too late to gain much performance #620
Comments
Note that this feature may probably be possible to bolt-on in a way that is API-compatible (on the instrumentation side at least), but resolving this before 1.0 may lead to a cleaner design. For example, we could have completely separate interfaces for purely parent-based sampling and "root sampling". #610 already does this mostly but behind the same interface. |
Where is this suggested? I couldn't find this in specs - My assumption was that, its entirely up to the libraries to decide how much attributes (or none at all) to pass initially (available for Samplers), and then check span.IsRecording before adding more and more (expensive) attributes . |
It's stated in the tracing API spec in the span creation section: opentelemetry-specification/specification/trace/api.md Lines 265 to 266 in 1729bc4
The assumption here, I suppose, is that the more fundamental attributes relevant for sampling decisions are readily available right from the start and that additional, more expensive attributes would be retrieved after a positive sampling decision. |
The most expensive attribute for http are often the URL parts, and you definitively want to know them when sampling a root span. |
Thanks for clarifying. By "already known attributes" - I took it as "already available attributes" - so the cost of obtaining them is already paid anyway, so use it in SpanCreation. |
Improving sampling performance appears to be a nice-to-have capability but not a mandatory one. The proposal for |
Adding a new use-case:
this implies a hint for instrumentation to know if the tracer is operational at all, regardless of sampler. |
And another one open-telemetry/oteps#172:
|
It looks like this would need a prototype and a draft spec change to move forward. In the linked issue #1916 I mentioned a preference for a lazy-value mechanism to support the intended performance gains. Only a Sampler that actually tries to use attributes will incur the cost of evaluating attributes before the Sampler decision. I would begin by asking in maintainers in each language if there is an existing instrumentation framework with a precedent for lazy values. In OpenTracing-Go there was a lazy log-field for example, https://github.com/opentracing/opentracing-go/blob/e2cb74943cd81f680d00b6be69fc4a2557525727/log/field.go#L162, inspired iirc by the |
In Go capturing lazy-eval attributes incurs allocations which could cause more overhead than the savings from lazy eval. Jaeger samplers (Go, Node.js) support deferred sampling decision where the span is recording until enough attributes are collected to definitely decide not to sample, at which point span becomes no-op. It's definitely more expensive than blind probabilistic sampling. |
Problem description
Instrumentations are encouraged to provide as many attributes as possible already when starting the span, to allow the sampler to make a better decision. However, collecting these attributes can already be expensive. There is one prominent case where samplers won't even look at these attributes, namely if the sampler respects parent decisions.
Proposal
I think the spec should contain a mechanism for instrumentations to ask the sampler earlier, before even starting to collect expensive attributes. Two ideas:
Discussion
A difference to current calls to the sampler with both of these is that the new calls are (and should be!) made before a span ID is computed (whether the trace ID should be filled in from the parent context in cases where there is one is debatable). Nevertheless, I think the existing sampler interface is enough to support these use cases if samplers are written a bit defensively (the probability sampler which does look at the trace ID might be an exception, it would have to return true for invalid trace IDs since it cannot know yet).
EDIT: One issue with calling Sampler.shouldSample twice is that it may lose performance in the sampled case, especially if the sampler adds / computes attributes for its decision. This could be fixed by adding a new method to the sampler so the tracer knows any attributes won't be used. This can also be done backwards-compatible, as the method can be easily substituted with "return true" or "return shouldSample().isSampled()" for samplers that don't support it.
Another advantage of these methods would be that they could also support explicit tracing suppression (as per #530).
The text was updated successfully, but these errors were encountered: