Sampling decison is too late to gain much performance #620

Oberon00 · 2020-05-26T10:31:05Z

Problem description

Instrumentations are encouraged to provide as many attributes as possible already when starting the span, to allow the sampler to make a better decision. However, collecting these attributes can already be expensive. There is one prominent case where samplers won't even look at these attributes, namely if the sampler respects parent decisions.

Proposal

I think the spec should contain a mechanism for instrumentations to ask the sampler earlier, before even starting to collect expensive attributes. Two ideas:

Languages that use two-step span creation (via a SpanBuilder) could provide a "isRecordable" already on the SpanBuilder that calls the sampler with the information collected so far. A problem with this is that the span name may also be expensive to compute, but that is fixable by making it optional and providing a setName on the SpanBuilder.
All languages could provide a isRecordable method on the tracer with some arguments, probably only the parent SpanContext.

Discussion

A difference to current calls to the sampler with both of these is that the new calls are (and should be!) made before a span ID is computed (whether the trace ID should be filled in from the parent context in cases where there is one is debatable). Nevertheless, I think the existing sampler interface is enough to support these use cases if samplers are written a bit defensively (the probability sampler which does look at the trace ID might be an exception, it would have to return true for invalid trace IDs since it cannot know yet).

EDIT: One issue with calling Sampler.shouldSample twice is that it may lose performance in the sampled case, especially if the sampler adds / computes attributes for its decision. This could be fixed by adding a new method to the sampler so the tracer knows any attributes won't be used. This can also be done backwards-compatible, as the method can be easily substituted with "return true" or "return shouldSample().isSampled()" for samplers that don't support it.

Another advantage of these methods would be that they could also support explicit tracing suppression (as per #530).

Oberon00 · 2020-06-02T15:48:22Z

Note that this feature may probably be possible to bolt-on in a way that is API-compatible (on the instrumentation side at least), but resolving this before 1.0 may lead to a cleaner design. For example, we could have completely separate interfaces for purely parent-based sampling and "root sampling". #610 already does this mostly but behind the same interface.

cijothomas · 2020-06-04T21:23:41Z

Instrumentations are encouraged to provide as many attributes as possible already when starting the span, to allow the sampler to make a better decision.

Where is this suggested? I couldn't find this in specs - My assumption was that, its entirely up to the libraries to decide how much attributes (or none at all) to pass initially (available for Samplers), and then check span.IsRecording before adding more and more (expensive) attributes .

arminru · 2020-06-05T12:06:35Z

@cijothomas

It's stated in the tracing API spec in the span creation section:

opentelemetry-specification/specification/trace/api.md

Lines 265 to 266 in 1729bc4

    
           Whenever possible, users SHOULD set any already known attributes at span creation 
        
           instead of calling `SetAttribute` later.

The assumption here, I suppose, is that the more fundamental attributes relevant for sampling decisions are readily available right from the start and that additional, more expensive attributes would be retrieved after a positive sampling decision.

Oberon00 · 2020-06-05T13:27:00Z

The most expensive attribute for http are often the URL parts, and you definitively want to know them when sampling a root span.

cijothomas · 2020-06-05T16:39:07Z

@cijothomas

It's stated in the tracing API spec in the span creation section:

opentelemetry-specification/specification/trace/api.md

Lines 265 to 266 in 1729bc4

Whenever possible, users SHOULD set any already known attributes at span creation

instead of calling `SetAttribute` later.

The assumption here, I suppose, is that the more fundamental attributes relevant for sampling decisions are readily available right from the start and that additional, more expensive attributes would be retrieved after a positive sampling decision.

Thanks for clarifying. By "already known attributes" - I took it as "already available attributes" - so the cost of obtaining them is already paid anyway, so use it in SpanCreation.

tigrannajaryan · 2020-09-09T01:38:44Z

Improving sampling performance appears to be a nice-to-have capability but not a mandatory one. The proposal for isRecordable also seems to be an additive change, not breaking. I suggest that we remove the release:required-for-ga label.

lmolkova · 2021-09-13T17:50:00Z

Adding a new use-case:

Native instrumentations don't even know if SDK is present and configured

this implies a hint for instrumentation to know if the tracer is operational at all, regardless of sampler.

lmolkova · 2021-09-23T19:19:29Z

And another one open-telemetry/oteps#172:

Span suppression for auto-instrumentation duplicates or verbose layers (e.g. http physical)

jmacd · 2021-09-23T23:42:29Z

It looks like this would need a prototype and a draft spec change to move forward.

In the linked issue #1916 I mentioned a preference for a lazy-value mechanism to support the intended performance gains. Only a Sampler that actually tries to use attributes will incur the cost of evaluating attributes before the Sampler decision.

I would begin by asking in maintainers in each language if there is an existing instrumentation framework with a precedent for lazy values. In OpenTracing-Go there was a lazy log-field for example, https://github.com/opentracing/opentracing-go/blob/e2cb74943cd81f680d00b6be69fc4a2557525727/log/field.go#L162, inspired iirc by the uber-go/zap implementation.

yurishkuro · 2021-09-24T00:02:48Z

In Go capturing lazy-eval attributes incurs allocations which could cause more overhead than the savings from lazy eval.

Jaeger samplers (Go, Node.js) support deferred sampling decision where the span is recording until enough attributes are collected to definitely decide not to sample, at which point span becomes no-op. It's definitely more expensive than blind probabilistic sampling.

Oberon00 changed the title ~~Normal head-based sampling decison is too late to gain much performance~~ Sampling decison is too late to gain much performance May 26, 2020

reyang mentioned this issue May 26, 2020

Remove SpanId from Sampler input #621

Merged

jmacd added the area:sampling label May 29, 2020

evantorrie mentioned this issue Jun 10, 2020

Remove trace.WithRecord() trace start option open-telemetry/opentelemetry-go#787

Closed

bogdandrutu added the spec:trace label Jun 12, 2020

jmacd mentioned this issue Jun 12, 2020

[feat] Allow additional sampling hooks open-telemetry/oteps#115

Closed

carlosalberto added the release:required-for-ga label Jul 10, 2020

andrewhsu added the priority:p2 label Jul 17, 2020

Oberon00 added release:after-ga and removed release:required-for-ga labels Sep 9, 2020

andrewhsu removed the priority:p2 label Sep 9, 2020

Oberon00 mentioned this issue Sep 24, 2020

Option to allow "default" IDs for unsampled traces #864

Open

MrAlias mentioned this issue Sep 25, 2020

Remove WithRecording option from trace API open-telemetry/opentelemetry-go#192

Closed

Oberon00 mentioned this issue Apr 6, 2021

Sampling based on URL #1597

Closed

Oberon00 mentioned this issue Apr 30, 2021

Add Suppress Tracing context key #1653

Closed

Oberon00 mentioned this issue May 11, 2021

Add Resource to ShouldSample. #1658

Closed

Oberon00 mentioned this issue Sep 10, 2021

Define HTTP attributes that MUST be provided at span creation time #1916

Merged

lmolkova mentioned this issue Oct 12, 2021

Confirm that HTTP instrumentations can provide sampling-relevant attributes at creation time #2011

Closed

jmacd mentioned this issue Nov 30, 2021

Probability sampling in tracestate specification #2047

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sampling decison is too late to gain much performance #620

Sampling decison is too late to gain much performance #620

Oberon00 commented May 26, 2020 •

edited

Loading

Oberon00 commented Jun 2, 2020 •

edited

Loading

cijothomas commented Jun 4, 2020 •

edited

Loading

arminru commented Jun 5, 2020

Oberon00 commented Jun 5, 2020

cijothomas commented Jun 5, 2020

tigrannajaryan commented Sep 9, 2020

lmolkova commented Sep 13, 2021

lmolkova commented Sep 23, 2021 •

edited

Loading

jmacd commented Sep 23, 2021

yurishkuro commented Sep 24, 2021

Sampling decison is too late to gain much performance #620

Sampling decison is too late to gain much performance #620

Comments

Oberon00 commented May 26, 2020 • edited Loading

Problem description

Proposal

Discussion

Oberon00 commented Jun 2, 2020 • edited Loading

cijothomas commented Jun 4, 2020 • edited Loading

arminru commented Jun 5, 2020

Oberon00 commented Jun 5, 2020

cijothomas commented Jun 5, 2020

tigrannajaryan commented Sep 9, 2020

lmolkova commented Sep 13, 2021

lmolkova commented Sep 23, 2021 • edited Loading

jmacd commented Sep 23, 2021

yurishkuro commented Sep 24, 2021

Oberon00 commented May 26, 2020 •

edited

Loading

Oberon00 commented Jun 2, 2020 •

edited

Loading

cijothomas commented Jun 4, 2020 •

edited

Loading

lmolkova commented Sep 23, 2021 •

edited

Loading