Allow samplers to be called during different moments in the Span lifetime #307

bogdandrutu · 2019-10-11T20:19:43Z

Currently Sampler is called only onSpanCreation.

Based on the comment #296 (comment) we should consider to allow:

onStart - span creation time
onSpanNameUpdate - when updateName is called
onSetAttribute - when setAttribute is called
onEnd - when end is called.

The text was updated successfully, but these errors were encountered:

jmacd · 2019-10-15T18:23:02Z

If I were a sampler with enough sophistication to respond to each of these events in a span lifecycle, I might also like to maintain state between the calls in the span processor state itself. This feels like something for a specialized SDK, not something for a public sampler API IMO.

paivagustavo · 2019-10-22T03:05:57Z

We've hit some questions (open-telemetry/opentelemetry-go#224) that I think are related with this issue.

By the current specs, if a span that was initially defined as isRecording == false it can drop all tracing events but what happens if someone tries to update the span name or set a new attribute after adding events? The span would have incomplete data if it became sampled.

To have a span that contains all data we would need to keep every event, attribute, status and etc from every span, if we are doing this what is the point of the IsRecording at all?

jmacd · 2019-10-22T03:06:55Z

I would like to propose a different solution. Presently, there are open issues in the Go repository about removing the WithRecord option and about the UpdateName vs SetName debate, which are both connected with this issue.

Instead of "re-triggering" the sampler API or re-evaluating it at different moments during the span lifetime, I see two good options:

--
First, a specialized SDK should be able to implement an arbitrarily complex state machine to perform a complex sampling decision that might change upon the arrival of any new information. It could potentially decide to not record a Span using the initial information, or it could decide to begin recording and stop at some later time when a new attribute, a change of name, or an event occur. This is the arbitrarily complex case and SDKs are free to do this.

--
The simple case, which I believe is being advocated for OTel to support generally, is to support a relatively flexible sampling API. The one we have today has raised this confusion about UpdateName vs SetName and this question of re-evaluation, but it hasn't helped us decide whether to record a span or not. Now that the spec says the WithRecord option is not a span parameter, does this become part of the sampling API? It seems there are really two questions the sampling API needs to answer:

Given the initial arguments given to start a Span, decide whether to record this span. In this call, we get the initial name of the span, the initial attributes, and any other options passed to Start(). The sampler has enough information to decide whether to record the span or not. It still does not know the latency, the full set of attributes, or the events.
At the end of a span, assuming it was recorded, the Sampler API has an opportunity to decide whether to sample or not, given the complete and finalized SpanData structure. This is an opportunity to make a final decision.

This addresses the confusion. A sampler can make relatively simplistic decisions up front, or it can make fully-informed decisions at the end of the span, but that is all it can do. This is, after all, the simple case. This suggests that there should be TWO calls in the Sampler API. At the start, decide whether to record. At the end, decide whether to sample.

Oberon00 · 2019-10-22T11:12:24Z

Note that when the Sampler (or the exporter for that matter) decides to drop the Span only at step 2, the Span's context may already have been propagated with an Sampled=true TraceOptions flag. I.e., if the decision changes from sampled in to sampled out any time after span creation (step 1), then you will probably get broken traces.

paivagustavo · 2019-10-22T12:15:29Z

@Oberon00 what is this different than someone updating a span names and it be sampled out after context propagation may happen? If sampler happens anywhere after creation there will be always chance for broken traces.

Oberon00 · 2019-10-22T14:00:56Z

@paivagustavo: Exactly that was my point 😃

EDIT: My understanding of the current sampling approach is that sampling happens immediately at span creation only.

carlosalberto · 2019-10-23T03:57:16Z

what is this different than someone updating a span names and it be sampled out after context propagation may happen?

AFAIK this is something that tracers like Jaeger already might run into, and this is done on at their own risk ;)

I see two good options

I really like the second option. Given that a lot of things have been added lately, and more of them will, I'd stick to the simplest, correct choice whenever possible.

jmacd · 2019-10-24T16:53:44Z

Remember that there are more uses for a Span than just constructing "complete" traces. If I want to study the statistical properties of a certain operation, let's say, I might decide to record all Spans and use some kind of latency-weighted sampling logic, for example. There's no requirement to collect a full trace, and I'd like to have control over whether spans are recorded or not. Z-pages are the other important use-case.

carlosalberto · 2020-01-29T16:13:49Z

Moving this issue to the next milestone as there's no consensus here yet.

yurishkuro · 2020-06-11T22:05:54Z

Responding to @jmacd's #307 (comment) about restricting sampler to two decisions only, instead of multiple decisions proposed in the issue description. Restricting the second decision only to span.finish means that any other activity the request has performed that resulted in downstream RPCs will not be recorded if the original decision as No, since it will be propagated to other services. In contrast, allowing multiple sampling decisions on every update of the span means the information needed for a Yes decision (like a certain tag) can become available early in the span life cycle.

toumorokoshi · 2020-06-11T22:11:22Z

FYI I've created an OTEP around this: open-telemetry/oteps#115. Will attempt to get discussions moving, so review is much appreciated!

jmacd · 2020-06-11T22:29:09Z

Responding to @jmacd's #307 (comment) about restricting sampler to two decisions only, instead of multiple decisions proposed in the issue description.

Just to clarify my earlier remark, I had described two descisions, but not two sampling decisions.

One was "attach a real span object vs a no-op span", which equates with "Maybe sample" or "Definitely do not sample"--this is a decision to possibly sample the span before all the information is available. If the first decision is No, there will be no information recorded.

The second decision was a tail-sampling decision where all the information is available. I view my suggestion is pretty close to open-telemetry/oteps#115.

tigrannajaryan · 2020-09-08T23:32:06Z

Given that open-telemetry/oteps#115 has no approvals yet and there is no clear understanding of how this issue will be resolved do we still believe this is a must-have for 1.0 GA?

It appears to me this functionality can be added in backwards compatible manner and if so can be postponed and addressed after GA.

Thoughts?

toumorokoshi · 2020-09-09T04:33:37Z

I think it's fine to leave until after GA, since my main concern (discouraging calling update_name) has been removed from the spec.

That said the OTEP as it stands has either addressed all feedback or waiting on responses from others. If someone can help guide me in how to get more eyes I'm happy to get right back to working on it.

bogdandrutu mentioned this issue Oct 11, 2019

Update Sampler interface base on OTEP 0006-sampling #296

Merged

SergeyKanzhelev added this to the Alpha v0.3 milestone Oct 16, 2019

paivagustavo mentioned this issue Oct 22, 2019

UpdateName doesn't trigger sampling decision if the span is not sampled when created. open-telemetry/opentelemetry-go#224

Closed

jmacd mentioned this issue Oct 24, 2019

api(trace): remove 'WithRecord' span option. open-telemetry/opentelemetry-go#223

Closed

paivagustavo mentioned this issue Dec 6, 2019

Remove WithRecording option from trace API open-telemetry/opentelemetry-go#192

Closed

carlosalberto modified the milestones: Alpha v0.3, Alpha v0.4 Jan 29, 2020

Oberon00 mentioned this issue Mar 31, 2020

remove warnings for trace.UpdateName #506

Closed

jmacd added the area:sampling Related to trace sampling label May 29, 2020

carlosalberto modified the milestones: v0.5, v0.6 Jun 9, 2020

evantorrie mentioned this issue Jun 10, 2020

Remove trace.WithRecord() trace start option open-telemetry/opentelemetry-go#787

Closed

yurishkuro mentioned this issue Jun 11, 2020

[feat] Allow additional sampling hooks open-telemetry/oteps#115

Closed

bogdandrutu added the spec:trace Related to the specification/trace directory label Jun 12, 2020

carlosalberto added area:sdk Related to the SDK release:required-for-ga Must be resolved before GA release, or nice to have before GA labels Jun 30, 2020

andrewhsu added the priority:p3 Lowest priority level label Jul 17, 2020

Oberon00 added release:after-ga Not required before GA release, and not going to work on before GA and removed release:required-for-ga Must be resolved before GA release, or nice to have before GA labels Sep 9, 2020

andrewhsu removed the priority:p3 Lowest priority level label Sep 9, 2020

evantorrie mentioned this issue Mar 5, 2021

Remove WithRecord() option from trace.SpanOption when starting a span open-telemetry/opentelemetry-go#1660

Merged

jack-berg mentioned this issue Jul 30, 2024

Support decide shouldSample on process end open-telemetry/opentelemetry-java#6609

Open

jonatan-ivanov mentioned this issue Aug 1, 2024

Sample on error - change sampling decision during the flow micrometer-metrics/tracing#786

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow samplers to be called during different moments in the Span lifetime #307

Allow samplers to be called during different moments in the Span lifetime #307

bogdandrutu commented Oct 11, 2019

jmacd commented Oct 15, 2019

paivagustavo commented Oct 22, 2019

jmacd commented Oct 22, 2019

Oberon00 commented Oct 22, 2019

paivagustavo commented Oct 22, 2019

Oberon00 commented Oct 22, 2019 •

edited

Loading

carlosalberto commented Oct 23, 2019

jmacd commented Oct 24, 2019

carlosalberto commented Jan 29, 2020

yurishkuro commented Jun 11, 2020

toumorokoshi commented Jun 11, 2020

jmacd commented Jun 11, 2020

tigrannajaryan commented Sep 8, 2020

toumorokoshi commented Sep 9, 2020

Allow samplers to be called during different moments in the Span lifetime #307

Allow samplers to be called during different moments in the Span lifetime #307

Comments

bogdandrutu commented Oct 11, 2019

jmacd commented Oct 15, 2019

paivagustavo commented Oct 22, 2019

jmacd commented Oct 22, 2019

Oberon00 commented Oct 22, 2019

paivagustavo commented Oct 22, 2019

Oberon00 commented Oct 22, 2019 • edited Loading

carlosalberto commented Oct 23, 2019

jmacd commented Oct 24, 2019

carlosalberto commented Jan 29, 2020

yurishkuro commented Jun 11, 2020

toumorokoshi commented Jun 11, 2020

jmacd commented Jun 11, 2020

tigrannajaryan commented Sep 8, 2020

toumorokoshi commented Sep 9, 2020

Oberon00 commented Oct 22, 2019 •

edited

Loading