You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
The Otel collector allows for Tail sampling. This lets the collector make sampling decisions based on the whole trace.
Clients are unaware of sampling decisions, as they happen downstream.
However, exemplar selection is taken at the client. The client will select a span which participated in the measurement of a metric (was in scope at the moment of measurement), and will be attached to the metric datapoint when exported.
Currentlty, the collector has no way of knowing that a span has been selected as an exemplar, so cannot make sampling decisions based on that fact. Therefore, Exemplar traces will often be dropped, giving a poor experience when users try to navigate from a metric via the exemplar to the (missing, not sampled) trace.
Essentially, when a span is selected, the attribute exemplar="true" is added to the span.
Downstream tail sampling can then be easily configured to sample any trace where a span has that attribute.
Describe alternatives you've considered
For the tail sampling use case, I see no sensible alternative to marking the span.
However, instead of simply adding this "marking" behaviour, it may be worth considering adding a generic extension point for the exemplar selection.
Additional context
I have the following observations, however this was based on the last time I looked into the source code, which was a few months ago:
It appears the agent is selecting exemplars based on a "last span seen" strategy. It seems like every span that participates in a measurement is (temporarily) selected as the exemplar, until the next span is seen, at which point this next span becomes the selected exemplar., replacing the previous one
The result is, as things stand, simply marking each selected span with the exemplar="true" attribute will not work, as effectively all perticipating spans will get marked. Instead, the selection strategy should avoid the case where a significant number of spans are marked as exemplars, but which do not actually get used as exemplars in an exported metric datapoint. For example, a "first seen span" strategy would work, similar to how the Prometheus Client functions.
Spans are typically processed sometime before the metric datapoint + exemplars are exported. We would want to export the spans (with the exemplar="true" attribute) as soon as possible, but we could be waiting perhaps 30 seconds until the exemplar is exported. Therefore we cannot wait until the moment of metric export to make the exemplar selection, as the span could have long since been exported, and it will be too late to set any attributes.
The prometheus client uses the attribute exemplar="true". Perhaps this should be defined in the semantic conventions.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
The Otel collector allows for Tail sampling. This lets the collector make sampling decisions based on the whole trace.
Clients are unaware of sampling decisions, as they happen downstream.
However, exemplar selection is taken at the client. The client will select a span which participated in the measurement of a metric (was in scope at the moment of measurement), and will be attached to the metric datapoint when exported.
Currentlty, the collector has no way of knowing that a span has been selected as an exemplar, so cannot make sampling decisions based on that fact. Therefore, Exemplar traces will often be dropped, giving a poor experience when users try to navigate from a metric via the exemplar to the (missing, not sampled) trace.
Describe the solution you'd like
This issue has been addressed in the Prometheus Java Client v1.0, which has support for marking spans as "exemplars".
Essentially, when a span is selected, the attribute
exemplar="true"
is added to the span.Downstream tail sampling can then be easily configured to sample any trace where a span has that attribute.
Describe alternatives you've considered
For the tail sampling use case, I see no sensible alternative to marking the span.
However, instead of simply adding this "marking" behaviour, it may be worth considering adding a generic extension point for the exemplar selection.
Additional context
I have the following observations, however this was based on the last time I looked into the source code, which was a few months ago:
It appears the agent is selecting exemplars based on a "last span seen" strategy. It seems like every span that participates in a measurement is (temporarily) selected as the exemplar, until the next span is seen, at which point this next span becomes the selected exemplar., replacing the previous one
The result is, as things stand, simply marking each selected span with the
exemplar="true"
attribute will not work, as effectively all perticipating spans will get marked. Instead, the selection strategy should avoid the case where a significant number of spans are marked as exemplars, but which do not actually get used as exemplars in an exported metric datapoint. For example, a "first seen span" strategy would work, similar to how the Prometheus Client functions.Spans are typically processed sometime before the metric datapoint + exemplars are exported. We would want to export the spans (with the
exemplar="true"
attribute) as soon as possible, but we could be waiting perhaps 30 seconds until the exemplar is exported. Therefore we cannot wait until the moment of metric export to make the exemplar selection, as the span could have long since been exported, and it will be too late to set any attributes.The prometheus client uses the attribute
exemplar="true"
. Perhaps this should be defined in the semantic conventions.The text was updated successfully, but these errors were encountered: