-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attribute to indicate whether a Span is used as an Exemplar #2922
Comments
I like the simplicity of this solution to not break exemplars when sampling traces. |
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#exemplar-defaults |
Thanks Cijo, that's interesting. Tbh I don't understand why the SDKs implement it that way. You can only have one Exemplar per metric per collection cycle, so I don't see why Exemplars should be updated more frequently. My own background is with Prometheus metrics, I'm maintainer of the Prometheus Java client library. Updating the Exemplar only once per collection cycle should be straightforward to implement in Prometheus libraries, and I guess also in other metrics libraries out there. I'm wondering whether this could still be considered in the interest of making OpenTelemetry tracing (and sampling) work well with Prometheus metric instrumentation? It might as well be a good idea to reconsider the current implementation in OpenTelemetry metrics SDKs. If the metric SDK has a sampling algorithm for Exemplars that is not aligned with the trace sampling in the OpenTelemetry collector it is unlikely that the Exemplars will actually be available in the monitoring backend. This makes Exemplars useless unless you don't sample traces in the collector. |
Sorry I haven't looked at how OTel Sdks have implemented it yet (only Java has implemented it so far I think.) But from the spec wording, and the algorithm used in the built-in SimpleFixedSizedExemplarReservoir, the reservoir can be overwritten with subsequent measurements within the same collection cycle. (Unless I read the spec incorrectly..) |
For reference: The Prometheus Java metrics library has a retention time for Exemplars, which defaults to 7s. Exemplars are kept for min 7s before the next Exemplar is sampled. This should work fine with marking Spans as "used as an Exemplar", as these spans would only be produced once every 7s. Is there any chance to change the Spec to allow something like this in the OpenTelemetry SDK? And if that's not possible, is there any chance to introduce a "used as exemplar" attribute anyway for 3rd party metric libraries like Prometheus? |
EDIT: added a section on head sampling. In practice, it's really hard to know when a span is exporter whether or not the exemplar will be retained and reported by metrics. As already stated here. Even for prometheus, the issue is that only the "latest" exemplar is kept in the reservoir, so unless you buffer spans and delay writing them until metrics are also written to guarantee consistency, you're basically unable to do this. Exemplars (today) work best with head-based sampling. They can take into account whether a trace is expected to be sampled to be included. If you can use only head-based sampling, you should be relatively guaranteed at least the local span has been sampled when an exemplar shows up. If you're doing tail-based sampling, AND, you're able to buffer the full minute or two of data you'd need to delay sampling decisions for metrics to show up, it's possibly a good feature for a tail-sampler. However, right now, by design of exemplars (both in Prometheus client libraries + otel), it's practically impossible to achieve this. This likely needs to be fixed by the tail-based sampling features. At best we can annotate a span that it may have an exemplar, but without waiting for metric reporting, we cannot guarantee it is reported. |
If spans are sampled consistently (see https://opentelemetry.io/docs/reference/specification/trace/tracestate-probability-sampling/#consistent-probability-sampling), exemplars could be simply selected based on the |
Trying to offer another technical position. Suppose you were not interested in distributed tracing, per se, just in recording the spans that happen to be associated with interesting metric events but not propagating trace context. In this parallel discussion about Span links and Sampling, #2918 (comment), I've proposed two new Sampler decision codes that could be useful in developing standalone spans as metric exemplars. The two states are
These primitive Sampler decisions would support configuring a trace SDK to export spans either because they link to other sampled traces or because they were included as metrics exemplars, for example. |
@oertl The Exemplar specification defines a pluggable reservoir component, so if a new type of reservoir was designed to interact with @jmacd For both of those, who do you propose we have the interaction between Span/Exemplar? Today Exemplars leverage the |
Is it possible to change this and allow other implementations? I understand the issue with the reservoir is that at the point in time when you put a span in the reservoir you don't know yet whether you will actually use it as an Exemplar. I think there could be other implementations that decide directly whether or not to use a span as an Exemplar, without using a reservoir. For example, you could just select a new Exemplar every n seconds. |
@fstab WE defined the interface to allow other implementations, we just haven't exposed the configuration options to do so. That is a thing you could do, it does mean you're more likely to sample exemplars early. There's implications in the kind of data you'd be showing in exemplars. Would love a better hollistic analysis/proposal for tail-sampling. |
I can spend a couple of hours next week to write a design doc. Then we have something to start with. It would be great to find a way to make exemplars work with tail-sampling. |
I wrote a doc, let me know if this helps. https://docs.google.com/document/d/1ymxt-oF5268kBZN-oI1yphvGsJjPE1j_f3AsBAKfUTQ/edit?usp=sharing |
We ran into this situation ourselves as we started moving to tail sampling. The solution we employed was that we used a custom |
I've just had a quick go at implementing something along the lines of @fstab's solution in opentelemetry-java. The only difference is that instead of adding a new way of rate-limiting the exemplar sampling process, I'm instead selecting the first span seen by each reservoir cell as an exemplar, and keeping it until the metric is collected and the reservoir is reset (which by default happens every 60 seconds). Does that sound feasible, or am I missing something? |
@samwright Sorry for the delay here. Here is what I think we should do instead: open-telemetry/opentelemetry-java#5960 Once we've exposed the hook for you to create a custom ExemplarRegistry, you can move all the code form your POC into it. |
Hey, awesome to see progress on this. Just FYI: The new Prometheus |
Fixes #2922 Fixes #3812 Related to #3756 ## Changes - Cleans up language around exposing `ExemplarReservoir`. (Remove TODO, e.g.) - Remove `ExemplarFilter` interface but keep configuration options. (see #3812) - Clarify / simplify Spec Compliance matrix for existing state of the Exemplar Specification. Prototype in java: open-telemetry/opentelemetry-java#5960
…emetry#3820) Fixes open-telemetry#2922 Fixes open-telemetry#3812 Related to open-telemetry#3756 ## Changes - Cleans up language around exposing `ExemplarReservoir`. (Remove TODO, e.g.) - Remove `ExemplarFilter` interface but keep configuration options. (see open-telemetry#3812) - Clarify / simplify Spec Compliance matrix for existing state of the Exemplar Specification. Prototype in java: open-telemetry/opentelemetry-java#5960
Exemplars are used to navigate from metrics to example traces. For example, the screenshot shows latencies from the
http.server.duration
metric as visualized by Grafana. Each little dot represents an Exemplar. If you click on an Exemplar, you can navigate to the corresponding trace.However, this does not work well if traces are sampled in the collector: In many cases the metric library will choose a Span as an exemplar that will be discarded by the collector during sampling.
It would be good if metric libraries could add a standard Span attribute to indicate that a Span has been selected as an Exemplar. This attribute could be considered by the collector's sampling algorithm to ensure that Exemplars aren't discarded.
A simple way to implement this is adding a boolean attribute, like
exemplar="true"
.A better way to implement this would be an attribute containing the metric name, like
exemplar.metric="http.server.duration"
. If the attribute is present, it means that the Span is used as an Exemplar. As an additional benefit, the metric name could be used by vendors to allow navigation from a trace to a dashboard showing the corresponding metric.What do you think?
P.S.: If a Github issue here is the wrong place for this let me know where else to put it.
The text was updated successfully, but these errors were encountered: