-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to rebuild from deltas? #1566
Comments
This is a recurring discussion from the days when OpenMetrics and OpenCensus tried to merge. Sequence numbers allow for detection, but not correction, of missing information. A rough(!) equivalent would be UDP (sequence only) vs TCP (send windows, ACKs, retransmissions, state per receiver, local deletion after last ACK, etc.). This is significant overhead compared to plain deltas, so it comes down to design goals and constraints. As far as I know, data loss is not acceptable within OTel, so some TCP-like mechanism would be needed. If data stability can be reduced, the additional complexity could be reduced accordingly. An in-between would be to rebuild the cumulative state directly at the emitter, which lead the discussion in early 2020 back to square one: that overall complexity would be lowest if the internal representation was cumulatives, not deltas, by default. I don't know off-hand if different receivers may request different delta periods. If that's the case, the system above would need to carry several state sets, one per delta time range. The same is true for any node where deltas can rest, or are recomputed/regrouped/cached, in the overall pipeline graph. |
I'd like to split out a few things to identify/resolve:
Given point 2, I think point 3 is a lower priority issue. That said, I think we have some answers straight away. First, regarding seqeunce numbers. Assuming the #1574 and the Single-Writer philosophy, we should be able to use timestamp + aggregation temporality to uniquely identify a delta within OTLP (see https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/metrics/v1/metrics.proto#L279) Second, we do need to document what to do when we detect an out-of-order delta sum. In this case my proposal would be to reset the counter on detection. I'm reading @RichiH H's comment as "this is an ok thing to do, but not ideal' and I agree. If you we know we're outputing cumulative metrics, users should use the result of #731. In scenarios where that's not practical, this is the best we can (likely) do. If folks agree, I can write this up into the data model specification. |
An earlier discussion about rebuilding from deltas raised a question: open-telemetry/prometheus-interoperability-spec#25 (comment).
Currently, the delta data points don't have sequence numbers. There is no way to identify duplicates or missing data points. Generally speaking, we don't know how to rebuild from deltas without sequence numbers. The spec/data model should address this issue.
cc @RichiH @jmacd
The text was updated successfully, but these errors were encountered: