-
Notifications
You must be signed in to change notification settings - Fork 911
Add synchronous gauge instrument, clarify temporality selection influence on metric point persistence #3540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the plan for updating the SDK specification? It would probably not be a good idea to release this change without an SDK specification. Are we planning to block the release until we have one?
Mentioned here, The Lightstep Metrics SDK has support for an off-spec synchronous gauge achieved by using another synchronous instrument (e.g., histogram) with Gauge as the aggregation (via a "hint"). It defines two behaviors matching the "classical" behavior of a typical Statsd (delta) or Prometheus (cumulative) synchronous gauge, where the gauge value has to be set to be reported vs a gauge value that remains set indefinitely. This raises the question of whether a Meter can ever mix temporality for synchronous instruments, and how that would be configured. Presently, we have no way to individually customize temporality of a View clause, but gauges seem to be useful both ways--possibly a hint is enough. |
I didn't see anything else in the SDK that needs to be updated to accommodate this API addition. Do you have anything in mind? |
This behavior is most aligned with existing systems, but current language in the metrics data model and proto description suggest otherwise. The metrics data model says:
I've pushed a commit to adjust this to match the timestamp language for the other point types, which simply says the time window The proto gauge description says:
I've opened a draft PR which would change the language around temporality selection. The general aim is to make it more correct to let aggregation temporality influence gauges and the StartTimeUnixNano field. I think the changes needed to the data model and proto needed to follow this interpretation should be allowed (i.e. they're NOT breaking). |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
I'm not sure what @jmacd is getting at with "delta" and "cumulative" here. It's not meaningful to calculate the delta between two gauge points, which I think is why the original language talked talked about not supporting different aggregation temporalities. It sounds like maybe you're using "delta" to refer to "only sends a point when the value has changed" and "cumulative" to refer to "send a point every interval regardless of whether it has changed". AIUI both statsd (spec) and Prometheus will continue reporting the last value of a gauge until a new value is set or the gauge is explicitly cleared. It's possible someone might want to not report gauge points where the value hasn't changed, but that's not the normal behavior of statsd or Prometheus and it is difficult to consume such a time series downstream. If you want to provide this option (and I don't think it's really worth it) it should be a separate setting from whether counters are reported as delta or cumulative points.
What does "gauge was calculated" mean here? In general gauges are sampled, not calculated, so it doesn't make sense to talk about an interval for which they were calculated. Only a small subset of gauges have a window over which they are calculated (e.g. a gauge representing the mean/max of a value over a time window). Even if you include "last" as a possible calculation, that is only meaningful in an SDK context where you know you are aggregating the samples.
I disagree; there are plenty of gauge points in the wild that (meaningfully) do not have start times, and this data model spec change would make those points invalid. |
Taking the last is what is being referred to by "calculation". The SDK behavior is big factor of the data model. I don't think this language negatively impacts the interpretation of the data model in other contexts.
Not quite. As mentioned here StartTimeUnixNano is optional for all point types. This PR just aims to carve out the ability for the aggregation temporality selection to influence the value StartTimeUnixNano (either time of last collection or SDK start time) of gauges produced by synchronous gauge instruments. |
Will merge this tomorrow morning if there are no additional comments. |
Resolve #5225 The specification has [added a synchronous gauge instrument](open-telemetry/opentelemetry-specification#3540). That instrument has now been [stabilized](open-telemetry/opentelemetry-specification#4019), and that stabilization is included in the [next release](open-telemetry/opentelemetry-specification#4034). This adds the new synchronous gauge instrument to the metric API and all implementation we publish. This change will be a breaking change for any SDK developer. The `embedded` package is updated to ensure our compatibility guarantees are meet. --------- Co-authored-by: David Ashpole <[email protected]>
…ence on metric point persistence (open-telemetry#3540) Resolves open-telemetry#2318. Extends the metrics API to include a new synchronous gauge instrument, whose value can be set as measurements occur. This is useful when the measurements are made accessible via subscription, such as JFR events. There are interesting questions around the behavior of `start_time_unix_nano` and whether or not sync gauges continue to export series without new measurements. This PR has gone through some iteration (available to see via commit history), but as it currently stands, I've made the following changes: - Add new language stating that metric reader's configurable aggregation temporality as a function of instrument type influences the persistence of metric points across successive collections, as described in this open-telemetry#3540 (comment). This influence occurs regardless of whether the resulting metric point has an aggregation temporality field. In other words, sync gauge persistence behavior is influenced by aggregation temporality selection. - Add new language stating that metric reader's configurable aggregation temporality as a function of instrument type influences the starting timestamp of metric points. When an instrument is cumulative, its starting timestamp will always be the same. When an instrument is delta, its starting timestamp advances with successive collections. This influence occurs regardless of whether the resulting metric point has an aggregation temporality field. In other words, sync gauge starting timestamp is influenced by aggregation temporality selection. --------- Co-authored-by: Reiley Yang <[email protected]>
Related to open-telemetry#3893. The table has entries for all the instrument types but I forgot to add an entry for synchronous gauge in open-telemetry#3540. Fixing.
Resolves #2318.
Extends the metrics API to include a new synchronous gauge instrument, whose value can be set as measurements occur. This is useful when the measurements are made accessible via subscription, such as JFR events.
There are interesting questions around the behavior of
start_time_unix_nano
and whether or not sync gauges continue to export series without new measurements. This PR has gone through some iteration (available to see via commit history), but as it currently stands, I've made the following changes: