-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supplementary guidance for metrics additive property #2571
Conversation
Co-authored-by: Yuri Shkuro <[email protected]>
Co-authored-by: Cijo Thomas <[email protected]>
Co-authored-by: Cijo Thomas <[email protected]>
Co-authored-by: Cijo Thomas <[email protected]>
@@ -79,6 +82,139 @@ Here is one way of choosing the correct instrument: | |||
|
|||
### Additive property | |||
|
|||
In OpenTelemetry a [Measurement](./api.md#measurement) encapsulates a value and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a user picks an Additive instrument, what agreement is being made between the instrumenter and the tools that will interpret and visualize the measurements?
- Whenever these measurements are presented in summarized form, the sum over all attributes is recommended as the most useful default summary value.
- Whenever these measurements are presented in summarized form, the sum over all attributes isn't necessarily the most useful or a good default, but it would make conceptual sense to provide it as an option.
- Something else?
I'm suspicious that the text below will lead instrumenters to interpret additivity with a low bar (2) but UI authors will mostly care about defaults so they will only care about Additivity if it gives them (1). Making a clear stand in the spec could help everyone get on the same page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no direct agreement between instrumenter and the backend, as there are several roles involved - instrumenter (folks who use the API), application owner (who configures the SDK and exporter), backend (time-series database + query engine) and UX (query client + UI).
I think the UI is using the type of the metrics stream (e.g. sum, gauge, histogram, summary, etc.) plus additional metadata (e.g. the name of the metrics stream if it's a well known thing as defined by the semantic convention, unit) to provide the experience.
The way I look at the supplementary guideline - it is trying to provide additional context which might help to give clarity, or help folks to think about things that they haven't thought about previously, or just calling out that something is hard/complex without providing a simple answer.
* ... | ||
* During (T<sub>0</sub>, T<sub>n+1</sub>], we reported `1,872`. | ||
* During (T<sub>n+2</sub>, T<sub>n+3</sub>], we reported `35`. | ||
* During (T<sub>n+2</sub>, T<sub>n+4</sub>], we reported `76`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does 'report' mean here? Originally I thought the context was the instrument API so I expected report to mean app code is reporting a measurement to the SDK. However explicit notions of a time range I think only show up in the SDK which makes me think this report is SDK->Exporter or Exporter->Back end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However explicit notions of a time range I think only show up in the SDK which makes me think this report is SDK->Exporter or Exporter->Back end?
Yep, report
here is referring to the cumulative sum from the SDK output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Ln132-133 "count the committed transactions in a database, reported as cumulative sum every 15 seconds" - "count" is the action from API perspective (count transactions by calling counter.Add(increment)
), report
is the result from the SDK (after aggregation).
For Instruments which take increments and/or decrements as the input (e.g. | ||
[Counter](./api.md#counter) and [UpDownCounter](./api.md#updowncounter)), the | ||
underlying numeric types (e.g., signed integer, unsigned integer, double) have | ||
direct impact on the dynamic range, precision, and how the data is interpreted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we codify how the implementers choice of datatype impacts what values can be stored for intermediate aggregates, on the wire, and long-term persistent aggregates? We are telling them there is impact and the examples give suggestions of what the impacts might be, but extrapolating these examples still left me with ambiguity. For example what parts of these examples are guarantees that all OpenTelemetry compliant implementations would/would not overflow and what parts were implementation specific choices how one particular SDK and backend might behave?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no guarantee - as called out in the first line of this doc ("Note: this document is NOT a spec, it is provided to support the Metrics API and SDK specifications, it does NOT add any extra requirements to the existing specifications.")
However, I do think it'll be helpful to give some suggestions (not requirements) to certain situations - e.g. JavaScript might not support int64, Python might treat integers as BigInt.
Changes
Explained the additive property, overflow/underflow, dynamic range, IEEE 754 double saturation, IEEE 754 subnormal, etc.
This PR doesn't change the spec, it provides additional guidance and examples to help people better understand the metrics specifications.
@jmacd @jsuereth FYI