Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supplementary guidance for metrics additive property #2571

Merged
merged 18 commits into from
May 26, 2022

Conversation

reyang
Copy link
Member

@reyang reyang commented May 21, 2022

Changes

Explained the additive property, overflow/underflow, dynamic range, IEEE 754 double saturation, IEEE 754 subnormal, etc.

This PR doesn't change the spec, it provides additional guidance and examples to help people better understand the metrics specifications.

@jmacd @jsuereth FYI

@reyang reyang requested review from a team May 21, 2022 21:31
@reyang reyang changed the title Reyang/additive Supplementary guidance for metrics additive property May 21, 2022
@reyang reyang added the spec:metrics Related to the specification/metrics directory label May 21, 2022
specification/metrics/supplementary-guidelines.md Outdated Show resolved Hide resolved
specification/metrics/supplementary-guidelines.md Outdated Show resolved Hide resolved
specification/metrics/supplementary-guidelines.md Outdated Show resolved Hide resolved
@reyang reyang merged commit 5ffcd35 into open-telemetry:main May 26, 2022
@reyang reyang deleted the reyang/additive branch May 26, 2022 00:55
@@ -79,6 +82,139 @@ Here is one way of choosing the correct instrument:

### Additive property

In OpenTelemetry a [Measurement](./api.md#measurement) encapsulates a value and

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a user picks an Additive instrument, what agreement is being made between the instrumenter and the tools that will interpret and visualize the measurements?

  1. Whenever these measurements are presented in summarized form, the sum over all attributes is recommended as the most useful default summary value.
  2. Whenever these measurements are presented in summarized form, the sum over all attributes isn't necessarily the most useful or a good default, but it would make conceptual sense to provide it as an option.
  3. Something else?

I'm suspicious that the text below will lead instrumenters to interpret additivity with a low bar (2) but UI authors will mostly care about defaults so they will only care about Additivity if it gives them (1). Making a clear stand in the spec could help everyone get on the same page.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no direct agreement between instrumenter and the backend, as there are several roles involved - instrumenter (folks who use the API), application owner (who configures the SDK and exporter), backend (time-series database + query engine) and UX (query client + UI).

I think the UI is using the type of the metrics stream (e.g. sum, gauge, histogram, summary, etc.) plus additional metadata (e.g. the name of the metrics stream if it's a well known thing as defined by the semantic convention, unit) to provide the experience.

The way I look at the supplementary guideline - it is trying to provide additional context which might help to give clarity, or help folks to think about things that they haven't thought about previously, or just calling out that something is hard/complex without providing a simple answer.

* ...
* During (T<sub>0</sub>, T<sub>n+1</sub>], we reported `1,872`.
* During (T<sub>n+2</sub>, T<sub>n+3</sub>], we reported `35`.
* During (T<sub>n+2</sub>, T<sub>n+4</sub>], we reported `76`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does 'report' mean here? Originally I thought the context was the instrument API so I expected report to mean app code is reporting a measurement to the SDK. However explicit notions of a time range I think only show up in the SDK which makes me think this report is SDK->Exporter or Exporter->Back end?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However explicit notions of a time range I think only show up in the SDK which makes me think this report is SDK->Exporter or Exporter->Back end?

Yep, report here is referring to the cumulative sum from the SDK output.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Ln132-133 "count the committed transactions in a database, reported as cumulative sum every 15 seconds" - "count" is the action from API perspective (count transactions by calling counter.Add(increment)), report is the result from the SDK (after aggregation).

For Instruments which take increments and/or decrements as the input (e.g.
[Counter](./api.md#counter) and [UpDownCounter](./api.md#updowncounter)), the
underlying numeric types (e.g., signed integer, unsigned integer, double) have
direct impact on the dynamic range, precision, and how the data is interpreted.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we codify how the implementers choice of datatype impacts what values can be stored for intermediate aggregates, on the wire, and long-term persistent aggregates? We are telling them there is impact and the examples give suggestions of what the impacts might be, but extrapolating these examples still left me with ambiguity. For example what parts of these examples are guarantees that all OpenTelemetry compliant implementations would/would not overflow and what parts were implementation specific choices how one particular SDK and backend might behave?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no guarantee - as called out in the first line of this doc ("Note: this document is NOT a spec, it is provided to support the Metrics API and SDK specifications, it does NOT add any extra requirements to the existing specifications.")

However, I do think it'll be helpful to give some suggestions (not requirements) to certain situations - e.g. JavaScript might not support int64, Python might treat integers as BigInt.

beeme1mr pushed a commit to beeme1mr/opentelemetry-specification that referenced this pull request Aug 31, 2022
carlosalberto pushed a commit to carlosalberto/opentelemetry-specification that referenced this pull request Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:metrics Related to the specification/metrics directory
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants