Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add observable (async) counter to the metrics API #1590

Merged
merged 18 commits into from
Apr 9, 2021

Conversation

reyang
Copy link
Member

@reyang reyang commented Apr 1, 2021

Related to #1578.

Changes

During the 03/25/2021 Metrics API/SDK SIG Meeting, we've agreed to tackle an async instrument after Counter is done.

After this PR, the only outstanding issues for the metrics API spec would be:

  • Figure out the remaining instruments
    • How many do we need
    • What should be the names
  • Hint API (we've discussed during the SIG meeting that we will move forward to the SDK spec and come back to this topic after SDK spec is in shape)

This will be discussed during the upcoming (4/1) Metrics API/SDK SIG meeting.

Related oteps OTEP146

@reyang reyang requested review from a team April 1, 2021 06:31
@reyang reyang added spec:metrics Related to the specification/metrics directory area:api Cross language API specification issue labels Apr 1, 2021
@reyang reyang changed the title Add observable counter Add observable (async) counter to the metrics API Apr 1, 2021
@reyang reyang force-pushed the reyang/async-counter branch from 1fa59f9 to 1264dcb Compare April 1, 2021 20:34
Copy link
Contributor

@jsuereth jsuereth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few non-blocking comments.

@@ -319,6 +421,14 @@ for the interaction between the API and SDK.
* A value
* [`Attributes`](../common/common.md#attributes)

## Compatibility

All the metrics components SHOULD allow new APIs to be added to existing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. We may need to provide guidance here during implementations, but glad to see this called out.


Example uses for `ObservableCounter`:

* [CPU time](https://wikipedia.org/wiki/CPU_time), which could be reported for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think It''d be good to call out use cases for where "ObservableCounter" is not good for tracking CPU time.

E.g. in a multi-tenant scenario where you want to track CPU usage / user, you'd actually want some kind of sycnhronous instrument that can report on request-level usage with an identified user.

Observable / Async metrics are really only useful for context-less metrics (those that cannot interact with other telemetry). I'd like to call that out here for folks deciding between Async/Sync as I think Otel really shines in the Sync scenario w/ Context.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, but this is tricky because you don't want users that report cpu_usage (not per request) to use the sync version and believe that is better :)

specification/metrics/new_api.md Outdated Show resolved Hide resolved

Example uses for `ObservableCounter`:

* [CPU time](https://wikipedia.org/wiki/CPU_time), which could be reported for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, but this is tricky because you don't want users that report cpu_usage (not per request) to use the sync version and believe that is better :)

specification/metrics/new_api.md Show resolved Hide resolved
specification/metrics/new_api.md Outdated Show resolved Hide resolved
specification/metrics/new_api.md Outdated Show resolved Hide resolved
specification/metrics/new_api.md Outdated Show resolved Hide resolved
@reyang reyang force-pushed the reyang/async-counter branch from f69e4c4 to b06a5d9 Compare April 7, 2021 06:36
Copy link

@hdost hdost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are any of the open comments actually un-resolved at this point?

@reyang
Copy link
Member Author

reyang commented Apr 7, 2021

Are any of the open comments actually un-resolved at this point?

So far all the outstanding/blocking comments are resolved.

There are some open discussions (e.g. nice to have some high level document describing why we want to distinguish monotonic vs. non-monotonic, and providing examples/suggestions on when to use sync vs. async instrumentation), these are non-blocking and can be addressed in separate PRs.

### CounterFunc

`CounterFunc` is an asynchronous Instrument which reports
[monotonically](https://wikipedia.org/wiki/Monotonic_function) increasing
Copy link

@noahfalk noahfalk Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who do we believe has the onus to enforce that the values are monotonic? Is it undefined behavior if the callback returns a value that is smaller than the one it returned on the last invocation?

Alternatively, if the only purpose of saying the values are monotonic is to infer that the user prefers that the data is presented as a rate, I've yet to understand a problem with redefining the counter as a "uses rate presentation by default" counter. I can still make a mathematically well defined rate function that is sum_of_measurements / time and there is no requirement to only accept positive inputs.

(I'm fine accepting the PR as-is and resolving this as a follow up)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say something like the default SDK SHOULD enforce monotonicity, and that it is an error when a callback returns a value less than a previously returned value.

I can still make a mathematically well defined rate function that is sum_of_measurements / time and there is no requirement to only accept positive inputs

I take it you are considering whether we could drop the monotonicity idea. Why should we use separate instruments for monotonic and non-monotonic streams? There are legacy arguments that this information should be preserved, but I don't like to use those arguments. One good reason is simply to help the user. If you have a monotonic instrument, we can detect when you use it incorrectly.

Copy link

@noahfalk noahfalk Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it you are considering whether we could drop the monotonicity idea

Yeah, but very open to the notion that I may not have the full picture. What I have read so far in the conversation suggests that users don't care about monotonicity, they care whether the data is shown as a rate or as an absolute value. If its true that rate vs. absolute value is all they care about then I think pros/cons of enforcing monotonicity anyways would be:
pros:

  1. some users may misuse the API and we correctly gave them error that helped them fix their code

cons:

  1. some users might want to report a negative rate and they will be frustrated if the SDK prevents that usage
  2. If we add more instruments to have separate monotonic and non-monotonic rate options this adds complexity to all users when choosing the appropriate instrument for their use case
  3. There is a (tiny) perf cost to verify every measurement, probably only relevant in the synchronous instrument case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've covered this topic during the SIG meeting that monotonic vs. non-monotonic is important for the user experience, and it is also widely adopted by many well established metrics systems/libraries.

So here goes my suggestion:

  1. Keep the current approach - have dedicated monotonic instruments (e.g. Counter/CounterFunc)
  2. Clarify in the SDK spec whether we want to the SDK to enforce monotonicity or not (I think I agree with @jmacd that the default SDK SHOULD enforce monotonicity).
  3. Use Editorial change - call out that Counter is monotonic #1593 to track the further clarification - for example, we might want to explain these concepts and put some examples in the README file.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've covered this topic during the SIG meeting that monotonic vs. non-monotonic is important for the user experience

Perhaps I am being dense but that wasn't my understanding of the conversation. To me it felt like I put forth a question and an idea. Nobody responded saying they loved it nor did they respond saying its bad. Perhaps others thought it wasn't a good idea and were being polite, or wanted time to think it over, or I simply didn't understand the feedback : )

@jmacd, I know at the end of the SIG meeting you said you felt like you didn't have adequate opportunity to respond and mentioned some concern that the conversation was calling into question too much from the original spec. I wasn't clear whether you were refering to my comments or other comments/questions that were in the vicinity. Certainly my aim is not to push you (or anyone else) into a position you disagree with and I welcome your thoughts. So far I get the impression you aren't particularly excited about pivoting the definition of Counter to mean "displays as rate by default" with no enforcement of monotonicity. I don't know if anything I have said since was convincing or you still feel that being able to report negative measurements as errors is the most compelling concern at play so we should lay the issue to rest with that as the conclusion?

So here goes my suggestion:

To me the issue appeared unresolved. I don't want to hold up the PR and I am fine accepting that Counter defined to be monotonic is the presumptive outcome at this point. I am hoping there will be a little more discussion in #1593, ideally considering the alternatives, but if not that then at least to clarify and document the rationale. Assuming that Counter is monotonic, having the SDK be the enforcement makes sense to me. Thanks all!

Copy link

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some suggestions, but I am also happy to accept the changes as-is and follow up on specific adjustments/questions in future targetted PRs.

### CounterFunc

`CounterFunc` is an asynchronous Instrument which reports
[monotonically](https://wikipedia.org/wiki/Monotonic_function) increasing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say something like the default SDK SHOULD enforce monotonicity, and that it is an error when a callback returns a value less than a previously returned value.

I can still make a mathematically well defined rate function that is sum_of_measurements / time and there is no requirement to only accept positive inputs

I take it you are considering whether we could drop the monotonicity idea. Why should we use separate instruments for monotonic and non-monotonic streams? There are legacy arguments that this information should be preserved, but I don't like to use those arguments. One good reason is simply to help the user. If you have a monotonic instrument, we can detect when you use it incorrectly.

specification/metrics/new_api.md Outdated Show resolved Hide resolved
specification/metrics/new_api.md Outdated Show resolved Hide resolved
@reyang reyang force-pushed the reyang/async-counter branch from b0f134b to 138281d Compare April 8, 2021 20:31
Copy link
Member

@cijothomas cijothomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SergeyKanzhelev SergeyKanzhelev merged commit 5e27e1a into open-telemetry:main Apr 9, 2021
@reyang reyang deleted the reyang/async-counter branch April 9, 2021 16:41
carlosalberto pushed a commit to carlosalberto/opentelemetry-specification that referenced this pull request Oct 31, 2024
* add observable counter

* change the wording to allow language client to decide how to handle callback state

* update based on discussion during the SIG meeting

* add observer example

* clarify that it is the instrument not meter being observed

* clarify that duplicates are not allowed, and the sdk can decide how to handle it

* address review feedback

* fix typo

* specify that the callback ensures a single timestamp

* rename to CounterFunc

* address review comments

* clarify that the callback function should not take indefinite amount of time

* update the wording for callback based on PR comments

* add notes that CounterFunc callback returns absolute value instead of delta/rate

* update the example based on feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:api Cross language API specification issue spec:metrics Related to the specification/metrics directory
Projects
None yet
Development

Successfully merging this pull request may close these issues.