Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add observable (async) counter to the metrics API #1590

Merged
merged 18 commits into from
Apr 9, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
135 changes: 128 additions & 7 deletions specification/metrics/new_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,11 @@ Table of Contents
* [Meter operations](#meter-operations)
* [Instrument](#instrument)
* [Counter](#counter)
* [Counter Creation](#counter-creation)
* [Counter creation](#counter-creation)
* [Counter operations](#counter-operations)
* [CounterFunc](#counterfunc)
* [CounterFunc creation](#counterfunc-creation)
* [CounterFunc operations](#counterfunc-operations)
* [Measurement](#measurement)

</details>
Expand All @@ -44,8 +48,8 @@ The Metrics API consists of these main components:
* [Instrument](#instrument) is responsible for reporting
[Measurements](#measurement).

Here is an example of the object hierarchy inside a process instrumented with the
metrics API:
Here is an example of the object hierarchy inside a process instrumented with
the metrics API:

```text
+-- MeterProvider(default)
Expand Down Expand Up @@ -108,9 +112,9 @@ This API MUST accept the following parameters:
the specified value is invalid SHOULD be logged. A library, implementing the
OpenTelemetry API *may* also ignore this name and return a default instance
for all calls, if it does not support "named" functionality (e.g. an
implementation which is not even observability-related). A MeterProvider
could also return a no-op Meter here if application owners configure the SDK
to suppress telemetry produced by this library.
implementation which is not even observability-related). A MeterProvider could
also return a no-op Meter here if application owners configure the SDK to
suppress telemetry produced by this library.
* `version` (optional): Specifies the version of the instrumentation library
(e.g. `1.0.0`).

Expand Down Expand Up @@ -225,7 +229,7 @@ Example uses for `Counter`:
* count the number of checkpoints run
* count the number of HTTP 5xx errors

#### Counter Creation
#### Counter creation

There MUST NOT be any API for creating a `Counter` other than with a
[`Meter`](#meter). This MAY be called `CreateCounter`. If strong type is
Expand Down Expand Up @@ -304,6 +308,123 @@ counterPowerUsed.Add(13.5, new PowerConsumption { customer = "Tom" });
counterPowerUsed.Add(200, new PowerConsumption { customer = "Jerry" }, ("is_green_energy", true));
```

### CounterFunc

`CounterFunc` is an asynchronous Instrument which reports
[monotonically](https://wikipedia.org/wiki/Monotonic_function) increasing
reyang marked this conversation as resolved.
Show resolved Hide resolved
Copy link

@noahfalk noahfalk Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who do we believe has the onus to enforce that the values are monotonic? Is it undefined behavior if the callback returns a value that is smaller than the one it returned on the last invocation?

Alternatively, if the only purpose of saying the values are monotonic is to infer that the user prefers that the data is presented as a rate, I've yet to understand a problem with redefining the counter as a "uses rate presentation by default" counter. I can still make a mathematically well defined rate function that is sum_of_measurements / time and there is no requirement to only accept positive inputs.

(I'm fine accepting the PR as-is and resolving this as a follow up)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say something like the default SDK SHOULD enforce monotonicity, and that it is an error when a callback returns a value less than a previously returned value.

I can still make a mathematically well defined rate function that is sum_of_measurements / time and there is no requirement to only accept positive inputs

I take it you are considering whether we could drop the monotonicity idea. Why should we use separate instruments for monotonic and non-monotonic streams? There are legacy arguments that this information should be preserved, but I don't like to use those arguments. One good reason is simply to help the user. If you have a monotonic instrument, we can detect when you use it incorrectly.

Copy link

@noahfalk noahfalk Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it you are considering whether we could drop the monotonicity idea

Yeah, but very open to the notion that I may not have the full picture. What I have read so far in the conversation suggests that users don't care about monotonicity, they care whether the data is shown as a rate or as an absolute value. If its true that rate vs. absolute value is all they care about then I think pros/cons of enforcing monotonicity anyways would be:
pros:

  1. some users may misuse the API and we correctly gave them error that helped them fix their code

cons:

  1. some users might want to report a negative rate and they will be frustrated if the SDK prevents that usage
  2. If we add more instruments to have separate monotonic and non-monotonic rate options this adds complexity to all users when choosing the appropriate instrument for their use case
  3. There is a (tiny) perf cost to verify every measurement, probably only relevant in the synchronous instrument case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've covered this topic during the SIG meeting that monotonic vs. non-monotonic is important for the user experience, and it is also widely adopted by many well established metrics systems/libraries.

So here goes my suggestion:

  1. Keep the current approach - have dedicated monotonic instruments (e.g. Counter/CounterFunc)
  2. Clarify in the SDK spec whether we want to the SDK to enforce monotonicity or not (I think I agree with @jmacd that the default SDK SHOULD enforce monotonicity).
  3. Use Editorial change - call out that Counter is monotonic #1593 to track the further clarification - for example, we might want to explain these concepts and put some examples in the README file.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've covered this topic during the SIG meeting that monotonic vs. non-monotonic is important for the user experience

Perhaps I am being dense but that wasn't my understanding of the conversation. To me it felt like I put forth a question and an idea. Nobody responded saying they loved it nor did they respond saying its bad. Perhaps others thought it wasn't a good idea and were being polite, or wanted time to think it over, or I simply didn't understand the feedback : )

@jmacd, I know at the end of the SIG meeting you said you felt like you didn't have adequate opportunity to respond and mentioned some concern that the conversation was calling into question too much from the original spec. I wasn't clear whether you were refering to my comments or other comments/questions that were in the vicinity. Certainly my aim is not to push you (or anyone else) into a position you disagree with and I welcome your thoughts. So far I get the impression you aren't particularly excited about pivoting the definition of Counter to mean "displays as rate by default" with no enforcement of monotonicity. I don't know if anything I have said since was convincing or you still feel that being able to report negative measurements as errors is the most compelling concern at play so we should lay the issue to rest with that as the conclusion?

So here goes my suggestion:

To me the issue appeared unresolved. I don't want to hold up the PR and I am fine accepting that Counter defined to be monotonic is the presumptive outcome at this point. I am hoping there will be a little more discussion in #1593, ideally considering the alternatives, but if not that then at least to clarify and document the rationale. Assuming that Counter is monotonic, having the SDK be the enforcement makes sense to me. Thanks all!

value(s) when the instrument is being observed.

Example uses for `CounterFunc`:

* [CPU time](https://wikipedia.org/wiki/CPU_time), which could be reported for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think It''d be good to call out use cases for where "ObservableCounter" is not good for tracking CPU time.

E.g. in a multi-tenant scenario where you want to track CPU usage / user, you'd actually want some kind of sycnhronous instrument that can report on request-level usage with an identified user.

Observable / Async metrics are really only useful for context-less metrics (those that cannot interact with other telemetry). I'd like to call that out here for folks deciding between Async/Sync as I think Otel really shines in the Sync scenario w/ Context.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, but this is tricky because you don't want users that report cpu_usage (not per request) to use the sync version and believe that is better :)

each thread, each process or the entire system. For example "the CPU time for
process A running in user mode, measured in seconds".
* The number of [page faults](https://wikipedia.org/wiki/Page_fault) for each
process.

#### CounterFunc creation

There MUST NOT be any API for creating a `CounterFunc` other than with a
[`Meter`](#meter). This MAY be called `CreateCounterFunc`. If strong type is
desired, the client can decide the language idomatic name(s), for example
`CreateUInt64CounterFunc`, `CreateDoubleCounterFunc`,
`CreateCounterFunc<UInt64>`, `CreateCounterFunc<double>`.

The API MUST accept the following parameters:

* The `name` of the Instrument, following the [instrument naming
rule](#instrument-naming-rule).
* An optional `unit of measure`, following the [instrument unit
reyang marked this conversation as resolved.
Show resolved Hide resolved
rule](#instrument-unit).
* An optional `description`, following the [instrument description
rule](#instrument-description).
* A `callback` function.
reyang marked this conversation as resolved.
Show resolved Hide resolved

The `callback` function is responsible for reporting the
reyang marked this conversation as resolved.
Show resolved Hide resolved
[Measurement](#measurement)s. It will only be called when the Meter is being
reyang marked this conversation as resolved.
Show resolved Hide resolved
observed. Individual language client SHOULD define whether this callback
reyang marked this conversation as resolved.
Show resolved Hide resolved
reyang marked this conversation as resolved.
Show resolved Hide resolved
function needs to be reentrant safe / thread safe or not.

Note: Unlike [Counter.Add()](#add) which takes the increment/delta value, the
callback function reports the absolute value of the counter. To determine the
reported rate the counter is changing, the difference between successive
measurements is used.

The callback function SHOULD NOT take indefinite amount of time. If multiple
reyang marked this conversation as resolved.
Show resolved Hide resolved
independent SDKs coexist in a running process, they MUST invoke the callback
function(s) independently.

Individual language client can decide what is the idomatic approach. Here are
reyang marked this conversation as resolved.
Show resolved Hide resolved
some examples:

* Return a list (or tuple, generator, enumerator, etc.) of `Measurement`s.
* Use an observer argument to allow individual `Measurement`s to be reported.

User code is recommended not to provide more than one `Measurement` with the
same `attributes` in a single callback. If it happens, the
[SDK](./README.md#sdk) can decide how to handle it. For example, during the
callback invocation if two measurements `value=1, attributes={pid:4 bitness:64}`
and `value=2, attributes={pid:4, bitness:64}` are reported, the SDK can decide
to simply let them pass through (so the downstream consumer can handle
duplication), drop the entire data, pick the last one, or something else. The
API must treat observations from a single callback as logically taking place at
a single instant, such that when recorded, observations from a single callback
MUST be reported with identical timestamps.

The API SHOULD provide some way to pass `state` to the callback. Individual
reyang marked this conversation as resolved.
Show resolved Hide resolved
language client can decide what is the idomatic approach (e.g. it could be an
additional parameter to the callback function, or captured by the lambda
closure, or something else).

Here are some examples that individual language client might consider:

```python
# Python

def pf_callback():
# Note: in the real world these would be retrieved from the operating system
return (
(8, ("pid", 0), ("bitness", 64)),
(37741921, ("pid", 4), ("bitness", 64)),
(10465, ("pid", 880), ("bitness", 32)),
)

page_faults_counter_func = meter.create_counter_func(name="PF", description="process page faults", pf_callback)
```

```python
# Python

def pf_callback(result):
reyang marked this conversation as resolved.
Show resolved Hide resolved
# Note: in the real world these would be retrieved from the operating system
result.Observe(8, ("pid", 0), ("bitness", 64))
reyang marked this conversation as resolved.
Show resolved Hide resolved
result.Observe(37741921, ("pid", 4), ("bitness", 64))
result.Observe(10465, ("pid", 880), ("bitness", 32))

page_faults_counter_func = meter.create_counter_func(name="PF", description="process page faults", pf_callback)
```

```csharp
// C#

// A simple scenario where only one value is reported

interface IAtomicClock
{
UInt64 GetCaesiumOscillates();
}

IAtomicClock clock = AtomicClock.Connect();

var obCaesiumOscillates = meter.CreateCounterFunc<UInt64>("caesium_oscillates", () => clock.GetCaesiumOscillates());
```

#### CounterFunc operations

`CounterFunc` is only intended for asynchronous scenario. The only operation is
reyang marked this conversation as resolved.
Show resolved Hide resolved
provided by the `callback`, which is registered during the [CounterFunc
creation](#counterfunc-creation).

## Measurement

A `Measurement` represents a data point reported via the metrics API to the SDK.
Expand Down