Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on MetricReader registered on MeterProvider #2072

Closed
legendecas opened this issue Oct 28, 2021 · 10 comments · Fixed by #2406
Closed

Clarification on MetricReader registered on MeterProvider #2072

legendecas opened this issue Oct 28, 2021 · 10 comments · Fixed by #2406
Assignees
Labels
area:sdk Related to the SDK release:required-for-ga Must be resolved before GA release, or nice to have before GA spec:metrics Related to the specification/metrics directory

Comments

@legendecas
Copy link
Member

legendecas commented Oct 28, 2021

What are you trying to achieve?

Clarify the object ownerships between metric reader and meter provider.

In the example (https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#view), it is said the metric reader is constructed in place and then it is registered to the meter provider:

meter_provider
    .add_view("*") # a wildcard view that matches everything
    .add_metric_reader(PeriodicExportingMetricReader(ConsoleExporter()))

Also, it is stated that metric reader has a Collect operation that triggers asynchronous instrument callbacks:

Metric Reader Operations:
Collect: Collects the metrics from the SDK. If there are asynchronous Instruments involved, their callback functions will be triggered.
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#metricreader-operations

As far as I could tell, all the instruments are owned by the meter provider, and thus the collect should consult the meter provider to perform async instrument callbacks.

In this sense, it seems that the reader can be registered to multiple meter provider and the metric_reader.collect should perform collection on all registered meter provider:

metric_reader = PeriodicExportingMetricReader(ConsoleExporter())
meter_provider
    .add_view("*")
    .add_metric_reader(metric_reader)

meter_provider_b
    .add_view("*")
    .add_metric_reader(metric_reader)

The problem here is that the SDK API interface suggested that we are registering the reader to the meter provider, however, the metric_reader.collect reverse the control ownership and the meter provider shall be registered on the metric reader and thus it is possible to call metric_reader.collect to perform a collection on all registered async instrument callbacks.

This sounds like a doubly linked ownership:

        meter_provider 
  shutdown/    |  ↑
  force_flush  |  | collect
               ↓  |
        metric reader

What did you expect to see?

The ownership should not be bidirectional.

@legendecas legendecas added the spec:metrics Related to the specification/metrics directory label Oct 28, 2021
@legendecas legendecas changed the title Clarification on single MetricReader registered on multiple MeterProvider Clarification on MetricReader registered on MeterProvider Oct 28, 2021
@findmyway
Copy link
Contributor

I share the same feeling here. The specification seems to suggest that the meter provider is the central place to manage meters and readers. But readers seem to be a wrapper around meter providers and should be defined in a higher level.

And more specifically, I’d suggest define the shut_down and force_flush on readers independently.

@reyang reyang added the release:after-ga Not required before GA release, and not going to work on before GA label Nov 2, 2021
@reyang reyang assigned reyang and unassigned tigrannajaryan Nov 2, 2021
@reyang
Copy link
Member

reyang commented Nov 2, 2021

Disucssed during the 11/2/2021 Spec SIG Meeting:

In this sense, it seems that the reader can be registered to multiple meter provider and the metric_reader.collect should perform collection on all registered meter provider:

Currently the spec doesn't specify if the same MetricReader instance can be shared across multiple MeterProvider instances. In practice, different languages use different approach:

  1. Java takes a factory instance instance, so it is impossible to pass in the same MetricReader instance to multiple MeterProvider instances.
  2. .NET only allows a single owner, and the MeterProvider "owns" the lifecycle of MetricReader instances registered. (when a provider is disposed, it will dispose the readers)
  3. C++ started with shared_ptr (ref counting based), there is a discussion to move to unique_ptr.

This sounds like a doubly linked ownership:

        meter_provider 
  shutdown/    |  ↑
  force_flush  |  | collect
               ↓  |
        metric reader

The control flow is bi-directional, and it is intentional - to allow both Push and Pull mode.
The ownership is not bi-directional.

@jmacd
Copy link
Contributor

jmacd commented Feb 28, 2022

I'm starting to understand this problem. I see the problem not about "ownership", but the apparent bi-directionality of control flow and something missing from the text to explain it.

Comment by @jack-berg that partly gets at the issue #2304 (comment).

A "decision to collect" comes from one of three places:

  1. Shutdown, which requires a final collection
  2. Flush, which requires a synchronous collection
  3. Any one MetricReader deciding to collect based on internal logic

A "decision to read" means for an Exporter to read through the MetricReader state. This always follows one of the three reasons to begin collection.

meter_provider (start of collection, any reason)
   ↓
metric reader (write path)
   ↓
metric exporter (following collection)
   ↓
metric reader (read path)

To summarize, the SDK or MeterProvider owns a write path while the Metric exporter owns a read path. I do not believe there's a cycle here, but potentially there's some way to improve the text.

@jmacd
Copy link
Contributor

jmacd commented Mar 1, 2022

In this sense, it seems that the reader can be registered to multiple meter provider

This connects with #2232, since the underlying request there is to have a way to allow whole Meter instances and the associated metric streams to come and go (i.e., be forgotten) at runtime. Due to current uncertainty in the spec, it might seem this can be accomplished if a Reader can be registered to multiple providers. I will argue the other way, because this can also be accomplished if an Exporter can be registered to multiple providers through multiple Reader instances.

First question

The difference in these two ways of framing the problem, implicitly, is whether "some kind of aggregation happens?" when the same streams are produced by more than one MeterProvider. If this is a choice--whether we support "one-to-many" for Exporters or for Readers--then supporting one-to-many for Exporters answers "No" to this question and supporting one-to-many for Readers answers "Yes" to this question about aggregation across providers.

I personally want to answer No to this question, I would be glad to see one Exporter to many providers. The user and/or the provider configuration is responsible for avoiding single-writer rule violations in this setup. In other words, multiple providers to one exporter is fine with me, but the user is required to ensure that streams do come into conflict.

Second question

This leads to an orthogonal question, about whether OTel implementations SHOULD / SHOULD NOT support the use of multiple Resource values in a single exporter configuration. Support for multiple Resources allows easily avoiding single-writer violations, but seems to encourage a single process to break norms about Resource usage. If this is a question, I would specify that OTel implementations SHOULD NOT support the use of multiple Resource values in SDKs for a single Exporter, even where multiple MeterProviders are in use. This leads to a requirement that the the user is responsible for avoiding single-writer violations by the use of distinct attribute sets, i.e., ensure they do not create the same stream in multiple MeterProviders bound to a single Exporter, or else there will be duplicate streams (b/c no aggregation).

Rationale in support of one Exporter to many MeterProviders

Here is my reasoning for the "No" answer to the original question about whether to aggregate instead of allowing single-writer violations to pass through. We need a way to support "forgetting" of metric streams in configurations like the one posed in #2232, because this is a fairly common pattern (which Prometheus also has). I answer "No" to the question about aggregation, because then it gives us a solution for stream deletion, as follows.

In typical pull-based cumulative configuration, a Metric Reader is expected to remember streams forever. If shutting down a MeterProvider means shutting down all of the associated Readers, then we get stream deletion support "for free". If a MeterReader is expected to aggregate streams from multiple providers, then shutting down a MeterProvider does not really imply forgetting any streams. Thus, I answer "No" to the question about aggregation in one-to-many provider configurations, because aggregation across providers conflicts with the desire to support forgetting streams.

Proposed solution

I think we could resolve this issue as follows:

  1. Specify that MetricReaders MUST have a one-to-one relationship with MeterProviders.
  2. Specify that implementations SHOULD support a one-Exporter-to-many-MeterProviders model
  3. Metric Exporters SHOULD NOT support the use of multiple Resources

Neither the Reader, the Exporter, nor the MeterProvider in such a one-to-many configuration is responsible for detecting single-writer violations or duplicate instrument conflicts. I might add:

  1. Implementations MAY provide a way to perform duplicate instrument conflict detection across multiple MeterProviders

@reyang
Copy link
Member

reyang commented Mar 1, 2022

Discussed during the 3/1, 2022 Spec SIG Meeting, we consider this as a documentation issue which requires clarification regarding the MetricReader ownership.

The conclusion is to "specify that MetricReaders MUST have a one-to-one relationship with MeterProviders", which would resolve this issue. And once this is resolved, it would unblock the SDK spec to move to Mixed (with the only exception that Exemplar is still Feature-Freeze).

These two items are related to @2232 and will be marked as "Future":

  • Specify that implementations SHOULD support a one-Exporter-to-many-MeterProviders model
  • Metric Exporters SHOULD NOT support the use of multiple Resources

@legendecas
Copy link
Member Author

  1. Specify that MetricReaders MUST have a one-to-one relationship with MeterProviders.

This sounds good to me, thanks!

@legendecas
Copy link
Member Author

Specify that implementations SHOULD support a one-Exporter-to-many-MeterProviders model

This comes back to a similar problem for exporters: MeterProvider.shutdown MUST be implemented at least by invoking Shutdown on all registered MetricReader and MetricExporter.

It could be confusing if an exporter is registered to multiple MeterProvider and one of the MeterProvider shuts down while others are not.

@jmacd
Copy link
Contributor

jmacd commented Mar 1, 2022

Shutdown on all registered MetricReader and MetricExporter.

I see. I looked into the trace SDK specification.

The TracerProvider Shutdown() is specified to call Shutdown() on each Processor.
It does not say that Processor Shutdown() should call Exporter Shutdown().
They are documented as separate methods.

Maybe we should follow that precedent and change our statement for MeterProvider.Shutdown(). The owner of the Exporter is required to shutdown the exporter, I would say. The trace specification says that Exporter.Shutdown() is used when the "SDK is shutting down", not when the "MeterProvider is shutting down".

@jmacd
Copy link
Contributor

jmacd commented Mar 1, 2022

They are documented as separate methods.

It looks like the trace SDK specification is missing a statement to require the Span Processor.Shutdown() method to shut down the exporter. At least the OTel-Go trace SDK processor will shut down the exporter in this case, so it would be a breaking change not to add that requirement.

I think this means we should add to the trace SDK spec:

SpanProcessor Shutdown() MUST call Span Exporter Shutdown().

I think this means we should add to the metrics SDK spec:

Metric Exporters MUST have a one-to-one relationship with MeterProviders.

For the question posed in our SIG meeting, "How can a single Prometheus exporter export for multiple MeterProviders?" we still do not have a great answer, but here's one that's hopefully acceptable. Users in this situation will create a many-to-one wrapper around the Metric Exporter that ignores Shutdown(). When scraped, it will Collect() every sub-exporter's MetricReader. The user in this situation will arrange to call Shutdown() once on the actual exporter.

@bogdandrutu, who posed the Prometheus question, do agree with the above?

@dyladan
Copy link
Member

dyladan commented Mar 1, 2022

it would also be the responsibility of the wrapper to ensure metrics do not collide (by prefixing metric names for example, or adding a provider id as a label)?

@reyang reyang added this to the Metrics API/SDK Stable Release milestone Mar 4, 2022
@reyang reyang added release:allowed-for-ga Editorial changes that can still be added before GA since they don't require action by SIGs release:required-for-ga Must be resolved before GA release, or nice to have before GA and removed release:after-ga Not required before GA release, and not going to work on before GA release:allowed-for-ga Editorial changes that can still be added before GA since they don't require action by SIGs labels Mar 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:sdk Related to the SDK release:required-for-ga Must be resolved before GA release, or nice to have before GA spec:metrics Related to the specification/metrics directory
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants