Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First cut at single writer principle. #1574

Merged
merged 15 commits into from
Apr 16, 2021
40 changes: 39 additions & 1 deletion specification/metrics/datamodel.md
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,45 @@ histograms.

## Single-Writer

Pending
All generated TimeSeries from OTLP must have one logical writer. This means,
jsuereth marked this conversation as resolved.
Show resolved Hide resolved
conceptually, that any Timeseries created from the Protocol must have one
originating source of truth. In practical terms, this implies the following:
jsuereth marked this conversation as resolved.
Show resolved Hide resolved

- All metric data points produce by OTel SDKs must by globally unique.
jsuereth marked this conversation as resolved.
Show resolved Hide resolved
- `Resource` is expected to uniquely identify the source "service" or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd give a link to the service attributes here and avoid mentioning "application" in quotes. There is no semantic convention family for applications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, and service isn't broad enough to denote what we mean here IMO. I'll just use service with a link as suggested, as the Resource debate belongs in a broader forum.

"application".
- `Metric` name and `Attribute`s are expected to unqiuely identify a
timeseries generated by the resource.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying my comment from the doc: I believe the InstrumentationLibrary name/version should also be part of single writer requirement. For example, you have two instrumented http libraries in the same process (resource), both producing http.client.duration metric.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually almost added that, but I'm not 100% convinced.

TL;DR: Do we expect backends to append Resource labels + InstrumentationLibrary-as-labels to gain unique timeseries? For resource, almost certainly (and in Google Cloud we directly map this concept). It's less certain to me that's true on Instrumentation library.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmacd any thought on this?

- Aggregations of metrics performed "in-motion" must only be written from a
single logical source.
__Note: This implies all aggregated metrics must reach one destination__.
jsuereth marked this conversation as resolved.
Show resolved Hide resolved

In real systems, particularly in misbehaving systems, there is in fact the
jsuereth marked this conversation as resolved.
Show resolved Hide resolved
possibility of multiple writers sending data for the same metric and attribute
set. This requirement states that receivers SHOULD presume a single writer was
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is duplication and the elimination is deduplication, right? If I'm not wrong, would it make sense to stick to this terminology because it's already commonly known/used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, added both here and @jmacd can comment on his original wording.

intended and eliminate overlap when converting OTLP into the Timeseries data
model.

### Overlap resolution

When more than one process writes the same timeseries, OTLP data points may
appear to overlap. This condition typically results from misconfiguration, but
can also result from running identical processes (indicative of operating system
jsuereth marked this conversation as resolved.
Show resolved Hide resolved
or SDK bugs). When there are overlapping points, receivers SHOULD eliminate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work for Prometheus? Not deduping would be fine (and might be used for redundancy) because everything is cumulative/absolute. Do we care about this behavior for all or is it only for deltas?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let @jmacd comment here. From what I understand this can cause issues in prometheus, but I wasn't able to get any specifics. I'll keep doing some investigation for a better answer here.

points so that there are no overlaps. Which data to select in overlapping cases
is not specified.

### Overlap observability

OpenTelemetry collectors SHOULD export telemetry on the appearance of
overlapping points, so that the user can monitor for erroneous configurations.
jsuereth marked this conversation as resolved.
Show resolved Hide resolved

### Overlap interpolation
When one process starts just as another exits, the appearance of overlapping
points may be expected. In this case, OpenTelemetry collectors SHOULD modify
points at the change-over using interpolation for Sum data points, to reduce
gaps to zero width in these cases, without any overlap.
Comment on lines +344 to +347
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As resource semantic conventions are specified with service.instance.id, I'm not sure if this is expected or a misconfiguration:

service.namespace,service.name,service.instance.id triplet MUST be globally unique

So maybe this falls under the advice above – "collectors SHOULD export telemetry when they observe overlapping
points in data streams, so that the user can monitor for erroneous
configurations"?



## Temporarily

Expand Down