Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Prometheus <-> OTLP datamodel specification. #2266

Merged
merged 27 commits into from
Feb 18, 2022
Merged
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
3b474f4
Initial draft of Prometheus <-> OTLP datamodel specification.
jsuereth Oct 8, 2021
676fbdf
Update specification/metrics/datamodel.md
jsuereth Oct 14, 2021
a5262be
Add blurb about labels and incompatibilities.
jsuereth Oct 22, 2021
a066d97
address minor review comments, and fix presubmit failures
dashpole Jan 14, 2022
f6e1212
update prometheus SD labels based on current state, and update format…
dashpole Jan 14, 2022
0f84de2
handle start time and delta sums
dashpole Jan 19, 2022
a05a7c6
add links and headers. Handle exemplars and otel resource attributes.
dashpole Jan 19, 2022
c0ffb1f
Apply suggestions from code review
dashpole Jan 20, 2022
163f905
fix link
dashpole Jan 20, 2022
f823306
change prom service discovery section to resource attribute sections
dashpole Jan 21, 2022
77a0a10
disallow process_start_time_seconds
dashpole Jan 27, 2022
d6c5245
add links for otel data points
dashpole Jan 28, 2022
958d624
Apply suggestions from code review
dashpole Feb 3, 2022
e49ef47
use normative language
dashpole Feb 3, 2022
fc488e0
handle negative prometheus histogram buckets
dashpole Feb 3, 2022
6c02daf
typo
dashpole Feb 3, 2022
6fa9153
specify what to do with non-string attributes
dashpole Feb 3, 2022
2e31920
clarify sum behavior
dashpole Feb 8, 2022
98b28b8
handle label collisions when converting
dashpole Feb 8, 2022
0010b0d
clarify that if deltas are not aggregated, they must be dropped
dashpole Feb 9, 2022
bd4ca10
allow process_start_time_seconds, but specify that it must be disable…
dashpole Feb 9, 2022
8842df5
Update specification/metrics/datamodel.md
anuraaga Feb 10, 2022
4303b79
Update specification/metrics/datamodel.md
dashpole Feb 10, 2022
1ea3315
Apply suggestions from code review
dashpole Feb 11, 2022
9aa19f5
describe when process_start_time_seconds isn't correct
dashpole Feb 11, 2022
7fa4164
convert values to string
dashpole Feb 14, 2022
3902652
Merge branch 'main' into current-prom-spec
bogdandrutu Feb 18, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
154 changes: 154 additions & 0 deletions specification/metrics/datamodel.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,26 @@
* [Sums: Delta-to-Cumulative](#sums-delta-to-cumulative)
+ [Sums: detecting alignment issues](#sums-detecting-alignment-issues)
+ [Sums: Missing Timestamps](#sums-missing-timestamps)
- [Prometheus Compatibility](#prometheus-compatibility)
* [Prometheus Metric points to OTLP](#prometheus-metric-points-to-otlp)
+ [Counters](#counters)
+ [Gauges](#gauges)
+ [Unknown-typed](#unknown-typed)
+ [Histograms](#histograms)
+ [Summaries](#summaries)
+ [Dropped Types](#dropped-types)
+ [Start Time](#start-time)
+ [Exemplars](#exemplars-1)
+ [Resource Attributes](#resource-attributes)
* [OTLP Metric points to Prometheus](#otlp-metric-points-to-prometheus)
+ [Gauges](#gauges-1)
+ [Sums](#sums-1)
+ [Histograms](#histograms-1)
+ [Summaries](#summaries-1)
+ [Dropped Data Points](#dropped-data-points)
+ [Metric Attributes](#metric-attributes)
+ [Exemplars](#exemplars-2)
+ [Resource Attributes](#resource-attributes-1)
- [Footnotes](#footnotes)

<!-- tocstop -->
Expand Down Expand Up @@ -1090,6 +1110,140 @@ For comparison, see the simple logic used in
[statsd sums](https://github.com/statsd/statsd/blob/master/stats.js#L281)
where all points are added, and lost points are ignored.

## Prometheus Compatibility

**Status**: [Experimental](../document-status.md)

This section denotes how to convert metrics scraped in the [Prometheus exposition](https://github.com/Prometheus/docs/blob/main/content/docs/instrumenting/exposition_formats.md#exposition-formats) or [OpenMetrics](https://openmetrics.io/) formats to the
OpenTelemetry metric data model and how to create Prometheus metrics from
OpenTelemetry metric data. Since OpenMetrics has a superset of Prometheus' types, "Prometheus" is taken to mean "Prometheus or OpenMetrics". "OpenMetrics" refers to OpenMetrics-only concepts.

### Prometheus Metric points to OTLP

#### Counters

A [Prometheus Counter](https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#counter) MUST be converted to an OTLP Sum with `is_monotonic` equal to `true`.

#### Gauges

A [Prometheus Gauge](https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#gauge) MUST be converted to an OTLP Gauge.

#### Unknown-typed

A [Prometheus Unknown](https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#unknown) MUST be converted to an OTLP Gauge.

#### Histograms

A [Prometheus Histogram](https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#histogram) MUST be converted to an OTLP Histogram.

Multiple Prometheus histogram metrics MUST be merged together into a single OTLP Histogram:

* The `le` label on non-suffixed metrics is used to identify and order histogram bucket boundaries. Each Prometheus line produces one bucket count on the resulting histogram. Each value for the `le` label except `+Inf` produces one bucket boundary.
* Lines with `_count` and `_sum` suffixes are used to determine the histogram's count and sum.
* If `_count` is not present, the metric MUST be dropped.
* If `_sum` is not present, it MUST be computed from the buckets.

#### Summaries

[Prometheus Summary](https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#summary) MUST be converted to an OTLP Summary.

Multiple Prometheus metrics are merged together into a single OTLP Summary:

* The `quantile` label on non-suffixed metrics is used to identify quantile points in summary metrics. Each Prometheus line produces one quantile on the resulting summary.
* Lines with `_count` and `_sum` suffixes are used to determine the summary's count and sum.

#### Dropped Types

The following Prometheus types MUST be dropped:

* [OpenMetrics GaugeHistogram](https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#gaugehistogram)
* [OpenMetrics StateSet](https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#stateset)
* [OpenMetrics Info](https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#info)

#### Start Time

Prometheus Cumulative metrics do not include the start time of the metric. When converting Prometheus Counters to OTLP, conversion MUST follow [Cumulative streams: handling unknown start time](#cumulative-streams-handling-unknown-start-time) by default. Conversion MAY offer configuration, disabled by default, which allows using the `process_start_time_seconds` metric to provide the start time. Using `process_start_time_seconds` is only correct when all counters on the target start after the process and are not reset while the process is running.

#### Exemplars

[Prometheus Exemplars](https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#exemplars) can be attached to Prometheus Histogram bucket metrics, which SHOULD be converted to exemplars on OpenTelemetry histograms. The Trace ID and Span ID SHOULD be retrieved from the `trace_id` and `span_id` label keys, respectively.

#### Resource Attributes

When scraping a Prometheus endpoint, resource attributes MUST be added to the scraped metrics to distinguish them from metrics from other Prometheus endpoints. In particular, `job` and `instance`, [as defined by Prometheus](https://Prometheus.io/docs/concepts/jobs_instances/#jobs-and-instances), are needed to ensure Prometheus exporters can disambiguate metrics as [described below](#resource-attributes-1).

The following attributes MUST be associated with scraped metrics as resource attributes, and MUST NOT be added as metric attributes:

| OTLP Resource Attribute | Description |
| ----------------------- | ----------- |
| `service.name` | The configured name of the service that the target belongs to |
| `job` | Identical to `service.name` |
| `instance` | The <host>:<port> of the target's URL that was scraped. |
| `host.name` | `instance` The <host> portion of `instance` |
| `port` | `instance` The <port> portion of `instance` |
dashpole marked this conversation as resolved.
Show resolved Hide resolved
| `scheme` | `http` or `https` |

### OTLP Metric points to Prometheus

#### Gauges

An [OpenTelemetry Gauge](#gauge) MUST be converted to a Prometheus Gauge.

#### Sums

[OpenTelemetry Sums](#sums) follows this logic:

- If the aggregation temporality is cumulative and the sum is monotonic, it MUST be converted to a Prometheus Counter.
- If the aggregation temporality is cumulative and the sum is non-monotonic, it MUST be converted to a Prometheus Gauge.
- If the aggregation temporality is delta and the sum is monotonic, it SHOULD be converted to a cumulative temporality and become a Prometheus Sum
- Otherwise, it MUST be dropped.

#### Histograms

An [OpenTelemetry Histogram](#histogram) with a cumulative aggregation temporality MUST be converted to a Prometheus metric family with the following metrics:

- A single `{name}_count` metric denoting the count field of the histogram. All attributes of the histogram point are converted to Prometheus labels.
- `{name}_sum` metric denoting the sum field of the histogram, reported only if the sum is positive and monotonic. The sum is positive and monotonic when all buckets are positive. All attributes of the histogram point are converted to Prometheus labels.
- A series of `{name}` metric points that contain all attributes of the histogram point recorded as labels. Additionally, a label, denoted as `le` is added denoting the bucket boundary. The label's value is the stringified floating point value of bucket boundaries, ordered from lowest to highest. The value of each point is the sum of the count of all histogram buckets up the the boundary reported in the `le` label. These points will include a single exemplar that falls within `le` label and no other `le` labelled point. The final bucket metric MUST have an `+Inf` threshold.

OpenTelemetry Histograms with Delta aggregation temporality SHOULD be aggregated into a Cumulative aggregation temporality and follow the logic above, or MUST be dropped.

#### Summaries

An [OpenTelemetry Summary](#summary-legacy) MUST be converted to a Prometheus metric family with the following metrics:

- A single `{name}_count` metric denoting the count field of the summary.
All attributes of the summary point are converted to Prometheus labels.
- `{name}_sum` metric denoting the sum field of the summary, reported
only if the sum is positive and monotonic. All attributes of the summary
point are converted to Prometheus labels.
- A series of `{name}` metric points that contain all attributes of the
summary point recorded as labels. Additionally, a label, denoted as
`quantile` is added denoting a reported quantile point, and having its value
be the stringified floating point value of quantiles (between 0.0 and 1.0),
starting from lowest to highest, and all being non-negative. The value of
each point is the computed value of the quantile point.

#### Dropped Data Points

The following OTLP data points MUST be dropped:

* [ExponentialHistogram](#exponentialhistogram)

#### Metric Attributes

OpenTelemetry Metric Attributes MUST be converted to [Prometheus labels](https://Prometheus.io/docs/concepts/data_model/#metric-names-and-labels). String Attribute values are converted directly to Metric Attributes, and non-string Attribute values MUST be converted to string attributes following the [attribute specification](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/common/common.md#attribute). Prometheus metric label keys are required to match the following regex: `[a-zA-Z_:]([a-zA-Z0-9_:])*`. Metrics from OpenTelemetry with unsupported Attribute names MUST replace invalid characters with the `_` character. This may cause ambiguity in scenarios where multiple similar-named attributes share invalid characters at the same location. In such unlikely cases, if multiple key-value pairs are converted to have the same Prometheus key, the values MUST be concatenated together, separated by `;`, and ordered by the lexicographical order of the original keys.

#### Exemplars

[Exemplars](#exemplars) on OpenTelemetry Histograms SHOULD be converted to Prometheus exemplars. Exemplars on other OpenTelemetry data points MUST be dropped. For Prometheus push exporters, multiple exemplars are able to be added to each bucket, so all exemplars SHOULD be converted. For Prometheus pull endpoints, only a single exemplar is able to be added to each bucket, so the largest exemplar from each bucket MUST be used, if attaching exemplars. If no exemplars exist on a bucket, the highest exemplar from a lower bucket MUST be used, even though it is a duplicate of another bucket's exemplar. Prometheus Exemplars MUST use the `trace_id` and `span_id` keys for the trace and span IDs, respectively.
bogdandrutu marked this conversation as resolved.
Show resolved Hide resolved

#### Resource Attributes

In SDK Prometheus (pull) exporters, all resource attributes MUST be dropped, and MUST NOT be attached as labels. The scraper of the endpoint is expected to discover resource attributes of the endpoint it is scraping.

In the Collector's Prometheus pull and push (remote-write) exporters, it is possible for metrics from multiple targets to be sent together, so targets must be disambiguated from one another. However, the Prometheus exposition format and [remote-write](https://github.com/Prometheus/Prometheus/blob/main/prompb/remote.proto) formats do not include a notion of resource, and expect metric labels to distinguish scraped targets. By convention, [`job` and `instance`](https://Prometheus.io/docs/concepts/jobs_instances/#jobs-and-instances) labels distinguish targets and are expected to be present on metrics exposed on a Prometheus pull exporter (a ["federated"](https://Prometheus.io/docs/Prometheus/latest/federation/) Prometheus endpoint) or pushed via Prometheus remote-write. In the collector Prometheus exporters, the `job` and `instance` resource attributes MUST be converted to Prometheus metric labels, and other resource attributes SHOULD NOT be converted to metric labels.

## Footnotes

<a name="otlpdatapointfn">[1]</a>: OTLP supports data point kinds that do not
Expand Down