Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Prometheus compatibility] Do not require trimming suffixes by default #3580

Merged
merged 2 commits into from
Jul 10, 2023

Conversation

dashpole
Copy link
Contributor

Part of open-telemetry/prometheus-interoperability-spec#72

Changes

We currently require Prometheus -> OpenTelemetry translation to remove type and unit suffixes in metric names. For example, the Prometheus metric http_request_duration_seconds_total would become http_request_duration in the resulting OTel metric. When this change was made, it received a large amount of user pushback, as it broke many existing metric names.

This PR makes trimming suffixes from names disabled by default, and encourages implementations to provide a config knob to enable the suffix trimming.

The original motivation for trimming suffixes by default was so that metrics could be round-tripped between an OTel SDK and the collector through the prometheus endpoint. For example, the OTel metric process.network.io becomes process_network_io_bytes_total with type and unit suffixes. When converting that back to OpenTelemetry, trimming suffixes would result in process_network_io, which is close to the original OpenTelemetry metric name, but isn't the same as the original.

We would like to make the removal of suffixes disabled by default because:

  • It eliminates a breaking change for users
  • It would preserve the name for existing prometheus metrics (e.g. not from OTel instrumentation) which many users are familiar with.
  • Removing suffixes doesn't/didn't achieve the objective of round-tripping the metric name because . are substituted for _.

cc @open-telemetry/wg-prometheus @jmacd @gouthamve

@dashpole dashpole requested review from a team June 30, 2023 20:56
@dashpole dashpole force-pushed the optional_suffix_trimming branch 2 times, most recently from 5227b8b to 9fe783e Compare June 30, 2023 21:14
@pirgeo
Copy link
Member

pirgeo commented Jul 4, 2023

Maybe I am missing something, but what is the use case for round-tripping through a Prometheus endpoint when I am already using an OTel SDK and an OTel collector? Maybe I am just misunderstanding the meaning of "round-trip" here.

@dashpole
Copy link
Contributor Author

dashpole commented Jul 5, 2023

what is the use case for round-tripping through a Prometheus endpoint when I am already using an OTel SDK and an OTel collector?

Good question. Some users may prefer prometheus' pull model for ease of local development, for the up metric, or because they have already standardized on Prometheus endpoints within their team/org. They would be able to adopt OTel instrumentation and collector without changing the protocol if they wanted to. Being able to round-trip is also generally a reasonable definition of "compatibility" for us to work towards. But that use-case isn't compelling enough (IMO) to justify the breaking changes that we would have to make to get there, and trimming suffixes isn't sufficient to be able to round-trip metric names.

@pirgeo
Copy link
Member

pirgeo commented Jul 5, 2023

Gotcha, so this would allow folks to basically use the collector in an all-prometheus deployment.
This is an example of how I understood what you are saying (please correct me if this is not what you meant):

  • an app has a prom endpoint that the (local) collector scrapes
  • (optional) The collector might add some host metrics or similar
  • The "local" collector sends it to a different ("remote") collector (e.g. via OTLP)
    • that remote collector combines data from multiple apps or instances
  • The remote collector offers a Prometheus endpoint which can again be scraped by the backend.

The idea is that the data produced in the app and scraped by the local collector is the same as the data showing up in the backend, even if the data format was OTLP in between, is that right?

@dashpole
Copy link
Contributor Author

dashpole commented Jul 5, 2023

Your example demonstrates the Prometheus -> OTLP -> Prometheus round-trip. The simpler example of this is a single collector with a Prometheus receiver and a Prometheus exporter, since the internal representation in the collector is OTLP. That round-trip works well today, and is not impacted by this PR.

The other round-trip, which is impacted by this PR, is the OTLP -> Prometheus -> OTLP round trip. This would occur when someone uses an OTel SDK (which has an OTel representation of metric data) with a prometheus exporter that is scraped by an OTel collector's prometheus receiver, and then pushed via OTLP somewhere.

@pirgeo
Copy link
Member

pirgeo commented Jul 5, 2023

I see, thanks for clarifying that.

@dashpole dashpole force-pushed the optional_suffix_trimming branch from 9fe783e to 237893b Compare July 5, 2023 16:41
@dashpole dashpole force-pushed the optional_suffix_trimming branch from 237893b to 29fc3c2 Compare July 5, 2023 16:43
@reyang reyang merged commit bb1aa87 into open-telemetry:main Jul 10, 2023
@dashpole dashpole deleted the optional_suffix_trimming branch July 10, 2023 19:06
carlosalberto pushed a commit to carlosalberto/opentelemetry-specification that referenced this pull request Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants