Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Append unit to prometheus metric names #5400

Merged
merged 25 commits into from
Jun 8, 2023
Merged

Conversation

psx95
Copy link
Contributor

@psx95 psx95 commented Apr 25, 2023

Fixes #4390
This PR appends the unit name to the prometheus metric name generated by the exporter. The PR follows guidelines from this portion of the spec and closely follows the implementation details of the Opentelemetry Collector.

Difference between collector generated names and prometheus exporter

  1. The collector will drop the entire unit if it encounters { or }.

    • "{packets}By" → ""
    • "{packets}By/y" → "per_year"
      In contrast, this PR will drop only portions between {} and process the remaining string. This is done because the spec suggests to drop only portions in {}.
    • "{packets}By" → "bytes"
    • "{packets}By/y" → "bytes_per_year"
  2. If the metric name starts with a number, Collector simply appends it with an _, the current implementation (existing, not affected by this PR) replaces the number with a _. This is not changed by this PR.

  3. The Collector tokenizes both - the input metric name and its unit. Each unit token is individually processed, and is appended to the final name if the name does not contain that token. To clarify, for a given metric and unit, the collector will:

    • Metric Name: metric_total_hertz Metric Unit: hertz_total will result in final name metric_total_hertz
    • For the same input, this implementation will result in: metric_total_hertz_hertz_total, this is because the spec suggests to avoid appending unit if metric name already contains the unit.

Notes:

  1. Regarding leading and trailing _ in the unit are not mandated by the spec, but the implementation of unit conversion in the collector does not allow for any leading or trailing _.

psx95 added 3 commits April 24, 2023 22:25
The unit conversion function that converts OTLP units to Prometheus now takes in the rawUnitName and PrometheusType instead of OTLP metric data. This leads to more accurate conversion of OTLP units to Prometheus units since the unit type representative of types recognized by Prometheus
@codecov
Copy link

codecov bot commented Apr 25, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.05 🎉

Comparison is base (e637e51) 91.31% compared to head (c302e91) 91.36%.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5400      +/-   ##
============================================
+ Coverage     91.31%   91.36%   +0.05%     
- Complexity     4892     4952      +60     
============================================
  Files           547      549       +2     
  Lines         14412    14503      +91     
  Branches       1354     1359       +5     
============================================
+ Hits          13160    13251      +91     
  Misses          866      866              
  Partials        386      386              
Impacted Files Coverage Δ
...entelemetry/exporter/prometheus/NameSanitizer.java 100.00% <100.00%> (ø)
...xporter/prometheus/PrometheusMetricNameMapper.java 100.00% <100.00%> (ø)
...try/exporter/prometheus/PrometheusUnitsHelper.java 100.00% <100.00%> (ø)
.../opentelemetry/exporter/prometheus/Serializer.java 86.50% <100.00%> (-0.19%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@psx95 psx95 marked this pull request as ready for review April 26, 2023 18:00
@psx95 psx95 requested a review from a team April 26, 2023 18:00
Copy link
Contributor

@jsuereth jsuereth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Down to minor commments.

Overall LGTM.

High level - we should probably optimise the string manipulation to involve less allocations over time. We can check w/ @jack-berg on whether or not we have a prometheus-exporter benchmark where we could measure allocations, etc. That can be follow on work here.

private static String metricName(String rawMetricName, PrometheusType type) {
String name = NameSanitizer.INSTANCE.apply(rawMetricName);
private static String metricName(MetricData rawMetric, PrometheusType type) {
String name = NameSanitizer.INSTANCE.apply(rawMetric.getName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you move this line below the next if you can probably get rid of the cleanUpString method in PrometheusUnitsHelper -- no need to perform sanitization twice after all. (And the performance will benefit from the cache in NameSanitizer as well)

Comment on lines 106 to 107
prometheusCompliant = prometheusCompliant.replaceAll("_+$", ""); // remove trailing underscore
prometheusCompliant = prometheusCompliant.replaceAll("^_+", ""); // remove leading underscore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for these replacements? Prometheus does not seem to forbid double or trailing underscores in its spec; and it does not seem that the previous operations on the unit are likely to produce trailing or duplicated underscores.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec linked in the PR description states -

 Multiple consecutive `_` characters MUST be replaced with a single `_` character

Based on that I removed leading _, because the serializer was already adding it.

name = name + "_" + prometheusEquivalentUnit;

But looking at this again, I think this would be confusing to do in the PrometheusUnitHelper. I will remove this, also the spec does not explicitly mention that the final unit name cannot have a trailing _, so I think leading _ is fine. I will make changes.

Also, the current name sanitizer does not seem to remove consecutive _, so based on spec this needs to be added as well.

TL;DR

  • The PrometheusUnitHelper will just return the computed unit (with illegal characters replaced with _) without any additional cleaning up.
  • The Sanitizing will be done by the serializer right before returning. The sanitizing will only take care of replacing any remaining Illegal characters with _ and then replacing consecutive _ with a single underscore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to mention here that while not explicitly stated in the spec, this is how opentelemetry-collector works, should we try to keep this in sync with how the collector works ?

@psx95 psx95 marked this pull request as draft April 29, 2023 14:03
@psx95 psx95 marked this pull request as ready for review April 29, 2023 16:48
@psx95 psx95 requested a review from jsuereth May 1, 2023 15:07
Comment on lines +99 to +104
return SANITIZE_LEADING_UNDERSCORES
.matcher(
SANITIZE_TRAILING_UNDERSCORES
.matcher(
SANITIZE_CONSECUTIVE_UNDERSCORES
.matcher(INVALID_CHARACTERS_PATTERN.matcher(string).replaceAll("_"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: as a future performance improvement we could probably replace all these regexes (and those in NameSanitizer too) with a single loop that removes all the unwanted characters.

Copy link
Member

@jack-berg jack-berg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of small comments but overall looks good. Test cases really help make it clear what's happening.

Sorry it took so long to get eyes on this!

static String metricName(MetricData rawMetric, PrometheusType type) {
String name = NameSanitizer.INSTANCE.apply(rawMetric.getName());
String prometheusEquivalentUnit =
PrometheusUnitsHelper.getEquivalentPrometheusUnit(rawMetric.getUnit());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing how this is used, we probably want to cache the unit conversion the same way NameSanitizer caches its responses. Maybe even cache the combination of the metric name, unit, and prometheus type to avoid repeatedly performing the concatenation / suffix logic below.

@psx95
Copy link
Contributor Author

psx95 commented Jun 3, 2023

Couple of small comments but overall looks good. Test cases really help make it clear what's happening.

Sorry it took so long to get eyes on this!

@jack-berg Thanks for the review !

A dedicated class to represent cache mapping keys would prevent certain
edge cases where String concatination could yield same result even when
individual Strings being concatinated are different.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Append unit to prometheus metric names
4 participants