-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Counter type metrics from statsdreceiver being dropped by prometheusexporter #7156
Conversation
Signed-off-by: Loc Mai <[email protected]>
fix: dropping non-cumulative logic
Signed-off-by: Loc Mai <[email protected]>
doc: update doc for PR#7156
// Drop metrics with non-cumulative aggregations | ||
if doubleSum.AggregationTemporality() != pdata.MetricAggregationTemporalityCumulative { | ||
// Drop metrics with non-cumulative aggregations and is not monotonic increasing | ||
if doubleSum.AggregationTemporality() != pdata.MetricAggregationTemporalityCumulative && !metric.Sum().IsMonotonic() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be handled elsewhere. Sums with delta temporality need to be converted to gauges rather than counters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks for taking a look at this PR. Isn't Gauge type handled by accumulateGauge
func?
func (a *lastValueAccumulator) accumulateGauge(metric pdata.Metric, il pdata.InstrumentationLibrary, now time.Time) (n int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I saw that method and I was looking at the base data model and map:
So in Prometheus, counters are always monotonic and cumulative. But then I realized it's not a 100% mapped like that, some Sums could be converted back to Prometheus Gauge with convertSum()
case pdata.MetricDataTypeGauge:
return c.convertGauge(metric)
case pdata.MetricDataTypeSum:
return c.convertSum(metric)
So maybe we could go with this change below:
if doubleSum.AggregationTemporality() != pdata.MetricAggregationTemporalityCumulative && !metric.Sum().IsMonotonic() { | |
if doubleSum.AggregationTemporality() != pdata.MetricAggregationTemporalityCumulative && doubleSum.AggregationTemporality() != pdata.MetricAggregationTemporalityDelta { |
With:
MetricAggregationTemporalityDelta is a MetricAggregationTemporality for a metric aggregator which reports changes since last report time.
MetricAggregationTemporalityCumulative is a MetricAggregationTemporality for a metric aggregator which reports changes since a fixed start time.
Prometheus exporter is at the end of the aggregation (it would set the metric to MetricAggregationTemporalityCumulative in the same function anyway) , I don't think it would care about the time which the metric got reported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds better, since the monotonic part will be used to check later on with convertSum():
opentelemetry-collector-contrib/exporter/prometheusexporter/collector.go
Lines 120 to 126 in e52cfd0
func (c *collector) convertSum(metric pdata.Metric) (prometheus.Metric, error) { | |
ip := metric.Sum().DataPoints().At(0) | |
metricType := prometheus.GaugeValue | |
if metric.Sum().IsMonotonic() { | |
metricType = prometheus.CounterValue | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More inputs in #5714 (comment) from @jmacd , I'm reading through his suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To summarize what I understand from this thread:
When Prometheus receives a Delta temporality counter, there are two cases: (a) monotonic, (b) non-monotonic.
For (a), the treatment can be applied inside Prometheus both easily and correctly. Since this module already keeps state on every metric, performing the delta-to-cumulative is simple. Just keep tallying the sum and it will be cumulative when Prometheus sees it.
For (b), I will speculate that no one cares and "correct" is difficult to achieve. These are non-monotonic deltas, which statsd already has a different way to treat. If the user sends gauges prefixed by +
or -
they have the same semantics as UpDownCounter, are to be interpreted as deltas. So, if the statsd user wanted delta-to-gauge, they already have a way to do that and it's the statsdreceiver
's responsibility to compute it (and it does).
Continuing case (b), if the statsd user didn't want delta-to-gauge, then they're presumably sending you positive-and-negative increments. We have no information about reset time, so there are two correct ways to interpret this: (a) keep them as deltas, (b) convert them to rate gauges over some time period. Either way, Prometheus users will still have trouble working with this data, which is why I speculate that no one cares.
The preceding paragraph lends support to the following claim: Statsdreceiver should by default assume counters are monotonic, because there's a mechanism in statsd for treating non-monotonic counters as gauges already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand what you're proposing with respect to monotonic, delta temporality counters and agree that we could likely handle the delta-to-cumulative conversion here since this exporter is already maintaining state. Since https://github.com/open-telemetry/opentelemetry-specification/pull/2266/files#diff-0efae13f08f98e62a81767d5daeff37ebb7ef8c50537c7b9013e72506e9b055aR1085 would specify a different treatment for non-cumulative, monotonic OTLP sum data points being converted to prometheus exposition, do we need to change that or create an exception for the collector, or is there some scoping of that (proposed specification that I'm missing that makes it irrelevant to the collector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I will comment in open-telemetry/opentelemetry-specification#2266
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found the document https://opentelemetry.io/docs/reference/specification/metrics/datamodel/#sums-delta-to-cumulative that exactly aligned with what @jmacd said :D I believe it's 100% the right implementation. Let me see what I could do.
While OpenTelemetry (and some metric backends) allows both Delta and Cumulative sums to be reported, the timeseries model we target does not support delta counters. To this end, converting from delta to cumulative needs to be defined so that backends can use this mechanism.
Thanks again for all the great inputs here folks 🪨
@@ -2,6 +2,10 @@ | |||
|
|||
## Unreleased | |||
|
|||
## 🧰 Bug fixes 🧰 | |||
|
|||
- `statsdreceiver`: Fix counter metric being dropped by Prometheus exporter (#7156) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why statsdreceiver
? Looks like this change (#7156) is made on prometheus exporter only. Please update the line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch, sorry for this one, I'm waiting to finalize the solution from the above discussion with @jmacd @Aneurysm9 so will update this later as well.
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Any update on this? @locmai solution seems correct. |
@bryanpaluch hi, I have discussed the right solution Joshua and we finalized it. I got side-tracked for a while and now I'm back working on it soon this sprint (we scheduled it as a work ticket). |
Hi, I made a new PR (this one was using my own company account, so to reduce the steps needed, I made a new one): #9919 |
Signed-off-by: Loc Mai [email protected]
Description:
Fixing the bug with Counter type metrics from statsdreceiver being dropped by prometheusexporter
Previously, I fixed this on the statdreceiver side, however, it was off the specification of statsd. For this PR, I add the condition for the dropping if statement.
Assuming metric.Sum() would always be used for https://opentelemetry.io/docs/reference/specification/metrics/datamodel/#sums both Aggregation Temporality of delta or cumulative. But in Prometheus case, there would be no metric that would be non-cumulative AND also be monotonic at the same time.
Link to tracking Issue: #4153