Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intervalprocessor not dropping metrics #37905

Open
avilevy18 opened this issue Feb 13, 2025 · 10 comments · May be fixed by #37911
Open

Intervalprocessor not dropping metrics #37905

avilevy18 opened this issue Feb 13, 2025 · 10 comments · May be fixed by #37911
Labels
bug Something isn't working needs triage New item requiring triage processor/interval

Comments

@avilevy18
Copy link

avilevy18 commented Feb 13, 2025

Component(s)

processor/interval

What happened?

Description

I am running contrib in a docker container, and ingesting metrics generated from telegen in another containers. I noticed that telegen is sending 2 metrics immediately on starting up the container, and I only want a single metric every 5 seconds. After adding the interval processor into my contrib config, both metrics were still being emitted at them same time.

Steps to Reproduce

Create a docker network

docker create network -d bridge otel

Start up contrib:

docker run --net=otel --name=collector -v ./config:/etc/otelcol-contrib/ otel/opentelemetry-collector-contrib:0.119.0

Start up telegen:

docker run --net=otel ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:v0.119.0 metrics --otlp-insecure --rate=0.1 --duration=5m --otlp-endpoint=collector:4317

Expected Result

Only 1 metric is exported.

Actual Result

Metrics export in less than 1 second between each other.

Taking a look at the Timestamp in the log the metrics export at:
2025-02-13 16:37:04.135426715 and 2025-02-13 16:37:04.258232968

Collector version

v0.119.0

Environment information

Environment

Docker

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
processors:
  interval:
    interval: 5s
    pass_through:
      gauge: false
      summary: false
exporters:
  debug:
    verbosity: detailed
service:
  pipelines:
    metrics/otlp:
      receivers:
      - otlp
      processors:
      - interval
      exporters:
      - debug

Log output

2025-02-13T16:37:04.257Z	info		{"kind": "exporter", "data_type": "metrics", "name": "debug"}
2025-02-13T16:37:08.001Z	info	Metrics	{"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 1, "data points": 1}
2025-02-13T16:37:08.001Z	info	ResourceMetrics #0
Resource SchemaURL: https://opentelemetry.io/schemas/1.13.0
ScopeMetrics #0
ScopeMetrics SchemaURL: 
InstrumentationScope  
Metric #0
Descriptor:
     -> Name: gen
     -> Description: 
     -> Unit: 
     -> DataType: Gauge
NumberDataPoints #0
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2025-02-13 16:37:04.135426715 +0000 UTC
Value: 0
	{"kind": "exporter", "data_type": "metrics", "name": "debug"}
2025-02-13T16:37:13.001Z	info	Metrics	{"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 0, "metrics": 0, "data points": 0}
2025-02-13T16:37:13.001Z	info		{"kind": "exporter", "data_type": "metrics", "name": "debug"}
2025-02-13T16:37:14.146Z	info	Metrics	{"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 0, "metrics": 0, "data points": 0}
2025-02-13T16:37:14.146Z	info		{"kind": "exporter", "data_type": "metrics", "name": "debug"}
2025-02-13T16:37:18.001Z	info	Metrics	{"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 1, "data points": 1}
2025-02-13T16:37:18.001Z	info	ResourceMetrics #0
Resource SchemaURL: https://opentelemetry.io/schemas/1.13.0
ScopeMetrics #0
ScopeMetrics SchemaURL: 
InstrumentationScope  
Metric #0
Descriptor:
     -> Name: gen
     -> Description: 
     -> Unit: 
     -> DataType: Gauge
NumberDataPoints #0
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2025-02-13 16:37:04.258232968 +0000 UTC
Value: 1
	{"kind": "exporter", "data_type": "metrics", "name": "debug"}
2025-02-13T16:37:23.001Z	info	Metrics	{"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 0, "metrics": 0, "data points": 0}
2025-02-13T16:37:23.001Z	info		{"kind": "exporter", "data_type": "metrics", "name": "debug"}
2025-02-13T16:37:24.145Z	info	Metrics	{"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 0, "metrics": 0, "data points": 0}
2025-02-13T16:37:24.145Z	info		{"kind": "exporter", "data_type": "metrics", "name": "debug"}

Additional context

No response

@avilevy18 avilevy18 added bug Something isn't working needs triage New item requiring triage labels Feb 13, 2025
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@dashpole
Copy link
Contributor

I think the actual expected result is that only one of the two points will be sent?

@avilevy18
Copy link
Author

I think the actual expected result is that only one of the two points will be sent?

Yes, sorry that is what I actually meant. Updated the description of the issue.

@RichieSams
Copy link
Contributor

Ok, I ran this locally both with and without the interval processor, and I see what you're saying. The interval processor is correctly aggregating. Note that the "extra" metric exports you're seeing are empty:

2025-02-13T16:37:14.146Z	info	Metrics	{"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 0, "metrics": 0, "data points": 0}

No resource metrics, no metrics, and no datapoints.

func (p *Processor) exportMetrics() {
	md := func() pmetric.Metrics {
		p.stateLock.Lock()
		defer p.stateLock.Unlock()

		// ConsumeMetrics() has prepared our own pmetric.Metrics instance ready for us to use
		// Take it and clear replace it with a new empty one
		out := p.md
		p.md = pmetric.NewMetrics()

		// Clear all the lookup references
		clear(p.rmLookup)
		clear(p.smLookup)
		clear(p.mLookup)
		clear(p.numberLookup)
		clear(p.histogramLookup)
		clear(p.expHistogramLookup)
		clear(p.summaryLookup)

		return out
	}()

	if err := p.nextConsumer.ConsumeMetrics(p.ctx, md); err != nil {
		p.logger.Error("Metrics export failed", zap.Error(err))
	}
}

We can add a check to the end, where we skip calling ConsumeMetrics() if md is empty.

@RichieSams
Copy link
Contributor

I'm making a PR

@dashpole
Copy link
Contributor

@RichieSams I don't think that is the issue. Note the timestamps of the two points that are printed by the debug exporter are very close:
Timestamp: 2025-02-13 16:37:04.135426715 +0000 UTC vs Timestamp: 2025-02-13 16:37:04.258232968 +0000 UTC. Shouldn't one of those be dropped?

@RichieSams
Copy link
Contributor

I don't think that is the issue. Note the timestamps of the two points that are printed by the debug exporter are very close:
Timestamp: 2025-02-13 16:37:04.135426715 +0000 UTC vs Timestamp: 2025-02-13 16:37:04.258232968 +0000 UTC. Shouldn't one of those be dropped?

The main ConsumeMetrics() has the same problem. It will aggregate and "consume" metrics correctly. But it will always forward on a root metrics instance, even if it's empty.

So the fix will need to be applied in both places

@dashpole dashpole linked a pull request Feb 13, 2025 that will close this issue
@dashpole
Copy link
Contributor

It seems like you are still missing the core issue. Two data points that are within 0.2 seconds of each other are sent to the debug exporter with the interval set to 5s. Your fix still makes sense, but I don't think it is the core issue here. Or am I misunderstanding how the interval processor is supposed to work?

@RichieSams
Copy link
Contributor

RichieSams commented Feb 14, 2025

I understand. Interval processor works as follows:

  1. It receives metrics at any time from upstream (ConsumeMetrics())
  2. Inside that, the "intention" is that interval processor will fully consume (and not pass on) any aggregated metrics. And non-aggregated metrics will be passed on right then and there
  3. At the given "interval time", the processor will re-export the aggregated metrics

There are two problems:

  1. In step 2, the processor is correctly consuming all metrics to be aggregated. This results in an "empty" metric set. The "bug" is that interval processor will still send this empty metric set downstream. (This can be seen in the debug exporter).
  2. In step 3, a similar problem occurs. If no metrics have come in from upstream, interval processor will have an empty metric set. And it will still forward that on downstream.

Thus my PR attempts to fix both those issue by adding a hard check before we send things downstream: If the metric set is empty, don't bother sending it

To answer your question directly:

Two data points that are within 0.2 seconds of each other are sent to the debug exporter with the interval set to 5s.

Yes. But if you look at the data, you'll notice that the second one is completely empty.

2025-02-13T16:37:23.001Z	info	Metrics	{"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 0, "metrics": 0, "data points": 0}
2025-02-13T16:37:23.001Z	info		{"kind": "exporter", "data_type": "metrics", "name": "debug"}
2025-02-13T16:37:24.145Z	info	Metrics	{"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 0, "metrics": 0, "data points": 0}
2025-02-13T16:37:24.145Z	info		{"kind": "exporter", "data_type": "metrics", "name": "debug"}

@dashpole
Copy link
Contributor

dashpole commented Feb 14, 2025

I'm only looking at these two metrics (ignoring empty ones):

2025-02-13T16:37:08.001Z	info	Metrics	{"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 1, "data points": 1}
2025-02-13T16:37:08.001Z	info	ResourceMetrics #0
Resource SchemaURL: https://opentelemetry.io/schemas/1.13.0
ScopeMetrics #0
ScopeMetrics SchemaURL: 
InstrumentationScope  
Metric #0
Descriptor:
     -> Name: gen
     -> Description: 
     -> Unit: 
     -> DataType: Gauge
NumberDataPoints #0
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2025-02-13 16:37:04.135426715 +0000 UTC
Value: 0
	{"kind": "exporter", "data_type": "metrics", "name": "debug"}
2025-02-13T16:37:18.001Z	info	Metrics	{"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 1, "data points": 1}
2025-02-13T16:37:18.001Z	info	ResourceMetrics #0
Resource SchemaURL: https://opentelemetry.io/schemas/1.13.0
ScopeMetrics #0
ScopeMetrics SchemaURL: 
InstrumentationScope  
Metric #0
Descriptor:
     -> Name: gen
     -> Description: 
     -> Unit: 
     -> DataType: Gauge
NumberDataPoints #0
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2025-02-13 16:37:04.258232968 +0000 UTC
Value: 1
	{"kind": "exporter", "data_type": "metrics", "name": "debug"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage New item requiring triage processor/interval
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants