[exporter/prometheusremotewrite] Consider converting from pmetrics to prometheus data model in parallel. #21106

rapphil · 2023-04-21T23:46:30Z

Component(s)

exporter/prometheusremotewrite

Is your feature request related to a problem? Please describe.

While performing load tests with the prometheusremotewrite exporter, I was able to identify a bottleneck that could be further optmized.

I used a very simple collector configuration in the load tests:

extensions:
  health_check:
  pprof:
    endpoint: 0.0.0.0:1777
  sigv4auth:
    region: "us-west-2"

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  prometheus:
    config:
      scrape_configs:
        - job_name: 'otel-collector'
          scrape_interval: 10s
          static_configs:
            - targets: ['127.0.0.1:8888']

processors:
  batch:
    send_batch_size: 128000

exporters:
  logging:
    loglevel: debug
  awsxray:
    region: 'us-west-2'
  awsemf:
    region: 'us-west-2'
  prometheusremotewrite:
    endpoint: http://localhost:8080

  prometheusremotewrite/telemetry:
    endpoint: "https://<endpoint>"
    auth:
      authenticator: sigv4auth



service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [prometheusremotewrite]
    metrics/2:
      receivers: [prometheus]
      exporters: [prometheusremotewrite/telemetry]


  extensions: [pprof, sigv4auth]
  telemetry:
    logs:
      level: debug
    metrics:
      level: detailed
      address: 0.0.0.0:8888

The target of prometheusremotewrite is a dummy web server that will only accept data and will return instantly.

We then run a distributed load tests using locust to ingest data into the otlp endpoint using otlp over http. We ingested load to the point were prometheusremotewrite was not able to keep up with the amount of data that was being ingested, and data started to accumulate in the queue.

I decided to profile the collector while it was under load, and I got to this:

After inspecting the code I noticed that the conversion of data from pmetric to the prometheus data model happens sequentially.

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/prometheusremotewriteexporter/exporter.go#L140

Describe the solution you'd like

I would like to propose that the conversion of the metrics from pmetric to prometheus data model can be parallelized. This can be done with a configurable parameter for the parallelism level and the algorithm should partition the data into chunks that are converted in parallel and them finally merged.

Describe alternatives you've considered

One natural way of mitigating this issue is just adding more collectors. However this comes with its own set of problems and challanges. Ideally each collector should scale to take most of the hardware where it is running.

Additional context

No response

The text was updated successfully, but these errors were encountered:

github-actions · 2023-04-21T23:46:50Z

Pinging code owners:

exporter/prometheusremotewrite: @Aneurysm9

See Adding Labels via Comments if you do not have permissions to add labels yourself.

rapphil · 2023-04-21T23:48:16Z

After opening the ticket, I noticed that this can be considered a duplicate of this one: #20741

However the approach suggested is different.

atoulme · 2023-04-29T18:20:51Z

Would you like to close this ticket and work with #20741?

Aneurysm9 · 2023-05-01T22:23:56Z

I think the two issues are related but possibly susceptible to independent resolutions. This issue can be resolved without any changes to the handling of consumer count on the queued retry helper by increasing data conversion parallelism. I believe that @rapphil also had some ideas for how to safely increase export parallelism that would be more closely aligned with #20741.

github-actions · 2023-07-03T03:34:49Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

exporter/prometheusremotewrite: @Aneurysm9 @rapphil

See Adding Labels via Comments if you do not have permissions to add labels yourself.

rapphil · 2023-07-03T03:49:31Z

This is still relevant

github-actions · 2023-09-04T03:29:11Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

exporter/prometheusremotewrite: @Aneurysm9 @rapphil

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions · 2023-11-03T05:18:57Z

This issue has been closed as inactive because it has been stale for 120 days with no activity.

rapphil added enhancement needs triage labels Apr 21, 2023

github-actions bot added the exporter/prometheusremotewrite label Apr 21, 2023

rapphil mentioned this issue Apr 26, 2023

REQUEST: New membership for rapphil open-telemetry/community#1456

Closed

6 tasks

atoulme removed the needs triage label Apr 29, 2023

atoulme assigned rapphil Apr 29, 2023

github-actions bot added the Stale label Jul 3, 2023

github-actions bot removed the Stale label Jul 3, 2023

github-actions bot added the Stale label Sep 4, 2023

github-actions bot added the closed as inactive label Nov 3, 2023

github-actions bot closed this as not planned Nov 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[exporter/prometheusremotewrite] Consider converting from pmetrics to prometheus data model in parallel. #21106

[exporter/prometheusremotewrite] Consider converting from pmetrics to prometheus data model in parallel. #21106

rapphil commented Apr 21, 2023

github-actions bot commented Apr 21, 2023

rapphil commented Apr 21, 2023 •

edited

Loading

atoulme commented Apr 29, 2023

Aneurysm9 commented May 1, 2023

github-actions bot commented Jul 3, 2023

rapphil commented Jul 3, 2023

github-actions bot commented Sep 4, 2023

github-actions bot commented Nov 3, 2023

[exporter/prometheusremotewrite] Consider converting from pmetrics to prometheus data model in parallel. #21106

[exporter/prometheusremotewrite] Consider converting from pmetrics to prometheus data model in parallel. #21106

Comments

rapphil commented Apr 21, 2023

Component(s)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

github-actions bot commented Apr 21, 2023

rapphil commented Apr 21, 2023 • edited Loading

atoulme commented Apr 29, 2023

Aneurysm9 commented May 1, 2023

github-actions bot commented Jul 3, 2023

rapphil commented Jul 3, 2023

github-actions bot commented Sep 4, 2023

github-actions bot commented Nov 3, 2023

rapphil commented Apr 21, 2023 •

edited

Loading