-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[exporter/prometheusremotewrite] Consider converting from pmetrics to prometheus data model in parallel. #21106
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
After opening the ticket, I noticed that this can be considered a duplicate of this one: #20741 However the approach suggested is different. |
Would you like to close this ticket and work with #20741? |
I think the two issues are related but possibly susceptible to independent resolutions. This issue can be resolved without any changes to the handling of consumer count on the queued retry helper by increasing data conversion parallelism. I believe that @rapphil also had some ideas for how to safely increase export parallelism that would be more closely aligned with #20741. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This is still relevant |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been closed as inactive because it has been stale for 120 days with no activity. |
Component(s)
exporter/prometheusremotewrite
Is your feature request related to a problem? Please describe.
While performing load tests with the prometheusremotewrite exporter, I was able to identify a bottleneck that could be further optmized.
I used a very simple collector configuration in the load tests:
The target of prometheusremotewrite is a dummy web server that will only accept data and will return instantly.
We then run a distributed load tests using locust to ingest data into the otlp endpoint using otlp over http. We ingested load to the point were prometheusremotewrite was not able to keep up with the amount of data that was being ingested, and data started to accumulate in the queue.
I decided to profile the collector while it was under load, and I got to this:
After inspecting the code I noticed that the conversion of data from pmetric to the prometheus data model happens sequentially.
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/prometheusremotewriteexporter/exporter.go#L140
Describe the solution you'd like
I would like to propose that the conversion of the metrics from pmetric to prometheus data model can be parallelized. This can be done with a configurable parameter for the parallelism level and the algorithm should partition the data into chunks that are converted in parallel and them finally merged.
Describe alternatives you've considered
One natural way of mitigating this issue is just adding more collectors. However this comes with its own set of problems and challanges. Ideally each collector should scale to take most of the hardware where it is running.
Additional context
No response
The text was updated successfully, but these errors were encountered: