Skip to content

Conversation

@binarin
Copy link
Contributor

@binarin binarin commented Sep 14, 2021

Proposed Changes

Prometheus plugin always exposes all possible metrics (even in aggregated mode, where the output is smaller, but the same amount of data still needs to be processed), and this can be problematic when there is a large number of connections/channels/queues.

For existing endpoints this change allows to disable some metric families altogether or filter out some vhosts (but this is optional, and without configuration change everything is 100% backwards compatible). This can greatly reduce the amount of data exposed, while still providing enough details for a proper monitoring.

And then there is a new endpoint where the same customizations can be applied on the fly, in case more information is needed, or with different level of details.

Types of Changes

What types of changes does your code introduce to this project?
Put an x in the boxes that apply

  • Bug fix (non-breaking change which fixes issue #NNNN)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause an observable behavior change in existing systems)
  • Documentation improvements (corrections, new content, etc)
  • Cosmetic change (whitespace, formatting, etc)

Checklist

Further Comments

@vikinghawk
Copy link

This looks really promising!

Our usecase is probably similar to many others... we are only interested in queue message & consumer count at the per-object granularity. But scraping the current all or nothing per-object endpoint with 10k+ queues can take a long time (60+ seconds) and causes high memory usage on the node.

Couple questions:

  1. Have you guys decided what releases this will be targeted for? Specifically, is it possible to be backported to 3.8.x?
  2. Would it be feasible to add "rabbitmq_queue_consumer_utilisation" to the "queue_consumer_count" group? It would be useful to see the current consumer utilization metrics alongside the count

@binarin
Copy link
Contributor Author

binarin commented Sep 16, 2021

1. Have you guys decided what releases this will be targeted for? Specifically, is it possible to be backported to 3.8.x?

For now we are planning to do it, unless there will be some unexpected problem.

2. Would it be feasible to add "rabbitmq_queue_consumer_utilisation" to the "queue_consumer_count" group? It would be useful to see the current consumer utilization metrics alongside the count

No, but I've did a bit of benchmarking on a 10k queues/10k publishers/10k consumers rabbit:

  • Current per-object takes around 2 minutes.
  • queue_coarse_metrics and queue_consumer_count - 1.3 second
  • queue_metrics - 8.5 seconds

If you need utilization, you'll be able to scrape it now in a more reasonable fashion anyway.

@binarin binarin merged commit e4bda83 into master Sep 20, 2021
@binarin binarin deleted the alebedeff/opp-92 branch September 20, 2021 15:54
binarin added a commit that referenced this pull request Sep 21, 2021
Make prometheus plugin output customizable (backport #3421)
@michaelklishin michaelklishin added this to the 3.9.7 milestone Sep 23, 2021
@luos
Copy link
Contributor

luos commented Dec 6, 2021

Hi @binarin, seems like this did not get backported by Mergify to v3.8.x. Do you think there is a chance this could be merged into it? Thanks

@sherifkayad
Copy link

sherifkayad commented Jan 12, 2022

@binarin similar to @luos I am missing this commit in 3.8.x .. any plans to include it there? .. I submitted an issue for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants