Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP API: reduce the number of metrics served by GET /api/queues by default #9437

Closed
michaelklishin opened this issue Sep 14, 2023 · 6 comments
Assignees
Milestone

Comments

@michaelklishin
Copy link
Member

michaelklishin commented Sep 14, 2023

GET /api/queues is widely used and abused without pagination and often to retrieve just a single metric. In environments with 10s or 100s of thousands of queues that produces enormously large JSON responses:

100K * 80 keys per queue = 8M keys and easily tens of MiBs in size)

That rendering of 80 metrics per 100K objects burns a lot of CPU and maxes out network links for no good reason.

This is true for 3rd party tools such as monitoring ones.

In order to reduce this unnecessary resource waste we can a few things:

  • Drop some metrics, like almost all backing_queue_status fields, the garbage_collection field, from GET /api/queues responses
  • Potentially introduce a /detailed version of the endpoint that would return more metrics, or leave them to in the rabbitmq-diagnostics observer

We can also introduce pagination by default but that's likely going to be very confusing at first and will affect a lot of tools. Reducing the size of the response will be a lot less disruptive,
most tools and in most environments the vast majority of metrics are ignored.

@michaelklishin michaelklishin self-assigned this Sep 14, 2023
@dcorbacho
Copy link
Contributor

@michaelklishin I wonder if in many cases it can be enough to use GET /api/queues?disable_stats=true&enable_queue_totals=true

[{
   "arguments":{"x-queue-type":"classic"},
   "auto_delete":false,
   "durable":true,
   "exclusive":false,
   "messages":8,
   "messages_ready":8,
   "messages_unacknowledged":0,
   "name":"q1",
   "node":"rabbit@dparracorb0633Q",
   "state":"running",
   "type":"classic",
   "vhost":"/"
}]

which already reduces a lot the datapoints returned vs GET /api/queues:

[{
   "arguments":{"x-queue-type":"classic"},
   "auto_delete":false,
   "backing_queue_status":{"version":2,"mode":"default","len":8,"qi_buffer_num_up":0,"qi_buffer_size":0,"qs_buffer_size":0,"q3":1,"delta":["delta",1,7,7,8],"num_pending_acks":0,"num_unconfirmed":0,"next_deliver_seq_id":0,"next_seq_id":8,"avg_ack_egress_rate":0.0,"avg_ack_ingress_rate":0.0,"avg_egress_rate":0.0,"avg_ingress_rate":0.670065251225457,"target_ram_count":"infinity","q4":0,"q2":0,"q1":0},
   "consumer_capacity":0,
   "consumer_utilisation":0,
   "consumers":0,
   "durable":true,
   "effective_policy_definition":{},
   "exclusive":false,
   "exclusive_consumer_tag":null,
   "garbage_collection":{"fullsweep_after":65535,"max_heap_size":0,"min_bin_vheap_size":46422,"min_heap_size":233,"minor_gcs":4},
   "head_message_timestamp":null,
   "idle_since":"2023-09-27T09:41:17.010+02:00",
   "memory":8512,
   "message_bytes":64,
   "message_bytes_paged_out":56,
   "message_bytes_persistent":0,
   "message_bytes_ram":8,
   "message_bytes_ready":64,
   "message_bytes_unacknowledged":0,
   "message_stats":{"publish":8,"publish_details":{"rate":0.0}},
   "messages":8,
   "messages_details":{"rate":0.0},
   "messages_paged_out":7,
   "messages_persistent":0,
   "messages_ram":1,
   "messages_ready":8,
   "messages_ready_details":{"rate":0.0},
   "messages_ready_ram":1,
   "messages_unacknowledged":0,
   "messages_unacknowledged_details":{"rate":0.0},
   "messages_unacknowledged_ram":0,
   "name":"q1",
   "node":"rabbit@dparracorb0633Q",
   "operator_policy":null,
   "policy":null,
   "recoverable_slaves":null,
   "reductions":21134,
   "reductions_details":{"rate":0.0},
   "single_active_consumer_tag":null,
   "state":"running",
   "type":"classic",
   "vhost":"/"
}]

@michaelklishin
Copy link
Member Author

@dcorbacho that looks good and we can document that but in any case, serving backing queue metrics (that were never really meant to be used by the general public) does not make sense.

I'd both document that combination of parameters and proceed with the rest of this issue.

@dcorbacho
Copy link
Contributor

The plugin configuration is already documented in https://www.rabbitmq.com/management.html#disable-stats but there is no mention of the query parameters at all in the HTTP API docs. I'll document that part.

@dcorbacho
Copy link
Contributor

@michaelklishin I would drop the fields: backing_queue_status, garbage_collection, head_message_timestamp, reductions, reductions_details and idle_since.

Also, I don't think there is any value on returning those keys with a null value, maybe we can drop them without causing any major disruption?

@michaelklishin
Copy link
Member Author

#9627 #9578 suggest this can be closed.

@D1-3105
Copy link

D1-3105 commented May 31, 2024

How do I get such accurate values of metrics such were presented in backing_queue_status?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants