rabbit_quorum_queue: Shrink batches of QQs in parallel (backport #15081) (backport #15765) by mergify[bot] · Pull Request #15769 · rabbitmq/rabbitmq-server

mergify · 2026-03-18T18:22:44Z

Shrinking a member node off of a QQ can be parallelized. The operation involves

removing the node from the QQ's cluster membership (appending a command to the log and committing it) with ra:remove_member/3
updating the metadata store to remove the member from the QQ type state with rabbit_amqqueue:update/2
deleting the queue data from the node with ra:force_delete_server/2 if the node can be reached

All of these operations are I/O bound. Updating the cluster membership and metadata store involves appending commands to those logs and replicating them. Writing commands to Ra synchronously in serial is fairly slow - sending many commands in parallel is much more efficient. By parallelizing these steps we can write larger chunks of commands to WAL(s).

ra:force_delete_server/2 benefits from parallelizing if the node being shrunk off is no longer reachable, for example in some hardware failures. The underlying rpc:call/4 will attempt to auto-connect to the node and this can take some time to time out. By parallelizing this, each rpc:call/4 reuses the same underlying distribution entry and all calls fail together once the connection fails to establish.

Discussed in #15057

This is an automatic backport of pull request #15081 done by Mergify.

This is an automatic backport of pull request #15765 done by Mergify.

Shrinking a member node off of a QQ can be parallelized. The operation involves * removing the node from the QQ's cluster membership (appending a command to the log and committing it) with `ra:remove_member/3` * updating the metadata store to remove the member from the QQ type state with `rabbit_amqqueue:update/2` * deleting the queue data from the node with `ra:force_delete_server/2` if the node can be reached All of these operations are I/O bound. Updating the cluster membership and metadata store involves appending commands to those logs and replicating them. Writing commands to Ra synchronously in serial is fairly slow - sending many commands in parallel is much more efficient. By parallelizing these steps we can write larger chunks of commands to WAL(s). `ra:force_delete_server/2` benefits from parallelizing if the node being shrunk off is no longer reachable, for example in some hardware failures. The underlying `rpc:call/4` will attempt to auto-connect to the node and this can take some time to time out. By parallelizing this, each `rpc:call/4` reuses the same underlying distribution entry and all calls fail together once the connection fails to establish. (cherry picked from commit 511692a) (cherry picked from commit 6455406)

the-mikedavis added this to the 4.2.6 milestone Mar 18, 2026

the-mikedavis merged commit 0af55d8 into v4.2.x Mar 18, 2026
583 of 585 checks passed

the-mikedavis deleted the mergify/bp/v4.2.x/pr-15765 branch March 18, 2026 19:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rabbit_quorum_queue: Shrink batches of QQs in parallel (backport #15081) (backport #15765)#15769

rabbit_quorum_queue: Shrink batches of QQs in parallel (backport #15081) (backport #15765)#15769
the-mikedavis merged 1 commit intov4.2.xfrom
mergify/bp/v4.2.x/pr-15765

mergify bot commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mergify bot commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant