rabbit_quorum_queue: Shrink batches of QQs in parallel (backport #15081) (backport #15765)#15769
Merged
the-mikedavis merged 1 commit intov4.2.xfrom Mar 18, 2026
Merged
Conversation
Shrinking a member node off of a QQ can be parallelized. The operation involves * removing the node from the QQ's cluster membership (appending a command to the log and committing it) with `ra:remove_member/3` * updating the metadata store to remove the member from the QQ type state with `rabbit_amqqueue:update/2` * deleting the queue data from the node with `ra:force_delete_server/2` if the node can be reached All of these operations are I/O bound. Updating the cluster membership and metadata store involves appending commands to those logs and replicating them. Writing commands to Ra synchronously in serial is fairly slow - sending many commands in parallel is much more efficient. By parallelizing these steps we can write larger chunks of commands to WAL(s). `ra:force_delete_server/2` benefits from parallelizing if the node being shrunk off is no longer reachable, for example in some hardware failures. The underlying `rpc:call/4` will attempt to auto-connect to the node and this can take some time to time out. By parallelizing this, each `rpc:call/4` reuses the same underlying distribution entry and all calls fail together once the connection fails to establish. (cherry picked from commit 511692a) (cherry picked from commit 6455406)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Shrinking a member node off of a QQ can be parallelized. The operation involves
ra:remove_member/3rabbit_amqqueue:update/2ra:force_delete_server/2if the node can be reachedAll of these operations are I/O bound. Updating the cluster membership and metadata store involves appending commands to those logs and replicating them. Writing commands to Ra synchronously in serial is fairly slow - sending many commands in parallel is much more efficient. By parallelizing these steps we can write larger chunks of commands to WAL(s).
ra:force_delete_server/2benefits from parallelizing if the node being shrunk off is no longer reachable, for example in some hardware failures. The underlyingrpc:call/4will attempt to auto-connect to the node and this can take some time to time out. By parallelizing this, eachrpc:call/4reuses the same underlying distribution entry and all calls fail together once the connection fails to establish.Discussed in #15057
This is an automatic backport of pull request #15081 done by Mergify.
This is an automatic backport of pull request #15765 done by Mergify.