Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix quorum queue status calculation to ignore non-voters #10394

Closed
wants to merge 1 commit into from

Conversation

illotum
Copy link

@illotum illotum commented Jan 23, 2024

Proposed Changes

Fix: calculating qq status will ignore non-voters in cluster size calculation.
Add: list voters over HTTP API (via rabbit_quorum_queue:format/1).

Types of Changes

  • Bug fix (non-breaking change which fixes issue #NNNN)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause an observable behavior change in existing systems)
  • Documentation improvements (corrections, new content, etc)
  • Cosmetic change (whitespace, formatting, etc)
  • Build system and/or CI

Checklist

Further Comments

  • I have mixed feelings about adding RPC for every queue in a potential list of hundreds, but couldn't think of an alternative.

@michaelklishin michaelklishin added this to the 3.13.0 milestone Jan 23, 2024
@michaelklishin michaelklishin changed the title fix quorum queue status calculation to ignore non-voters 3.13: fix quorum queue status calculation to ignore non-voters Jan 23, 2024
@illotum
Copy link
Author

illotum commented Jan 24, 2024

Opening for review. This PR

  • Pulls newly minted ra 2.8.0
  • Bumps Elixir predicate to 1.17. I keep adding and reverting it on various branches, it's probably time to commit. But LMK if you prefer a standalone PR!
  • Exposes voters in quorum queue format. I have mixed feelings about adding RPC for every queue in a potential list of hundreds, but couldn't think of an alternative.

@illotum illotum marked this pull request as ready for review January 24, 2024 04:23
@michaelklishin
Copy link
Member

@illotum can we use ra_server_proc:local_state_query/3 directly in RabbitMQ and not bump Ra? We will bump it for 3.13 but before rabbitmq/ra#411 (comment) is addressed, likely cannot adopt a new version immediately.

@illotum illotum mentioned this pull request Jan 24, 2024
11 tasks
@illotum
Copy link
Author

illotum commented Jan 25, 2024

Reverted to direct state_query (and pruned everything unrelated).

Copy link
Contributor

@kjnilsson kjnilsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change will make the /queues HTTP API slower when there are many queues. It will need testing but I can't approve as is.

@@ -1783,6 +1785,14 @@ get_nodes(Q) when ?is_amqqueue(Q) ->
#{nodes := Nodes} = amqqueue:get_type_state(Q),
Nodes.

get_voters(Q) when ?amqqueue_is_quorum(Q) ->
Leader = amqqueue:get_pid(Q),
case ra:voters(Leader) of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace

@@ -1672,34 +1672,36 @@ online(Q) when ?is_amqqueue(Q) ->
is_pid(Pid)].

format(Q, Ctx) when ?is_amqqueue(Q) ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function is called for every queue when using the /queues HTTP API, depending on sort columns this could mean all queues in the system. Please compare HTTP API query times with and without this change with 5k quorum queues in a 3 node cluster.

I think we need to find another way of achieving this. Perhaps we need to persist the voting set in the queue record queue type state

@michaelklishin michaelklishin removed this from the 3.13.0 milestone Jan 26, 2024
@michaelklishin michaelklishin changed the title 3.13: fix quorum queue status calculation to ignore non-voters Fix quorum queue status calculation to ignore non-voters Jan 26, 2024
@michaelklishin
Copy link
Member

Closing per discussion with @illotum @kjnilsson. Unlike #10304, this PR needs to be reworked from scratch.

Since this change does not break anything and is arguably a bug fix, we can ship it in a 3.13.x patch in case a revised version does not make it in time for 3.13.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants