Explore: classic peer discovery with randomised startup delay #687

ansd · 2021-05-11T16:25:04Z

Relates #662.

Dynamic peer discovery is not needed for RabbitMQ Clusters deployed by the Cluster Operator. All nodes are known at deploy time. The Cluster Operator knows the number of replicas and their host names.

This PR uses classic peer discovery with a static list of nodes instead of the dynamic rabbitmq_peer_discovery_k8s plugin.
In the case of a scale out (i.e. more RabbitMQ nodes added to the RabbitMQ cluster), existing nodes do not get restarted.

Pros:

Nodes can join other nodes that are running but not yet ready increasing the likelihood of discovering peers
no sophisticated locking mechanism

Cons:

There might still be cases where clusters do not get formed correctly since this approach still relies on randomised startup delays. @mkuratczyk and I are doing some more testing with different parameters.

Alternatives:

Enforce dedicated node to form the cluster: Explore: classic peer discovery without randomised startup delay #689
Use a lock in rabbitmq_peer_discovery_k8s (e.g. as done in controller-runtime leader election)

instead of rabbit_peer_discovery_k8s plugin. For RabbitMQ clusters deployed by the RabbitMQ cluster operator, there is no need for dynamic service discovery since cluster members are known at deploy time. By using the classic config, we increase likelihood of nodes discovering peers. In contrast, K8S peer discovery only considers peers that are ready, which might take a long time resulting in more than one node to start a cluster.

If the RabbitMQ cluster is under heavy load and is being scaled out (i.e. more RabbitMQ nodes added to the RabbitMQ cluster), existing nodes shouldn't be restarted. Before this commit, exising nodes were restarted because the ConfigMap gets updated since new peers get included for peer discovery. However, existing nodes do not need this new peer discovery configuration since the cluster is already formed.

ansd · 2021-06-07T09:21:24Z

Closing this PR in favor of rabbitmq/rabbitmq-server#3075.

ansd changed the title ~~Use classic peer discovery~~ Explore: classic peer discovery with randomised startup delay May 12, 2021

ansd added 2 commits May 19, 2021 16:03

ansd force-pushed the peer-discovery-classic branch from 54a2b66 to ff33f41 Compare May 19, 2021 14:07

ansd closed this Jun 7, 2021

ansd deleted the peer-discovery-classic branch June 8, 2021 07:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Explore: classic peer discovery with randomised startup delay #687

Explore: classic peer discovery with randomised startup delay #687

Uh oh!

ansd commented May 11, 2021 •

edited

Loading

Uh oh!

ansd commented Jun 7, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Explore: classic peer discovery with randomised startup delay #687

Explore: classic peer discovery with randomised startup delay #687

Uh oh!

Conversation

ansd commented May 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ansd commented Jun 7, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ansd commented May 11, 2021 •

edited

Loading