Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 44 additions & 6 deletions docs/examples/production-ready/README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,58 @@
# Production Example

This is an example of a good starting point for a production RabbitMQ deployment. It deploys a 3-node cluster with enough resources to handle reasonable traffic.
This is an example of a good starting point for a production RabbitMQ deployment.
It deploys a 3-node cluster with sufficient resources to handle 1 billion messages per day at 8kB payload and a replication factor of three.
The rest of the workload details are outlined in the monthly cost savings calculator on https://rabbitmq.com/tanzu

Please keep in mind that:

1. It may not be suitable for YOUR production deployment. Please go through the [Production Checklist](https://www.rabbitmq.com/production-checklist.html) to learn more about production deployment considerations.
1. It may not be suitable for **your** production deployment.
The official [RabbitMQ Production Checklist](https://www.rabbitmq.com/production-checklist.html) will help you with some of these considerations.

2. While it is important to correctly deploy RabbitMQ cluster for production deployment, it is even more important to correctly use RabbitMQ from your applications. [Production Checklist](https://www.rabbitmq.com/production-checklist.html) covers some of the common issues such as connection churn and polling consumers. Please also consider using [Quorum Queues](https://www.rabbitmq.com/quorum-queues.html) since they provide better data safety.
2. While it is important to correctly deploy RabbitMQ cluster for production workloads, it is equally important for your applications to use RabbitMQ correctly.
[Production Checklist](https://www.rabbitmq.com/production-checklist.html) covers some of the common issues such as connection churn and polling consumers.
This example was tested with [Quorum Queues](https://www.rabbitmq.com/quorum-queues.html) which provide excellent data safety for workloads that require message replication.

You can deploy this example like this:
Before you can deploy this RabbitMQ cluster, you will need a multi-zone Kubernetes cluster with at least 3 nodes, 12 CPUs, 30Gi RAM and 1.5Ti disk space available.
A `storageClass` named `ssd` will need to be defined too.
We have [a GKE-specific example](ssd-gke.yaml) included in this example.
Read more about the expected disk performance [in Google Cloud Documentation](https://cloud.google.com/compute/docs/disks/performance#ssd_persistent_disk).
For what it's worth, disk write throughput is the limiting factor for persistent messages with a payload of 8kB.

To deploy this RabbitMQ cluster, run the following:

```shell
kubectl apply -f rabbitmq.yaml
kubectl apply -f pod-disruption-budget.yaml
```

Please keep in mind that you need a multi-zone Kubernetes cluster with 3 nodes, 12 CPUs, 30Gi RAM, 1.5Ti disk space available as well as a `storageClass` called `ssd` to deploy this example as-is. Of course you can adjust these values to your environment if needed.
## Q & A

### Is 4 CPUs per RabbitMQ node the minimum?

No. The absolute minimum is 2 CPUs.

For our workload - 1 billion messages per day at 8kB payload and a replication factor of three - 4 CPUs is the minimum.

### Will RabbitMQ work with 1 CPU?

Yes. It will work, but poorly, which is why we cannot recommend it for production workloads.
A RabbitMQ with less than 2 full CPUs cannot be considered production.


### Can I assign less than 1 CPU to RabbitMQ?

Yes, this is entirely possible within Kubernetes.
Be prepared for unresponsiveness that cannot be explained.
The kernel will work against RabbitMQ's runtime optimisations, and anything can happen.
A RabbitMQ with less than 2 full CPUs cannot be considered production.

### Does CPU clock speed matter for message throughput?

Yes. Queues are single threaded, and CPUs with higher clock speeds can run more cycles, which means that the queue process can perform more operations per second.
This will not the case when disks or network are the limiting factor, but in benchmarks with sufficient network and disk capacity, faster CPUs translate to higher message throughhput.

### Are vCPUs (virtual CPUs) OK?

An SSD storage class can be defined using [the example](ssd-gke.yaml) (which is GKE-specific and needs to be adjusted for other environments). Read more about the expected disk performance [in Google Cloud Documentation](https://cloud.google.com/compute/docs/disks/performance#ssd_persistent_disk).
Yes. The workload that was used for this production configuration starting point ran on Google Cloud and used 2 real CPU cores with 2 hyper-threads each, meaning 4 vCPUs.
While we would recommend real CPUs and no hyper-threading, we also operate in the cloud and default to using vCPUs, including for our benchmarks.