From a68d83a96681498cfb5b3fe234dc299169f372c8 Mon Sep 17 00:00:00 2001 From: JakeSCahill Date: Mon, 30 Jun 2025 08:48:11 +0100 Subject: [PATCH 1/5] Ensure that users do not enable auto-upgrades in K8s guides --- .../self-hosted/kubernetes/aks-guide.adoc | 19 ++++++++++--------- .../self-hosted/kubernetes/eks-guide.adoc | 18 +++++++++++------- .../self-hosted/kubernetes/gke-guide.adoc | 18 ++++++++++-------- modules/deploy/partials/requirements.adoc | 16 ++++------------ 4 files changed, 35 insertions(+), 36 deletions(-) diff --git a/modules/deploy/pages/deployment-option/self-hosted/kubernetes/aks-guide.adoc b/modules/deploy/pages/deployment-option/self-hosted/kubernetes/aks-guide.adoc index ee8876a3cc..1bf6956159 100644 --- a/modules/deploy/pages/deployment-option/self-hosted/kubernetes/aks-guide.adoc +++ b/modules/deploy/pages/deployment-option/self-hosted/kubernetes/aks-guide.adoc @@ -11,18 +11,16 @@ Deploy a secure Redpanda cluster and Redpanda Console in Azure Kubernetes Servic == Prerequisites -Before you begin, you must have the following: - -* You must satisfy the prerequisites listed in the https://learn.microsoft.com/en-us/azure/aks/learn/quick-kubernetes-deploy-cli#prerequisites[AKS quickstart^] +* Satisfy the prerequisites listed in the https://learn.microsoft.com/en-us/azure/aks/learn/quick-kubernetes-deploy-cli#prerequisites[AKS quickstart^] to get access to the Azure CLI. -* https://kubernetes.io/docs/tasks/tools/[`kubectl`^]. Minimum required Kubernetes version: {supported-kubernetes-version}. +* Install https://kubernetes.io/docs/tasks/tools/[`kubectl`^]. Minimum required Kubernetes version: {supported-kubernetes-version}. + [,bash] ---- kubectl version --short --client ---- -* https://helm.sh/docs/intro/install/[Helm^]. Minimum required Helm version: {supported-helm-version} +* Install https://helm.sh/docs/intro/install/[Helm^]. Minimum required Helm version: {supported-helm-version} + [,bash] ---- @@ -38,8 +36,6 @@ In this step, you create an AKS cluster with three nodes on https://learn.micros - 2 cores per worker node, which is a requirement for production. - Local NVMe disks, which is recommended for best performance. -NOTE: The Helm chart configures default `podAntiAffinity` rules to make sure that only one Pod running a Redpanda broker is scheduled on each worker node. To learn why, see xref:deploy:deployment-option/self-hosted/kubernetes/k-requirements.adoc#number-of-workers[Number of workers]. - . Create a resource group for Redpanda: + [,bash] @@ -56,10 +52,15 @@ az aks create -g redpandaResourceGroup -n \ --generate-ssh-keys \ --enable-node-public-ip \ --node-vm-size Standard_L8s_v3 \ - --disable-file-driver + --disable-file-driver \ + --node-os-upgrade-channel None <1> ---- + -TIP: For all available options, see the https://learn.microsoft.com/en-us/cli/azure/aks?view=azure-cli-latest#az-aks-create[AKS documentation^]. +<1> Set the https://learn.microsoft.com/en-us/azure/aks/auto-upgrade-node-os-image[OS upgrade channel^] to `None` to prevent AKS from automatically rebooting or upgrading nodes. ++ +For more details, see the xref:deploy:deployment-option/self-hosted/kubernetes/k-requirements.adoc#node-update[requirements and recommendations] for deploying Redpanda in Kubernetes. + +For all available options, see the https://learn.microsoft.com/en-us/cli/azure/aks?view=azure-cli-latest#az-aks-create[AKS documentation^]. include::deploy:partial$kubernetes/guides/create-storageclass.adoc[leveloffset=+2] diff --git a/modules/deploy/pages/deployment-option/self-hosted/kubernetes/eks-guide.adoc b/modules/deploy/pages/deployment-option/self-hosted/kubernetes/eks-guide.adoc index 1c20b54cec..14cd57aefb 100644 --- a/modules/deploy/pages/deployment-option/self-hosted/kubernetes/eks-guide.adoc +++ b/modules/deploy/pages/deployment-option/self-hosted/kubernetes/eks-guide.adoc @@ -13,7 +13,7 @@ Then, use `rpk` both as an internal client and an external client to interact wi == Prerequisites -Before you begin, you must have the following prerequisites. +Before you begin, you must meet the following prerequisites. === IAM user @@ -252,8 +252,6 @@ In this step, you create an EKS cluster with three nodes on https://aws.amazon.c - 2 cores per worker node, which is a requirement for production. - Local NVMe disks, which is recommended for best performance. -NOTE: The Helm chart configures default `podAntiAffinity` rules to make sure that only one Pod running a Redpanda broker is scheduled on each worker node. To learn why, see xref:deploy:deployment-option/self-hosted/kubernetes/k-requirements.adoc#number-of-workers[Number of workers]. - . Create an EKS cluster and give it a unique name. If your account is configured with OIDC, add the `--with-oidc` flag to the `create cluster` command. + [,bash,lines=4-6] @@ -266,16 +264,22 @@ eksctl create cluster \ --external-dns-access ---- + -[TIP] +[IMPORTANT] ==== -To see all options: +Do not enable https://docs.aws.amazon.com/eks/latest/userguide/automode.html[auto mode^] (`--enable-auto-mode`) on Amazon EKS clusters running Redpanda. + +Auto mode can trigger automatic reboots or node upgrades that disrupt Redpanda brokers, risking data loss or cluster instability. Redpanda requires manual control over node lifecycle events. +For more details, see the xref:deploy:deployment-option/self-hosted/kubernetes/k-requirements.adoc#node-updates[requirements and recommendations] for deploying Redpanda in Kubernetes. +==== ++ +To see all options: ++ ```bash eksctl create cluster --help ``` - ++ Or, for help creating an EKS cluster, see the https://eksctl.io/usage/creating-and-managing-clusters/[Creating and managing clusters^] in the `eksctl` documentation. -==== . Make sure that your local `kubeconfig` file points to your EKS cluster: + diff --git a/modules/deploy/pages/deployment-option/self-hosted/kubernetes/gke-guide.adoc b/modules/deploy/pages/deployment-option/self-hosted/kubernetes/gke-guide.adoc index 9f9cfa460e..7298c2de33 100644 --- a/modules/deploy/pages/deployment-option/self-hosted/kubernetes/gke-guide.adoc +++ b/modules/deploy/pages/deployment-option/self-hosted/kubernetes/gke-guide.adoc @@ -11,17 +11,15 @@ Deploy a secure Redpanda cluster and Redpanda Console in Google Kubernetes Engin == Prerequisites -Before you begin, you must have the following: - * Complete the 'Before you begin' steps and the 'Launch Cloud Shell' steps of the https://cloud.google.com/kubernetes-engine/docs/deploy-app-cluster#before-you-begin[GKE quickstart^]. Cloud Shell comes preinstalled with the Google Cloud CLI, the `kubectl` command-line tool, and the Helm package manager. -* https://kubernetes.io/docs/tasks/tools/[`kubectl`^]. Minimum required Kubernetes version: {supported-kubernetes-version}. +* Ensure https://kubernetes.io/docs/tasks/tools/[`kubectl`^] is installed. Minimum required Kubernetes version: {supported-kubernetes-version}. + [,bash] ---- kubectl version --short --client ---- -* https://helm.sh/docs/intro/install/[Helm^]. Minimum required Helm version: {supported-helm-version} +* Ensure https://helm.sh/docs/intro/install/[Helm^] is installed. Minimum required Helm version: {supported-helm-version} + [,bash] ---- @@ -37,8 +35,6 @@ In this step, you create a GKE cluster with three nodes on https://cloud.google. - 2 cores per worker node, which is a requirement for production. - Local NVMe disks, which is recommended for best performance. -NOTE: The Helm chart configures default `podAntiAffinity` rules to make sure that only one Pod running a Redpanda broker is scheduled on each worker node. To learn why, see xref:deploy:deployment-option/self-hosted/kubernetes/k-requirements.adoc#number-of-workers[Number of workers]. - Create a GKE cluster. Replace the `` placeholder with your own region. [,bash] @@ -50,12 +46,18 @@ gcloud container clusters create \ --region= ---- -[TIP] +[IMPORTANT] ==== +Do not enable https://docs.aws.amazon.com/eks/latest/userguide/automode.html[node auto-upgrades^] (`--enable-autoupgrade`) on Google GKE clusters running Redpanda. + +Node auto-upgrades can trigger automatic reboots or node upgrades that disrupt Redpanda brokers, risking data loss or cluster instability. Redpanda requires manual control over node lifecycle events. + +For more details, see the xref:deploy:deployment-option/self-hosted/kubernetes/k-requirements.adoc#node-updates[requirements and recommendations] for deploying Redpanda in Kubernetes. +==== + To see all options that you can specify when creating a cluster, see the https://cloud.google.com/sdk/gcloud/reference/container/clusters/create[Cloud SDK reference^]. Or, for help creating a GKE cluster, see the https://cloud.google.com/kubernetes-engine/docs/deploy-app-cluster#create_cluster[GKE documentation^]. -==== include::deploy:partial$kubernetes/guides/create-storageclass.adoc[leveloffset=+2] diff --git a/modules/deploy/partials/requirements.adoc b/modules/deploy/partials/requirements.adoc index 7a87f27b72..9e9fc25b4a 100644 --- a/modules/deploy/partials/requirements.adoc +++ b/modules/deploy/partials/requirements.adoc @@ -52,25 +52,17 @@ ifndef::env-kubernetes[] endif::[] [[node-updates]] -== Node maintenance and operating system upgrades +== Prevent automatic node upgrades in managed Kubernetes clusters Ensure that node and operating system (OS) upgrades are manually managed when running Redpanda in production. Manual control avoids unplanned reboots or replacements that disrupt Redpanda brokers, causing service downtime, data loss, or quorum instability. -=== Limitations of automatic updates +Common issues with automatic node upgrades include: -Redpanda is stateful. Redpanda brokers manage partition data and leadership, making them sensitive to disruptions. Proper handling during maintenance is required to: - -- Avoid data loss, especially for nodes with ephemeral or local storage. -- Ensure smooth leadership transitions by decommissioning brokers before removing a node. -- Minimize service downtime by upgrading nodes one at a time during planned maintenance windows. - -However, automatic update mechanisms provided by cloud platforms may not meet Redpanda's stateful requirements. Common issues include: - -- Hard timeouts for graceful shutdowns that may not allow Redpanda brokers enough time to complete decommissioning or leadership transitions. +- Hard timeouts for graceful shutdowns that do not allow Redpanda brokers enough time to complete decommissioning or leadership transitions. - Replacements or reboots without ensuring data has been safely migrated or replicated, risking data loss. - Parallel upgrades across multiple nodes, which can disrupt quorum or reduce cluster availability. -*Recommendations*: +*Requirements*: - Disable automatic node maintenance or upgrades. ifdef::env-kubernetes[] From aae036fd7b0d062326eada5b12cc4c7868d11f4d Mon Sep 17 00:00:00 2001 From: Jake Cahill <45230295+JakeSCahill@users.noreply.github.com> Date: Mon, 30 Jun 2025 08:54:52 +0100 Subject: [PATCH 2/5] Apply suggestions from code review --- modules/deploy/partials/requirements.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/modules/deploy/partials/requirements.adoc b/modules/deploy/partials/requirements.adoc index 9e9fc25b4a..318f1276dc 100644 --- a/modules/deploy/partials/requirements.adoc +++ b/modules/deploy/partials/requirements.adoc @@ -52,7 +52,7 @@ ifndef::env-kubernetes[] endif::[] [[node-updates]] -== Prevent automatic node upgrades in managed Kubernetes clusters +== Prevent automatic node upgrades Ensure that node and operating system (OS) upgrades are manually managed when running Redpanda in production. Manual control avoids unplanned reboots or replacements that disrupt Redpanda brokers, causing service downtime, data loss, or quorum instability. From d83628eb1c27831a0ae266d416f1b606d2ad60de Mon Sep 17 00:00:00 2001 From: Jake Cahill <45230295+JakeSCahill@users.noreply.github.com> Date: Mon, 30 Jun 2025 21:33:24 +0100 Subject: [PATCH 3/5] Update modules/deploy/pages/deployment-option/self-hosted/kubernetes/gke-guide.adoc Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> --- .../deployment-option/self-hosted/kubernetes/gke-guide.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/modules/deploy/pages/deployment-option/self-hosted/kubernetes/gke-guide.adoc b/modules/deploy/pages/deployment-option/self-hosted/kubernetes/gke-guide.adoc index 7298c2de33..f0133cc003 100644 --- a/modules/deploy/pages/deployment-option/self-hosted/kubernetes/gke-guide.adoc +++ b/modules/deploy/pages/deployment-option/self-hosted/kubernetes/gke-guide.adoc @@ -48,7 +48,7 @@ gcloud container clusters create \ [IMPORTANT] ==== -Do not enable https://docs.aws.amazon.com/eks/latest/userguide/automode.html[node auto-upgrades^] (`--enable-autoupgrade`) on Google GKE clusters running Redpanda. +Do not enable https://cloud.google.com/kubernetes-engine/docs/how-to/node-auto-upgrades[node auto-upgrades^] (`--enable-autoupgrade`) on Google GKE clusters running Redpanda. Node auto-upgrades can trigger automatic reboots or node upgrades that disrupt Redpanda brokers, risking data loss or cluster instability. Redpanda requires manual control over node lifecycle events. From 1271781817f0d59f20059209f1cbbf61a048d155 Mon Sep 17 00:00:00 2001 From: Jake Cahill <45230295+JakeSCahill@users.noreply.github.com> Date: Mon, 30 Jun 2025 21:33:58 +0100 Subject: [PATCH 4/5] Update modules/deploy/pages/deployment-option/self-hosted/kubernetes/aks-guide.adoc Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> --- .../deployment-option/self-hosted/kubernetes/aks-guide.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/modules/deploy/pages/deployment-option/self-hosted/kubernetes/aks-guide.adoc b/modules/deploy/pages/deployment-option/self-hosted/kubernetes/aks-guide.adoc index 1bf6956159..c0487790a5 100644 --- a/modules/deploy/pages/deployment-option/self-hosted/kubernetes/aks-guide.adoc +++ b/modules/deploy/pages/deployment-option/self-hosted/kubernetes/aks-guide.adoc @@ -58,7 +58,7 @@ az aks create -g redpandaResourceGroup -n \ + <1> Set the https://learn.microsoft.com/en-us/azure/aks/auto-upgrade-node-os-image[OS upgrade channel^] to `None` to prevent AKS from automatically rebooting or upgrading nodes. + -For more details, see the xref:deploy:deployment-option/self-hosted/kubernetes/k-requirements.adoc#node-update[requirements and recommendations] for deploying Redpanda in Kubernetes. +For more details, see the xref:deploy:deployment-option/self-hosted/kubernetes/k-requirements.adoc#node-updates[requirements and recommendations] for deploying Redpanda in Kubernetes. For all available options, see the https://learn.microsoft.com/en-us/cli/azure/aks?view=azure-cli-latest#az-aks-create[AKS documentation^]. From 3e5de54194aa86d18e96e5ac41df1894755f85ea Mon Sep 17 00:00:00 2001 From: Jake Cahill <45230295+JakeSCahill@users.noreply.github.com> Date: Tue, 1 Jul 2025 13:09:48 +0100 Subject: [PATCH 5/5] Update requirements.adoc --- modules/deploy/partials/requirements.adoc | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/modules/deploy/partials/requirements.adoc b/modules/deploy/partials/requirements.adoc index 318f1276dc..edaecacb01 100644 --- a/modules/deploy/partials/requirements.adoc +++ b/modules/deploy/partials/requirements.adoc @@ -67,12 +67,12 @@ Common issues with automatic node upgrades include: - Disable automatic node maintenance or upgrades. ifdef::env-kubernetes[] To prevent managed Kubernetes services from automatically rebooting or upgrading nodes: -** **Azure AKS**: Set the OS upgrade channel to `None`. https://learn.microsoft.com/en-us/azure/aks/auto-upgrade-node-os-image[Azure Documentation^]. -** **Google GKE**: Disable GKE auto-upgrades for node pools. https://cloud.google.com/kubernetes-engine/docs/how-to/node-auto-upgrades[GCP Documentation^]. -** **Amazon EKS**: Avoid enabling EKS node auto-upgrades. https://docs.aws.amazon.com/eks/latest/userguide/worker.html[AWS Documentation^]. -- xref:upgrade:k-upgrade-kubernetes.adoc[Manually manage node upgrades]. -endif::[] +** **Azure AKS**: https://learn.microsoft.com/en-us/azure/aks/auto-upgrade-node-os-image[Set the OS upgrade channel to `None`^]. +** **Google GKE**: https://cloud.google.com/kubernetes-engine/docs/how-to/node-auto-upgrades[Disable GKE auto-upgrades for node pools^]. +** **Amazon EKS**: https://docs.aws.amazon.com/eks/latest/userguide/automode.html[Disable EKS node auto-upgrades^]. +See also: xref:upgrade:k-upgrade-kubernetes.adoc[How to manually manage node upgrades]. +endif::[] == CPU and memory