-
Notifications
You must be signed in to change notification settings - Fork 15.1k
Add 1.27 feature blog article about StatefulSet Start Ordinals (KEP-3335) #37418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 9 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
8555b53
Add blog post for StatefulSet Migration using StatefulSetStartOrdinal
pwschuurman bd45ab5
Update blog post headings and add What's Next section
pwschuurman 0043f19
Add a note about copying PV/PVC from source to destination cluster
pwschuurman bd610ae
Review updates for StatefulSet StartOrdinal blog post
pwschuurman 76dae78
Remove MCS references from StatefulSet start ordinal blog post
pwschuurman 8ca5a5d
Minor edits to StatefulSet start ordinal blog post
pwschuurman 13f1c8a
Update formatting and wording for StatefulSet Migration Redis demo
pwschuurman 22101af
Update StatefulSetStartOrdinal blog post for beta v1.27
pwschuurman 4f223e0
Add publish date for StatefulSet Migration blog
pwschuurman f71a862
Update title to reflect k8s 1.27
pwschuurman File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
223 changes: 223 additions & 0 deletions
223
content/en/blog/_posts/2023-04-28-statefulset-migration.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,223 @@ | ||
| --- | ||
| layout: blog | ||
| title: "Kubernetes 1.26: StatefulSet Start Ordinal Simplifies Migration" | ||
| date: 2023-04-28 | ||
| slug: statefulset-start-ordinal | ||
| --- | ||
|
|
||
| **Author**: Peter Schuurman (Google) | ||
|
|
||
| Kubernetes v1.26 introduced a new, alpha-level feature for | ||
| [StatefulSets](/docs/concepts/workloads/controllers/statefulset/) that controls | ||
| the ordinal numbering of Pod replicas. As of Kubernetes v1.27, this feature is | ||
| now beta. Ordinals can start from arbitrary | ||
| non-negative numbers. This blog post will discuss how this feature can be | ||
| used. | ||
|
|
||
| ## Background | ||
|
|
||
| StatefulSets ordinals provide sequential identities for pod replicas. When using | ||
| [`OrderedReady` Pod management](/docs/tutorials/stateful-application/basic-stateful-set/#orderedready-pod-management) | ||
| Pods are created from ordinal index `0` up to `N-1`. | ||
|
|
||
| With Kubernetes today, orchestrating a StatefulSet migration across clusters is | ||
| challenging. Backup and restore solutions exist, but these require the | ||
| application to be scaled down to zero replicas prior to migration. In today's | ||
| fully connected world, even planned application downtime may not allow you to | ||
| meet your business goals. You could use | ||
| [Cascading Delete](/docs/tutorials/stateful-application/basic-stateful-set/#cascading-delete) | ||
| or | ||
| [On Delete](/docs/tutorials/stateful-application/basic-stateful-set/#on-delete) | ||
| to migrate individual pods, however this is error prone and tedious to manage. | ||
| You lose the self-healing benefit of the StatefulSet controller when your Pods | ||
| fail or are evicted. | ||
|
|
||
| Kubernetes v1.26 enables a StatefulSet to be responsible for a range of ordinals | ||
| within a range {0..N-1} (the ordinals 0, 1, ... up to N-1). | ||
| With it, you can scale down a range | ||
| {0..k-1} in a source cluster, and scale up the complementary range {k..N-1} | ||
| in a destination cluster, while maintaining application availability. This | ||
| enables you to retain *at most one* semantics (meaning there is at most one Pod | ||
| with a given identity running in a StatefulSet) and | ||
| [Rolling Update](/docs/tutorials/stateful-application/basic-stateful-set/#rolling-update) | ||
| behavior when orchestrating a migration across clusters. | ||
|
|
||
| ## Why would I want to use this feature? | ||
|
|
||
| Say you're running your StatefulSet in one cluster, and need to migrate it out | ||
| to a different cluster. There are many reasons why you would need to do this: | ||
| * **Scalability**: Your StatefulSet has scaled too large for your cluster, and | ||
| has started to disrupt the quality of service for other workloads in your | ||
| cluster. | ||
| * **Isolation**: You're running a StatefulSet in a cluster that is accessed | ||
| by multiple users, and namespace isolation isn't sufficient. | ||
| * **Cluster Configuration**: You want to move your StatefulSet to a different | ||
| cluster to use some environment that is not available on your current | ||
| cluster. | ||
| * **Control Plane Upgrades**: You want to move your StatefulSet to a cluster | ||
| running an upgraded control plane, and can't handle the risk or downtime of | ||
| in-place control plane upgrades. | ||
|
|
||
| ## How do I use it? | ||
|
|
||
| Enable the `StatefulSetStartOrdinal` feature gate on a cluster, and create a | ||
| StatefulSet with a customized `.spec.ordinals.start`. | ||
|
|
||
| ## Try it out | ||
|
|
||
| In this demo, I'll use the new mechanism to migrate a | ||
| StatefulSet from one Kubernetes cluster to another. The | ||
| [redis-cluster](https://github.com/bitnami/charts/tree/main/bitnami/redis-cluster) | ||
| Bitnami Helm chart will be used to install Redis. | ||
|
|
||
| Tools Required: | ||
| * [yq](https://github.com/mikefarah/yq) | ||
| * [helm](https://helm.sh/docs/helm/helm_install/) | ||
|
|
||
| ### Pre-requisites {#demo-pre-requisites} | ||
|
|
||
| To do this, I need two Kubernetes clusters that can both access common | ||
| networking and storage; I've named my clusters `source` and `destination`. | ||
| Specifically, I need: | ||
|
|
||
| * The `StatefulSetStartOrdinal` feature gate enabled on both clusters. | ||
| * Client configuration for `kubectl` that lets me access both clusters as an | ||
| administrator. | ||
| * The same `StorageClass` installed on both clusters, and set as the default | ||
| StorageClass for both clusters. This `StorageClass` should provision | ||
| underlying storage that is accessible from either or both clusters. | ||
| * A flat network topology that allows for pods to send and receive packets to | ||
| and from Pods in either clusters. If you are creating clusters on a cloud | ||
| provider, this configuration may be called private cloud or private network. | ||
|
|
||
| 1. Create a demo namespace on both clusters: | ||
|
|
||
| ``` | ||
| kubectl create ns kep-3335 | ||
| ``` | ||
|
|
||
| 2. Deploy a Redis cluster with six replicas in the source cluster: | ||
|
|
||
| ``` | ||
| helm repo add bitnami https://charts.bitnami.com/bitnami | ||
| helm install redis --namespace kep-3335 \ | ||
| bitnami/redis-cluster \ | ||
| --set persistence.size=1Gi \ | ||
| --set cluster.nodes=6 | ||
| ``` | ||
|
|
||
| 3. Check the replication status in the source cluster: | ||
|
|
||
| ``` | ||
| kubectl exec -it redis-redis-cluster-0 -- /bin/bash -c \ | ||
| "redis-cli -c -h redis-redis-cluster -a $(kubectl get secret redis-redis-cluster -o jsonpath="{.data.redis-password}" | base64 -d) CLUSTER NODES;" | ||
| ``` | ||
|
|
||
| ``` | ||
| 2ce30362c188aabc06f3eee5d92892d95b1da5c3 10.104.0.14:6379@16379 myself,master - 0 1669764411000 3 connected 10923-16383 | ||
| 7743661f60b6b17b5c71d083260419588b4f2451 10.104.0.16:6379@16379 slave 2ce30362c188aabc06f3eee5d92892d95b1da5c3 0 1669764410000 3 connected | ||
| 961f35e37c4eea507cfe12f96e3bfd694b9c21d4 10.104.0.18:6379@16379 slave a8765caed08f3e185cef22bd09edf409dc2bcc61 0 1669764411000 1 connected | ||
| 7136e37d8864db983f334b85d2b094be47c830e5 10.104.0.15:6379@16379 slave 2cff613d763b22c180cd40668da8e452edef3fc8 0 1669764412595 2 connected | ||
| a8765caed08f3e185cef22bd09edf409dc2bcc61 10.104.0.19:6379@16379 master - 0 1669764411592 1 connected 0-5460 | ||
| 2cff613d763b22c180cd40668da8e452edef3fc8 10.104.0.17:6379@16379 master - 0 1669764410000 2 connected 5461-10922 | ||
| ``` | ||
|
|
||
| 4. Deploy a Redis cluster with zero replicas in the destination cluster: | ||
|
|
||
| ``` | ||
| helm install redis --namespace kep-3335 \ | ||
| bitnami/redis-cluster \ | ||
| --set persistence.size=1Gi \ | ||
| --set cluster.nodes=0 \ | ||
| --set redis.extraEnvVars\[0\].name=REDIS_NODES,redis.extraEnvVars\[0\].value="redis-redis-cluster-headless.kep-3335.svc.cluster.local" \ | ||
| --set existingSecret=redis-redis-cluster | ||
| ``` | ||
|
|
||
| 5. Scale down the `redis-redis-cluster` StatefulSet in the source cluster by 1, | ||
| to remove the replica `redis-redis-cluster-5`: | ||
|
|
||
| ``` | ||
| kubectl patch sts redis-redis-cluster -p '{"spec": {"replicas": 5}}' | ||
| ``` | ||
|
|
||
| 6. Migrate dependencies from the source cluster to the destination cluster: | ||
|
|
||
| The following commands copy resources from `source` to `destionation`. Details | ||
| that are not relevant in `destination` cluster are removed (eg: `uid`, | ||
| `resourceVersion`, `status`). | ||
|
|
||
| **Steps for the source cluster** | ||
|
|
||
| Note: If using a `StorageClass` with `reclaimPolicy: Delete` configured, you | ||
| should patch the PVs in `source` with `reclaimPolicy: Retain` prior to | ||
| deletion to retain the underlying storage used in `destination`. See | ||
| [Change the Reclaim Policy of a PersistentVolume](/docs/tasks/administer-cluster/change-pv-reclaim-policy/) | ||
| for more details. | ||
|
|
||
| ``` | ||
| kubectl get pvc redis-data-redis-redis-cluster-5 -o yaml | yq 'del(.metadata.uid, .metadata.resourceVersion, .metadata.annotations, .metadata.finalizers, .status)' > /tmp/pvc-redis-data-redis-redis-cluster-5.yaml | ||
| kubectl get pv $(yq '.spec.volumeName' /tmp/pvc-redis-data-redis-redis-cluster-5.yaml) -o yaml | yq 'del(.metadata.uid, .metadata.resourceVersion, .metadata.annotations, .metadata.finalizers, .spec.claimRef, .status)' > /tmp/pv-redis-data-redis-redis-cluster-5.yaml | ||
| kubectl get secret redis-redis-cluster -o yaml | yq 'del(.metadata.uid, .metadata.resourceVersion)' > /tmp/secret-redis-redis-cluster.yaml | ||
| ``` | ||
|
|
||
| **Steps for the destination cluster** | ||
|
|
||
| Note: For the PV/PVC, this procedure only works if the underlying storage system | ||
| that your PVs use can support being copied into `destination`. Storage | ||
| that is associated with a specific node or topology may not be supported. | ||
| Additionally, some storage systems may store addtional metadata about | ||
| volumes outside of a PV object, and may require a more specialized | ||
| sequence to import a volume. | ||
|
|
||
| ``` | ||
| kubectl create -f /tmp/pv-redis-data-redis-redis-cluster-5.yaml | ||
| kubectl create -f /tmp/pvc-redis-data-redis-redis-cluster-5.yaml | ||
| kubectl create -f /tmp/secret-redis-redis-cluster.yaml | ||
| ``` | ||
|
|
||
| 7. Scale up the `redis-redis-cluster` StatefulSet in the destination cluster by | ||
| 1, with a start ordinal of 5: | ||
|
|
||
| ``` | ||
| kubectl patch sts redis-redis-cluster -p '{"spec": {"ordinals": {"start": 5}, "replicas": 1}}' | ||
| ``` | ||
|
|
||
| 8. Check the replication status in the destination cluster: | ||
|
|
||
| ``` | ||
| kubectl exec -it redis-redis-cluster-5 -- /bin/bash -c \ | ||
| "redis-cli -c -h redis-redis-cluster -a $(kubectl get secret redis-redis-cluster -o jsonpath="{.data.redis-password}" | base64 -d) CLUSTER NODES;" | ||
| ``` | ||
|
|
||
| I should see that the new replica (labeled `myself`) has joined the Redis | ||
| cluster (the IP address belongs to a different CIDR block than the | ||
| replicas in the source cluster). | ||
|
|
||
| ``` | ||
| 2cff613d763b22c180cd40668da8e452edef3fc8 10.104.0.17:6379@16379 master - 0 1669766684000 2 connected 5461-10922 | ||
| 7136e37d8864db983f334b85d2b094be47c830e5 10.108.0.22:6379@16379 myself,slave 2cff613d763b22c180cd40668da8e452edef3fc8 0 1669766685609 2 connected | ||
| 2ce30362c188aabc06f3eee5d92892d95b1da5c3 10.104.0.14:6379@16379 master - 0 1669766684000 3 connected 10923-16383 | ||
| 961f35e37c4eea507cfe12f96e3bfd694b9c21d4 10.104.0.18:6379@16379 slave a8765caed08f3e185cef22bd09edf409dc2bcc61 0 1669766683600 1 connected | ||
| a8765caed08f3e185cef22bd09edf409dc2bcc61 10.104.0.19:6379@16379 master - 0 1669766685000 1 connected 0-5460 | ||
| 7743661f60b6b17b5c71d083260419588b4f2451 10.104.0.16:6379@16379 slave 2ce30362c188aabc06f3eee5d92892d95b1da5c3 0 1669766686613 3 connected | ||
| ``` | ||
|
|
||
| 9. Repeat steps #5 to #7 for the remainder of the replicas, until the | ||
| Redis StatefulSet in the source cluster is scaled to 0, and the Redis | ||
| StatefulSet in the destination cluster is healthy with 6 total replicas. | ||
|
|
||
| ## What's Next? | ||
|
|
||
| This feature provides a building block for a StatefulSet to be split up across | ||
| clusters, but does not prescribe the mechanism as to how the StatefulSet should | ||
| be migrated. Migration requires coordination of StatefulSet replicas, along with | ||
| orchestration of the storage and network layer. This is dependent on the storage | ||
| and connectivity requirements of the application installed by the StatefulSet. | ||
| Additionally, many StatefulSets are managed by | ||
| [operators](/docs/concepts/extend-kubernetes/operator/), which adds another | ||
| layer of complexity to migration. | ||
|
|
||
| If you're interested in building enhancements to make these processes easier, | ||
| get involved with | ||
| [SIG Multicluster](https://github.com/kubernetes/community/blob/master/sig-multicluster) | ||
| to contribute! | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.