Skip to content

Commit 250c4bb

Browse files
committed
Feature blog for StatefulSet Autodelete beta graduation
1 parent 55efb17 commit 250c4bb

File tree

1 file changed

+102
-0
lines changed

1 file changed

+102
-0
lines changed
Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
---
2+
layout: blog
3+
title: 'Kubernetes 1.28: StatefulSet PVC Auto-Deletion (beta)'
4+
date: XXX
5+
slug: kubernetes-1-28-statefulset-pvc-auto-deletion-beta
6+
---
7+
8+
**Author:** Matthew Cary (Google)
9+
10+
Kubernetes v1.28 graduates to beta a new policy for
11+
[StatefulSets](/docs/concepts/workloads/controllers/statefulset/) that controls the lifetime of
12+
their [PersistentVolumeClaims](/docs/concepts/storage/persistent-volumes/) (PVCs). The new PVC
13+
retention policy lets users specify if the PVCs generated from the StatefulSet spec template should
14+
be automatically deleted or retrained when the StatefulSet is deleted or replicas in the StatefulSet
15+
are scaled down.
16+
17+
## What problem does this solve?
18+
A StatefulSet spec can include Pod and PVC templates. When a replica is first created, the
19+
Kubernetes control plane creates a PVC for that replica if one does not already exist. The behavior
20+
before the PVC retention policy was that the control plane never cleaned up the PVCs created for
21+
StatefulSets - this was left up to the cluster administrator, or to some add-on automation that
22+
you’d have to find, check suitability, and deploy. The common pattern for managing PVCs, either
23+
manually or through tools such as Helm, is that the PVCs are tracked by the tool that manages them,
24+
with explicit lifecycle. Workflows that use StatefulSets must determine on their own what PVCs are
25+
created by a StatefulSet and what their lifecycle should be.
26+
27+
Before this new feature, when a StatefulSet-managed replica disappears, either because the
28+
StatefulSet is reducing its replica count, or because its StatefulSet is deleted, the PVC and its
29+
backing volume remains and must be manually deleted. While this behavior is appropriate when the
30+
data is critical, in many cases the persistent data in these PVCs is either temporary, or can be
31+
reconstructed from another source. In those cases, PVCs and their backing volumes remaining after
32+
their StatefulSet or replicas have been deleted are not necessary, incur cost, and require manual
33+
cleanup.
34+
35+
## The new StatefulSet PVC retention policy
36+
37+
The new StatefulSet PVC retention policy is used to control if and when PVCs created from a
38+
StatefulSet’s `volumeClaimTemplate` are deleted. There are two contexts when this may occur.
39+
40+
The first context is when the StatefulSet resource is deleted (which implies that all replicas are
41+
also deleted). This is controlled by the `whenDeleted` policy. The second context, controlled by
42+
`whenScaled` is when the StatefulSet is scaled down, which removes some but not all of the replicas
43+
in a StatefulSet. In both cases the policy can either be `Retain`, where the corresponding PVCs are
44+
not touched, or `Delete`, which means that PVCs are deleted. The deletion is done with a normal
45+
[object deletion](/docs/concepts/architecture/garbage-collection/), so that, for example, all
46+
retention policies for the underlying PV are respected.
47+
48+
This policy forms a matrix with four cases. I’ll walk through and give an example for each one.
49+
50+
* **`whenDeleted` and `whenScaled` are both `Retain`.** This matches the existing behavior for
51+
StatefulSets, where no PVCs are deleted. This is also the default retention policy. It’s
52+
appropriate to use when data on StatefulSet volumes may be irreplaceable and should only be
53+
deleted manually.
54+
55+
* **`whenDeleted` is `Delete` and `whenScaled` is `Retain`.** In this case, PVCs are deleted only when
56+
the entire StatefulSet is deleted. If the StatefulSet is scaled down, PVCs are not touched,
57+
meaning they are available to be reattached if a scale-up occurs with any data from the previous
58+
replica. This might be used for a temporary StatefulSet, such as in a CI instance or ETL
59+
pipeline, where the data on the StatefulSet is needed only during the lifetime of the
60+
StatefulSet lifetime, but while the task is running the data is not easily reconstructible. Any
61+
retained state is needed for any replicas that scale down and then up.
62+
63+
* **`whenDeleted` and `whenScaled` are both `Delete`.** PVCs are deleted immediately when their
64+
replica is no longer needed. Note this does not include when a Pod is deleted and a new version
65+
rescheduled, for example when a node is drained and Pods need to migrate elsewhere. The PVC is
66+
deleted only when the replica is no longer needed as signified by a scale-down or StatefulSet
67+
deletion. This use case is for when data does not need to live beyond the life of its
68+
replica. Perhaps the data is easily reconstructable and the cost savings of deleting unused PVCs
69+
is more important than quick scale-up, or perhaps that when a new replica is created, any data
70+
from a previous replica is not usable and must be reconstructed anyway.
71+
72+
* **`whenDeleted` is `Retain` and `whenScaled` is `Delete`.** This is similar to the previous case,
73+
when there is little benefit to keeping PVCs for fast reuse during scale-up. An example of a
74+
situation where you might use this is an Elasticsearch cluster. Typically you would scale that
75+
workload up and down to match demand, whilst ensuring a minimum number of replicas (for example:
76+
3). When scaling down, data is migrated away from removed replicas and there is no benefit to
77+
retaining those PVCs. However, it can be useful to bring the entire Elasticsearch cluster down
78+
temporarily for maintenance. If you need to take the Elasticsearch system offline, you can do
79+
this by temporarily deleting the StatefulSet, and then bringing the Elasticsearch cluster back
80+
by recreating the StatefulSet. The PVCs holding the Elasticsearch data will still exist and the
81+
new replicas will automatically use them.
82+
83+
Visit the
84+
[documentation](/docs/concepts/workloads/controllers/statefulset/#persistentvolumeclaim-policies) to
85+
see all the details.
86+
87+
## What’s next?
88+
89+
Try it out! The beta Enable the `StatefulSetAutoDeletePVC` feature gate is enabled by default on
90+
cluster versions 1.28 and greater. Create a StatefulSet using the new policy, test it out and tell
91+
us what you think!
92+
93+
I'm very curious to see if this owner reference mechanism works well in practice. For example, I
94+
realized there is no mechanism in Kubernetes for knowing who set a reference, so it’s possible that
95+
the StatefulSet controller may fight with custom controllers that set their own
96+
references. Fortunately, maintaining the existing retention behavior does not involve any new owner
97+
references, so default behavior will be compatible.
98+
99+
Please tag any issues you report with the label `sig/apps` and assign them to Matthew Cary
100+
([@mattcary](https://github.com/mattcary) at GitHub).
101+
102+
Enjoy!

0 commit comments

Comments
 (0)