Skip to content

Commit 7ba997e

Browse files
committed
Add more detail to alternatives section
1 parent b7ee691 commit 7ba997e

File tree

1 file changed

+55
-11
lines changed
  • keps/sig-multicluster/3335-statefulset-slice

1 file changed

+55
-11
lines changed

keps/sig-multicluster/3335-statefulset-slice/README.md

Lines changed: 55 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -128,14 +128,14 @@ checklist items _must_ be updated for the enhancement to be released.
128128

129129
Items marked with (R) are required *prior to targeting to a milestone / release*.
130130

131-
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
131+
- [X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
132132
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
133-
- [ ] (R) Design details are appropriately documented
134-
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
133+
- [X] (R) Design details are appropriately documented
134+
- [X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
135135
- [ ] e2e Tests for all Beta API Operations (endpoints)
136136
- [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
137137
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
138-
- [ ] (R) Graduation criteria is in place
138+
- [X] (R) Graduation criteria is in place
139139
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
140140
- [ ] (R) Production readiness review completed
141141
- [ ] (R) Production readiness review approved
@@ -236,11 +236,12 @@ What is out of scope for this KEP? Listing non-goals helps to focus discussion
236236
and make progress.
237237
-->
238238

239-
* Updating a PDB to safeguard more than one StatefulSet slice
239+
* Updating a PDB to safeguard more than one StatefulSet slice
240240
* As StatefulSet slices are scaled up or down, corresponding PDBs can also be adjusted. For example, a PDB corresponding to a slice of `k` replicas could be adjusted to `MinAvailable: k-1` on scale up or down events. Providing guidance and functionality to adjust these PDBs is outside the scope of this KEP.
241-
* Orchestrating pod movement from one StatefulSet slice to another
242-
* Managing network connectivity between pods in different StatefulSet slices
243-
* Orchestrating storage lifecycle of PVCs and PVs across different StatefulSet slices
241+
* Orchestrating pod movement from one StatefulSet slice to another
242+
* Managing network connectivity between pods in different StatefulSet slices
243+
* Orchestrating storage lifecycle of PVCs and PVs across different StatefulSet slices
244+
* Referenced PV/PVCs will need to be migrated in order for a new StatefulSet to reference data that was used by an existing StatefulSet. Orchestration complexity will depend on how volumes are used (RWO with `.spec.volumeClaimTemplates` on a StatefulSet, RWX with pod `.spec.volumes`).
244245

245246
## Proposal
246247

@@ -940,9 +941,52 @@ not need to be as detailed as the proposal, but should include enough
940941
information to express the idea and why it was not acceptable.
941942
-->
942943

943-
Users can orphan pods from a StatefulSet, migrate pods across a namespace or cluster, and create a new StatefulSet to manage pods upon migration. In the case of pod eviction or failure, pods will need to be manually restarted, requiring manual intervention and constant monitoring.
944-
945-
Users can backup and restore a StatefulSet (and underlying storage) in a new namespace or cluster. Doing so requires the existing StatefulSet to be deleted, for underlying storage to be backed up and restored, resulting in downtime for the stateful application.
944+
### Alternative API changes
945+
946+
**ReverseOrderedReady**: A new PodManagementPolicy policy called
947+
`ReverseOrderedReady` could be added. This would allow a StatefulSet to be
948+
started and actuated from the highest ordinal (current default is from the
949+
lowest ordinal). For the cross-cluster migration use case, this would allow for
950+
a source StatefulSet to be scaled down and a target StatefulSet to be scaled in.
951+
The downside with this API is that pod management policy is not a mutable field.
952+
So if an orchestrator uses this behavior to scale in a StatefulSet, in a
953+
destination cluster, and then wants to revert the PodManagementPolicy back to
954+
default, the StatefulSet would need to be deleted, and re-created.
955+
956+
**KEP-3521**: [KEP-3521](https://github.com/kubernetes/enhancements/issues/3521)
957+
proposes a Pod `.spec` level API that enables a pod to be paused at the initial
958+
scheduling phase of pod lifecycle. This provides granular control of which pods
959+
should be started and running (active) and which pods shouldn't be scheduled
960+
(standby). An orchestrator can leverage control over specific pod scheduling,
961+
without making changes to the StatefulSet controller, as the StatefulSet
962+
controller is in control of creating pods.
963+
964+
If the StatefulSet controller is using OrderedReady Pod Management, pausing
965+
scheduling can result in a pod being marked as not Ready. This will prevent
966+
the StatefulSet controller from actuating updates to higher ordinal pods (eg:
967+
pod `m` will not be created if pod `n` is unhealthy, where `m` > `n`). This
968+
may increase orchestrator complexity, by requiring an orchestrator of a
969+
migration to leverage Parallel Pod Management during a migration, and then
970+
re-create a StatefulSet (using `--cascade=orphan`) to revert back to
971+
`OrderedReady` if desired.
972+
973+
Additionally, if modifying a StatefulSet template is undesired, a webhook must
974+
be introduced to mark Pods as paused when they are created. This adds a layer
975+
of complexity to an orchestrator operator, since it needs both an operator
976+
component that is capable of making changes to ApiServer, and a webhook that is
977+
reading from a consistent migration state.
978+
979+
### Alternatives without any API changes
980+
981+
**Orphan Pods**: Users can orphan pods from a StatefulSet, migrate pods across a
982+
namespace or cluster, and create a new StatefulSet to manage pods upon
983+
migration. In the case of pod eviction or failure, pods will need to be manually
984+
recreated, requiring manual intervention and constant monitoring.
985+
986+
**Backup/Restore**: Users can backup and restore a StatefulSet (and underlying
987+
storage) in a new namespace or cluster. Doing so requires the existing
988+
StatefulSet to be deleted, for underlying storage to be backed up and restored,
989+
resulting in downtime for the stateful application.
946990

947991
## Infrastructure Needed (Optional)
948992

0 commit comments

Comments
 (0)