@@ -3,13 +3,11 @@ title: wide-availability-workload-partitioning
33authors :
44 - " @eggfoobar"
55reviewers :
6- - TBD
6+ - " @jerpeter1 "
77approvers :
8- - TBD
9- api-approvers :
10- - TBD
8+ - " @jerpeter1"
119creation-date : 2022-08-03
12- last-updated : 2022-08-08
10+ last-updated : 2022-08-09
1311tracking-link :
1412 - https://issues.redhat.com/browse/CNF-5562
1513see-also :
@@ -57,6 +55,9 @@ determinism required of my applications.
5755 masters and for workers, 2 hyperthreaded cores.
5856- We want a general approach, that can be applied to all OpenShift control plane
5957 and per-node components via the PerformenceProfile
58+ - We want to be clear with customers that this enhancement is a day 0 supported
59+ feature only. We do not support turning it off after the installation is done
60+ and the feature is on.
6061
6162### Non-Goals
6263
@@ -83,10 +84,7 @@ but slightly different non-goals
8384- This enhancement assumes that the configuration of a management CPU pool is
8485 done as part of installing the cluster. It can be changed after the fact but
8586 we will need to stipulate that, that is currently not supported. The intent
86- here is for this to be supported as a day 0 feature.
87- - This enhancement describes partitioning concepts that could be expanded to be
88- used for other purposes. Use cases for partitioning workloads for other
89- purposes may be addressed by future enhancements.
87+ here is for this to be supported as a day 0 feature, only.
9088
9189## Proposal
9290
@@ -278,6 +276,26 @@ day 0 configuration and day n+1 alterations are not be supported with this
278276enhancement. Part of that messaging should involve a clear indication that this
279277should be a cluster wide feature.
280278
279+ A risk we can run into is that a customer can apply a CPU set that is too small
280+ or out of bounds can cause problems such as extremely poor performance or start
281+ up errors. Mitigation of this scenario will be to provide proper guidance and
282+ guidelines for customers who enable this enhancement. Furthermore, the
283+ performance team would need to be reached out to for more specific information
284+ around upper and lower bounds of CPU sets for running an Openshift cluster.
285+
286+ It is possible to build a cluster with the feature enabled and then add a node
287+ in a way that does not configure the workload partitions only for that node. We
288+ do not support this configuration as all nodes must have the feature turned on.
289+ However, there might be a race condition where a node is added and is in the
290+ process of being restarted with workload partitioning, during that process pods
291+ being admitted will trigger a warning. We expect the resulting error message
292+ described in [failure modes](#failure-modes) to explain the problem well enough
293+ for admins to recover.
294+
295+ A possible risk are cluster upgrades, this is the first time this enhancement
296+ will be for multi-node clusters, we need to run more tests on upgrade cycles to
297+ make sure things run as expected.
298+
281299# ## Drawbacks
282300
283301This feature contains the same drawbacks as the [Management Workload
0 commit comments