[OCPCLOUD-1248] Add initial iteration of ControlPlaneMachineSet operator enhancement #1008

JoelSpeed · 2022-01-17T12:00:28Z

This adds the initial version of the design for the ControlPlaneSet CRD and operator to pair with it.

The aim here is to allow automated vertical scaling of Control Plane Machines within OpenShift HA clusters.
We have designed a protection mechanism (#943) which handles most of the safety aspects of the system separately.
This enhancement is purely about the management of the Machines themselves, how they will be updated when they need to and how users can interact with the new CRD.

jewzaam · 2022-01-17T18:20:03Z

enhancements/machine-api/control-plane-set.md

+## Motivation
+
+As OpenShift adoption increases and our managed services offerings become more popular, the manual effort required to
+scale Control Plane Machines as clusters grow or shrink is becoming a significant drain on the OpenShift Dedicated team.


Suggested change

scale Control Plane Machines as clusters grow or shrink is becoming a significant drain on the OpenShift Dedicated team.

scale Control Plane Machines as clusters grow or shrink is becoming a significant drain on the SRE Platform team managing OpenShift Dedicated (OSD), Red Hat OpenShift on AWS (ROSA), and Azure Red Hat OpenShift (ARO).

jewzaam · 2022-01-17T18:22:58Z

enhancements/machine-api/control-plane-set.md

+### Goals
+
+* Provide a "best practices" approach to declarative management of Control Plane Machines
+* Allow users to "1-click" scale their Control Plane to large or smaller instances


Scaling down is a risky change. Is that really a goal? It either requires an acceptance of risk on the part of the cluster administrator or a lot more checks on the OCP side to ensure the cluster will not become unstable. Has this specifically been requested from SREP / customers?

As I understood it from initial requirements gather calls, yes. We need to scale up the control plane to bigger instances when the capacity is being stretched and be able to scale down to smaller instances when it is being under-utilized.

Having any kind of code check that the user is scaling down the instance type sounds like it will be very complex, we will have to start interpreting VM types which is not something we've done before

@jeremyeder Could you weigh in here

jewzaam · 2022-01-17T18:25:46Z

enhancements/machine-api/control-plane-set.md

+A new CRD `ControlPlaneSet` will be introduced into OpenShift and a respective `control-plane-set-operator` will be
+introduced as a new second level operator within OpenShift to perform the operations described in this proposal.
+
+The new CRD will define the specification for creating Machines for the Control Plane. The operator will be responsible


Suggested change

The new CRD will define the specification for creating Machines for the Control Plane. The operator will be responsible

The new CRD will define the specification for managing (creating, adopting, updating) Machines for the Control Plane. The operator will be responsible

enhancements/machine-api/control-plane-set.md

jewzaam · 2022-01-17T18:47:05Z

enhancements/machine-api/control-plane-set.md

+// for the Control Plane Machines.
+type ControlPlaneSetStrategyType string
+
+const (


Is there a reason why in-place upgrades are not listed? I expect this would have a much shorter down-time than a replace and mitigates concerns with new machine creation. SRE has an SOP for that on OSD here: https://github.com/openshift/ops-sop/blob/2ff3d49cf7b23353f3fc4928b367230f6dd32972/v4/howto/resize-masters.md

The Machine API by design is an immutable infrastructure product, we don't support in-place mutation of machines.

As far as I understood it, the idea of manually stopping and resizing machines is a support exception to allow a quicker replacement until we have full automation in place.

IMO, bringing up new machines, while potentially wasteful, should actually be safer in the long term as we know programatically what we are doing there, with a shut down and update, this gets more confusing if you try to program the changes

jewzaam · 2022-01-17T18:52:44Z

enhancements/machine-api/control-plane-set.md

+
+When adding a ControlPlaneSet to a new cluster, the end user will need to define the ControlPlaneSet resource by
+copying the existing Control Plane Machine ProviderSpecs.
+Once this is copied, they should remove the failure domain and add the desired failure domains to the `FailureDomains`


Thinking of the OSD/ROSA/ARO case this is a lot of changes specific to individual clusters and the logic seems fairly straight forward. Is there a possibility adoption could be part of the ControlPlaneSet creation, triggered by some specific condition such as an annotation? This would ensure the rollout of this feature across managed OpenShift is done correctly and consistently. The alternative is some bespoke logic created by SRE released in conjunction with the CPS to all managed clusters that then has to be cleaned up in a future release. Having this as a part of the CPS operator would allow the deletion and recreation of the CR (previous section) to be nearly trivial from an administrator perspective.

Is there a possibility adoption could be part of the ControlPlaneSet creation, triggered by some specific condition such as an annotation?

Could you expand on your concern here, I'm not sure exactly what you're saying. The idea is that the adoption process happens when the CPS is created, so I would expect SRE to apply a new CPS to each individual cluster (possibly in batches) if they want to take advantage of the CPS on older clusters. My recommendation though is that we focus on new clusters where this functionality will be included from the start

Having this as a part of the CPS operator would allow the deletion and recreation of the CR (previous section) to be nearly trivial from an administrator perspective.

So your suggestion is that if you create an empty CPS, the ControlPlaneSet operator should populate the spec for you? This feels like an anti-pattern for Kube and the idea of declarative APIs. I'm hesitant about us suggesting this as there's no guarantee that the existing configs are accurate, eg if the machines were on a UPI cluster (they should have been removed but maybe there were), at that point, we have introduced a spec that is invalid. If we infer it and get it wrong, that's on us, if a user specs it and gets it wrong, that's on them 🤔

We should also bear in mind the scope of this project, adding adoption into the mix and having to deal with the safety aspects of that will significantly expand the scope and extend the development time which is likely to already be tight if we want to get something running for this within 4.11

Manually managing these for the managed OCP fleet isn't an option. But understood the concern with automatically creating the spec for CPS. For my use case though it's IPI 100% of the time. I am also not suggesting we populate CPS all the time, would it be a more tenable solution if it were an opt-in?

I was expecting that the managed fleet would come up with some automation for creating the CPS, I don't think this warrants being in the core of the product though and I'm concerned about the scope creep.

We could possible have an opt-in adoption mechanism, but if we were to do that, that would have to be a stretch goal and not part of the MVP IMO, I wouldn't want to jeopardise the main project for this. I'm also not sure I'd want to expose this to all users which makes me think again, this should be something separate, what is your hesitation about running some job on the cluster to create the CPS? Is it just the clean up?

I can't speak for the use case across all OCP customers but it seems like a useful pattern to adopt via code in the operator instead of documentation and steps executed by a customer. There are a lot of clusters out in the wild, I presume management of masters is a useful capability. I know from the managed OCP perspective it's been a gap that is getting more attention lately.

So I'm not necessarily reluctant to run a job to create CPS resources, though the mechanics on how we can roll that out will make it painful. I'm asking if that's the right way to do it for OCP and our customers.

i wonder if a valid option here would be for us to have a way for the user to create a local manifest that could applied as a CPS?

for example, the user could issue an oc command, and it would return a manifest they could save and then inspect before applying. that might provide a nice middle ground where it could be automated but doesn't need to be.

for example, the user could issue an oc command, and it would return a manifest they could save and then inspect before applying. that might provide a nice middle ground where it could be automated but doesn't need to be.

We talked about this during the cluster lifecycle arch call last week. There were others with concerns about having this as an automated thing in-cluster. We automate it and break the cluster, that's on us, the customer breaks it, that's on them.

There was some support for a CLI kind of thing/script that would spit out a CPS manifest for the customer to apply if they wanted, with the caveat that they should verify it before they apply it. Though my initial stance on this was that this should be a stretch goal, certainly for the purpose of this enhancement and feature we have been focusing on the experience for new clusters, older clusters can adopt later IMO

jewzaam · 2022-01-17T18:57:13Z

enhancements/machine-api/control-plane-set.md

+#### A user deletes the ControlPlaneSet resource
+
+Users will be allowed to remove ControlPlaneSet resources should they so desire. This shouldn't pose a risk to the
+control plane as the ControlPlaneSet will orphan Machines it manages before it is removed from the cluster. More detail


What happens if the finalizer on the CPS is removed before orphaning Machines is complete? I guess the machines get deleted due to owner references when CPS is deleted? Admins, especially ones under pressure and who haven't learned not to touch finalizers, will do this.

guess the machines get deleted due to owner references when CPS is deleted?

Yep, the machines will be deleted, but they should also be protected from removal by the etcd operator's machine deletion hooks, so you'll have deleted machines, sat for a while doing nothing, an alert will fire, and the only way to then resolve would be to re-add a CPS and have it replace the machines, then remove the CPS properly.

Users should not interfere with finalizers, especially in sensitive applications like this one

jewzaam · 2022-01-17T18:58:24Z

enhancements/machine-api/control-plane-set.md

+control plane as the ControlPlaneSet will orphan Machines it manages before it is removed from the cluster. More detail
+is available in the [Removing/disabling the ControlPlaneSet](#Removingdisabling-the-ControlPlane) notes.
+
+#### The ControlPlaneSet spec may not match the existing Machines


If adoption of existing machines is managed by this operator the risk is mitigated a good bit, at least in the case where master machines are consistent with each other except in the failure domain (zone).

I don't feel that really adds much for us though, as the only safe way for the operator to do an adoption like this is when the masters are all consistent, in which case, there's no risk anyway

dhellmann · 2022-01-18T14:09:16Z

enhancements/machine-api/control-plane-set.md

+## Alternatives
+
+Previous iterations of this enhancement linked in the metadata describe alternative ways we could implement this
+feature.


Given our general desire to stop adding new things to the release payload, and the fact that this new feature seems to be opt-in, did you explore delivering this new operator via OLM instead of the payload?

I was imagining that this would eventually be (4.12) default and set up by the installer, so I hadn't investigated the idea of OLM. If we do want to make this opt-in permanently then having it delivered via OLM could be an option.

When I've been talking with the etcd team about this in the past, we had imagined they would also observe the CPS resource to understand control plane scale in the longer term as well as they don't have a good way to see how many replicas they should have at the moment

I'm going to add an open question for is this optional or not

enhancements/machine-api/control-plane-set.md

sttts · 2022-01-20T10:26:32Z

enhancements/machine-api/control-plane-set.md

+reviewers:
+  - TBD
+approvers:
+  - TBD


@soltysh @tkashem @hasbro17 and myself please.

elmiko

@JoelSpeed i think this looks nice and i appreciate the level of detail. i left a few comments/questions inline. i also want to say that i like the idea of creating this as an operator with an operand, regardless of whether we deploy as part of the release payload or as something on OLM.

containing the business logic of the control plane set activity in an operator seems like a good way for us to plan for a future where we have more cluster-api integration, and also be flexible to other structural changes like that.

lastly, as kind of a cheeky question, given the discussion of OLM and deploying this operator through the standard marketplace, is there anything that would prevent a developer from creating this type of operator and releasing it on the community operator hub for use with okd?
(with credit to @lobziik who has also mentioned something like this)

elmiko · 2022-01-20T20:40:29Z

enhancements/machine-api/control-plane-set.md

+prevent the Control Plane Machine from being removed until a replacement for the etcd member has been introduced into
+the cluster. To allow for this use case, the end user can manually remove the etcd protection (via removing the Machine
+deletion hook on the deleted Machine) allowing the rollout to proceed, the etcd operator will not re-add the
+protection if the Machine is already marked for deletion.


will the operator be able to detect that these manual intervention steps have been performed?

i'm curious how a user might discover that they should have done it, or what might happen if they forget.

(no worries if this falls under one of the open questions below)

The operator will see that the Machine is not being drained (machines have a condition for this), it can report that back in its own status, machine foo isn't being drained because etcd is blocking it.

Then, when the etcd operator pre-drain hook is removed by the end user, this will go away, the drain will occur and the Machine goes away. The ControlPlaneSet will then spot the Machine has gone, and create the replacement.

elmiko · 2022-01-20T20:43:57Z

enhancements/machine-api/control-plane-set.md

+When a change is required, any logic for rolling out the Machine will be paused until the ControlPlaneSet controller
+observes a deleted Machine.
+When a Machine is marked for deletion, it will create a new Machine based on the current spec.
+It will then proceed with the replacement as normal.


is there any option here for the user to specify that it should wait until the machine is actually deleted from the infrastructure before proceeding?

i'm thinking about resource constrained environments, and this sounds like it could be similar in some respects to the Recreate strategy.

A Machine should never be in the cluster if it isn't gone from the provider, so that's how we would know.

For resource constrained environments we have the Recreate strategy, this strategy is just for trialing new configs or slowing down the rollout by manually choosing when each control plane member is replaced

elmiko · 2022-01-20T20:46:18Z

enhancements/machine-api/control-plane-set.md

+Notably it will need to ensure that there are no owner references on the Machines pointing to the ControlPlaneSet
+instances. This will prevent the garbage collector from removing the Control Plane Machines when the ControlPlaneSet is
+deleted.
+


if we make the ControlPlaneSet a standard component would this capability go away? (eg user cannot delete ControlPlaneSet)

I don't think so, I think we will still want to allow users to opt-out of this feature (especially when regarding the composable openshift concept)

elmiko · 2022-01-20T20:49:47Z

enhancements/machine-api/control-plane-set.md

+
+When adding a ControlPlaneSet to a new cluster, the end user will need to define the ControlPlaneSet resource by
+copying the existing Control Plane Machine ProviderSpecs.
+Once this is copied, they should remove the failure domain and add the desired failure domains to the `FailureDomains`


i wonder if a valid option here would be for us to have a way for the user to create a local manifest that could applied as a CPS?

for example, the user could issue an oc command, and it would return a manifest they could save and then inspect before applying. that might provide a nice middle ground where it could be automated but doesn't need to be.

sttts · 2022-01-21T08:19:06Z

enhancements/machine-api/control-plane-set.md

+
+## Proposal
+
+A new CRD `ControlPlaneSet` will be introduced into OpenShift and a respective `control-plane-set-operator` will be


technically we have one control plane, not a set. So technically ControlPlaneMachineSet is the right term

Yes, you're are correct. Naming is not something I'm 100% sold on. I originally didn't include Machine in the name because I was trying to not make it sound like a MachineSet (which is akin to a replicaset), I want users to think of this more as a statefulset. I was also wondering how it would look in docs, IIRC they write the names of resources out, so you could see worker machine set as an item in today docs, and this resource would be a control plane machine set, is that confusing?

I'm not adverse to calling this a ControlPlaneMachineSet though. Perhaps bringing docs into the naming conversation might yield some interesting discussion. @jeana-redhat Do you have any thoughts about what would be clearer to an end user in terms of what we want to name this CRD and operator? Currently it's ControlPlaneSet, but Stefan has correctly pointed out, it would be more accurate as ControlPlaneMachineSet

I am definitely concerned that a user reading about "machine sets" would see a "control plane machine set" as a subset or type of machine set. If there is a more accurate term that also can't cause accidental conflation of terms, that would be ideal.

NB: this would still be a point of possible confusion even if we are using resource names. ControlPlaneMachineSet also sounds like a type or subset of MachineSet.

i agree with @sttts reasoning about this being a ControlPlaneMachineSet, but i also agree about the naming confusion with respects to using MachineSet in the name.

we chatted about this during our team standup and i'm curious if we could go in a divergent direction by calling this a ControlPlaneInfrastructureSet? this would make it clearer about what the resource is defining and also leave us room for expansion in the future as we consider different types of infrastructures and APIs that openshift will use.

i don't have a strong opinion about the name, but i like the idea of making this more clear to users.

sttts · 2022-01-21T08:20:00Z

enhancements/machine-api/control-plane-set.md

+    // +kubebuilder:validation:Enum:=3;5
+    // +kubebuilder:default:=3
+    // +kubebuilder:validation:Required
+    Replicas *int32 `json:"replicas"`


why a pointer?

I was under the impression that it's pretty normal for replicas to be a pointer within kube, it allows distinction between an unset value and an explicit 0, both of which are valid in the case of a deployment or statefulset.

Perhaps in this case 0 replicas doesn't make sense though? In that case we maybe don't need the pointer, though I'm still tempted to keep it to be inline with all of the other resources with a scale subresource that I've seen

sttts · 2022-01-21T08:20:15Z

enhancements/machine-api/control-plane-set.md

+    // Strategy defines how the ControlPlaneSet will update
+    // Machines when it detects a change to the ProviderSpec.
+    // +optional
+    Strategy ControlPlaneSetStrategy `json:"strategy,omitempty"`


needs a default

I've added this, could you validate that the format is correct, a default tag for a struct is new to me

sttts · 2022-01-21T08:21:32Z

enhancements/machine-api/control-plane-set.md

+    // Template describes the Control Plane Machines that will be created
+    // by this ControlPlaneSet.
+    // +kubebuilder:validation:Required
+    Template ControlPlaneSetTemplate `json:"template"`


x-kubernetes-embedded-resource, compare https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#rawextension

I don't think this is meant to be an embedded resource, it doesn't have a TypeMeta.

The ProviderSpec is an embedded resources however. Though this has just prompted me to realise that we aren't following the spec.template.spec pattern that we have elsewhere, I think the providerSpec needs to be replaced with a reference to https://github.com/openshift/api/blob/d74727069f6fb9193d6ba50ed7e3a5edb2477bec/machine/v1beta1/types_machine.go#L163-L200

sttts · 2022-01-21T08:24:23Z

enhancements/machine-api/control-plane-set.md

+It will take the desired configuration, the `ProviderSpec`, and ensure that an appropriate number of Machines exist
+within the cluster which match this specification.
+
+It will be introduced as a new second level operator so that, if there are issues with its operation, it may report


does this mean it will have a distinct release cycle to OCP ? Is that feasible?

It was my understanding that a second level operator was the term used to describe the group of operators deployed by CVO, perhaps that's not the correct wording?

I am fully intending for this to be an in-payload project deployed around the same run level as Machine API. Though, due to the impact this controller might have, I was planning to have it as a separate ClusterOperator so that it has a distinct status it can report for cluster health, rather than trying to shoehorn that into the Machine API one

We had some discussion about this being an OLM deployed operator during the Cluster Lifecycle architecture call on Thursday 20th, though I'm still leaning towards this being in payload.

Note: I've clarified this with staff engineers, a second level operator is something deployed by CVO, sometimes "CVO operator" or "OLM operator" are used, I could update the wording to clarify this

sttts · 2022-01-21T08:29:29Z

enhancements/machine-api/control-plane-set.md

+    // appropriate location for the particular provider.
+    // +kubebuilder:validation:MinLength:=1
+    // +kubebuilder:validation:Required
+    FailureDomains []string `json:"failureDomains"`


can this mean that users put regions here? Looks like it encourages that. Cross-region etcd is not supported.

Ahh yes, I see what you mean, the wording on this isn't the best. The intention is only for availability zones to be put here. The issue we face is that they are called different names across different cloud providers. IIRC the failure domain label on a node is the AZ typically, so the nomenclature has come from that. I will try to clarify the description.

sttts · 2022-01-21T08:30:10Z

enhancements/machine-api/control-plane-set.md

+As we are not planning to tackle horizontal scaling of the control plane initially, we will implement a validating
+webhook to deny changes to this value once set.
+
+For new clusters, the installer will create (timeline TBD) the ControlPlaneSet resource and will set the value based on


I thought this should be a second-level operator. How can the installer then create a CR ?

A second level operator is deployed by CVO so we can have the installer create a CR, as for example it does with Machine API. See my other comment for more detail

sttts · 2022-01-21T08:32:16Z

enhancements/machine-api/control-plane-set.md

+The RollingUpdate strategy will be the default. It mirrors the RollingUpdate strategy familiar to users from
+Deployments. When a change is required to be rolled out, the Machine controller will create a new Control Plane
+Machine, wait for this to become ready, and then remove the old Machine. It will repeat this process until all Machines
+are up to date.


Why does this even matter? With the other enhancement about etcd machine hooks, wouldn't there are always be some kind of rolling update? Or would the machine api controller create the machine anyway and just the etcd membership would be "rolling"? Maybe reference the etcd machine hook enhancement here.

I can add a reference for sure

I think the point I was trying to make is that for example, the instance size would be a rolling update as we create and delete machines to migrate over. You're right though, it isn't massively an issue, you could create 3 new machines and then delete all three old ones and etcd's protection mechanism should handle it. I think in previous discussions though we concluded it would be best to only surge a single instance if we can help it, to reduce the likelihood of etcd having issues with balancing the etcd members across zones (the zonal information is under MAPI control not etcd)

sttts · 2022-01-21T08:32:52Z

enhancements/machine-api/control-plane-set.md

+
+The Recreate strategy mirrors the Recreate strategy familiar to users from Deployments.
+When a change is required to be rolled out, the Machine controller will first remove a Control Plane Machine, wait for
+its removal, and then create a new Control Plane Machine.


this won't work with etcd machine hook, will it?

ack, you describe it two paragraph further down 👍

From a discussion in the architecture call last Thursday, I think we will make this a future goal. It's not required for our current use cases with customers so we can describe it here, but leave it out of the 4.11 plan.

This will give us time to figure out the kinks and how we handle the etcd protection question before we actually implement this part of the overall feature

sttts · 2022-01-21T08:35:14Z

enhancements/machine-api/control-plane-set.md

+- What alternative can we provide to satisfy resource constrained environments if we do not implement this strategy?
+- Should we teach the etcd operator to remove the protection mechanism when the ControlPlaneSet is configured with this
+  strategy?
+- To minimise risk, should we allow this strategy only on certain platforms? (Eg disallow the strategy on public clouds)


an idea: allow it, even automate the whole mechanism, but let the user acknowledge it with some very clear value to set: RecreateDangerUnsupportedYouWillLoseYourDataPotentiallyDontWineDontComplain.

We totally could rename the strategy to make it more obvious that it's dangerous, I'm not against that, but for now I might park this (add another open question) as we are planning on skipping this strategy in the first iteration now

sttts · 2022-01-21T08:36:37Z

enhancements/machine-api/control-plane-set.md

+
+As some users may want to remove or disable the ControlPlaneSet, and we have no bulletproof way to prevent a user
+deleting the ControlPlaneSet, a finalizer will be placed on the ControlPlaneSet to allow the controller to ensure a
+safe removal of the ControlPlaneSet, while leaving the Control Plane Machines in place.


you have an admission webhook. It can prevent deletion.

You're right, we could set the failure policy to block deletion requests when the admission webhook is down, though I don't think we want to block deletion altogether else we block users from opting out of this feature

I've removed the note about bulletproof as that was misleading

sttts · 2022-01-21T08:39:21Z

enhancements/machine-api/control-plane-set.md

+
+1. How do we wish to introduce this feature into the cluster, should it be a Technical Preview in the first instance?
+2. Do we want this to be an optional add-on (delivered via OLM) or do we want this to be a default operator for future
+   clusters?


Is a distinct life-cycle from MCO and etcd operator even feasible?

Productizing things via cPaaS is still super complex. Would rather like to see the development time spent into making this mechanism rock-solid than wasting it in wrongly understood modularity of the platform.

I will add a section talking about why this should be in payload based off some discussion from an arch call last week, I agree, I don't think this should be OLM based personally, though there are some who disagree, so I will write up the reasons/my argument for it being in payload to see if I can convince them otherwise

sttts · 2022-01-21T08:41:37Z

enhancements/machine-api/control-plane-set.md

+  - We expect these operations to be very infrequent within a cluster once established, so the impact should be
+    minimal
+  - The failure policy will be to fail closed, as such, when the webhook is down, no update operations will succeed
+  - The Kube API server operator already detects failing webhooks and as such will identify the issue early


Add: the ControlPlaneSet is not critical for cluster operator. Everything done by the operator described here can be done manually to the machine(set) objects.

Good suggestion thanks

JoelSpeed

I've pushed some updates and some comments on each of the existing pieces of feedback that have been added so far, thanks for the reviews.

I have a todo list to get to this week, it includes:

Failure Domains: These need to be explained more and discuss how this works across different platforms
How does this fit in with Cluster API
Clarifying that this will be installed by default and the reasons for doing so
Making the Recreate strategy a future feature and exclude it from the initial scope
Add some notes about testing
Anything I've missed from this list that was called out during the cluster lifecycle call last week

JoelSpeed · 2022-01-24T14:53:51Z

enhancements/machine-api/control-plane-set.md

+
+## Proposal
+
+A new CRD `ControlPlaneSet` will be introduced into OpenShift and a respective `control-plane-set-operator` will be


Yes, you're are correct. Naming is not something I'm 100% sold on. I originally didn't include Machine in the name because I was trying to not make it sound like a MachineSet (which is akin to a replicaset), I want users to think of this more as a statefulset. I was also wondering how it would look in docs, IIRC they write the names of resources out, so you could see worker machine set as an item in today docs, and this resource would be a control plane machine set, is that confusing?

I'm not adverse to calling this a ControlPlaneMachineSet though. Perhaps bringing docs into the naming conversation might yield some interesting discussion. @jeana-redhat Do you have any thoughts about what would be clearer to an end user in terms of what we want to name this CRD and operator? Currently it's ControlPlaneSet, but Stefan has correctly pointed out, it would be more accurate as ControlPlaneMachineSet

JoelSpeed · 2022-01-24T14:58:06Z

enhancements/machine-api/control-plane-set.md

+    // +kubebuilder:validation:Enum:=3;5
+    // +kubebuilder:default:=3
+    // +kubebuilder:validation:Required
+    Replicas *int32 `json:"replicas"`


I was under the impression that it's pretty normal for replicas to be a pointer within kube, it allows distinction between an unset value and an explicit 0, both of which are valid in the case of a deployment or statefulset.

Perhaps in this case 0 replicas doesn't make sense though? In that case we maybe don't need the pointer, though I'm still tempted to keep it to be inline with all of the other resources with a scale subresource that I've seen

JoelSpeed · 2022-01-24T15:01:19Z

enhancements/machine-api/control-plane-set.md

+    // Strategy defines how the ControlPlaneSet will update
+    // Machines when it detects a change to the ProviderSpec.
+    // +optional
+    Strategy ControlPlaneSetStrategy `json:"strategy,omitempty"`


I've added this, could you validate that the format is correct, a default tag for a struct is new to me

JoelSpeed · 2022-01-24T15:02:48Z

enhancements/machine-api/control-plane-set.md

+    // appropriate location for the particular provider.
+    // +kubebuilder:validation:MinLength:=1
+    // +kubebuilder:validation:Required
+    FailureDomains []string `json:"failureDomains"`


Ahh yes, I see what you mean, the wording on this isn't the best. The intention is only for availability zones to be put here. The issue we face is that they are called different names across different cloud providers. IIRC the failure domain label on a node is the AZ typically, so the nomenclature has come from that. I will try to clarify the description.

JoelSpeed · 2022-01-24T15:08:13Z

enhancements/machine-api/control-plane-set.md

+    // Template describes the Control Plane Machines that will be created
+    // by this ControlPlaneSet.
+    // +kubebuilder:validation:Required
+    Template ControlPlaneSetTemplate `json:"template"`


I don't think this is meant to be an embedded resource, it doesn't have a TypeMeta.

The ProviderSpec is an embedded resources however. Though this has just prompted me to realise that we aren't following the spec.template.spec pattern that we have elsewhere, I think the providerSpec needs to be replaced with a reference to https://github.com/openshift/api/blob/d74727069f6fb9193d6ba50ed7e3a5edb2477bec/machine/v1beta1/types_machine.go#L163-L200

JoelSpeed · 2022-01-24T15:33:29Z

enhancements/machine-api/control-plane-set.md

+
+As some users may want to remove or disable the ControlPlaneSet, and we have no bulletproof way to prevent a user
+deleting the ControlPlaneSet, a finalizer will be placed on the ControlPlaneSet to allow the controller to ensure a
+safe removal of the ControlPlaneSet, while leaving the Control Plane Machines in place.


You're right, we could set the failure policy to block deletion requests when the admission webhook is down, though I don't think we want to block deletion altogether else we block users from opting out of this feature

I've removed the note about bulletproof as that was misleading

JoelSpeed · 2022-01-24T15:35:30Z

enhancements/machine-api/control-plane-set.md

+Notably it will need to ensure that there are no owner references on the Machines pointing to the ControlPlaneSet
+instances. This will prevent the garbage collector from removing the Control Plane Machines when the ControlPlaneSet is
+deleted.
+


I don't think so, I think we will still want to allow users to opt-out of this feature (especially when regarding the composable openshift concept)

JoelSpeed · 2022-01-24T15:38:21Z

enhancements/machine-api/control-plane-set.md

+
+When adding a ControlPlaneSet to a new cluster, the end user will need to define the ControlPlaneSet resource by
+copying the existing Control Plane Machine ProviderSpecs.
+Once this is copied, they should remove the failure domain and add the desired failure domains to the `FailureDomains`


for example, the user could issue an oc command, and it would return a manifest they could save and then inspect before applying. that might provide a nice middle ground where it could be automated but doesn't need to be.

We talked about this during the cluster lifecycle arch call last week. There were others with concerns about having this as an automated thing in-cluster. We automate it and break the cluster, that's on us, the customer breaks it, that's on them.

There was some support for a CLI kind of thing/script that would spit out a CPS manifest for the customer to apply if they wanted, with the caveat that they should verify it before they apply it. Though my initial stance on this was that this should be a stretch goal, certainly for the purpose of this enhancement and feature we have been focusing on the experience for new clusters, older clusters can adopt later IMO

JoelSpeed · 2022-01-24T15:43:19Z

enhancements/machine-api/control-plane-set.md

+
+1. How do we wish to introduce this feature into the cluster, should it be a Technical Preview in the first instance?
+2. Do we want this to be an optional add-on (delivered via OLM) or do we want this to be a default operator for future
+   clusters?


I will add a section talking about why this should be in payload based off some discussion from an arch call last week, I agree, I don't think this should be OLM based personally, though there are some who disagree, so I will write up the reasons/my argument for it being in payload to see if I can convince them otherwise

JoelSpeed · 2022-01-24T15:43:37Z

enhancements/machine-api/control-plane-set.md

+  - We expect these operations to be very infrequent within a cluster once established, so the impact should be
+    minimal
+  - The failure policy will be to fail closed, as such, when the webhook is down, no update operations will succeed
+  - The Kube API server operator already detects failing webhooks and as such will identify the issue early


Good suggestion thanks

JoelSpeed · 2022-03-21T11:11:30Z

@staebler I've added an extra note to the API description that clarifies the subnet defaulting behaviour. I think I've resolved all of your comments now

staebler · 2022-03-24T01:47:11Z

enhancements/machine-api/control-plane-machine-set.md

+  // If no subnet reference is provided, the Machine will be created in the first
+  // subnet returned by AWS when listing subnets within the provided availability zone.


nit on spacing

Suggested change

// If no subnet reference is provided, the Machine will be created in the first

// subnet returned by AWS when listing subnets within the provided availability zone.

// If no subnet reference is provided, the Machine will be created in the first

// subnet returned by AWS when listing subnets within the provided availability zone.

enhancements/machine-api/control-plane-machine-set.md

openshift-bot · 2022-05-02T18:47:26Z

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

JoelSpeed · 2022-05-03T09:05:54Z

I sent an email out last time this went stale, there has been no further activity and no changes as we are implementing. I think we should merge this now, CC @sdodson

/remove-lifecycle stale

enxebre · 2022-05-03T10:51:40Z

Thanks for the exhaustive doc @JoelSpeed.
Just a note: I know there's been long discussion around naming, even though ControlPlane + Machine + Set makes sense technically, I found it very unfortunate that out of a universal range of possibilities for the name we choose the very only one that clashes with other existing technology that behaves differently i.e MachineSet. As a user I would find it very confusing and I would find very difficult to understand the decision. No need to extend the discussion in this PR but might want to reconsider before shipping.

soltysh · 2022-05-10T11:42:48Z

enhancements/machine-api/control-plane-machine-set.md

+  underlying hardware of my control plane and have these changes safely applied using immutable hardware concepts
+- As a cluster administrator of OpenShift, I would like to be able to control rollout of changes to my Control Plane
+  Machines such that I can test changes with a single replica before applying the change to all replicas
+- (Future work) As a cluster administrator of OpenShift with restricted hardware capacity, I would like to be able to   


That does mean the current approach is only handling scaling up w/o down? I think it's important to make it clear, and ensure that scaling down throws an unsupported error.

The current approach is handling replacement w/scale up and down, the point of this line in particular is that currently we require surging the capacity as a part of the replacement. This story in particular is referencing a recreate style replacement rather than a rolling update style replacement

That sounds like a limitation that should be explicitly expressed.

This is already expanded upon in the The Recreate update strategy (FUTURE WORK) section, but I can reword this here if you think there's a way I can make this clearer in the goals

Another note on this, this project has been designed with both cloud and local datacenter solutions in mind, though, the configuration changes we are targeting, eg resizing, don't make sense in baremetal environments. So we aren't sure whether we actually want to support this Recreate strategy ever, but we left it in for prosperity to facilitate a future conversation about this

soltysh · 2022-05-10T11:46:31Z

enhancements/machine-api/control-plane-machine-set.md

+	// More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata
+	// Labels are required to match the ControlPlaneMachineSet selector.
+	// +kubebuilder:validation:Required
+	ObjectMeta ControlPlaneMachineSetTemplateObjectMeta `json:"metadata"`


Not sure why not just use metav1.ObjectMeta and set the fields you care about?

As a rule in OpenShift APIs we avoid reusing upstream objects in our specs as they add fields to our APIs that we don't care about. For example, having the whole metav1.ObjectMeta here would mean fields like the name or creation timestamp are now a part of this API, yet they have no meaning to the controller and if a user sets them, are ignored.

It's better to have a struct with only those fields we actually care about so that the generated schema can give better feedback to the user about what is and isn't supported.

soltysh · 2022-05-10T11:48:06Z

enhancements/machine-api/control-plane-machine-set.md

+	// OpenShiftMachineV1Beta1Machine defines the template for creating Machines
+	// from the v1beta1.machine.openshift.io API group.
+	// +kubebuilder:validation:Required
+	OpenShiftMachineV1Beta1Machine *OpenShiftMachineV1Beta1MachineTemplate `json:"machines_v1beta1_machine_openshift_io,omitempty"`


*Spec should be the type, and Template is usually the field name. See https://github.com/kubernetes/api/blob/9b88471ea2b12de1a16e6c140bd1131c349f3811/apps/v1/types.go#L184 or other fields in that file.

Because this is a discriminated union, the template is a one of choice, so we needed to add an extra layer of indentation. We are otherwise compliant with the pattern from a end user /JSON structure perspective.

soltysh · 2022-05-10T11:53:16Z

enhancements/machine-api/control-plane-machine-set.md

+	// +kubebuilder:validation:Required
+	Replicas *int32 `json:"replicas"`


So, I'm a bit confused this enhancement is meant for vertical scaling, since horizontal was explicitly mentioned as non-goal, but looking through the API I can only see Replicas which is horizontal scaling, rather than machine descriptions denoting vertical scaling. What am I missing?

soltysh · 2022-05-10T11:56:46Z

enhancements/machine-api/control-plane-machine-set.md

+Deployments. When a change is required to be rolled out, the Machine controller will create a new Control Plane
+Machine, wait for this to become ready, and then remove the old Machine. It will repeat this process until all Machines
+are up to date.
+During this process the etcd membership will be protected by the mechanism described in a [separate enhancement](https://github.com/openshift/enhancements/pull/943),


This merged so probably worth fixing the link.

soltysh · 2022-05-10T12:00:34Z

enhancements/machine-api/control-plane-machine-set.md

+
+There are however a few concerns about this idea which means we are planning to not implement this (at least for the
+first iteration of the project):
+- It may encourage users to attempt to use `ControlPlaneMachineSets` with UPI clusters which are incorrectly configured


Theoretically in the manual case you have a similar problem, the question is are you planning on putting the same limitation on UPI as in the etcd enhancement?

The controller is designed in a way such that, without an existing working set of Control Plane Machines it will degrade and not take any action, we will inform the user via the status of actions they need to take to resolve the issue.

So yes, unless the user has correctly configured a set of Machines, we will not do anything, in the same way the etcd protection does.

sdodson

This all looks fine to me, I appreciate the attention to making this a smooth transition and that it's being made optional from the start.

sdodson · 2022-05-11T16:19:13Z

enhancements/machine-api/control-plane-machine-set.md

+The `FailureDomains` field is expected to be populated by the user/installer based on the topology of the Control Plane
+Machines. For example, we expect on AWS that this will contain a list of availability zones within a single region.
+Note, we are explicitly not expecting users to add different regions to the the `FailureDomains` as we do not support
+running OpenShift across multiple regions.


@patrickdillon Do we provision all the necessary infrastructure in all AZs within a region or will there need to be some inspection of AZs to determine which are fully capable of supporting a control plane node?

Reading further, I guess perhaps we're just trusting the admin or installer to provide valid input?

If the user doesn't provide valid input, then the new Machine will fail to come up and the operator will go degraded, prompting them to resolve the issue, so I think we are ok with this.

That said, AFAIK, we do provision across failure domains, eg on AWS I know when I create a US-East cluster, the installer creates 6 machinesets across all 6 of the failure domains, so maybe it does create the infrastructure

enhancements/machine-api/control-plane-machine-set.md

openshift-ci · 2022-05-12T13:28:53Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sdodson

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [sdodson]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sdodson · 2022-05-12T13:30:00Z

/lgtm
/hold
I don't believe anyone else has pending feedback feel free to remove the hold whenever you feel everyone's required input has been addressed.

JoelSpeed · 2022-05-13T11:22:30Z

/hold cancel

I have responded to and addressed all of the comment through the various rounds of reviews this enhancement has had. I've had reviews from Machine API, Installer, Workloads, API/Auth, Service Delivery and docs. I have also sent several emails to the relevant parties to ensure they have had adequate time to review, stating my intention to merge if no further comments came through. The rate of comments has slowed to a point where I think it's better to merge this and resolve any follow ups in a separate PR.

/override ci/prow/makdownlint

New requirements were added for markdownlint since this was authored

openshift-ci · 2022-05-13T11:23:55Z

@JoelSpeed: /override requires a failed status context or a job name to operate on.
The following unknown contexts were given:

ci/prow/makdownlint

Only the following contexts were expected:

ci/prow/markdownlint
pull-ci-openshift-enhancements-master-markdownlint
tide

Details

In response to this:

/hold cancel

I have responded to and addressed all of the comment through the various rounds of reviews this enhancement has had. I've had reviews from Machine API, Installer, Workloads, API/Auth, Service Delivery and docs. I have also sent several emails to the relevant parties to ensure they have had adequate time to review, stating my intention to merge if no further comments came through. The rate of comments has slowed to a point where I think it's better to merge this and resolve any follow ups in a separate PR.

/override ci/prow/makdownlint

New requirements were added for markdownlint since this was authored

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

JoelSpeed · 2022-05-13T11:24:24Z

/override ci/prow/markdownlint

openshift-ci · 2022-05-13T11:24:46Z

@JoelSpeed: /override requires a failed status context or a job name to operate on.
The following unknown contexts were given:

ci/prow/makdownlint

Only the following contexts were expected:

ci/prow/markdownlint
pull-ci-openshift-enhancements-master-markdownlint
tide

Details

In response to this:

/override ci/prow/makdownlint

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci · 2022-05-13T11:24:56Z

@JoelSpeed: Overrode contexts on behalf of JoelSpeed: ci/prow/markdownlint

Details

In response to this:

/override ci/prow/markdownlint

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci · 2022-05-13T11:24:58Z

@JoelSpeed: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Add initial iteration of ControlPlaneSet operator enhancement

98676e3

openshift-ci bot requested review from dgoodwin and jcantrill January 17, 2022 12:01

JoelSpeed added 2 commits January 17, 2022 12:14

Fix indentation on lists

22291ce

Add note about adoption of existing Machines to risks

2593b5f

jewzaam reviewed Jan 17, 2022

View reviewed changes

dhellmann reviewed Jan 18, 2022

View reviewed changes

alexander-demicev reviewed Jan 18, 2022

View reviewed changes

enhancements/machine-api/control-plane-set.md Outdated Show resolved Hide resolved

enxebre reviewed Jan 19, 2022

View reviewed changes

enhancements/machine-api/control-plane-set.md Outdated Show resolved Hide resolved

enxebre reviewed Jan 19, 2022

View reviewed changes

enhancements/machine-api/control-plane-set.md Outdated Show resolved Hide resolved

enxebre reviewed Jan 19, 2022

View reviewed changes

enhancements/machine-api/control-plane-set.md Outdated Show resolved Hide resolved

JoelSpeed mentioned this pull request Jan 19, 2022

Add enhancement for AWS Placement Groups in Machine API #995

Merged

Address first round of review feedback

c83537d

sttts reviewed Jan 20, 2022

View reviewed changes

elmiko reviewed Jan 20, 2022

View reviewed changes

sttts reviewed Jan 21, 2022

View reviewed changes

Address second round of review feedback

651232c

JoelSpeed commented Jan 24, 2022

View reviewed changes

Add note about defaulting behaviour for subnets

9e09de8

staebler reviewed Mar 24, 2022

View reviewed changes

Fix indentation

0dccff9

dofinn reviewed Apr 1, 2022

View reviewed changes

enhancements/machine-api/control-plane-machine-set.md Show resolved Hide resolved

Add major reviewers/approvers to enhancement meta

dd4127a

sunzhaohua2 mentioned this pull request Apr 24, 2022

BZ2053605: Add instructions on how to scale up AWS nodes openshift/openshift-docs#44431

Closed

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 2, 2022

openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 3, 2022

soltysh reviewed May 10, 2022

View reviewed changes

Update links to the etcd protection enhancement

584a14b

sdodson approved these changes May 12, 2022

View reviewed changes

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 12, 2022

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 12, 2022

openshift-ci bot assigned sdodson May 12, 2022

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 12, 2022

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 13, 2022

openshift-merge-robot merged commit 5198f3a into openshift:master May 13, 2022

JoelSpeed deleted the control-plane-set branch May 13, 2022 11:27

	scale Control Plane Machines as clusters grow or shrink is becoming a significant drain on the OpenShift Dedicated team.
	scale Control Plane Machines as clusters grow or shrink is becoming a significant drain on the SRE Platform team managing OpenShift Dedicated (OSD), Red Hat OpenShift on AWS (ROSA), and Azure Red Hat OpenShift (ARO).

	The new CRD will define the specification for creating Machines for the Control Plane. The operator will be responsible
	The new CRD will define the specification for managing (creating, adopting, updating) Machines for the Control Plane. The operator will be responsible


		## Proposal

		A new CRD `ControlPlaneSet` will be introduced into OpenShift and a respective `control-plane-set-operator` will be

		// If no subnet reference is provided, the Machine will be created in the first
		// subnet returned by AWS when listing subnets within the provided availability zone.

		// +kubebuilder:validation:Required
		Replicas *int32 `json:"replicas"`

[OCPCLOUD-1248] Add initial iteration of ControlPlaneMachineSet operator enhancement #1008

[OCPCLOUD-1248] Add initial iteration of ControlPlaneMachineSet operator enhancement #1008

Uh oh!

Conversation

JoelSpeed commented Jan 17, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JoelSpeed Jan 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elmiko left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeana-redhat Jan 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JoelSpeed Jan 19, 2022 •

edited

Loading

jeana-redhat Jan 24, 2022 •

edited

Loading

elmiko Jan 24, 2022 •

edited

Loading