diff --git a/enhancements/machine-api/control-plane-machine-set.md b/enhancements/machine-api/control-plane-machine-set.md
new file mode 100644
index 0000000000..a59dc56653
--- /dev/null
+++ b/enhancements/machine-api/control-plane-machine-set.md
@@ -0,0 +1,1104 @@
+---
+title: control-plane-machine-set-controller
+authors:
+  - "@JoelSpeed"
+reviewers:
+  - "@jewzaam" - service delivery asks
+  - "@elmiko" - cluster infrastructure review
+  - "@enxebre" - authored previews works
+  - "@jstuever" - installer review
+  - "@staebler" - installer review
+  - "@jeana-redhat" - product docs review
+approvers:
+  - "@sttts" - impacts on control plane availability
+  - "@soltysh" - impacts on control plane availability
+  - "@tkashem" - impacts on etcd
+  - "@hasbro17" - impacts on etcd
+  - "@sdodson" - impacts on cluster lifecycle
+api-approvers:
+  - "@deads2k"
+creation-date: 2022-01-11
+last-updated: 2022-02-07
+tracking-link:
+  - TBD
+replaces:
+  - "[/enhancements/machine-api/control-plane-machinesets.md](https://github.com/openshift/enhancements/blob/master/enhancements/machine-api/control-plane-machinesets.md)"
+  - "https://github.com/openshift/enhancements/pull/278"
+  - "https://github.com/openshift/enhancements/pull/292"
+---
+
+# Control Plane Machine Set Controller
+
+## Release Signoff Checklist
+
+- [ ] Enhancement is `implementable`
+- [ ] Design details are appropriately documented from clear requirements
+- [ ] Test plan is defined
+- [ ] Operational readiness criteria is defined
+- [ ] Graduation criteria for dev preview, tech preview, GA
+- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/)
+
+## Summary
+
+To enable automated management of vertical scaling and replacement of Control Plane Machines, this proposal introduces
+a new resource and controller that will manage Control Plane Machine instances.
+
+## Motivation
+
+As OpenShift adoption increases and our managed services offerings become more popular, the manual effort required to
+scale Control Plane Machines as clusters grow or shrink is becoming a significant drain on the SRE Platform team
+managing OpenShift Dedicated (OSD), Red Hat OpenShift on AWS (ROSA), and Azure Red Hat OpenShift (ARO).
+
+The procedure to resize a Control Plane today is lengthy and very involved. It takes a significant amount of time for
+an OpenShift expert to perform. We also document this process for our end users, however due to the complexity of the
+procedure, there is discomfort in recommending this procedure to end users who may not be as familiar with the product.
+
+To ensure the long term usability of OpenShift, we must provide an automated way for users to scale their Control Plane
+as and when their cluster needs additional capacity.
+
+As HyperShift adoptions grows, HyperShift will solve the same issue by running hosted Control Planes within management
+clusters. However, HyperShift is not a suitable product for all OpenShift requirements (for example the first management
+cluster) and as such, we see that HA clusters will continue to be used and we must solve this problem to allow the
+continued adoption of HA clusters.
+
+### Goals
+
+* Provide a "best practices" approach to declarative management of Control Plane Machines
+* Allow users to "1-click" scale their Control Plane to large or smaller instances
+* Allow the adoption of existing Control Planes into this new mechanism
+* Provide a safe mechanism to perform the sensitive operation of scaling the Control Plane and provide adequate
+  feedback to end users about progress of the operation
+* Allow users to opt-out of Control Plane management should our mechanism not fit their needs
+* Allow users to customise rolling update strategies based on the needs of their environments
+
+### Non-Goals
+
+* Allow horizontal scaling of the Control Plane (this may be required in the future, but today is considered
+  unsupported by OpenShift)
+* Management of the etcd cluster state (the etcd operator will handle this separately)
+* Automatic adoption of existing clusters (we will provide instructions to allow users to opt-in for existing cluster)
+* Management of anything that falls out of the scope of Machine API (e.g. management of load balancers in front of
+  Control Plane Machine - only load balancer membership is managed by Machine API)
+
+## Proposal
+
+A new CRD `ControlPlaneMachineSet` will be introduced into OpenShift and a respective `control-plane-machine-set-operator`
+will be introduced as a new second level operator within OpenShift to perform the operations described in
+this proposal.
+
+The new CRD will define the specification for managing (creating, adopting, updating) Machines for the Control Plane.
+The operator will be responsible for ensuring the desired number of Control Plane Machines are present within the
+cluster as well as providing update mechanisms akin to those seen in a StatefulSets and Deployments to allow rollouts
+of updated configuration to the Control Plane Machines.
+
+### User Stories
+
+- As a cluster administrator of OpenShift, I would like to be able to safely and automatically vertically resize my
+  control plane as and when the demand on the control plane changes
+- As a cluster administrator of OpenShift, I would like to be able to automatically recover failed Control Plane
+  Machines (eg those that have been removed by the cloud provider, or those failing MachineHealthChecks)
+- As a cluster administrator of OpenShift, I would like to be able to make changes to the configuration of the
+  underlying hardware of my control plane and have these changes safely applied using immutable hardware concepts
+- As a cluster administrator of OpenShift, I would like to be able to control rollout of changes to my Control Plane
+  Machines such that I can test changes with a single replica before applying the change to all replicas
+- (Future work) As a cluster administrator of OpenShift with restricted hardware capacity, I would like to be able to   
+  scale down my control plane before adding new Control Plane Machines with newer configuration, notably, my
+  environment does not have capacity to add additional Machines during updates
+
+### API Extensions
+
+We will introduce a new `ControlPlaneMachineSet` CRD to the `machine.openshift.io/v1` API group. It will be based
+on the spec and status structures defined below.
+
+```go
+// ControlPlaneMachineSet represents the configuration of the ControlPlaneMachineSet.
+type ControlPlaneMachineSetSpec struct {
+	// Replicas defines how many Control Plane Machines should be
+	// created by this ControlPlaneMachineSet.
+	// This field is immutable and cannot be changed after cluster
+	// installation.
+	// The ControlPlaneMachineSet only operates with 3 or 5 node control planes,
+	// 3 and 5 are the only valid values for this field.
+	// +kubebuilder:validation:Enum:=3;5
+	// +kubebuilder:default:=3
+	// +kubebuilder:validation:Required
+	Replicas *int32 `json:"replicas"`
+
+	// Strategy defines how the ControlPlaneMachineSet will update
+	// Machines when it detects a change to the ProviderSpec.
+	// +kubebuilder:default:={type: RollingUpdate}
+	// +optional
+	Strategy ControlPlaneMachineSetStrategy `json:"strategy,omitempty"`
+
+	// Label selector for Machines. Existing Machines selected by this
+	// selector will be the ones affected by this ControlPlaneMachineSet.
+	// It must match the template's labels.
+	// This field is considered immutable after creation of the resource.
+	// +kubebuilder:validation:Required
+	Selector metav1.LabelSelector `json:"selector"`
+
+	// Template describes the Control Plane Machines that will be created
+	// by this ControlPlaneMachineSet.
+	// +kubebuilder:validation:Required
+	Template ControlPlaneMachineSetTemplate `json:"template"`
+}
+
+// ControlPlaneMachineSetTemplate is a template used by the ControlPlaneMachineSet
+// to create the Machines that it will manage in the future.
+// +union
+// + ---
+// + This struct is a discriminated union which allows users to select the type of Machine
+// + that the ControlPlaneMachineSet should create and manage.
+// + For now, the only supported type is the OpenShift Machine API Machine, but in the future
+// + we plan to expand this to allow other Machine types such as Cluster API Machines or a
+// + future version of the Machine API Machine.
+type ControlPlaneMachineSetTemplate struct {
+  // MachineType determines the type of Machines that should be managed by the ControlPlaneMachineSet.
+	// Currently, the only valid value is machines_v1beta1_machine_openshift_io.
+	// +unionDiscriminator
+	// +kubebuilder:validation:Required
+	MachineType ControlPlaneMachineSetMachineType `json:"machineType"`
+
+	// OpenShiftMachineV1Beta1Machine defines the template for creating Machines
+	// from the v1beta1.machine.openshift.io API group.
+	// +kubebuilder:validation:Required
+	OpenShiftMachineV1Beta1Machine *OpenShiftMachineV1Beta1MachineTemplate `json:"machines_v1beta1_machine_openshift_io,omitempty"`
+}
+
+// ControlPlaneMachineSetMachineType is a enumeration of valid Machine types
+// supported by the ControlPlaneMachineSet.
+// +kubebuilder:validation:Enum:=machines_v1beta1_machine_openshift_io
+type ControlPlaneMachineSetMachineType string
+
+const (
+	// OpenShiftMachineV1Beta1MachineType is the OpenShift Machine API v1beta1 Machine type.
+	OpenShiftMachineV1Beta1MachineType ControlPlaneMachineSetMachineType = "machines_v1beta1_machine_openshift_io"
+)
+
+// OpenShiftMachineV1Beta1MachineTemplate is a template for the ControlPlaneMachineSet to create
+// Machines from the v1beta1.machine.openshift.io API group.
+type OpenShiftMachineV1Beta1MachineTemplate struct {
+	// FailureDomains is the list of failure domains (sometimes called
+	// availability zones) in which the ControlPlaneMachineSet should balance
+	// the Control Plane Machines.
+	// This will be merged into the ProviderSpec given in the template.
+	// This field is optional on platforms that do not require placement
+	// information, eg OpenStack.
+	// +optional
+	FailureDomains FailureDomains `json:"failureDomains,omitempty"`
+
+	// ObjectMeta is the standard object metadata
+	// More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata
+	// Labels are required to match the ControlPlaneMachineSet selector.
+	// +kubebuilder:validation:Required
+	ObjectMeta ControlPlaneMachineSetTemplateObjectMeta `json:"metadata"`
+
+	// Spec contains the desired configuration of the Control Plane Machines.
+	// The ProviderSpec within contains platform specific details
+	// for creating the Control Plane Machines.
+	// The ProviderSe should be complete apart from the platform specific
+	// failure domain field. This will be overriden when the Machines
+	// are created based on the FailureDomains field.
+	// +kubebuilder:validation:Required
+	Spec machinev1beta1.MachineSpec `json:"spec"`
+}
+
+// ControlPlaneMachineSetTemplateObjectMeta is a subset of the metav1.ObjectMeta struct.
+// It allows users to specify labels and annotations that will be copied onto Machines
+// created from this template.
+type ControlPlaneMachineSetTemplateObjectMeta struct {
+	// Map of string keys and values that can be used to organize and categorize
+	// (scope and select) objects. May match selectors of replication controllers
+	// and services.
+	// More info: http://kubernetes.io/docs/user-guide/labels
+	// +optional
+	Labels map[string]string `json:"labels,omitempty"`
+
+	// Annotations is an unstructured key value map stored with a resource that may be
+	// set by external tools to store and retrieve arbitrary metadata. They are not
+	// queryable and should be preserved when modifying objects.
+	// More info: http://kubernetes.io/docs/user-guide/annotations
+	// +optional
+	Annotations map[string]string `json:"annotations,omitempty"`
+}
+
+// ControlPlaneMachineSetStrategy defines the strategy for applying updates to the
+// Control Plane Machines managed by the ControlPlaneMachineSet.
+type ControlPlaneMachineSetStrategy struct {
+	// Type defines the type of update strategy that should be
+	// used when updating Machines owned by the ControlPlaneMachineSet.
+	// Valid values are "RollingUpdate" and "OnDelete".
+	// The current default value is "RollingUpdate".
+	// +kubebuilder:default:="RollingUpdate"
+	// +kubebuilder:validation:Enum:="RollingUpdate";"OnDelete"
+	// +optional
+	Type ControlPlaneMachineSetStrategyType `json:"type,omitempty"`
+
+	// This is left as a struct to allow future rolling update
+	// strategy configuration to be added later.
+}
+
+// ControlPlaneMachineSetStrategyType is an enumeration of different update strategies
+// for the Control Plane Machines.
+type ControlPlaneMachineSetStrategyType string
+
+const (
+	// RollingUpdate is the default update strategy type for a
+	// ControlPlaneMachineSet. This will cause the ControlPlaneMachineSet to
+	// first create a new Machine and wait for this to be Ready
+	// before removing the Machine chosen for replacement.
+	RollingUpdate ControlPlaneMachineSetStrategyType = "RollingUpdate"
+
+	// Recreate causes the ControlPlaneMachineSet controller to first
+	// remove a ControlPlaneMachine before creating its
+	// replacement. This allows for scenarios with limited capacity
+	// such as baremetal environments where additional capacity to
+	// perform rolling updates is not available.
+	Recreate ControlPlaneMachineSetStrategyType = "Recreate"
+
+	// OnDelete causes the ControlPlaneMachineSet to only replace a
+	// Machine once it has been marked for deletion. This strategy
+	// makes the rollout of updated specifications into a manual
+	// process. This allows users to test new configuration on
+	// a single Machine without forcing the rollout of all of their
+	// Control Plane Machines.
+	OnDelete ControlPlaneMachineSetStrategyType = "OnDelete"
+)
+
+// FailureDomain represents the different configurations required to spread Machines
+// across failure domains on different platforms.
+// +union
+type FailureDomains struct {
+	// Platform identifies the platform for which the FailureDomain represents
+	// +unionDiscriminator
+	// +optional
+	Platform configv1.PlatformType `json:"platform,omitempty"`
+
+	// AWS configures failure domain information for the AWS platform
+	// +optional
+	AWS *[]AWSFailureDomain `json:"aws,omitempty"`
+
+	// Azure configures failure domain information for the Azure platform
+	// +optional
+	Azure *[]AzureFailureDomain `json:"azure,omitempty"`
+
+	// GCP configures failure domain information for the GCP platform
+	// +optional
+	GCP *[]GCPFailureDomain `json:"gcp,omitempty"`
+
+	// OpenStack configures failure domain information for the OpenStack platform
+	// +optional
+	OpenStack *[]OpenStackFailureDomain `json:"openstack,omitempty"`
+}
+
+// AWSFailureDomain configures failure domain information for the AWS platform
+// +kubebuilder:validation:MinProperties:=1
+type AWSFailureDomain struct {
+	// Subnet is a reference to the subnet to use for this instance.
+	// If no subnet reference is provided, the Machine will be created in the first
+	// subnet returned by AWS when listing subnets within the provided availability zone.
+	// +optional
+	Subnet *AWSResourceReference `json:"subnet,omitempty"`
+
+	// Placement configures the placement information for this instance
+	// +optional
+	Placement AWSFailureDomainPlacement `json:"placement,omitempty"`
+}
+
+// AWSFailureDomainPlacement configures the placement information for the AWSFailureDomain
+type AWSFailureDomainPlacement struct {
+	// AvailabilityZone is the availability zone of the instance
+	// +kubebuilder:validation:Required
+	AvailabilityZone string `json:"availabilityZone"`
+}
+
+// AzureFailureDomain configures failure domain information for the Azure platform
+type AzureFailureDomain struct {
+	// Availability Zone for the virtual machine.
+	// If nil, the virtual machine should be deployed to no zone
+	// +kubebuilder:validation:Required
+	Zone string `json:"zone"`
+}
+
+// GCPFailureDomain configures failure domain information for the GCP platform
+type GCPFailureDomain struct {
+	// Zone is the zone in which the GCP machine provider will create the VM.
+	// +kubebuilder:validation:Required
+	Zone string `json:"zone"`
+}
+
+// OpenStackFailureDomain configures failure domain information for the OpenStack platform
+type OpenStackFailureDomain struct {
+	// The availability zone from which to launch the server.
+	// +kubebuilder:validation:Required
+	AvailabilityZone string `json:"availabilityZone"`
+}
+
+// ControlPlaneMachineSetStatus represents the status of the ControlPlaneMachineSet CRD.
+type ControlPlaneMachineSetStatus struct {
+	// Conditions represents the observations of the ControlPlaneMachineSet's current state.
+	// Known .status.conditions.type are: (TODO)
+	// TODO: Identify different condition types/reasons that will be needed.
+	// +patchMergeKey=type
+	// +patchStrategy=merge
+	// +listType=map
+	// +listMapKey=type
+	// +optional
+	Conditions []metav1.Condition `json:"conditions,omitempty"`
+
+	// ObservedGeneration is the most recent generation observed for this
+	// ControlPlaneMachineSet. It corresponds to the ControlPlaneMachineSets's generation,
+	// which is updated on mutation by the API Server.
+	// +optional
+	ObservedGeneration int64 `json:"observedGeneration,omitempty"`
+
+	// Replicas is the number of Control Plane Machines created by the
+	// ControlPlaneMachineSet controller.
+	// Note that during update operations this value may differ from the
+	// desired replica count.
+	// +optional
+	Replicas int32 `json:"replicas,omitempty"`
+
+	// ReadyReplicas is the number of Control Plane Machines created by the
+	// ControlPlaneMachineSet controller which are ready.
+	// +optional
+	ReadyReplicas int32 `json:"readyReplicas,omitempty"`
+
+	// UpdatedReplicas is the number of non-terminated Control Plane Machines
+	// created by the ControlPlaneMachineSet controller that have the desired
+	// provider spec.
+	// +optional
+	UpdatedReplicas int32 `json:"updatedReplicas,omitempty"`
+
+	// UnavailableReplicas is the number of Control Plane Machines that are
+	// still required before the ControlPlaneMachineSet reaches the desired
+	// available capacity. When this value is non-zero, the number of
+	// ReadyReplicas is less than the desired Replicas.
+	// +optional
+	UnavailableReplicas int32 `json:"unavailableReplicas,omitempty"`
+}
+```
+
+### Implementation Details/Notes/Constraints
+
+The ControlPlaneMachineSet controller aims to act similarly to the Kubernetes StatefulSet controller.
+It will take the desired configuration, the `ProviderSpec`, and ensure that an appropriate number of Machines exist
+within the cluster which match this specification.
+
+It will be introduced as a new second level operator so that, if there are issues with its operation, it may report
+this via a `ClusterOperator` and prevent upgrades or further disruption until the issues have been rectified.
+Due to the nature of the actions performed by this controller, we believe it is important that it have its own
+`ClusterOperator` in which to report its current status.
+
+The `ControlPlaneMachineSet` CRD will be limited to a singleton within a standard OpenShift HA cluster, the only
+allowed name will be `cluster`. This matches other high level CRD concepts such as the `Infrastructure` object.
+The operator will operate solely on the `openshift-machine-api` namespace as with other Machine API components.
+Only `ControlPlaneMachineSets` in this namespace will be reconciled.
+The resource must be namespaced to ensure compatibility with future OpenShift projects such as Cluster API and
+centralised machine management patterns.
+
+The behaviour of such a controller is complex, and as such, various features of the controller and scenarios are
+outlined in the details below.
+
+#### Desired number of Machines
+
+At present, the only source of the installed control plane size within OpenShift clusters exists within the
+`cluster-config-v1` ConfigMap in the `kube-system` namespace.
+This ConfigMap has been deprecated for some time and as such, should not be used for scale in new projects.
+
+Due to this limitation, we will need to have a desired replicas field within the ControlPlaneMachineSet controller.
+As we are not planning to tackle horizontal scaling of the control plane initially, we will implement a validating
+webhook to deny changes to this value once set.
+
+For new clusters, the installer will create (timeline TBD) the `ControlPlaneMachineSet` resource and will set the value
+based on the install configuration.
+For existing clusters, we will need to validate during creation of the `ControlPlaneMachineSet` resource that the
+number of existing Control Plane Machines matches the replica count set within the `ControlPlaneMachineSet`.
+This will prevent end users from attempting to horizontally scale their control plane during creation of the
+`ControlPlaneMachineSet` resource.
+
+In the future, once we have identified risks and issues with horizontal scale, and mitigated those, we will remove the
+immutability restriction on the replica field to allow horizontal scaling of the control plane between 3 and 5 replicas
+as per current support policy for control plane sizing.
+
+#### Selecting Machines
+
+The ControlPlaneMachineSet operator will use the selector defined within the CRD
+to find Machines which it should consider to be within the set of control plane Machines it is responsible for managing.
+
+This set should be the complete control plane and there should be a 1:1 mapping of Machines in this set to the control
+plane nodes within the cluster.
+
+If there are any control plane nodes (identified by the node role) which do not have a Machine, the operator will mark
+itself degraded as this is an unexpected state.
+No further action (ie creating/deleting Control PlaneMachines, any rollouts or updates that need to be applied) will be
+taken until the unknown node has either been removed from the cluster or a Machine has been manually created for it.
+
+#### Providing high availability/fault tolerance within the control plane
+
+Typically within OpenShift, Machines are created within a MachineSet.
+MachineSets have a single `ProviderSpec` which defines the configuration for the Machines created by the MachineSet.
+The failure domain (sometimes called availability zone) is a part of this provider spec and as such, defines that all
+Machines within the Machineset share a failure domain.
+
+This is undesirable for Control Plane Machines as we wish to have them spread across multiple availability zones to
+reduce the likelihood of datacenter level faults degrading the control plane.
+To this end, the failure domains for the control plane will be set on the `ControlPlaneMachineSet` spec directly.
+
+When creating Machines, the `ControlPlaneMachineSet` controller will balance the Machines across these failure domains
+by injecting the desired failure domain for the new Machine into the provider spec based on the platform specific
+failure domain field.
+
+##### Failures domains
+
+The `FailureDomains` field is expected to be populated by the user/installer based on the topology of the Control Plane
+Machines. For example, we expect on AWS that this will contain a list of availability zones within a single region.
+Note, we are explicitly not expecting users to add different regions to the the `FailureDomains` as we do not support
+running OpenShift across multiple regions.
+
+Note that the `FailureDomains` field is only supported on certain platforms, currently; AWS, Azure, GCP and OpenStack;
+other platforms may be supported in the future.
+
+The users will be allowed to override a small amount of configuration for the `providerSpec` based on the configuration
+required to spread Machines across different failure domains.
+For example, on AWS, both the `availabilityZone` and `subnets` differ depending on which failure domain is configured,
+on other platforms, eg Azure, GCP or OpenStack, only one field is required to vary the failure domain, in which case,
+this is all that will be allowed.
+
+The overrides will be injected into the given `providerSpec` before creating the Machines as part of the balancing logic
+within the `ControlPlaneMachineSet` operator.
+
+As an example, a user on AWS may set their `FailureDomains` as:
+
+```yaml
+failureDomains:
+  aws:
+  - placement:
+      availabilityZone: us-east-1a
+    subnet:
+      filters:
+      - name: "tag:Name"
+        values:
+        - "my-cluster-subnet-1a"
+  - placement:
+      availabilityZone: us-east-1b
+      subnet:
+        filters:
+        - name: "tag:Name"
+          values:
+          - "my-cluster-subnet-1b"
+```
+
+A user on Azure may set their `FailureDomains` as:
+
+```yaml
+failureDomains:
+  azure:
+  - zone: us-central-1
+  - zone: us-central-2
+```
+
+###### Failure Domains on vSphere
+
+As of OpenShift 4.10, there is no concept of a zone within vSphere and as such we would expect that the `FailureDomains`
+would be omitted for vSphere.
+
+However, [there is future work](https://github.com/openshift/enhancements/pull/918) to include zone support for vSphere
+within 4.11. Once this is implemented, the Machine API provider for vSphere will understand the concept of zones,
+defined within the `Infrastructure` resource.
+The `ControlPlaneMachineSet` will then be able to balance Machines across the different zones by setting the new zone
+information within the provider specs as it does on other platforms.
+
+#### Ensuring Machines match the desired state
+
+To ensure Machines are up to date with the desired configuration, the `ControlPlaneMachineSet` controller will leverage
+a similar pattern to the workload controllers.
+It will hash the template and compare the hashed result with the spec of the Machines within the managed set.
+As we expect the failure domain to vary across the managed Machines, this will need to be omitted before the hash can
+be calculated.
+
+Should any Machine not match the desired hash, it will be updated based on the chosen update strategy.
+
+##### The RollingUpdate update strategy
+
+The RollingUpdate strategy will be the default. It mirrors the RollingUpdate strategy familiar to users from
+Deployments. When a change is required to be rolled out, the Machine controller will create a new Control Plane
+Machine, wait for this to become ready, and then remove the old Machine. It will repeat this process until all Machines
+are up to date.
+During this process the etcd membership will be protected by the mechanism described in a [separate enhancement](https://github.com/openshift/enhancements/blob/master/enhancements/etcd/protecting-etcd-quorum-during-control-plane-scaling.md),
+in particular it isn't essential that the Control Plane Machines are updated in a rolling fashion for etcd sakes, though
+to avoid potential issues with the spread of etcd members across failure domains during update, the
+`ControlPlaneMachineSet` will perform a rolling update domain by domain.
+
+At first, we will not allow any configuration of the RollingUpdate (as you might see in a Deployment) and will pin the
+surge to 1 replica. We may wish to change this later once we have more data about the stability of this operator and
+the etcd protection mechanism it relies on.
+
+We expect this strategy to be used in most applications of the `ControlPlaneMachineSet`, though it is not appropriate
+in all environments. For example, we expect this strategy to not be used in environments where capacity is very limited
+and a surge of any control plane capacity is unavailable.
+
+##### The Recreate update strategy (FUTURE WORK)
+
+Note: This strategy is not planned for implementation in the initial phase of this project. There are a number of
+open questions that need to be ironed out and the use case is not immediately required with OpenShift.
+The details of the strategy are left here as a basis of the future work.
+
+The Recreate strategy mirrors the Recreate strategy familiar to users from Deployments.
+When a change is required to be rolled out, the Machine controller will first remove a Control Plane Machine, wait for
+its removal, and then create a new Control Plane Machine.
+
+This strategy is intended to be used only in very resource constrained environments where there is no capacity to
+introduce an extra temporary Control Plane Machine (preventing the RollingUpdate strategy).
+
+At present, when using this strategy, the updates will need manual intervention. The etcd protection design will
+prevent the Control Plane Machine from being removed until a replacement for the etcd member has been introduced into
+the cluster. To allow for this use case, the end user can manually remove the etcd protection (via removing the Machine
+deletion hook on the deleted Machine) allowing the rollout to proceed, the etcd operator will not re-add the
+protection if the Machine is already marked for deletion.
+
+This strategy introduces risk into the update of the Machines. While the old Machine is being drained/removed and the
+new Machine is being provisioned, etcd quorum is at risk as a member has been removed from the cluster. In most
+circumstances this would leave the cluster with just 2 etcd members.
+This poses a similar risk to the etcd cluster as is present during a cluster upgrade when the Machine Config Daemon
+reboots the Control Plane Machines, however, in this case, the duration of the member being down is expected to be
+much longer than in the update process. For example baremetal clusters can take over an hour to reprovision a host.
+
+There are a number of open questions related to this update strategy:
+- Are we comfortable offering this strategy given the risks that it poses?
+- What alternative can we provide to satisfy resource constrained environments if we do not implement this strategy?
+- Should we teach the etcd operator to remove the protection mechanism when the `ControlPlaneMachineSet` is configured
+  with this strategy?
+- To minimise risk, should we allow this strategy only on certain platforms? (Eg disallow the strategy on public clouds)
+- Do we want to name the update strategy in a way that highlights the risks associated, eg `RecreateUnsupported`?
+
+##### The OnDelete update strategy
+
+The OnDelete strategy mirrors the OnDelete strategy familiar to users from StatefulSets.
+When a change is required, any logic for rolling out the Machine will be paused until the `ControlPlaneMachineSet`
+controller observes a deleted Machine.
+When a Machine is marked for deletion, it will create a new Machine based on the current spec.
+It will then proceed with the replacement as normal.
+
+This strategy will allow for explicit control for end users to decide when their Control Plane Machines are updated.
+In particular it will also allow them to have different specs across their control plane.
+While in the normal case this is discouraged, this could be used for short periods to test new configuration before
+rolling the updated configuration to all Control Plane Machines.
+
+Note that the OnDelete strategy is similar to the RollingUpdate strategy in that it needs to be able to add new
+Machines to the cluster to function. It can be thought of as a slower, more controlled RollingUpdate strategy where the
+user decides when each Control Plane Machine is replaced, and marks it for replacement by deleting it.
+Otherwise the replacement is identical to the RollingUpdate strategy.
+
+#### Protecting etcd quorum during update operations
+
+With the introduction of the [etcd protection enhancement](https://github.com/openshift/enhancements/blob/master/enhancements/etcd/protecting-etcd-quorum-during-control-plane-scaling.md), the
+ControlPlaneMachineSet does not need to observe anything etcd related during scaling operations.
+In particular, it is the etcd operators responsibility to add Machine deletion hooks to prevent Control Plane Machines
+from being removed until they are no longer needed.
+When the `ControlPlaneMachineSet` controller observes that a newly created Machine is ready, it will delete the old
+Control Plane Machine signalling to the etcd operator to switch the membership of the etcd cluster between the old and
+new instances.
+Once the membership has been switched, the Machine deletion hook will be removed on the old Machine, allowing it to be
+removed by the Machine controller in the normal way.
+
+#### Removing/disabling the ControlPlaneMachineSet
+
+As some users may want to remove or disable the ControlPlaneMachineSet, a finalizer will be placed on the
+`ControlPlaneMachineSet` to allow the controller to ensure a safe removal of the ControlPlaneMachineSet, while leaving
+the Control Plane Machines in place.
+
+Notably it will need to ensure that there are no owner references on the Machines pointing to the ControlPlaneMachineSet
+instances. This will prevent the garbage collector from removing the Control Plane Machines when the
+`ControlPlaneMachineSet` is deleted.
+
+If users later wish to reinstall the `ControlPlaneMachineSet`, they are free to do so.
+
+Note: Owner references will be added to the Control Plane Machines to identify to other components that a controller is
+managing the state of these Machines. This allows other systems such as the MachineHealthCheck to identify that if they
+were to make an action on the Machine, the Control Plane Machine Set Operator will react to that action.
+
+#### Installing a ControlPlaneMachineSet within an existing cluster
+
+When adding a ControlPlaneMachineSet to a existing cluster, the end user will need to define the
+`ControlPlaneMachineSet` resource by copying the existing Control Plane Machine ProviderSpecs.
+Once this is copied, they should remove the failure domain and add the desired failure domains to the `FailureDomains`
+field within the ControlPlaneMachineSet spec.
+
+To ensure adding a `ControlPlaneMachineSet` to the cluster is safe, we will need to ensure via a validating webhook
+that the replica count defined in the spec is consistent with the actual size of the control plane within the cluster.
+We will also validate that the failure domains align with those within the cluster already, this will prevent users
+from accidentally migrating their Control Plane Machines from multiple availability zones to a single zone.
+
+If no Control Plane Machines exist, or they are in a non-Running state, the operator will report degraded until this
+issue is resolved. This creates a dependency for the `ControlPlaneMachineSet` operator on the Machine API. It will be
+required to run at a higher run-level than Machine API.
+
+This in turn means, that in a UPI (where typically Machine objects do not exist) or misconfigured clusters, a user
+adding a `ControlPlaneMachineSet` will result in a degraded cluster.
+Users will need to remove the invalid `ControlPlaneMachineSet` resource, or manually add correctly configured Control
+Plane Machines to their cluster to restore their cluster to a healthy state.
+
+We do not recommend that UPI users attempt to adopt their control plane instances into Machine API due to the
+likelihood that the resulting spec does not match the existing Machines. Instead, we recommend that UPI users initially
+go through the process of replacing their control plane instances with Machines by creating new Control Plane Machines,
+allowing Machine API to create the instances, and then removing their old, manually created control plane instance.
+This effectively means migrating their entire control plane onto Machine API before the `ControlPlaneMachineSet` will
+take over the management.
+We enforce this recommendation so that users can be confident in the Machine `providerSpec` that they have configured
+and that Machine API will be able to create valid Machines from the spec before they then start using our automation.
+
+##### Why not to populate the spec for the customer
+
+One idea posed, is to allow the `ControlPlaneMachineSet` to be automatically populated if the spec is left blank when it
+is created. This would improve the UX by allowing customers not to have to worry about the spec and allow it to be
+inferred from the existing Machines within the cluster. It would also make it easier for managed services to adopt
+`ControlPlaneMachineSets` throughout the fleet without having to manually check each cluster.
+
+There are however a few concerns about this idea which means we are planning to not implement this (at least for the
+first iteration of the project):
+- It may encourage users to attempt to use `ControlPlaneMachineSets` with UPI clusters which are incorrectly configured
+  - Some users have Machines in their cluster, that aren't actually configured correctly, therefore we would be creating
+    an incorrectly configured `ControlPlaneMachineSet` for them, it is then unclear who is at fault for this and we
+    could end up with an increased number of support tickets due to these misconfigurations
+  - As an example, this is particularly prevalent in UPI clusters where users can forget to remove the Machine manifests
+    from the install manifest directory, this has been the source of many bugs where users have had Machines stuck in
+    Provisioning for an extended period. Inferring the specs from these Machines would certainly make the
+    `ControlPlaneMachineSet` invalid.
+- Some users may have made out of band changes to their Control Plane Machines which are not reflected within the
+  Machine specs
+  - We document the procedure within product docs of replacing a Control Plane Machine, though it is a manual process
+    that doesn't involve using the Machine API. Within the process we instruct users to resize their VM within the
+    cloud, and then update their Machine `providerSpec` to match the changes in the cloud. We suspect that there are
+    a number of users who have made this change, or other similar changes that are not reflected in the `providerSpec`
+    and as such, inferring the configuration from these Machines would create an incorrect spec.
+
+If we do implement some adoption process in the future, we should also include a `paused` field within the spec, set to
+`true` by the adoption logic, that prevents the `ControlPlaneMachineSet` from taking any action until someone has
+reviewed the inferred spec and marked `paused: false`. This would allow a sanity check to be enforced before the
+`ControlPlaneMachineSet` takes any actions based on the inferred spec.
+
+An alternative way to populate the spec could be to build a plugin for `oc` which would inspect the Control Plane
+Machines and print out a `ControlPlaneMachineSet` for the customer to create. This would also allow the customer to
+inspect the resource before they create it within the cluster and would make it a very conscious decision on their part
+to create the `ControlPlaneMachineSet`.
+
+#### Naming of Control Plane Machines owned by the ControlPlaneMachineSet
+
+In an IPI OpenShift cluster, the Control Plane Machines are named after the cluster ID followed by an index, for example
+`my-openshift-cluster-abcde-master-0`. As we will need to create additional Machines during replacement operations, we
+cannot reuse the names of the existing Machines. Instead we will generate a random, 5 character string, and add this
+before the index of the Machine. For example, Machines created by a `ControlPlaneMachineSet` may have a name such as
+`my-openshift-cluster-abcde-master-fghij-0`. This should make it clear to the end user which Machine we are attempting
+to replace with the newer Machine. As we typically spread Machines across multiple availability zones, we will keep the
+indexes consistent with the availability zone that they were originally assigned.
+
+As users can today replace their Machines with differently named Machines (eg as part of a recovery process), we do not
+expect other components to be relying on the index of the Machines for any function within OpenShift, but we may
+discover during testing something that is currently unknown to us. If we discover issues with other components relying
+on a particular naming scheme for the Control Plane Machines, we will need to solve this issue within the other
+component.
+
+#### Delivery via the core OpenShift payload
+
+This new operator will form part of the core OpenShift release payload for clusters installed on/upgrading to OpenShift
+4.11. While there is a movement within OpenShift to stop adding new components to the release payload, we believe that
+this new operator should be added to the core for the following reasons:
+
+- The scaling operations enabled by this operator are often required when there is an imminent cluster failure due to
+  an overwhelmed control plane. Having the `ControlPlaneMachineSet` already installed means the users will be able to
+  scale their control plane without first having to learn that this project exists or installing/configuring it.
+- By having this installed by the installer, there is a lower likelihood of issues where configuration within the
+  `ControlPlaneMachineSet` is invalid, such that it wouldn't be able to create new Machines when required.
+  The installer already has logic for creating MachineSets, which will form a basis of the installation of the
+  `ControlPlaneMachineSet`.
+- Having this installed by default will make it easier for managed services to manage. The managed services team intend
+  to leverage this operator across thousands of clusters. Installing the operator and configuring it correctly for each
+  new cluster is a significant amount of work for them to automate. Including this within the installer would make this
+  easier to rollout to new clusters in the fleet.
+- The operator adds a lot of value for recovery of failed Control Plane Machines in clusters using Machine API. If the
+  operator is installed by default, users are more likely to be able to correctly create new Machines during the
+  cluster recovery process.
+
+This operator however is not critical to the OpenShift cluster, and while we believe this should be installed by
+default, it should be be made optional via the install time [component-selection](https://github.com/openshift/enhancements/blob/master/enhancements/installer/component-selection.md)
+mechanism currently being implemented.
+
+#### How does this new operator fit within the Cluster API landscape
+
+Within Cluster API, a concept exists known as a Control Plane Provider. This component, currently with a single upstream
+reference implementation based on KubeADM, is intended to instantiate and manage a control plane within for the
+Kubernetes guest cluster.
+
+The Control Plane Provider is responsible not only for creating the infrastructure for the Control Plane Machines but
+also etcd and the control plane Kubernetes components (API server, Controller Manager, Scheduler, Cloud Controller
+Manager). Within OpenShift, various different operators implement the management and responsibility of these components,
+however, to date we do not have a Machine Infrastructure operator that fits this role.
+
+In a future iteration of the `ControlPlaneMachineSet`, we could use it to satisfy the Cluster API Control Plane Provider
+contract and fill the role within OpenShift clusters running on CAPI. To ensure that this is a possibility, we are
+planning to make the `ControlPlaneMachineSet` compatible, as much as possible, with the
+[CRD contract](https://cluster-api.sigs.k8s.io/developer/architecture/controllers/control-plane.html#crd-contracts)
+for the Control Plane Provider in Cluster API.
+Importantly, we are designing the CRD API with the intention of making it API compatible in the future without making
+any breaking changes or needing to bump the API version of the `ControlPlaneMachineSet`.
+Notably, an API restriction imposed by making this resource compatible with Cluster API is that the resource MUST be
+Namespaced, and not Cluster scoped.
+
+The notable exception to this, is that because Cluster API uses separate resources for Machine templates, and Machine
+API embeds these directly within the spec, we will follow the Machine API convention in the first iteration of this
+CRD and may evolve it at a later date by adding the additional fields required to satisfy the Machine template within
+Cluster API.
+
+The `template` within the `spec` is designed such that we can add additional supported machine types in the future.
+We will add a new struct `machines.<version>.cluster-x.k8s.io` to the discriminated union to allow users to specify
+the template for creating Cluster API machines via the `ControlPlaneMachineSet`.
+The templates for different machine types are mutually exclusive and as such, users will only be able to set one
+template type at a time.
+
+We are planning to synchronise and convert resources from Machine API to Cluster API resources as part of our Cluster
+API proof of concept. When converting resources, such as `MachineSets` and `Machines`, we will also convert the
+`ControlPlaneMachineSet` in the same manor. Users will be able to select whether they want the Machine API or Cluster
+API version of the resource to be authoritative, and therefore which should have controllers operate on it.
+
+The Control Plane Provider in Cluster API is also responsible for creating and maintaining a Kubeconfig file that the
+Cluster API components can use to manage resources within the guest cluster. Within the OpenShift Cluster API Technical
+Preview, we are handling this Kubeconfig generation with a separate component, we will continue to do this even if
+we make this new operator satisfy the Control Plane Provider contract.
+
+Alternatively, instead of making the `ControlPlaneMachineSet` satisfy the Control Plane Provider contract, we may
+introduce another CRD to act as a proxy, gathering information from the other components within the OpenShift cluster
+to satisfy the requirements of the Control Plane Provider contract. Further investigation will be required in the future
+to determine how exactly we want to handle this compatibility.
+
+#### Interaction with Machine Health Check
+
+In OpenShift, customers may choose to use MachineHealthCheck resources to remove failed Machines from their clusters
+and have them replaced automatically by a MachineSet. MachineHealthCheck requires that a Machine has an Owner Reference
+before it will remove the Machine, this prevents the removal of a Machine that is not going to be replaced.
+
+As the Control Plane Machines will now be owned by the `ControlPlaneMachineSet`, MachineHealthChecks will now be
+compatible with Control Plane Machines. This we believe to be safe due to the design of the `ControlPlaneMachineSet`
+operator and the protection mechanism being implemented to protect etcd quorum. If at any point a Machine is deleted,
+the protection system will ensure no Machine is actually removed until a replacement has been brought in to replace it
+within the etcd cluster.
+
+We expect that if a user wants automated remediation for their Control Plane Machines, they will configure a
+MachineHealthCheck to point to the Control Plane Machines, but we will not configure this for them.
+
+Notably, if a user does wish to use a MachineHealthCheck with the Control Plane Machines, we advise them to configure
+the MachineHealthCheck just to observe Control Plane Machines and to have the `maxUnhealthy` field set to 1.
+These recommendations will ensure that if more than one Control Plane Machine appears unhealthy at once, that the
+MachineHealthCheck will take no action on the Machines. It is likely that if more than one Control Plane Machine appears
+unhealthy that either the etcd cluster is degraded, or a scaling operation is taking place presently to replace a
+failed Machine.
+
+#### Management of Control Plane load balancers
+
+In an OpenShift cluster, in general, there is a concept of an internal and external load balancer in front of the
+Kubernetes API. These load balancers are created by the installer on various cloud providers but are later considered
+for the most part to be unmanaged.
+
+To enable Control Plane Machine replacement, Machine API handles adding and removing Control Plane Machines from these
+load balancers on appropriate platforms (eg AWS, Azure and GCP). Therefore, when a customer today replaces their Control
+Plane Machine, they do not need to worry about the load balancer attachment as this is automated for them.
+
+As this is already a part of the Machine management directly, the `ControlPlaneMachineSet` does not need to be concerned
+about the load balancer management itself.
+
+On other platforms where virtual load balancers are employed (via Keepalived and HAProxy), such as vSphere or OpenStack,
+the Kubernetes API load balancing is all in cluster and therefore is not required to be modified during a Control Plane
+Machine replacement.
+
+On platforms that do not yet support load balancer management (eg IBM and Alibaba), this will need to be implemented in
+a similar manner to that of AWS, Azure and GCP before these platforms can be supported by `ControlPlaneMachineSets`.
+
+### Risks and Mitigations
+
+#### etcd quorum must be preserved throughout scaling operations
+
+As we are planning to scale up/down Control Plane Machines in an automated fashion, scaling operations will inevitably
+effect the stability of the etcd cluster.
+To prevent disruption, we have an [existing mechanism](https://github.com/openshift/enhancements/blob/master/enhancements/etcd/protecting-etcd-quorum-during-control-plane-scaling.md) that was
+designed to allow the etcd operator to protect etcd quorum without other systems, such as Machine API, having any
+knowledge of the state of the etcd cluster.
+
+The protection mechanism is designed so that, even if a Machine is deleted, nothing will happen to the Machine (ie no
+drain, no removal on the cloud provider) until the etcd operator allows the removal to proceed.
+This prevents any data loss or quorum loss as the Machine will remain within the cluster until the etcd operator is
+confident that it no longer needs the Machine.
+
+The only time etcd quorum may be at risk is during the Recreate update strategy, this is highlighted in more detail
+above.
+
+#### Users may delete Control Plane Machines manually
+
+If a user were to delete the Control Plane Machines using `oc`, `kubectl` or some other API call, the  
+`ControlPlaneMachineSet` operator is designed in such a way that this should not pose a risk to the cluster.
+
+The etcd protection mechanism will prevent removal of the Machines until they are no longer required for the etcd
+quorum. The `ControlPlaneMachineSet` operator will, one by one, add new Control Plane Machines based on the existing
+spec and wait for these to join the cluster. Once the new Machines have joined, they will replace the deleted Machines
+as normal with the process outlined earlier in this document.
+
+#### Machine Config may change during a rollout
+
+There may be occasions where the Machine Config operator attempts to rollout new Machine Config during a Control Plane
+scaling operation. We do not believe this will cause issue, but it may extend the time taken for the scaling operation
+to take place.
+
+While a scaling operation is in progress, etcd quorum is protected by the protection mechanism mentioned above as well
+as the etcd quorum guard. The quorum guard will ensure that at most one etcd instance is interrupted at any time.
+If the Machine Config operator needs to rollout an update, it will proceed in the usual manner while the etcd learner
+process (part of scaling up a new etcd member) may suffer a delay due to the restart of the Control Plane Machines.
+
+#### A user deletes the ControlPlaneMachineSet resource
+
+Users will be allowed to remove `ControlPlaneMachineSet` resources should they so desire. This shouldn't pose a risk to
+the control plane as the `ControlPlaneMachineSet` will orphan Machines it manages before it is removed from the
+cluster.
+More detail is available in the [Removing/disabling the ControlPlaneMachineSet](#Removingdisabling-the-ControlPlane)
+notes.
+
+#### The ControlPlaneMachineSet spec may not match the existing Machines
+
+It is likely that in a number of scenarios, the spec attached to the `ControlPlaneMachineSet` for creating new Machines
+may not match that of the existing Machines. In this case, we expect the `ControlPlaneMachineSet` operator will attempt
+a rolling update
+when it is created. Apart from refreshing the Machines potentially needlessly, this shouldn't have a negative effect on
+the cluster assuming the new configuration is valid. If it is invalid, the cluster will degrade as the new Machines
+created by the `ControlPlaneMachineSet` will fail to launch.
+
+We have designed this operator with IPI style clusters in mind. We expect that the Machine objects within the cluster
+should match the actual hosts on the infrastructure provider. In IPI clusters this is ensured by the installer.
+Users may have, out of band, then modified these instances, but unfortunately we have no way to deal with this at
+present. In these scenarios, when the Machines are replaced, they may not exactly match the previous iteration and
+therefore may cause issues for the customer.
+If a customer wishes to mitigate the chances of issues occurring, they may wish to manually control the roll out using
+the OnDelete update strategy.
+
+## Design Details
+
+### Open Questions
+
+Some open questions currently exist, but are called out in the Recreate Update Strategy notes.
+
+1. How do we wish to introduce this feature into the cluster, should it be a Technical Preview in the first instance?
+
+### Test Plan
+
+As this project operates solely on resources within the Kubernetes environment, we will aim to leverage the controller
+runtime [envtest](https://book.kubebuilder.io/reference/envtest.html) project to write an extensive integration test
+suite for the main functionality of the new operator.
+
+In particular, tests should include:
+- Behaviour when there is no `ControlPlaneMachineSet` within the cluster
+- Behaviour when a new `ControlPlaneMachineSet` is created
+  - What happens with regard to the adoption of existing Control Plane Machines
+  - What happens if the configuration differs from the existing Machines within the cluster (per strategy)
+  - How does the ClusterOperator status present during the adoption
+  - What happens if the name is wrong or there are multiple `ControlPlaneMachineSets`
+  - How does the status get updated during this process
+- Behaviour during a configuration change
+  - How does the RollingUpdate strategy behave
+  - How does the OnDelete strategy behave
+  - How does the status get updated during rollouts
+- Behaviour when a `ControlPlaneMachineSet` is removed
+  - The effect on the Machines (removing owner references etc)
+  - Removal of the finalizer
+
+Notably, these tests will involve a large amount of simulation as we won't have a running Machine controller, nor will
+we have an etcd operator adding Machine deletion hooks. The functions of these two operators will be simulated to
+exercise the `ControlPlaneMachineSet` operator based on the behaviours that the operator relies on.
+
+As this isn't a true E2E test, we will also duplicate a number of the above tests into an E2E suite that can run on a
+real cluster. This E2E suite has the potential to be very disruptive, we expect that a sufficiently well written
+integration suite should prevent this however.
+
+We will ensure that the origin E2E suite is updated to include tests for this component to prevent regressions in either
+the etcd operator or Machine API ecosystem from breaking the functionality provided by this operator.
+
+### Graduation Criteria
+
+TBD
+
+#### Dev Preview -> Tech Preview
+
+TBD
+
+#### Tech Preview -> GA
+
+TBD
+
+#### Removing a deprecated feature
+
+This enhancement does not describe removing a feature of OpenShift, this section is not applicable.
+
+### Upgrade / Downgrade Strategy
+
+When the new operator is introduced into the cluster, it will not operate until a `ControlPlaneMachineSet` resource has
+been created. This means we do not need to worry about upgrades during the introduction of this resource.
+
+### Version Skew Strategy
+
+The `ControlPlaneMachineSet` operator relies on the Machine API. This is a stable API within OpenShift and we are not
+expecting changes that will cause version skew issues over the next handful of releases.
+
+### Operational Aspects of API Extensions
+
+We will introduce a new `ClusterOperator` for the `ControlPlaneMachineSet` operator. Within this we will introduce
+conditions (TBD) which describe the state of the operator and the control plane which it manages.
+
+This new `ClusterOperator` will be the key resource from which support should discover information if they believe the
+`ControlPlaneMachineSet` to be degraded.
+
+There should be no effect to existing operators or supportability of other components due to this enhancement, in
+particular, this is strictly adding new functionality on top of an existing API, we do not expect it to impact the
+functionality of other APIs.
+
+#### Failure Modes
+
+- A new Machine fails to scale up
+  - In this scenario, we expect the operator to turn degraded to signal that there is an issue with the Control Plane
+    Machines
+  - We expect that the Machine will contain information identifying the issue with the launch request, which can be
+    rectified by the user before they then delete the Machine, once deleted, the ControlPlaneMachineSet will attempt the
+    scale up again
+  - This process should be familiar from dealing with existing Failed Machines
+  - A MachineHealthCheck targeting Control Plane Machines could automatically resolve this issue
+- The `ControlPlaneMachineSet` webhook is not operational
+  - This will prevent creation and update of ControlPlaneMachineSets
+  - We expect these operations to be very infrequent within a cluster once established, so the impact should be
+    minimal
+  - The failure policy will be to fail closed, as such, when the webhook is down, no update operations will succeed
+  - The Kube API server operator already detects failing webhooks and as such will identify the issue early
+  - The ControlPlaneMachineSet is not a critical cluster operator. Everything done by the operator described here can be
+    done manually to the machine(set) objects.
+
+#### Support Procedures
+
+TBD
+
+## Implementation History
+
+There is not current implementation of this enhancement, however, it will depend on the
+[etcd quorum protection mechanism](https://github.com/openshift/enhancements/blob/master/enhancements/etcd/protecting-etcd-quorum-during-control-plane-scaling.md) which is currently being
+implemented.
+
+## Drawbacks
+
+- Introduction of new CRDs complicates the user experience
+  - Users will have to learn about the new CRD and how it can be used
+  - Counter: The concepts are familiar from existing apps resources and CRDs
+- We are making it easier for customers to put their clusters at risk
+  - If a scale up fails, this will then need manual intervention, the cluster will become degraded at this point
+  - Counter: The etcd protection mechanism should mean that the cluster is never at risk of losing quorum, clusters
+    should be recoverable
+- We will need confidence in the project before we can ship it
+  - We need to decide how to ship the project, will we ship it as GA or TP in the first instance?
+
+## Alternatives
+
+Previous iterations of this enhancement linked in the metadata describe alternative ways we could implement this
+feature.
+
+### Layering MachineSets
+
+There has been previous discussion about the use of MachineSets to create the Machines for the Control Plane.
+In previous iterations of this enhancement they have been recommendations to either create one MachineSet per
+availability zone or to have some `ControlPlaneMachineSet` like CRD create MachineSets and leverage these to create the
+Machines.
+This proposal deliberately omits MachineSets due to concern over the risks and drawbacks that leveraging MachineSets
+poses in this situation.
+
+Exposing MachineSets within this mechanism exposes risk in a number of ways:
+
+- Users have the ability to scale the MachineSet
+  - We do not intend to support horizontal scaling initially, there's no easy way to prevent users scaling the
+    Control Plane MachineSets while not affecting workload MachineSets
+  - If a user were to scale up the MachineSet, something would need to sit on top to scale the MachineSet back to the
+    supported control plane replicas count. It is then difficult to ensure that the correct Machine is removed from
+    the cluster without jeopardising the etcd quorum. If a user attempted to use GitOps to manage the MachineSet,
+    this could cause issues as the higher level controller battles to scale the MachineSet.
+- If no higher level object exists, difficult to ensure consistency across MachineSets
+  - We promote to users that their control plane should be consistent across availability zones, with separate
+    MachineSets, there is nothing to prevent users modifying the MachineSets between zones and having major
+    differences. Collating these differences for support issues could prove difficult. Users may also spread their
+    Machines in an undesirable manner.
+  - Users will have the ability to have inconsistency while using the OnDelete update strategy with
+    `ControlPlaneMachineSets`, but this should be easy to track due to the updated replicas count within the
+    `ControlPlaneMachineSet` status. We may want to degrade the operator if there are discrepancies to encourage users
+    to keep their control plane consistent.
+- Users can delete intermediary MachineSets
+  - In this case, if a user were to delete the Control Plane MachineSet, it is hard to define a safe way to leave the
+    Control Plane Machine(s) behind without having to have very specific knowledge baked into the MachineSet
+    controller
+  - It becomes easier for users to mismanage their Control Plane Machines and put their cluster at risk
+  - Previous enhancements have discussed the use of webhooks to prevent the deletion of Control Plane MachineSets,
+    though these are not foolproof. The removal design of the `ControlPlaneMachineSet` (being operator based) should be
+    more reliable than a webhook.
+
+### StatefulMachineSet
+
+The proposed `ControlPlaneMachineSet` is reasonably similar to a mixture between a `StatefulSet` and a `MachineSet`.
+The `ControlPlaneMachineSet` is targeted specifically at managing Control Plane Machines but we could also create a
+more generic `StatefulMachineSet` that covers this use case and others. The implementation would be very similar, though
+would most likely not have its own `ClusterOperator` to report status.
+
+To understand why we don't think a more generic `StatfeulMachineSet` is worth implementing, we must look at the promises
+that `StatefulSets` make and why users would want to use them.
+
+- Stable, unique network identifies: Users have applications that require stable IP addresses, eg. something like etcd
+- Stable, persistent storage: Users must reattach applications to the same storage disk as previously used when the
+  workload is rescheduled
+- Ordered, graceful deployment, scaling and rolling updates: Different applications must be updated in a certain and
+  controlled way
+
+In most Kubernetes environments and cloud applications, the IP address of the host does not matter and is not required
+to be stable. The application layer networking means that the host IP address is insignificant to the functionality of
+the cluster. The only use case we can consider where static IPs over a given set of hosts are required would be in a
+scenario where you have an external load balancer that requires reconfiguration if the host IPs change. However, for
+this example, it would most likely be more cloud-native to implement an operator that could reconfigure the load
+balancer on changes rather than trying to keep the IP addresses of the hosts static. Additionally, there are already
+projects tackling IPAM within Kubernetes which may resolve this issue without having to make pets of the Machines.
+
+For storage, we expect most users to use persistent volumes which can be, in most environments attached to multiple
+hosts, whether that be abstracted away as a cloud provider service (eg AWS EFS) or as an iSCSI storage network Within
+a datacenter. In certain applications these network storage provisions may not be suitable however and you may need
+access to a local disk or volume. In cloud environments this doesn't apply, in virtualized environments the local
+volume would be represented as a persistent volume that is only able to be attached to VMs on a certain physical host,
+and in bare-metal environments, you would need to schedule to a single host, in which case, existing pod scheduling
+mechanisms would ensure this scheduling provided the Machine has some persistent labelling. In the bare metal case,
+this is already achieved through the hardware inventories provided by Metal3.
+
+For graceful ordered deployments, this isn't typically a property of the host but the applications running on top of
+them. If we want to provide users with the ability to apply updates to their Machines, we are likely better implementing
+a `MachineDeployment` concept similar to that of the Cluster API project. This allows automated updates by creating new
+Machines, as described within this document but does treat the Machines with any special consideration.
+When OpenShift 4 was conceived, the `MachineDeployment` concept was originally tabled because, although we promote
+immutable infrastructure within OpenShift, our OS level is updated automatically through the Machine Config Operator
+system. The two ideas would work together, but we didn't want to force users to redeploy every Machine to benefit from
+updates and so the value of the `MachineDeployment` was diminished.
+
+Aside from the above arguments, there are other higher level reasons we don't feel that a generic `StatefulMachineSet`
+is a valuable addition to OpenShift.
+- The concept of stateful Machines goes against the cattle not pets concept on which the rest of Machine API has been
+  built
+- Machines in OpenShift don't, in the majority of cases, have any state and, we shouldn't promote them to have state.
+  The state is handled at the application layer by adding additional abstractions such as persistent volumes.
+- A lot of the scenarios we could think of for having a stateful group of Machines, could also be solved by using a set
+  of well defined `MachineSets`.
+- To our knowledge, no customers have asked for a `StatefulMachineSet`
+- For the Control Plane case, we want additional monitoring on top of what a `StatefulMachineSet` might provide:
+  - The ability to track the Control Plane infrastructure state via a Cluster Operator and to block upgrades if the
+    Control Plane infrastructure is degraded for any reason
+  - The ability to restrict there to being a single source of truth for the Control Plane infrastructure definition
+  - Restricting the replica count of the set to a supported number for the Control Plane within OpenShift
+  - Ensuring that users aren't putting themselves in unsupported scenarios by trying to create additional Control Plane
+    Machines
+  - Ensuring Control Plane Machines are not removed when/if the `ControlPlaneMachineSet` is deleted
+
+## Infrastructure Needed
+
+For a clean separation of code, we will introduce the new operator in a new repository,
+openshift/cluster-control-plane-machine-set-operator.