diff --git a/design/cluster-api-provider-metal3/node_reuse.md b/design/cluster-api-provider-metal3/node_reuse.md new file mode 100644 index 00000000..0318ab89 --- /dev/null +++ b/design/cluster-api-provider-metal3/node_reuse.md @@ -0,0 +1,224 @@ + + +# node reuse + +## Status + +implementable + +## Summary + +Node reuse during the upgrade/remediation operation. + +## Motivation + +Sometimes it is necessary to make some upgrades to the Metal3 cluster. These +upgrades can happen when updates occur on `Metal3MachineTemplate` object; for +example, user modifies node image version via `spec.template.spec.image.url` +field, or on `KubeadmControlPlane`; for example, a user modifies Kubernetes +version via `spec.version` field. Once update takes place, owning controller +(e.g. KCP controller) starts rolling upgrade. As such, CAPI `Machines`, CAPM3 +`Metal3Machines` will be re-created based on the KCP changes. And of course, +`BareMetalHosts` (below will be referenced as host) which are owned by +`Metal3Machines`, will be deprovisioned. Normally, when host is +getting deprovisioned, Ironic cleans up root and externally attached disks on +the host. However, while performing upgrade operation we don't want hosts +external disk data to be cleaned so that when host becomes provisioned +again, it has still the disk data untouched. + +### Goals + +Add a logic to label, filter and pick those labelled hosts when going through a +re-provisioning cycle as part of the upgrade procedure. + +### Non-Goals + +Support node reuse for Machines which are created independently of Kubeadm +Control Plane controller/Machine Deployment(KCP/MD). Currently we are not trying +to support node reuse feature for the Machines not owned by higher level objects +like KCP/MD. Because, if there is no KCP/MD that Machine could point to, then +Cluster API Provider Metal3 (CAPM3) Machine controller will fail to first, set +the label in the hosts and second, to filter the hosts based on the label. + +## Proposal + +We would like to propose an interface to mark hosts that we want to reuse after +deprovisioning and second, add a selection/filtering mechanism in CAPM3 Machine +controller that will select those hosts with a specific label (to be exact, based +on the label `infrastructure.cluster.x-k8s.io/node-reuse`). + +To achieve this, we need to + +1. be able to disable disk cleaning while deprovisioning. + This [feature](https://github.com/metal3-io/metal3-docs/blob/master/design/cluster-api-provider-metal3/allow_disabling_node_disk_cleaning.md) + is WIP right now. + +2. be able reuse the same pool of hosts so that we get the storage + data back. Currently, there is no mechanism available in Metal³ to pick the + same pool of hosts that were released while upgrading/remediation - for the + next provisioning phase. And this proposal tries to solve it. + +**NOTE:** This proposal focuses on upgrade use cases but it isn't limited to +only upgrade. Other use cases like remediation can also benefit from this +feature as well. + +### User Stories + +#### Story 1 + +As a cluster admin, I would like the re-use the same nodes during the upgrade +operation so that I don't loose secondary storage data attached to them. + +#### Story 2 + +As a cluster admin, I would like the re-use the nodes during the remediation +operation so that I don't loose secondary storage data attached to them. + +#### Story 3 + +As a cluster admin, when nodeReuse is disabled, I want to preserve the current +flow of host selection. + +## Design Details + +We propose modifying the Metal3MachineTemplate CRD to support enabling/disabling +node reuse feature. Add `nodeReuse` field under the spec of the Metal3MachineTemplate, +that stores boolean type of a value. + +When set to True, CAPM3 machine controller tries to reuse the same pool of hosts. + +If no value is provided by the user, default value False is set and the current +flow is preserved. + +E.g. Metal3MachineTemplate CR + +```yaml +apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4 +kind: Metal3MachineTemplate +spec: + nodeReuse: True + template: + spec: + image: + ... +``` + +During upgrade, to avoid our hosts from being selected for other KCP/MD pools, +`infrastructure.cluster.x-k8s.io/node-reuse` label will be set. The label +stores the name of the KCP/MD that hosts belong to. In the example CR below, MD +named _md-pool1_ is set to the label (_kcp-pool1_ will be set in case the host +belongs to KCP). + +E.g. BareMetalHost CR + +```yaml +apiVersion: metal3.io/v1alpha1 +kind: BareMetalHost +metadata: + labels: + infrastructure.cluster.x-k8s.io/node-reuse: md-pool1 +``` + +### Implementation Details/Notes/Constraints + +#### Node Reuse Approach + +We should perform two steps to re-use the same pool of hosts. + +1. During host deprovisioning, set the `infrastructure.cluster.x-k8s.io/node-reuse` + label; +2. During next provisioning, try to select any host in Ready state and having + a matching `infrastructure.cluster.x-k8s.io/node-reuse` label; + +The actual implementation will be done within the CAPM3 Machine controller. + +Step1 can be done as follows: + +- CAPM3 controller sets the `infrastructure.cluster.x-k8s.io/node-reuse: md-pool1` + label on the hosts belonging to the same KCP/MD. + +Step2 can be done as follows: + +- CAPM3 controller filters host labelled with + `infrastructure.cluster.x-k8s.io/node-reuse: md-pool1` label. + + Next: + - If host is found in `Ready` state: + - Pick that host for newly created M3M; + - Once it is picked up, remove the whole label + (`infrastructure.cluster.x-k8s.io/node-reuse: md-pool1`) + from the host. + - If host is found in `Deprovisioning` state: + - Requeue until that host becomes `Ready`; + - If no host is found, while it should be (i.e for some reason host + is not in the cluster anymore): + - Fall back to the current flow, which selects host randomly. + +
+
+
+
+