From c1e1931bb3c3c37086418c051c32bd73bfd20dae Mon Sep 17 00:00:00 2001 From: David Justice Date: Tue, 23 Feb 2021 11:31:39 -0500 Subject: [PATCH] add azure machine pool machine proposal --- .../20210222-azure-machinepool-machine.md | 417 ++++++++++++++++++ 1 file changed, 417 insertions(+) create mode 100644 docs/proposals/20210222-azure-machinepool-machine.md diff --git a/docs/proposals/20210222-azure-machinepool-machine.md b/docs/proposals/20210222-azure-machinepool-machine.md new file mode 100644 index 00000000000..06dd8181bc1 --- /dev/null +++ b/docs/proposals/20210222-azure-machinepool-machine.md @@ -0,0 +1,417 @@ +--- +title: Azure Machine Pool Machines +authors: + - @devigned +reviewers: + - @CecileRobertMichon + - @nader-ziada +creation-date: 2021-02-22 +last-updated: 2021-02-22 +status: implementable +see-also: + - https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/819 + - https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/1067 +--- + + +# Azure Machine Pool Machines + +## Table of Contents +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals / Future Work](#non-goals--future-work) + - [Notes About VMSS Terminate Notifications](#notes-about-vmss-terminate-notifications) +- [Proposal](#proposal) + - [User Stories](#user-stories) + - [Story 1 - Upgrading the Kubernetes Version of a MachinePool](#story-1---upgrading-the-kubernetes-version-of-a-machinepool) + - [Story 2 - Reducing the Number of Replicas in a MachinePool](#story-2---reducing-the-number-of-replicas-in-a-machinepool) + - [Story 3 - Deleting an individual Azure Machine Pool Machine](#story-3---deleting-an-individual-azure-machine-pool-machine) + - [Requirements](#requirements) + - [Functional](#functional) + - [Non-Functional](#non-functional) + - [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints) + - [Existing APIs for Clarity](#existing-apis-for-clarity) + - [Proposed API Changes](#proposed-api-changes) + - [Proposed Controllers Changes](#proposed-controller-changes) + - [Proposed Changes of Responsibily](#proposed-changes-of-responsibility) +- [Available Options](#available-options-for-cluster-api-provider-azure) + - [Add Annotations to AzureMachinePool for Instance Delete Selection](#option-1-add-annotations-to-azuremachinepool-for-instance-delete-selection) + - [Pros](#option-1-pros) + - [Cons](#option-1-cons) + - [Separate AzureMachinePool and AzureMachinePoolMachines](#option-2-separate-azuremachinepool-and-azuremachinepoolmachines) + - [Pros](#option-2-pros) + - [Cons](#option-2-cons) +- [Conclusions](#conclusions) +- [Additional Details](#additional-details) + - [Test Plan](#test-plan) +- [Implementation History](#implementation-history) + +## Summary + +Azure MachinePool currently embeds the state of each of the instances in the MachinePool within the status of the Azure +MachinePool. MachinePool instances should be their own resources to enable individual lifecycles. + +## Motivation + +By giving each AzureMachinePoolMachine an individual lifecycle, a user would be able to inform CAPZ of the specific +instance to delete and then have the AzureMachinePoolMachine controller cordon and drain the node prior to deleting +the underlying infrastructure. + +### Goals +- Be able to delete specific AzureMachinePool instances +- Rolling updates with max unavailable and max surge + - MaxUnavailable is the max number of machines that are allowed to be unavailable at any time + - MaxSurge is the number of machines to surge, add to the current replica count, during an upgrade of the VMSS model +- Safely update by cordoning and draining nodes prior to deleting the underlying infrastructure +- Be able to take advantage of [Azure's Virtual Machine Scale Set Update Instance API](https://docs.microsoft.com/en-us/rest/api/compute/virtualmachinescalesets/updateinstances) + to in-place update a VMSS instance rather than delete and recreate the infrastructure, which would result in a much + quicker upgrade. + +### Non-Goals / Future Work +- Create a CAPI Machine owner for each AzureMachinePoolMachine +- Implementing different roll out and scale down strategies +- Adopting individual Machine instances to be managed by the MachinePool +- Create or use an on instance agent to cordon and drain in response to Azure Virtual Machine Scale Sets provide [terminate notifications](https://docs.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-terminate-notification) + +#### Notes About VMSS Terminate Notifications +Azure Virtual Machine Scale Sets provide [terminate notifications](https://docs.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-terminate-notification). +These terminate notifications would be helpful to inform Kubernetes when a node is going to be deleted. Unfortunately, +terminate notifications do not provide notifications when an instance is Updated, in this case "Updated" means the +instance is reimaged to match the updated VMSS model by using the [Update Instance API](https://docs.microsoft.com/en-us/rest/api/compute/virtualmachinescalesets/updateinstances). +If a VMSS instance were to be reimaged, rather than deleted and recreated the instance will not receive a notification. +Due to the design of terminate notifications the CAPZ controller needs to alert Kubernetes when an instance is being +Updated. Without some way to inform Kubernetes of the specific instance that is to be updated, the underlying +infrastructure may be removed before workloads can be safely migrated from the machine / node. By managing the lifecycle +from CAPZ, we are able to safely delete / upgrade machines / nodes. + +In the future, it would be useful to integrate [awesomenix/drainsafe](https://github.com/awesomenix/drainsafe) or +something similar to handle scenarios when Azure will delete or migrate a VMSS instance. Two scenarios come to mind. + +1. VMSS is configured to use [Spot instances](https://docs.microsoft.com/en-us/azure/virtual-machines/spot-vms) and + Azure must evict an instance. +2. Azure must [perform maintenance on an instance](https://docs.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-maintenance-notifications). + +## Proposal + +### User Stories + +#### Story 1 - Upgrading the Kubernetes Version of a MachinePool +Alex is an engineer in a large organization which has a MachinePool running 1.18.x and would like to upgrade the +MachinePool 1.19.x. It is important to Alex that the MachinePool doesn't experience downtime during the upgrade. Alex +has set the MaxUnavailable and MaxSurge values on the AzureMachinePool to limit the number of machines that will be +unavailable during the upgrade, and the number of extra machines VMSS will add during upgrade. The MachinePool +upgrades each machine in the pool by first cordoning and draining, then replacing the machine in the pool. + +#### Story 2 - Reducing the Number of Replicas in a MachinePool +Alex is an engineer in a large organization which has a MachinePool running. Alex has too many nodes running on the +cluster and would like to reduce the replicas. It is important to Alex that the MachinePool doesn't experience downtime. +Alex decreases the replica count of the MachinePool by 2. The MachinePool deletes 2 machines from the pool by first +cordoning and draining, then deleting the underlying infrastructure. + +#### Story 3 - Deleting an individual Azure Machine Pool Machine +Alex is an engineer in a large organization which has a MachinePool running with 5 replicas. Alex would like to delete a +specific MachinePool machine. It is important to Alex that the MachinePool doesn't experience downtime while deleting +the individual machine. Alex uses `kubectl` to delete the specific MachinePool machine resource. The MachinePool machine +is cordoned and drained, then the underlying infrastructure is deleted. The MachinePool still has a replica count of 5, +but only has 4 running replicas. The MachinePool creates a new machine to take the place of the deleted instance. + + +### Requirements + +#### Functional + +FR1. CAPZ MUST support deleting an individual Virtual Machine Scale Set instance. + +FR2. CAPZ SHOULD support cordon and draining workload from a Virtual Machine Scale Set instance. + +FR3. CAPZ SHOULD support updating an instance in-place using Virtual Machine Scale Set Update API + +#### Non-Functional + +NFR1. CAPZ SHOULD provide resource status updates as the Azure resources are provisioned + +NFR2. CAPZ SHOULD not overwhelm Azure API request limits and should rate limit reconciliation cycles + +NFR3. Unit tests MUST exist for upgrade and delete instance selection + +NFR4. e2e tests MUST exist for MachinePool upgrade, scale up / down, and instance delete scenarios + +### Implementation Details/Notes/Constraints + +The current implementation of CAPZ AzureMachinePool embeds the state of each of the instances in the Scale Set within +the status of the AzureMachinePool. + +```go +// AzureMachinePoolStatus defines the observed state of AzureMachinePool +AzureMachinePoolStatus struct { + + /* + Other fields omitted for brevity + */ + + // Instances is the VM instance status for each VM in the VMSS + // +optional + Instances []*AzureMachinePoolInstanceStatus `json:"instances,omitempty"` +} + +// AzureMachinePoolInstanceStatus provides status information for each instance in the VMSS +AzureMachinePoolInstanceStatus struct { + // Version defines the Kubernetes version for the VM Instance + // +optional + Version string `json:"version"` + + // ProvisioningState is the provisioning state of the Azure virtual machine instance. + // +optional + ProvisioningState *infrav1.VMState `json:"provisioningState"` + + // ProviderID is the provider identification of the VMSS Instance + // +optional + ProviderID string `json:"providerID"` + + // InstanceID is the identification of the Machine Instance within the VMSS + // +optional + InstanceID string `json:"instanceID"` + + // InstanceName is the name of the Machine Instance within the VMSS + // +optional + InstanceName string `json:"instanceName"` + + // LatestModelApplied indicates the instance is running the most up-to-date VMSS model. A VMSS model describes + // the image version the VM is running. If the instance is not running the latest model, it means the instance + // may not be running the version of Kubernetes the Machine Pool has specified and needs to be updated. + LatestModelApplied bool `json:"latestModelApplied"` +} +``` + +#### Existing APIs for Clarity +These are included here to provide a description of the structures as they exist in CAPI and will be leveraged to +extend AzureMachinePool. There are no changes to these structures. They are simply for reference. + +```go +// MachineDeploymentStrategy describes how to replace existing machines with new ones. +type MachineDeploymentStrategy struct { + // Type of deployment. Currently the only supported strategy is + // "RollingUpdate". + // Default is RollingUpdate. + // +optional + Type MachineDeploymentStrategyType `json:"type,omitempty"` + + // Rolling update config params. Present only if + // MachineDeploymentStrategyType = RollingUpdate. + // +optional + RollingUpdate *MachineRollingUpdateDeployment `json:"rollingUpdate,omitempty"` +} + +// MachineRollingUpdateDeployment is used to control the desired behavior of rolling update. +type MachineRollingUpdateDeployment struct { + // The maximum number of machines that can be unavailable during the update. + // Value can be an absolute number (ex: 5) or a percentage of desired + // machines (ex: 10%). + // Absolute number is calculated from percentage by rounding down. + // This can not be 0 if MaxSurge is 0. + // Defaults to 0. + // Example: when this is set to 30%, the old MachineSet can be scaled + // down to 70% of desired machines immediately when the rolling update + // starts. Once new machines are ready, old MachineSet can be scaled + // down further, followed by scaling up the new MachineSet, ensuring + // that the total number of machines available at all times + // during the update is at least 70% of desired machines. + // +optional + MaxUnavailable *intstr.IntOrString `json:"maxUnavailable,omitempty"` + + // The maximum number of machines that can be scheduled above the + // desired number of machines. + // Value can be an absolute number (ex: 5) or a percentage of + // desired machines (ex: 10%). + // This can not be 0 if MaxUnavailable is 0. + // Absolute number is calculated from percentage by rounding up. + // Defaults to 1. + // Example: when this is set to 30%, the new MachineSet can be scaled + // up immediately when the rolling update starts, such that the total + // number of old and new machines do not exceed 130% of desired + // machines. Once old machines have been killed, new MachineSet can + // be scaled up further, ensuring that total number of machines running + // at any time during the update is at most 130% of desired machines. + // +optional + MaxSurge *intstr.IntOrString `json:"maxSurge,omitempty"` + + // DeletePolicy defines the policy used by the MachineDeployment to identify nodes to delete when downscaling. + // Valid values are "Random, "Newest", "Oldest" + // When no value is supplied, the default DeletePolicy of MachineSet is used + // +kubebuilder:validation:Enum=Random;Newest;Oldest + // +optional + DeletePolicy *string `json:"deletePolicy,omitempty"` +} +``` + +#### Proposed API Changes +The proposed changes below show the CAPZ AzureMachinePool and AzureMachinePoolMachine. + +```go +const azureMachinePoolUpdateInstanceAnnotation = "azuremachinepool.infrastructure.cluster.x-k8s.io/updateInstance" + +type AzureMachinePoolSpec struct { + // The deployment strategy to use to replace existing machines with + // new ones. + // +optional + Strategy MachineDeploymentStrategy `json:"strategy,omitempty"` + + // NodeDrainTimeout is the total amount of time that the controller will spend on draining a node. + // The default value is 0, meaning that the node can be drained without any time limitations. + // NOTE: NodeDrainTimeout is different from `kubectl drain --timeout` + // +optional + NodeDrainTimeout *metav1.Duration `json:"nodeDrainTimeout,omitempty"` +} + +// AzureMachinePoolMachineSpec defines the desired state of AzureMachinePoolMachine +type AzureMachinePoolMachineSpec struct { + // ProviderID is the identification ID of the Virtual Machine Scale Set + ProviderID string `json:"providerID"` +} + +// AzureMachinePoolMachineStatus defines the observed state of AzureMachinePoolMachine +type AzureMachinePoolMachineStatus struct { + // NodeRef will point to the corresponding Node if it exists. + // +optional + NodeRef *corev1.ObjectReference `json:"nodeRef,omitempty"` + + // Version defines the Kubernetes version for the VM Instance + // +optional + Version string `json:"version"` + + // ProvisioningState is the provisioning state of the Azure virtual machine instance. + // +optional + ProvisioningState *infrav1.VMState `json:"provisioningState"` + + // InstanceID is the identification of the Machine Instance within the VMSS + InstanceID string `json:"instanceID"` + + // InstanceName is the name of the Machine Instance within the VMSS + // +optional + InstanceName string `json:"instanceName"` + + // FailureReason will be set in the event that there is a terminal problem + // reconciling the MachinePool machine and will contain a succinct value suitable + // for machine interpretation. + // + // Any transient errors that occur during the reconciliation of MachinePools + // can be added as events to the MachinePool object and/or logged in the + // controller's output. + // +optional + FailureReason *errors.MachineStatusError `json:"failureReason,omitempty"` + + // FailureMessage will be set in the event that there is a terminal problem + // reconciling the MachinePool and will contain a more verbose string suitable + // for logging and human consumption. + // + // Any transient errors that occur during the reconciliation of MachinePools + // can be added as events to the MachinePool object and/or logged in the + // controller's output. + // +optional + FailureMessage *string `json:"failureMessage,omitempty"` + + // Conditions defines current service state of the AzureMachinePool. + // +optional + Conditions clusterv1.Conditions `json:"conditions,omitempty"` + + // LongRunningOperationState saves the state for an Azure long running operations so it can be continued on the + // next reconciliation loop. + // +optional + LongRunningOperationState *infrav1.Future `json:"longRunningOperationState,omitempty"` + + // LatestModelApplied indicates the instance is running the most up-to-date VMSS model. A VMSS model describes + // the image version the VM is running. If the instance is not running the latest model, it means the instance + // may not be running the version of Kubernetes the Machine Pool has specified and needs to be updated. + LatestModelApplied bool `json:"latestModelApplied"` + + // Ready is true when the provider resource is ready. + // +optional + Ready bool `json:"ready"` +} +``` + +#### Proposed Controller Changes + +* Create a new AzureMachinePoolMachine controller. +* Remove VMSS instance status tracking logic from AzureMachinePool controller and moving it to AzureMachinePoolMachine + controller. +* Introduce rate limiting behavior to AzureMachinePool* controllers to ensure Azure API limits are not + exceeded. + +#### Proposed Changes of Responsibility +Currently in CAPZ, the AzureMachinePool controller is responsible for both the Virtual Machine Scale Set (VMSS) and the +instances created by the VMSS. The proposed change would separate the responsibility of managing the state of the VMSS +and the instances created by the VMSS. This would introduce a new AzureMachinePoolMachine controller and a new +MachinePoolMachineScope. The responsibilities would be as follows. + +**AzureMachinePool Responsibilities:** +- Create AzureMachinePoolMachine instances when a new VMSS instance is observed. The AzureMachinePoolMachine spec should + have the `ProviderID` field set with the observed resource ID. The AzureMachinePool should also be added to the + AzureMachinePoolMachine's OwnerReferences. +- Selection of AzureMachinePoolMachine instances for deletion or upgrade. When a change to the AzureMachinePool model + occurs, the `MachinePoolScope` will be responsible for coordinating the rollout of the updated model by selecting + AzureMachinePoolMachines to delete or upgrade with respect to MaxUnavailable and the DeletePolicy. +- Scale up: AzureMachinePool should increase the number of VMSS replicas if the replica count increases on MachinePool +- Scale down: AzureMachinePool should select and delete AzureMachinePoolMachines that are overprovisioned with respect + to MaxUnavailable and DeletePolicy from the proposed MachinePool Strategy. +- Upgrade: AzureMachinePool should select the AzureMachinePoolMachines to upgrade, set the + `azureMachinePoolUpdateInstanceAnnotation` on the AzureMachinePoolMachine and wait for the annotation to be removed + before proceeding with the rolling upgrade. +- Clean up. When a AzureMachinePoolMachine is no longer in the list of instances in Azure, but a matching + AzureMachinePoolMachine resource exists, delete the AzureMachinePoolMachine. + +**AzureMachinePoolMachine Responsibilities:** +- Update Azure Provisioning State: when creating a new VMSS instance, the AzureMachinePoolMachine controller will poll + the Azure API until the instance reaches a terminal state. +- Cordon and Drain: when deleting or upgrading the AzureMachinePoolMachine resource, the AzureMachinePoolMachine + controller is responsible for ensuring workload is moved from the node prior to removing the underlying Azure + infrastructure. +- NodeRef: as a VMSS instance joins the cluster, the AzureMachinePoolMachine controller is responsible for ensuring + the node is found and ready before marking the AzureMachinePoolMachine resource as ready. +- Upgrade: The AzureMachinePoolMachine is responsible for removing the `azureMachinePoolUpdateInstanceAnnotation` upon + successful instance upgrade. + +## Available Options + +### Option 1: Add Annotations to AzureMachinePool for Instance Delete Selection +Create annotations on AzureMachinePool resources to indicate which machine should be upgraded next or deleted. + +#### Option 1 Pros: +- No custom resource schema changes would be needed +- Would enable a user to provide input to the help the controller to decide the next machine to delete / upgrade + +#### Option 1 Cons: +- Annotations don't have strong schema +- Controller would be dependent on the application of annotations to inform machine selection, which could be error + prone and brittle. +- Each machine lifecycle will need to be embedded in the status of the AzureMachinePool to enable cordon and drain + +### Option 2: Separate AzureMachinePool and AzureMachinePoolMachines +Introduce a new custom resource, AzureMachinePoolMachine, to represent AzureMachinePool instances rather than persisting +each instance status in the `AzureMachinePool.Status.Instances` + +#### Option 2 Pros: +- Allows for easier tracking of state of individual AzureMachinePool instances via their own resource +- Each AzureMachinePoolMachine can be responsible for their own lifecycle, decomposing the logic in the controllers +- Would enable a user to interact with an AzureMachinePoolMachine the same way they would any other machine + +#### Option 2 Cons: +- Breaking change to the status of the AzureMachinePool by removing the instances array + +## Conclusions +Separate AzureMachinePool and AzureMachinePoolMachine resources provide a reasonable way to break down concerns and +offer the functionality to enable safe rolling upgrades and individual instance deletion. + +## Additional Details + +### Test Plan + +* Unit tests to validate the proper selection of VMSS nodes to delete / upgrade +* Unit tests for the new MachinePoolMachineScope +* e2e tests for upgrade, scale down / up, and instance delete + +## Implementation History + +- 2021/02/22: Initial proposal +- 2021/01/06: Initial PR opened https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/1105 \ No newline at end of file