Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
149 changes: 121 additions & 28 deletions docs/proposals/20191017-kubeadm-based-control-plane.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ reviewers:
- "@hardikdr"
- "@sbueringer"
creation-date: 2019-10-17
last-updated: 2020-09-07
last-updated: 2021-01-07
status: implementable
---

Expand All @@ -39,15 +39,17 @@ status: implementable
* [Identified features from user stories](#identified-features-from-user-stories)
* [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints)
* [New API Types](#new-api-types)
* [Rollout strategy](#rollout-strategy)
* [Modifications required to existing API Types](#modifications-required-to-existing-api-types)
* [Behavioral Changes from v1alpha2](#behavioral-changes-from-v1alpha2)
* [Behaviors](#behaviors)
* [Create](#create)
* [Scale Up](#scale-up)
* [Scale Down](#scale-down)
* [Delete of the entire KubeadmControlPlane (kubectl delete controlplane my-controlplane)](#delete-of-the-entire-kubeadmcontrolplane-kubectl-delete-controlplane-my-controlplane)
* [KubeadmControlPlane rollout (using create-swap-and-delete)](#kubeadmcontrolplane-rollout-using-create-swap-and-delete)
* [Constraints and Assumptions](#constraints-and-assumptions)
* [KubeadmControlPlane rollout](#kubeadmcontrolplane-rollout)
* [Rolling update strategy](#rolling update strategy)
* [Constraints and Assumptions](#constraints-and-assumptions)
* [Remediation (using delete-and-recreate)](#remediation-using-delete-and-recreate)
* [Why delete and recreate](#why-delete-and-recreate)
* [Scenario 1: Three replicas, one machine marked for remediation](#scenario-1-three-replicas-one-machine-marked-for-remediation)
Expand Down Expand Up @@ -92,8 +94,7 @@ and proxy services, and the underlying etcd data store.
During 2019 we saw control plane management implementations in each infrastructure provider. Much like
bootstrapping was identified as being reimplemented in every infrastructure provider and then extracted into Cluster API
Bootstrap Provider Kubeadm (CABPK), we believe we can reduce the redundancy of control plane management across providers
and centralize the logic in Cluster API. We also wanted to ensure that any default control plane management that we
for the default implementation would not preclude the use of alternative control plane management solutions.
and centralize the logic in Cluster API. We also wanted to ensure that default control plane management and use of any alternative control plane management solutions are separated.

### Goals

Expand All @@ -104,6 +105,7 @@ for the default implementation would not preclude the use of alternative control
- To provide a default machine-based implementation using kubeadm
- To provide a kubeadm-based implementation that is infrastructure provider agnostic
- To enable declarative orchestrated replacement of control plane machines, such as to roll out an OS-level CVE fix.
- To support Rolling Update type of rollout strategy (similar to MachineDeployment) in KubeadmControlPlane.
- To manage a kubeadm-based, "stacked etcd" control plane
- To manage a kubeadm-based, "external etcd" control plane (using a pre-existing, user-managed, etcd clusters).
- To manage control plane deployments across failure domains.
Expand Down Expand Up @@ -148,6 +150,7 @@ Non-Goals listed in this document are intended to scope bound the current v1alph
8. As a cluster operator, I want to be able to quickly respond to a non-Kubernetes CVE that affects my base image or Kubernetes dependencies by upgrading my clusters in an automated fashion.
9. As a cluster operator, I would like to upgrade to a new minor version of Kubernetes so that my cluster remains supported.
10. As a cluster operator, I want to know that my cluster isn’t working properly after creation. I have ended up with an API server I can access, but kube-proxy isn’t functional or new machines are not registering themselves with the control plane.
11. As a cluster operator I would like to use MachineDeployment like rollout strategy to upgrade my control planes. For example in resource constrained environments I would like to my machines to be removed one-by-one before creating a new ones. I would also like to be able to rely on the default control plane upgrade mechanism without any extra effort when specific rollout strategy is not needed.

#### Identified features from user stories

Expand All @@ -156,6 +159,7 @@ Non-Goals listed in this document are intended to scope bound the current v1alph
3. In service of user story 5, the kubeadm control plane provider must also manage etcd membership via kubeadm as part of scaling down (`kubeadm` takes care of adding the new etcd member when joining).
4. The control plane provider should provide indicators of health to meet user story 6 and 10. This should include at least the state of etcd and information about which replicas are currently healthy or not. For the default implementation, health attributes based on artifacts kubeadm installs on the cluster may also be of interest to cluster operators.
5. The control plane provider must be able to upgrade a control plane’s version of Kubernetes as well as updating the underlying machine image on where applicable (e.g. virtual machine based infrastructure).
6. To address user story 11, the control plane provider must provide Rolling Update strategy similar to MachineDeployment. With `MaxUnavailable` and `MaxSurge` fields user is able to delete old machine first during upgrade. Control plane provider should default the `RolloutStrategy`, `MaxUnavailable` and `MaxSurge` fields such a way that scaling up is the default behavior during upgrade.

### Implementation Details/Notes/Constraints

Expand All @@ -180,6 +184,74 @@ And the following defaulting:

- `KubeadmControlPlane.Spec.Replicas: 1`

##### Rollout strategy

```go
type RolloutStrategyType string

const (
// Replace the old control planes by new one using rolling update
// i.e. gradually scale up or down the old control planes and scale up or down the new one.
RollingUpdateStrategyType RolloutStrategyType = "RollingUpdate"
)
```

- Add `KubeadmControlPlane.Spec.RolloutStrategy` defined as:
Comment thread
jan-est marked this conversation as resolved.

```go
// The RolloutStrategy to use to replace control plane machines with
// new ones.
// +optional
RolloutStrategy *RolloutStrategy `json:"strategy,omitempty"`
```

- Add `KubeadmControlPlane.RolloutStrategy` struct defined as:

```go
// RolloutStrategy describes how to replace existing machines
// with new ones.
type RolloutStrategy struct {
// Type of rollout. Currently the only supported strategy is
// "RollingUpdate".
// Default is RollingUpdate.
// +optional
Type RolloutStrategyType `json:"type,omitempty"`

// Rolling update config params. Present only if
// RolloutStrategyType = RollingUpdate.
// +optional
RollingUpdate *RollingUpdate `json:"rollingUpdate,omitempty"`
}
```

- Add `KubeadmControlPlane.RollingUpdate` struct defined as:

```go
// RollingUpdate is used to control the desired behavior of rolling update.
type RollingUpdate struct {
// The maximum number of control planes that can be unavailable during the rollout.
// Value can be an absolute number 0 or 1.
// This needs to be 1 if MaxSurge is 0.
// Defaults to 0.
// Example: when this is set to 1 and MaxSurge is 0, the control planes can be scaled
// down one-by-one when the rolling update starts.
// Control plane scale down is disabled when desired number of control planes is 1.
// Scale down is possible only if desired number of control planes is 3 or more.
// +optional
MaxUnavailable *intstr.IntOrString `json:"maxUnavailable,omitempty"`

// The maximum number of control planes that can be scheduled above the
// desired number of control planes.
// Value can be an absolute number 1 or 0.
// This needs to be 1 if MaxUnavailable is 0.
// Defaults to 1.
// Example: when this is set to 1 and MaxUnavailable is 0, the control plane can be scaled
// up immediately when the rolling update starts.
// +optional
MaxSurge *intstr.IntOrString `json:"maxSurge,omitempty"`
}
```

#### Modifications required to existing API Types

- Add `Cluster.Spec.ControlPlaneRef` defined as:
Expand Down Expand Up @@ -351,41 +423,61 @@ spec:
- Completely removing the control plane and issuing a delete on the underlying machines.
- User documentation should focus on deletion of the Cluster resource rather than the KubeadmControlPlane resource.

##### KubeadmControlPlane rollout (using create-swap-and-delete)
##### KubeadmControlPlane rollout

- Triggered by:
- Changes to Version
- Changes to the kubeadmConfigSpec
- Changes to the infrastructureRef
- The `upgradeAfter` field, which can be set to a specific time in the future
- Set to `nil` or the zero value of `time.Time` if no upgrades are desired
- An upgrade will run when that timestamp is passed
- Good for scheduling upgrades/SLOs
- Set `upgradeAfter` to now (in RFC3339 form) if an upgrade is required immediately

- Rollout operations rely on scale up and scale down which are be blocked based on Etcd and control plane health checks
KubeadmControlPlane rollout operations rely on [scale up](#scale up) and [scale down](#scale_down) which are be blocked based on Etcd and control plane health checks
- See [Health checks](#Health checks) below.

- The rollout algorithm is the following:
KubeadmControlPlane rollout is triggered by:
- Changes to Version
- Changes to the kubeadmConfigSpec
- Changes to the infrastructureRef
- The `upgradeAfter` field, which can be set to a specific time in the future
- Set to `nil` or the zero value of `time.Time` if no upgrades are desired
- An upgrade will run after that timestamp is passed
- Good for scheduling upgrades/SLOs
- Set `upgradeAfter` to now (in RFC3339 form) if an upgrade is required immediately

Comment thread
jan-est marked this conversation as resolved.
Outdated
- The controller should tolerate the manual or automatic removal of a replica during the upgrade process. A replica that fails during the upgrade may block the completion of the upgrade. Removal or other remedial action may be necessary to allow the upgrade to complete.

- In order to determine if a Machine to be rolled out, KCP implements the following:
- The infrastructureRef link used by each machine at creation time is stored in annotations at machine level.
- The kubeadmConfigSpec used by each machine at creation time is stored in annotations at machine level.
- If the annotation is not present (machine is either old or adopted), we won't roll out on any possible changes made in KCP's ClusterConfiguration given that we don't have enough information to make a decision. Users should use KCP.Spec.UpgradeAfter field to force a rollout in this case.

##### Rolling update strategy

Currently KubeadmControlPlane supports only one rollout strategy type the `RollingUpdateStrategyType`. Rolling upgrade strategy's behavior can be modified by using `MaxUnavailable` and `MaxSurge` fields. Both field values can be an absolute number 0 or 1 with following rules:

- If `MaxUnavailable` is set to 0 `MaxSurge` needs to be 1 (default values)
- If `MaxUnavailable` is set to 1 `MaxSurge` needs to be 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure this validation is enforced when the proposal is implemented


When `MaxUnavailable` is set to 0 and `MaxSurge` is set to 1 the rollout algorithm is as follows:

- Find Machines that have an outdated spec
- If there is a machine requiring rollout
- Scale up control plane creating a machine with the new spec
- Scale down control plane by removing one of the machine that needs rollout (the oldest out-of date machine in the failure domain that has the most control-plane machines on it)

- In order to determine if a Machine to be rolled out, KCP implements the following:
- The infrastructureRef link used by each machine at creation time is stored in annotations at machine level.
- The kubeadmConfigSpec used by each machine at creation time is stored in annotations at machine level.
- If the annotation is not present (machine is either old or adopted), we won't roll out on any possible changes made in KCP's ClusterConfiguration given that we don't have enough information to make a decision.
Users should use KCP.Spec.UpgradeAfter field to force a rollout in this case.
When `MaxUnavailable` is set to 1 and `MaxSurge` is set to 0 the rollout algorithm is as follows:

- The controller should tolerate the manual or automatic removal of a replica during the upgrade process. A replica that fails during the upgrade may block the completion of the upgrade. Removal or other remedial action may be necessary to allow the upgrade to complete.
- KubeadmControlPlane verifies that control plane replica count is >= 3
- Find Machines that have an outdated spec and scale down the control plane by removing the oldest out-of-date machine.
- Scale up control plane by creating a new machine with the updated spec

Comment thread
jan-est marked this conversation as resolved.
> NOTE: Setting `MaxUnavailable` to 1 and `MaxSurge` to 0 could be use in resource constrained environment like bare-metal, OpenStack or vSphere resource pools, etc when there is no capacity to Scale up the control plane.

###### Constraints and Assumptions

* A stable endpoint (provided by DNS or IP) for the API server will be required in order to allow for machines to maintain a connection to control plane machines as they are swapped out during upgrades. This proposal is agnostic to how this is achieved, and is being tracked in https://github.com/kubernetes-sigs/cluster-api/issues/1687. The control plane controller will use the presence of the apiEndpoints status field of the cluster object to determine whether or not to proceed. This behaviour is currently implicit in the implementations for cloud providers that provider a load balancer construct.
* A stable endpoint (provided by DNS or IP) for the API server will be required in order
to allow for machines to maintain a connection to control plane machines as they are swapped out
during upgrades. This proposal is agnostic to how this is achieved, and is being tracked
in https://github.com/kubernetes-sigs/cluster-api/issues/1687. The control plane controller will use
the presence of the apiEndpoints status field of the cluster object to determine whether or not to proceed.
This behaviour is currently implicit in the implementations for cloud providers that provider a load balancer construct.

* Infrastructure templates are expected to be immutable, so infrastructure template contents do not have to hashed in order to detect
changes.
* Infrastructure templates are expected to be immutable, so infrastructure template contents do not have to be hashed in order to detect
changes.

##### Remediation (using delete-and-recreate)

Expand Down Expand Up @@ -556,4 +648,5 @@ For the purposes of designing upgrades, two existing lifecycle managers were exa
- [x] 12/04/2019: Updated References to ErrorMessage/ErrorReason to FailureMessage/FailureReason
- [x] 12/04/2019: Initial stubbed KubeadmControlPlane controller added [#1826](https://github.com/kubernetes-sigs/cluster-api/pull/1826)
- [x] 07/09/2020: Document updated to reflect changes up to v0.3.9 release
- [x] 22/09/2020: KCP remediation added
- [x] 22/09/2020: KCP remediation added
- [x] XX/XX/2020: KCP rollout strategies added