-
Notifications
You must be signed in to change notification settings - Fork 536
Add cluster-api integration enhancement #913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,305 @@ | ||||||
| --- | ||||||
| title: Cluster API Integration | ||||||
| authors: | ||||||
| - "@JoelSpeed" | ||||||
| - "@alexander-demichev" | ||||||
| reviewers: | ||||||
| - "@elmiko" | ||||||
| - "@Fedosin" | ||||||
| - "@lobziik" | ||||||
| - "@asalkeld" | ||||||
| - "@hardys" | ||||||
| approvers: | ||||||
| - "@elmiko" | ||||||
| - "@enxebre" | ||||||
| - "@asalkeld" | ||||||
| creation-date: 2021-09-16 | ||||||
| last-updated: 2021-09-16 | ||||||
| status: implementable | ||||||
| --- | ||||||
|
|
||||||
| # Cluster API Integration | ||||||
|
|
||||||
| ## Release Signoff Checklist | ||||||
|
|
||||||
| - [x] Enhancement is `implementable` | ||||||
| - [ ] Design details are appropriately documented from clear requirements | ||||||
| - [ ] Test plan is defined | ||||||
| - [ ] Operational readiness criteria is defined | ||||||
| - [ ] Graduation criteria for dev preview, tech preview, GA | ||||||
| - [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/) | ||||||
|
|
||||||
| ## Summary | ||||||
|
|
||||||
| This enhancement describes the process of integrating the upstream [Cluster API](https://github.com/kubernetes-sigs/cluster-api) | ||||||
| project into OpenShift standalone clusters. | ||||||
|
|
||||||
| ## Motivation | ||||||
|
|
||||||
| We would like to give users the ablility to use Cluster API for machine management, as an addition or supplement for Machine API. | ||||||
|
|
||||||
| ### Goals | ||||||
|
|
||||||
| - Run Cluster API controllers for managing infrastructure in a similar way to Machine API. | ||||||
| - Provide forward compatibility between Machine API (MAPI) and Cluster API (CAPI). | ||||||
| - Ensure feature parity between MAPI and CAPI before migration. | ||||||
|
|
||||||
| ### Non-Goals | ||||||
|
|
||||||
| - Deprecate or remove any existing APIs. | ||||||
| - Stop providing support for Machine API in near future. | ||||||
| - Provide any automated integration or migration between MAPI and CAPI resources. | ||||||
| - Change current autoscaler behavior to use Cluster API. This will be handled after Technical Preview. | ||||||
|
|
||||||
| ## Proposal | ||||||
|
|
||||||
| This proposal is about introducing Cluster API alongside Machine API as a technical preview in OpenShift clusters. Cluster API on OpenShift has the potential to unlock new infrastructure providers and community engagement for our users. | ||||||
| During the technical preview we will gather feedback on its usefulness as well as evaluate the feasibility of using Cluster API as a primary infrastructure resource API for OpenShift. | ||||||
|
|
||||||
| ### User Stories | ||||||
alexander-demicev marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
|
||||||
| #### Story 1 | ||||||
|
|
||||||
| As an OpenShift developer, I would like to leverage the upstream community Cluster API infrastructure providers and reduce the barrier to OpenShift of supporting new providers. | ||||||
|
|
||||||
| #### Story 2 | ||||||
|
|
||||||
| As an OpenShift developer, I would like to collaborate with third parties who already have vested interests in maintaining Machine controllers for various infrastructure providers so that i can benefit from their expertise as I add new features. | ||||||
|
|
||||||
| #### Story 3 | ||||||
|
|
||||||
| As a cloud developer, I would like to easily onboard new infrastructure providers as this process is well documented by the CAPI community and any implementation will be able to be leveraged by both Kubernetes and OpenShift customers, increasing the value of implementing a new provider. | ||||||
|
|
||||||
| #### Story 4 | ||||||
|
|
||||||
| As a developer, I would like to be able to use the same set of tools for infrastructure management in OpenShift as I can for vanilla Kubernetes. | ||||||
|
|
||||||
| #### Story 5 | ||||||
|
|
||||||
| As a cloud operator, I would like to be able to use the CAPI infrastructure resource API for managing mixed infrastructure provider clusters. | ||||||
|
|
||||||
| #### Story 6 | ||||||
|
|
||||||
| As a developer, I would like to have support for hub-spoke OpenShift clusters. Where a management cluster can manage workload clusters that are running on different infrastructure providers. | ||||||
|
|
||||||
| #### Story 7 | ||||||
|
|
||||||
| As a user, I would like to create new MachineSets using CAPI and be able to explore features that are not available in MAPI. | ||||||
|
|
||||||
| ### Implementation Details | ||||||
|
|
||||||
| First, we need to establish Cluster API resource management by ensuring all required components(CRDs, controllers, | ||||||
| RBAC, secrets) are successfully installed and running within the OpenShift cluster. | ||||||
|
|
||||||
| Cluster API will only be present in the cluster (installed by a new operator) if and when a user installs a feature gate. | ||||||
| We will introduce a new, OpenShift specific, feature gate `ClusterAPIEnabled` and include it within the `TechPreviewNoUpgrade` FeatureSet. | ||||||
|
|
||||||
| Once installed, this preview will allow the user to create new MachineSets using CAPI and explore the | ||||||
| features available within CAPI, for comparison with MAPI. For example, availability set support in Azure that | ||||||
| was already in CAPI, but only being introduced to MAPI as of 4.10. | ||||||
|
|
||||||
| During this timeframe, any user wanting to migrate or try out the preview will be left to manually migrate the MachineSet | ||||||
| or create a new one, using either the upstream documentation or documentation provided by OpenShift. | ||||||
|
|
||||||
| This preview will be intended to be used to create day 2 worker MachineSets and is not expected to be integrated into the install process in any way. | ||||||
alexander-demicev marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
|
||||||
| It's important to note that the preview will focus on supporting CRDs that our users are familiar with, that includes: | ||||||
| Machines, MachineSets, MachineHealthChecks. We are not planning to document support for other CRDs like MachineDeployment, | ||||||
| however they will be installed. | ||||||
|
|
||||||
| ### Supported platforms | ||||||
|
|
||||||
| The technical preview aims to support: AWS, Azure, GCP, Baremetal, Openstack. | ||||||
|
|
||||||
| #### Cluster API resource management | ||||||
|
|
||||||
| In order to maintain the lifecycle of Cluster API related resources, we will create a new operator `cluster-capi-operator`, this name was chosen for avoiding confusion with upstream Cluster API operator. | ||||||
alexander-demicev marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| This operator will be responsible for all administrative tasks related to the deployment of the Cluster API project within the cluster. | ||||||
alexander-demicev marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| During tech preview phase, the new operator will leverage the new [CVO feature](https://github.com/openshift/enhancements/blob/master/enhancements/update/cvo-techpreview-manifests.md) for managing all Cluster API related CRDs. | ||||||
|
|
||||||
| `cluster-capi-operator` and it's operands will be provisioned in a new `openshift-cluster-api` namespace. | ||||||
|
|
||||||
| The operator will perform the following tasks: | ||||||
|
|
||||||
| ##### Reconcile FeatureGate object | ||||||
|
|
||||||
| While Cluster API intergration is in tech preview, the operator will reconcile the cluster [`FeatureGate`](https://docs.openshift.com/container-platform/4.8/nodes/clusters/nodes-cluster-enabling-features.html) object and check for `ClusterAPIEnabled` feature gate presence. | ||||||
| The operator will procceed with Cluster API installation if and only if the required feature gate is present. | ||||||
|
|
||||||
| ##### Deploy Cluster Machine Approver | ||||||
|
|
||||||
| In order for Cluster API machines to succefully join the cluster, the Kubelet CSRs need to be approved. | ||||||
| The operator will deploy a separate instance of the `cluster-machine-approver`, which will be configured to be used with Cluster API machines by providing [`--apigroup`](https://github.com/openshift/cluster-machine-approver/blob/master/main.go#L54) flag that was recently introduced. | ||||||
|
|
||||||
| ##### Install upstream CAPI operator | ||||||
|
|
||||||
| We will use the upstream [Cluster API operator](https://github.com/kubernetes-sigs/cluster-api/blob/master/docs/proposals/20201020-capi-provider-operator.md) for managing CRDs and deploying infrastructure providers. | ||||||
|
|
||||||
| ##### Deploy core Cluster API | ||||||
|
|
||||||
| Once the upstream Cluster API Operator is installed, the next step is to create a `CoreProvider` CR along with a configmap that contains upstream Cluster API CRDs, Deployment, Webhooks and RBAC resources. | ||||||
alexander-demicev marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| The example usage is described [here](https://github.com/kubernetes-sigs/cluster-api/blob/master/docs/proposals/20201020-capi-provider-operator.md#air-gapped-environment). | ||||||
|
|
||||||
| ##### Deploy Cluster API infrastructure provider | ||||||
|
|
||||||
| The `cluster-capi-operator` operator will create an appropriate `InfrastructureProvider` CR (based on the cluster platform) and a configmap that contains upstream Cluster API cloud provider CRDs, Deployment, Webhooks and RBAC resources. | ||||||
alexander-demicev marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
|
||||||
alexander-demicev marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| ##### Reconcile Cluster object | ||||||
|
|
||||||
| Cluster API's main entity is a `Cluster`, it represents the cluster which is managed by Cluster API and the cluster's infrastructure. More details about cluster object are [here](https://cluster-api.sigs.k8s.io/user/concepts.html). | ||||||
| The `cluster-capi-operator` will need to create the `Cluster` and a proper `InfrastructureCluster` resource for the OpenShift cluster. | ||||||
| Because we have our own infrastructure management strategy in OpenShift, we should leverage the [externally managed cluster infrastructure](https://github.com/kubernetes-sigs/cluster-api/blob/master/docs/proposals/20210203-externally-managed-cluster-infrastructure.md) feature. | ||||||
| This means that the created `InfrastructureCluster` should have `cluster.x-k8s.io/managed-by:` annotation set. | ||||||
|
|
||||||
| Cluster object reconciler can be done as a separate controller. The controller should: | ||||||
| - Wait before Cluster and InfrastructureCluster CRD is present | ||||||
| - Create both Cluster and InfrastructureCluster objects with externally managed cluster infrastructure annotation. | ||||||
| - Ensure spec/status of InfrastructureCluster are configured for the OpenShift cluster (infrastructure information can be sourced from resources within the OpenShift Cluster). | ||||||
| - Patch `Cluster` status to `Ready=true`. | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Who is creating the namespace where the cluster and machinesets CRs are mean to live?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @enxebre does hypershift re-implement the capi controllers or does it install the capi controllers? The CRDs have webhooks specifying a namespace (like this https://github.com/kubernetes-sigs/cluster-api/blob/main/config/crd/patches/webhook_in_clusters.yaml#L17) that will be different in the CRDs and WebHookConfigurations Currently the downstream operator gathers and specifies the resources (including CRDs) and the upstream operator will install them.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. They should be able to operate independently. Hypershift owns the lifecycle of their CAPI controllers/CRs.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We will only reconcile CAPI CRs in |
||||||
|
|
||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see any mention to the control plane CR here? Which implementation are you planning to use?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we are currently not installing the controlplane CR (but super easy to do so).
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We are not planning to support control planes for tech preview
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. might be nice to put this in the non-goals as well |
||||||
| ##### Create user data secret | ||||||
|
|
||||||
| Cluster API Machines will need a user data secret, similar to the one that Machine API uses. | ||||||
| This secret is created by installer for Machine API. | ||||||
| While Cluster API components are in technical preview, and therefore not integrated into the OpenShift Installer, the operator can copy the worker user data secret from `openshift-machine-api` namespace to `openshift-cluster-api`. | ||||||
|
|
||||||
| At this point all Cluster API components should be installed and ready to use. | ||||||
|
|
||||||
| #### CVO management | ||||||
|
|
||||||
| A new `cluster-capi-operator` image will be built and included in every release payload. | ||||||
|
|
||||||
| #### Credentials management | ||||||
|
|
||||||
| The `cluster-capi-operator`'s manifests should contain an appropriate `CredentialsRequest` for each supported infrastructure provider. | ||||||
| This is similiar to [machine-api-operator](https://github.com/openshift/machine-api-operator/blob/6f629682b791a6f4992b78218bfc6e41a32abbe9/install/0000_30_machine-api-operator_00_credentials-request.yaml) | ||||||
|
|
||||||
| #### Cluster API cloud providers | ||||||
|
|
||||||
| Cluster API infrastructure providers will live in forks, similar to what is now done for Machine API. We now evaluating moving | ||||||
| current providers implementation to new repos that will be called `machine-api-provider-*` and reseting current | ||||||
| `cluster-api-provider-*` to latest upstream. | ||||||
|
|
||||||
| #### Example usage | ||||||
|
|
||||||
| Usage is similar to Machine API with small differences, MachineSets reference the `Cluster` object and infrastructure machine template. | ||||||
|
|
||||||
| ```yaml | ||||||
| --- | ||||||
| apiVersion: cluster.x-k8s.io/v1alpha4 | ||||||
| kind: MachineSet | ||||||
| metadata: | ||||||
| name: capi-ms | ||||||
| namespace: openshift-cluster-api | ||||||
| spec: | ||||||
| clusterName: cluster-name | ||||||
| replicas: 1 | ||||||
| selector: | ||||||
| matchLabels: | ||||||
| test: example | ||||||
| template: | ||||||
| metadata: | ||||||
| labels: | ||||||
| test: example | ||||||
| spec: | ||||||
| bootstrap: | ||||||
| dataSecretName: worker-user-data | ||||||
| clusterName: cluster-name | ||||||
| infrastructureRef: | ||||||
| apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4 | ||||||
| kind: AWSMachineTemplate | ||||||
| name: cluster-name | ||||||
|
|
||||||
| --- | ||||||
| apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4 | ||||||
| kind: AWSMachineTemplate | ||||||
| metadata: | ||||||
| name: capi-machine-template | ||||||
| namespace: openshift-cluster-api | ||||||
| spec: | ||||||
| template: | ||||||
| spec: | ||||||
| uncompressedUserData: true | ||||||
| iamInstanceProfile: .... | ||||||
| instanceType: m5.large | ||||||
| cloudInit: | ||||||
| secureSecretsBackend: secrets-manager | ||||||
| insecureSkipSecretsManager: true | ||||||
| ami: | ||||||
| id: .... | ||||||
| subnet: | ||||||
| filters: | ||||||
| - name: tag:Name | ||||||
| values: | ||||||
| - ... | ||||||
| additionalSecurityGroups: | ||||||
| - filters: | ||||||
| - name: tag:Name | ||||||
| values: | ||||||
| - ... | ||||||
| ``` | ||||||
|
|
||||||
| ### Risks and Mitigations | ||||||
|
|
||||||
| - During tech preview `cluster-capi-operator` will have permissions to manage CRDs, this might be a not secure permission for an operator. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. minor nit
Suggested change
|
||||||
| - Note, this permission should be restricted to creating CRDs only, as once installed, the technical preview cannot be uninstalled. | ||||||
alexander-demicev marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| - CLI usage, once Cluster API is installed command like `oc get machine` will return Cluster API machines, in order to use Machine API users will have to use fully qualified name `oc get machines.machine.openshift.io`. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this makes me wonder if we shouldn't have some sort of warning message associated with
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, we still have to figure out what to do here.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Where did the conversation end up with API team about changing the priority so we don't make this breaking change?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have a "hack" to set the preference in openshift. If we do this then any scripts should not break and users will have to use fully qualified names for CAPI resources.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should make a note of that hack and ideally get something to track that so we don't forget to do it
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've just added a note |
||||||
| If we want to not introduce this breaking change then we have to set prefered API group in our [kubernetes fork](https://github.com/openshift/kubernetes/blob/master/pkg/controlplane/controller/crdregistration/patch.go). | ||||||
| - Feature parity, for last year we've been trying to upstream all features introduced to Machine API but we can't be sure all of them work in upstream. We need to have a good set of regression tests running periodically. | ||||||
|
|
||||||
| ### API Extensions | ||||||
|
|
||||||
| With `ClusterAPIEnabled` feature enabled, the following API extensions will be added: | ||||||
|
|
||||||
| - Core Cluster API resources and webhooks, they can be found [here](https://github.com/kubernetes-sigs/cluster-api/tree/main/api/v1beta1) | ||||||
| - Depending on the provider where a cluster is running, infrastructure provider CRD and webhooks will be added, see | ||||||
| [AWS](https://github.com/kubernetes-sigs/cluster-api-provider-aws/tree/main/api/v1beta1), [Azure](https://github.com/kubernetes-sigs/cluster-api-provider-azure/tree/main/api/v1beta1), [GCP](https://github.com/kubernetes-sigs/cluster-api-provider-gcp/tree/main/api/v1alpha4). | ||||||
| - Cluster API Operator CRDs will be added, see [here](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20201020-capi-provider-operator.md#new-api-types) | ||||||
|
|
||||||
| ## Design Details | ||||||
|
|
||||||
| ### Test Plan | ||||||
|
|
||||||
| - Cluster API providers should already come with a set of e2e tests, we will run these on each PR. | ||||||
| - `cluster-capi-operator` will include it's own e2e suite ensuring that all Cluter API components are successfully installed. | ||||||
|
|
||||||
| ### Operational Aspects of API Extensions | ||||||
| #### Failure Modes | ||||||
|
|
||||||
| If Cluster API starts failing, it will affect worker machine management, which is a critical | ||||||
| component of the OCP system. In case of Cluster API failures, users will be able to use the Machine API. | ||||||
|
|
||||||
| #### Support Procedures | ||||||
|
|
||||||
| The process of troubleshooting failure is similar to the process of troubleshooting Machine API failures. | ||||||
| We will be working on making sure that similar or equivalent events, metrics and alerts are present. | ||||||
|
|
||||||
| ### Graduation Criteria | ||||||
|
|
||||||
| #### Dev Preview -> Tech Preview | ||||||
|
|
||||||
| - Write symptoms-based alerts for the component(s) | ||||||
| - Ability to have Cluster API installed using a feature gate | ||||||
| - Ability to use Cluster API for machine management | ||||||
| - End user documentation | ||||||
| - Running upstream e2e workflow on openshift | ||||||
|
|
||||||
| #### Tech Preview -> GA (Future Work) | ||||||
|
|
||||||
| - Cluster API will be installed in all OpenShift clusters by default. | ||||||
| - Bidirectional migration for MAPI and CAPI. | ||||||
| - New infrastructure providers implemented as CAPI. | ||||||
|
|
||||||
| #### Removing a deprecated feature | ||||||
|
|
||||||
| ### Upgrade / Downgrade Strategy | ||||||
|
|
||||||
| ### Version Skew Strategy | ||||||
|
|
||||||
| ## Implementation History | ||||||
|
|
||||||
| ## Drawbacks | ||||||
|
|
||||||
| ## Alternatives | ||||||
|
|
||||||
| ## Infrastructure Needed | ||||||
Uh oh!
There was an error while loading. Please reload this page.