diff --git a/enhancements/api-review/new-CRD-ManagedClusterVersion.md b/enhancements/api-review/new-CRD-ManagedClusterVersion.md new file mode 100644 index 0000000000..6448de9eeb --- /dev/null +++ b/enhancements/api-review/new-CRD-ManagedClusterVersion.md @@ -0,0 +1,490 @@ +--- +title: managed-cluster-version-crd +authors: + - "@2uasimojo" +reviewers: # Include a comment about what domain expertise a reviewer is expected to bring and what area of the enhancement you expect them to focus on. For example: - "@networkguru, for networking aspects, please look at IP bootstrapping aspect" + - "@wking" + - "@jewzaam: Managed OpenShift" + - "@cblecker: Managed OpenShift" + - "@pvasant (or delegate): Clusters Service (OCM)" + - "@berenss (or delegate): ACM" + - "@csrwng (or delegate): Hypershift" +approvers: # A single approver is preferred, the role of the approver is to raise important questions, help ensure the enhancement receives reviews from all applicable areas/SMEs, and determine when consensus is achieved such that the EP can move forward to implementation. Having multiple approvers makes it difficult to determine who is responsible for the actual approval. + - "@LalatenduMohanty" +api-approvers: # In case of new or modified APIs or API extensions (CRDs, aggregated apiservers, webhooks, finalizers). If there is no API change, use "None" + - "@deads2k" +creation-date: yyyy-mm-dd +last-updated: yyyy-mm-dd +tracking-link: # link to the tracking ticket (for example: Jira Feature or Epic ticket) that corresponds to this enhancement + - https://issues.redhat.com//browse/HIVE-2366 +see-also: + - n/a +replaces: + - n/a +superseded-by: + - n/a +--- + +# ManagedClusterVersion CRD + +## Summary + +Introduce a new namespaced openshift/api CustomResourceDefinition, +`ManagedClusterVersion`, as a vehicle for managers of OpenShift fleets to +expose version and upgrade information in the management cluster. + +## Motivation + +OpenShift fleet management software is stratified. For example, ACM (Advanced +Cluster Manager) and OCM (OpenShift Cluster Manager, the front end for the +Managed OpenShift product) both provide users access and visibility into +clusters which may internally be controlled by different pieces of fleet +management software such as [hive](https://github.com/openshift/hive/) and +[hypershift](https://github.com/openshift/hypershift/). In this picture, hive +and hypershift interact directly with the "spoke" clusters, whereas ACM and OCM +rely on data visible only on the "hub" (the management cluster on which the +hive/hypershift/ACM/OCM software is running). Thus, any spoke-specific +information \*CM requires to operate must be brokered by the hive/hypershift +layer. + +These different levels may have different responsibilities. Salient to this +enhancement, the \*CM layer is responsible for driving upgrades. To do so, +it needs visibility into the spoke cluster's ClusterVersion data. Today the +only mechanisms available for accessing this information entail logging into, +or running an agent in, the spoke cluster. This is not ideal: +* The \*CM software is often subject to security boundaries that make such + access difficult or impossible, mechanically and/or by policy strictures. +* The hive/hypershift layer is already responsible for such communication + with the spoke cluster. Introducing the same into the \*CM layer would + seem a duplication of effort and an overlap of domain. + +This enhancement enables the hive/hypershift layer to copy ClusterVersion +status information for each of its spokes into a ManagedClusterVersion object +on the hub. There should be exactly one ManagedClusterVersion instance +associated with each spoke. + +This information can then be consumed/exposed by the \*CM layer to inform +upgrade decisions driven by/through that layer. + +### User Stories + +* As a (human or programmatic) manager of a fleet of OpenShift clusters, + potentially at disparate versions and/or on disparate infrastructures, I + want to have visibility into version and upgrade information of all clusters + in my fleet, so that I can appropriately coordinate upgrades. +* As a programmatic manager of a fleet of OpenShift clusters (i.e. ACM/OCM), I + want a common way to view version and upgrade information, regardless of the + software layer between me and the spokes, so that I can simplify my code, + reduce my test surface, and spend less on maintenance. +* As a Site Reliability Engineer (SRE) I want to get the recommended version + information from the cluster-version-operator because it has the capability + to evaluate conditional update risks and recommend preferred upgrade paths. + +### Goals + +* Provide visibility into the version and upgrade status of managed clusters. +* Provide upper layer management software a common view into the version and + upgrade status of managed clusters, irrespective of the low-level component + managing those clusters. + +### Non-Goals + +* Understand the factors and considerations of an upgrade decision. This + business logic remains the responsibility of the CVO on the spoke, and the + management layer consuming the ClusterVersion information. +* Introspect ClusterVersion. +* Solve for hypershift's "how do we deal with control plane config objects + that live in the hosted cluster's etcd rather than in the HCP namespace?" + conundrum. + +## Proposal + +### Workflow Description + +Managed OpenShift example: +1. **SRE-Platform human** merges a change to the HiveConfig Custom Resource for a + production Managed OpenShift shard, flipping on the switch that requests... +1. **Hive Operator**, at the behest of a HiveConfig field, instructs its **clusterversion + controller** that ClusterVersion information is requested for all managed + clusters. +1. Hive's **clusterversion controller**, in the course of reconciling the + ClusterDeployment object for spoke cluster `foo`, uses its admin kubeconfig + to retrieve the singleton `version` instance of the ClusterVersion CRD from + `foo`'s kube-apiserver. +1. Seeing that there is currently no ManagedClusterVersion object associated + with `foo`'s ClusterDeployment object on the hub, hive's **clusterversion + controller** creates one. The ManagedClusterVersion.Status is an exact copy + of the spoke ClusterVersion.Status. In this example: + * The association is indicated by the ManagedClusterVersion's namespace + and name exactly matching that of the ClusterDeployment. + * To simplify cleanup, the ManagedClusterVersion object is created with an + OwnerReference to the ClusterDeployment. +1. The **owner** of the `foo` cluster, a Managed OpenShift customer, wishing to + perform an upgrade of the `foo` cluster, accesses the OCM console and + selects the "I want to upgrade my cluster" widget. +1. **OCM** queries the hub cluster, retrieving the ManagedClusterVersion instance + associated with `foo` (as well as other necessary objects such as + ClusterDeployment -- that logic is outside the scope of this document). +1. **OCM** parses the ManagedClusterVersion object and exposes a useful view to the + **owner** of the `foo` cluster. +1. The **owner** of the `foo` cluster uses the information presented by **OCM** + to make an informed decision about the desired upgrade path for the `foo` + cluster, and requests **OCM** to orchestrate that upgrade (via existing means, + outside the scope of this document). +1. When the upgrade is complete, `foo`'s **cluster-version-operator** updates + `foo`'s ClusterVersion object with new version and upgrade information. +1. Hive's **clusterversion controller**, in the course of reconciling the + ClusterDeployment object for `foo`, retrieves the ClusterVersion object and, + seeing its Status differs from that of the (now extant) ManagedClusterVersion + object on the hub, updates the latter. +1. Later, when the **owner** of the `foo` cluster wishes to perform another + upgrade, **OCM** shows an updated view of the available upgrades. +1. Repeat. Profit. + +### API Extensions + +```go +import ( + configv1 "github.com/openshift/api/config/v1" +) + +// ManagedClusterVersion is the Schema for the managedclusterversions API +// +k8s:openapi-gen=true +// +kubebuilder:subresource:status +// +kubebuilder:resource:path=managedclusterversions,shortName=mcv,scope=Namespaced +type ManagedClusterVersion struct { + metav1.TypeMeta `json:",inline"` + metav1.ObjectMeta `json:"metadata,omitempty"` + + // NOTE: No Spec! + Spec struct{} `json:"spec,omitempty"` + + // +k8s:deepcopy-gen=true + // +optional + Status configv1.ClusterVersionStatus `json:"status,omitempty"` +} +``` + +Extensions to hive/hypershift/etc and ACM/OCM/etc are outside the scope of this document. + +### Topology Considerations + +#### Hypershift / Hosted Control Planes + +At the time of this writing, hypershift +[brokers ClusterVersion Status piecemeal](https://github.com/openshift/hypershift/blob/9adaf7778dd3595ab57e06e4eee7dc3bff353b6b/api/hypershift/v1beta1/hostedcluster_types.go#L2101), +cloning top-level fields and importing lower-level fields, selectively copying in the pieces +it deems important. As a result of this enhancement, it is expected that hypershift will +pivot to a design similar to that described [above](#workflow-description). + +Importantly, having a unified solution for brokering this information from the hive/hypershift +layer up to the \*CM layer is intended to simplify the processing needed by the latter. With +hive and hypershift both presenting this unified view, \*CM can use common code to retrieve +and process that object. + +#### Standalone Clusters + +Standalone clusters are not expected to use this object or carry any instances of it. + +#### Single-node Deployments or MicroShift + +These are affected only if they themselves are running a fleet manager. +In that case, each instance of ManagedClusterVersion is simply a CR to +manage like any other. + +If SNO/MicroShift clusters are part of a fleet, their fleet manager may +broker their ClusterVersion objects in the manner described [above](#workflow-description). +In this scenario they are the same as any other OpenShift spoke. + +### Implementation Details/Notes/Constraints + +What are some important details that didn't come across above in the +**Proposal**? Go in to as much detail as necessary here. This might be +a good place to talk about core concepts and how they relate. While it is useful +to go into the details of the code changes required, it is not necessary to show +how the code will be rewritten in the enhancement. + +### Risks and Mitigations + +A sufficiently powerful user on the spoke cluster could disable CVO and +manually edit the ClusterVersion object, influencing upgrade decisions made at +the management layer. This kind of malicious hackery would only affect the +spoke from which it was perpetrated, and would presumably break the seal, void +the warranty, and give SRE cause to drop support. + +/* +What are the risks of this proposal and how do we mitigate. Think broadly. For +example, consider both security and how this will impact the larger OKD +ecosystem. + +How will security be reviewed and by whom? + +How will UX be reviewed and by whom? + +Consider including folks that also work outside your immediate sub-project. +*/ + +### Drawbacks + +The idea is to find the best form of an argument why this enhancement should +_not_ be implemented. + +What trade-offs (technical/efficiency cost, user experience, flexibility, +supportability, etc) must be made in order to implement this? What are the reasons +we might not want to undertake this proposal, and how do we overcome them? + +Does this proposal implement a behavior that's new/unique/novel? Is it poorly +aligned with existing user expectations? Will it be a significant maintenance +burden? Is it likely to be superceded by something else in the near future? + +## Open Questions [optional] + +This is where to call out areas of the design that require closure before deciding +to implement the design. For instance, + > 1. This requires exposing previously private resources which contain sensitive + information. Can we do this? + +## Test Plan + +**Note:** *Section not required until targeted at a release.* + +Consider the following in developing a test plan for this enhancement: +- Will there be e2e and integration tests, in addition to unit tests? +- How will it be tested in isolation vs with other components? +- What additional testing is necessary to support managed OpenShift service-based offerings? + +No need to outline all of the test cases, just the general strategy. Anything +that would count as tricky in the implementation and anything particularly +challenging to test should be called out. + +All code is expected to have adequate tests (eventually with coverage +expectations). + +## Graduation Criteria + +**Note:** *Section not required until targeted at a release.* + +Define graduation milestones. + +These may be defined in terms of API maturity, or as something else. Initial proposal +should keep this high-level with a focus on what signals will be looked at to +determine graduation. + +Consider the following in developing the graduation criteria for this +enhancement: + +- Maturity levels + - [`alpha`, `beta`, `stable` in upstream Kubernetes][maturity-levels] + - `Dev Preview`, `Tech Preview`, `GA` in OpenShift +- [Deprecation policy][deprecation-policy] + +Clearly define what graduation means by either linking to the [API doc definition](https://kubernetes.io/docs/concepts/overview/kubernetes-api/#api-versioning), +or by redefining what graduation means. + +In general, we try to use the same stages (alpha, beta, GA), regardless how the functionality is accessed. + +[maturity-levels]: https://git.k8s.io/community/contributors/devel/sig-architecture/api_changes.md#alpha-beta-and-stable-versions +[deprecation-policy]: https://kubernetes.io/docs/reference/using-api/deprecation-policy/ + +**If this is a user facing change requiring new or updated documentation in [openshift-docs](https://github.com/openshift/openshift-docs/), +please be sure to include in the graduation criteria.** + +**Examples**: These are generalized examples to consider, in addition +to the aforementioned [maturity levels][maturity-levels]. + +### Dev Preview -> Tech Preview + +- Ability to utilize the enhancement end to end +- End user documentation, relative API stability +- Sufficient test coverage +- Gather feedback from users rather than just developers +- Enumerate service level indicators (SLIs), expose SLIs as metrics +- Write symptoms-based alerts for the component(s) + +### Tech Preview -> GA + +- More testing (upgrade, downgrade, scale) +- Sufficient time for feedback +- Available by default +- Backhaul SLI telemetry +- Document SLOs for the component +- Conduct load testing +- User facing documentation created in [openshift-docs](https://github.com/openshift/openshift-docs/) + +**For non-optional features moving to GA, the graduation criteria must include +end to end tests.** + +### Removing a deprecated feature + +- Announce deprecation and support policy of the existing feature +- Deprecate the feature + +## Upgrade / Downgrade Strategy + +If applicable, how will the component be upgraded and downgraded? Make sure this +is in the test plan. + +Consider the following in developing an upgrade/downgrade strategy for this +enhancement: +- What changes (in invocations, configurations, API use, etc.) is an existing + cluster required to make on upgrade in order to keep previous behavior? +- What changes (in invocations, configurations, API use, etc.) is an existing + cluster required to make on upgrade in order to make use of the enhancement? + +Upgrade expectations: +- Each component should remain available for user requests and + workloads during upgrades. Ensure the components leverage best practices in handling [voluntary + disruption](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/). Any exception to + this should be identified and discussed here. +- Micro version upgrades - users should be able to skip forward versions within a + minor release stream without being required to pass through intermediate + versions - i.e. `x.y.N->x.y.N+2` should work without requiring `x.y.N->x.y.N+1` + as an intermediate step. +- Minor version upgrades - you only need to support `x.N->x.N+1` upgrade + steps. So, for example, it is acceptable to require a user running 4.3 to + upgrade to 4.5 with a `4.3->4.4` step followed by a `4.4->4.5` step. +- While an upgrade is in progress, new component versions should + continue to operate correctly in concert with older component + versions (aka "version skew"). For example, if a node is down, and + an operator is rolling out a daemonset, the old and new daemonset + pods must continue to work correctly even while the cluster remains + in this partially upgraded state for some time. + +Downgrade expectations: +- If an `N->N+1` upgrade fails mid-way through, or if the `N+1` cluster is + misbehaving, it should be possible for the user to rollback to `N`. It is + acceptable to require some documented manual steps in order to fully restore + the downgraded cluster to its previous state. Examples of acceptable steps + include: + - Deleting any CVO-managed resources added by the new version. The + CVO does not currently delete resources that no longer exist in + the target version. + +## Version Skew Strategy + +It is essential that the schema definition for ManagedClusterVersion incorporate +the/a [fix](https://github.com/openshift/api/pull/1624) for +[OTA-1223](https://issues.redhat.com/browse/OTA-1223) so that it can properly +handle spoke clusters at any version (including future versions). + +Example failure mode: +- The hub CRD for ManagedClusterVersion enumerates `x,y` as legal values for + `$field`. +- Hive is managing a spoke whose ClusterVersion CRD is newer, where `x,y,z` are + legal values. +- The spoke's ClusterVersion contains value `z` for `$field`. +- Hive decodes the spoke ClusterVersion into the hub ManagedClusterVersion and + attempts to write it to etcd on the hub. +- The hub kube-apiserver rejects the request because, according to its schema + for ManagedClusterVersion, `z` is not a valid value for `$field`. + +/* +How will the component handle version skew with other components? +What are the guarantees? Make sure this is in the test plan. + +Consider the following in developing a version skew strategy for this +enhancement: +- During an upgrade, we will always have skew among components, how will this impact your work? +- Does this enhancement involve coordinating behavior in the control plane and + in the kubelet? How does an n-2 kubelet without this feature available behave + when this feature is used? +- Will any other components on the node change? For example, changes to CSI, CRI + or CNI may require updating that component before the kubelet. +*/ + +## Operational Aspects of API Extensions + +Describe the impact of API extensions (mentioned in the proposal section, i.e. CRDs, +admission and conversion webhooks, aggregated API servers, finalizers) here in detail, +especially how they impact the OCP system architecture and operational aspects. + +- For conversion/admission webhooks and aggregated apiservers: what are the SLIs (Service Level + Indicators) an administrator or support can use to determine the health of the API extensions + + Examples (metrics, alerts, operator conditions) + - authentication-operator condition `APIServerDegraded=False` + - authentication-operator condition `APIServerAvailable=True` + - openshift-authentication/oauth-apiserver deployment and pods health + +- What impact do these API extensions have on existing SLIs (e.g. scalability, API throughput, + API availability) + + Examples: + - Adds 1s to every pod update in the system, slowing down pod scheduling by 5s on average. + - Fails creation of ConfigMap in the system when the webhook is not available. + - Adds a dependency on the SDN service network for all resources, risking API availability in case + of SDN issues. + - Expected use-cases require less than 1000 instances of the CRD, not impacting + general API throughput. + +- How is the impact on existing SLIs to be measured and when (e.g. every release by QE, or + automatically in CI) and by whom (e.g. perf team; name the responsible person and let them review + this enhancement) + +- Describe the possible failure modes of the API extensions. +- Describe how a failure or behaviour of the extension will impact the overall cluster health + (e.g. which kube-controller-manager functionality will stop working), especially regarding + stability, availability, performance and security. +- Describe which OCP teams are likely to be called upon in case of escalation with one of the failure modes + and add them as reviewers to this enhancement. + +## Support Procedures + +Describe how to +- detect the failure modes in a support situation, describe possible symptoms (events, metrics, + alerts, which log output in which component) + + Examples: + - If the webhook is not running, kube-apiserver logs will show errors like "failed to call admission webhook xyz". + - Operator X will degrade with message "Failed to launch webhook server" and reason "WehhookServerFailed". + - The metric `webhook_admission_duration_seconds("openpolicyagent-admission", "mutating", "put", "false")` + will show >1s latency and alert `WebhookAdmissionLatencyHigh` will fire. + +- disable the API extension (e.g. remove MutatingWebhookConfiguration `xyz`, remove APIService `foo`) + + - What consequences does it have on the cluster health? + + Examples: + - Garbage collection in kube-controller-manager will stop working. + - Quota will be wrongly computed. + - Disabling/removing the CRD is not possible without removing the CR instances. Customer will lose data. + Disabling the conversion webhook will break garbage collection. + + - What consequences does it have on existing, running workloads? + + Examples: + - New namespaces won't get the finalizer "xyz" and hence might leak resource X + when deleted. + - SDN pod-to-pod routing will stop updating, potentially breaking pod-to-pod + communication after some minutes. + + - What consequences does it have for newly created workloads? + + Examples: + - New pods in namespace with Istio support will not get sidecars injected, breaking + their networking. + +- Does functionality fail gracefully and will work resume when re-enabled without risking + consistency? + + Examples: + - The mutating admission webhook "xyz" has FailPolicy=Ignore and hence + will not block the creation or updates on objects when it fails. When the + webhook comes back online, there is a controller reconciling all objects, applying + labels that were not applied during admission webhook downtime. + - Namespaces deletion will not delete all objects in etcd, leading to zombie + objects when another namespace with the same name is created. + +## Alternatives + +* \*CM layer logs directly into spoke clusters. + +Similar to the `Drawbacks` section the `Alternatives` section is used +to highlight and record other possible approaches to delivering the +value proposed by an enhancement, including especially information +about why the alternative was not selected. + +## Infrastructure Needed [optional] + +Use this section if you need things from the project. Examples include a new +subproject, repos requested, github details, and/or testing infrastructure.