diff --git a/dev-guide/cluster-version-operator/OWNERS b/dev-guide/cluster-version-operator/OWNERS new file mode 100644 index 0000000000..bb006bb6d5 --- /dev/null +++ b/dev-guide/cluster-version-operator/OWNERS @@ -0,0 +1,11 @@ +# See the OWNERS docs: https://git.k8s.io/community/contributors/guide/owners.md + +approvers: + - crawford + - smarterclayton + - abhinavdahiya + - sdodson + - wking + - vrutkovs + - jottofar + - LalatenduMohanty diff --git a/dev-guide/cluster-version-operator/dev/clusteroperator.md b/dev-guide/cluster-version-operator/dev/clusteroperator.md new file mode 100644 index 0000000000..6c904269bf --- /dev/null +++ b/dev-guide/cluster-version-operator/dev/clusteroperator.md @@ -0,0 +1,209 @@ +# ClusterOperator Custom Resource + +The ClusterOperator is a custom resource object which holds the current state of an operator. This object is used by operators to convey their state to the rest of the cluster. + +Ref: [godoc](https://godoc.org/github.com/openshift/api/config/v1#ClusterOperator) for more info on the ClusterOperator type. + +## Why I want ClusterOperator Custom Resource in /manifests + +Everyone installed by the ClusterVersionOperator must include the ClusterOperator Custom Resource in [`/manifests`](operators.md#what-do-i-put-in-manifests). +The CVO sweeps the release image and applies it to the cluster. On upgrade, the CVO uses clusteroperators to confirm successful upgrades. +Cluster-admins make use of these resources to check the status of their clusters. + +## How should I include ClusterOperator Custom Resource in /manifests + +### How ClusterVersionOperator handles ClusterOperator in release image + +When ClusterVersionOperator encounters a ClusterOperator Custom Resource, + +- It uses the `.metadata.name` to find the corresponding ClusterOperator instance in the cluster +- It then waits for the instance in the cluster until + - `.status.versions[name=operator].version` in the live instance matches the `.status.version` from the release image and + - the live instance `.status.conditions` report available +- It then continues to the next task. + +ClusterVersionOperator will only deploy files with `.yaml`, `.yml`, or `.json` extensions, like `kubectl create -f DIR`. + +**NOTE**: ClusterVersionOperator sweeps the manifests in the release image in alphabetical order, therefore if the ClusterOperator Custom Resource exists before the deployment for the operator that is supposed to report the Custom Resource, ClusterVersionOperator will be stuck waiting and cannot proceed. +Also note that the ClusterVersionOperator will pre-create the ClusterOperator resource found in the `/manifests` folder (to provide better support to must-gather operation in case of install or upgrade failure). +It remains a responsibility of the respective operator to properly update (or recreate if deleted) the ClusterOperator Custom Resource. + +### What should be the contents of ClusterOperator Custom Resource in /manifests + +There are 2 important things that need to be set in the ClusterOperator Custom Resource in /manifests for CVO to correctly handle it. + +- `.metadata.name`: name for finding the live instance +- `.status.versions[name=operator].version`: this is the version that the operator is expected to report. ClusterVersionOperator only respects the `.status.conditions` from instances that report their version. + +Additionally you might choose to include some fundamental relatedObjects. +The must-gather and insights operator depend on cluster operators and related objects in order to identify resources to create. +Because cluster operators are delegated to the operator install and upgrade failures of new operators can fail to gather the requisite info if the cluster degrades before those steps. +To mitigate this scenario the ClusterVersionOperator will do a best effort to fast-fill cluster-operators using the ClusterOperator Custom Resource in /manifests. + +Example: + +For a cluster operator `my-cluster-operator`, that is reporting its status using a ClusterOperator instance named `my-cluster-operator`. + +The ClusterOperator Custom Resource in /manifests should look like, + +```yaml +apiVersion: config.openshift.io/v1 +kind: ClusterOperator +metadata: + name: my-cluster-operator +spec: {} +status: + versions: + - name: operator + # The string "0.0.1-snapshot" is substituted in the manifests when the payload is built + version: "0.0.1-snapshot" +``` + +## What should an operator report with ClusterOperator Custom Resource + +The ClusterOperator exists to communicate status about a functional area of the cluster back to both an admin +and the higher level automation in the CVO in an opinionated and consistent way. Because of this, we document +expectations around the outcome and have specific guarantees that apply. + +Of note, in the docs below we use the word `operand` to describe the "thing the operator manages", which might be: + +* A deployment or daemonset, like a cluster networking provider +* An API exposed via a CRD and the operator updates other API objects, like a secret generator +* Just some controller loop invariant that the operator manages, like "all certificate signing requests coming from valid machines are approved" + +An operand doesn't have to be code running on cluster - that might be the operator. When we say "is the operand available" that might mean "the new code is rolled out" or "all old API objects have been updated" or "we're able to sign certificate requests" + +Here are the guarantees components can get when they follow the rules we define: + +1. Cause an installation to fail because a component is not able to become available for use +2. Cause an upgrade to hang because a component is not able to successfully reach the new upgrade +3. Prevent a user from clicking the upgrade button because components have one or more preflight criteria that are not met (e.g. nodes are at version 4.0 so the control plane can't be upgraded to 4.2 and break N-1 compat) +4. Ensure other components are upgraded *after* your component (guarantee "happens before" in upgrades, such as kube-apiserver being updated before kube-controller-manager) + +### There are a set of guarantees components are expected to honor in return + +- An operator shoould not report the `Available` status condition the first time + until they are completely rolled out (or within some reasonable percentage if + the component must be installed to all nodes) +- An operator reports `Degraded` when its current state does not match its + desired state over a period of time resulting in a reduced quality of service. + The period of time may vary by component, but a `Degraded` state represents + persistent observation of a condition. As a result, a component should not + oscillate in and out of `Degraded` state. + - A service may be `Available` even if its degraded. For example, your service + may desire three running pods, but one pod is crash-looping. The service is `Available` + but `Degraded` because it may have a lower quality of service. + - A component may be `Progressing` but not `Degraded` because the transition from one state to another + does not persist over a long enough period to report `Degraded`. +- A service should not report `Degraded` during the course of a normal upgrade. A service may report + `Degraded` in response to a persistent infrastructure failure that requires + administrator intervention. For example, if a control plane host is unhealthy + and must be replaced. An operator should report `Degraded` if unexpected + errors occur over a period, but the expectation is that all unexpected errors + are handled as operators mature. +- An operator reports `Progressing` when it is rolling out new code, + propagating config changes, or otherwise moving from one steady state to + another. It should not report progressing when it is reconciling a previously + known state. If it is progressing to a new version, it should include the + version in the message for the condition like "Moving to v1.0.1". +- An operator reports `Upgradeable` as `false` when it wishes to prevent an + upgrade for an admin-correctable condition. The component should include a + message that describes what must be fixed. +- An operator reports a new version when it has rolled out the new version to + all of its operands. + +### Status + +The operator should ensure that all the fields of `.status` in ClusterOperator are atomic changes. This means that all the fields in the `.status` are only valid together and do not partially represent the status of the operator. + +### Version + +The operator reports an array of versions. A version struct has a name, and a version. There MUST be a version with the name `operator`, which is watched by the CVO to know if a cluster operator has achieved the new level. The operator MAY report additional versions of its underlying operands. + +Example: + +```yaml +apiVersion: config.openshift.io/v1 +kind: ClusterOperator +metadata: + name: kube-apiserver +spec: {} +status: + ... + versions: + - name: operator + # Watched by the CVO + version: 4.0.0-0.alpha-2019-03-05-054505 + - name: kube-apiserver + # Used to report underlying upstream version + version: 1.12.4 +``` + +#### Version reporting during an upgrade + +When your operator begins rolling out a new version it must continue to report the previous operator version in its ClusterOperator status. +While any of your operands are still running software from the previous version then you are in a mixed version state, and you should continue to report the previous version. +As soon as you can guarantee you are not and will not run any old versions of your operands, you can update the operator version on your ClusterOperator status. + +### Conditions + +Refer [the godocs](https://godoc.org/github.com/openshift/api/config/v1#ClusterStatusConditionType) for conditions. + +In general, ClusterOperators should contain at least three core conditions: + +* `Progressing` must be true if the operator is actually making change to the operand. +The change may be anything: desired user state, desired user configuration, observed configuration, version update, etc. +If this is false, it means the operator is not trying to apply any new state. +If it remains true for an extended period of time, it suggests something is wrong in the cluster. It can probably wait until Monday. +* `Available` must be true if the operand is functional and available in the cluster at the level in status. +If this is false, it means there is an outage. Someone is probably getting paged. +* `Degraded` should be true if the operator has encountered an error that is preventing it or its operand from working properly. +The operand may still be available, but intent may not have been fulfilled. +If this is true, it means that the operand is at risk of an outage or improper configuration. It can probably wait until the morning, but someone needs to look at it. + +The message reported for each of these conditions is important. +All messages should start with a capital letter (like a sentence) and be written for an end user / admin to debug the problem. +`Degraded` should describe in detail (a few sentences at most) why the current controller is blocked. +The detail should be sufficient for an engineer or support person to triage the problem. +`Available` should convey useful information about what is available, and be a single sentence without punctuation. +`Progressing` is the most important message because it is shown by default in the CLI as a column and should be a terse, human-readable message describing the current state of the object in 5-10 words (the more succinct the better). + +For instance, if the CVO is working towards 4.0.1 and has already successfully deployed 4.0.0, the conditions might be reporting: + +* `Degraded` is false with no message +* `Available` is true with message `Cluster has deployed 4.0.0` +* `Progressing` is true with message `Working towards 4.0.1` + +If the controller reaches 4.0.1, the conditions might be: + +* `Degraded` is false with no message +* `Available` is true with message `Cluster has deployed 4.0.1` +* `Progressing` is false with message `Cluster version is 4.0.1` + +If an error blocks reaching 4.0.1, the conditions might be: + +* `Degraded` is true with a detailed message `Unable to apply 4.0.1: could not update 0000_70_network_deployment.yaml because the resource type NetworkConfig has not been installed on the server.` +* `Available` is true with message `Cluster has deployed 4.0.0` +* `Progressing` is true with message `Unable to apply 4.0.1: a required object is missing` + +The progressing message is the first message a human will see when debugging an issue, so it should be terse, succinct, and summarize the problem well. The degraded message can be more verbose. Start with simple, easy to understand messages and grow them over time to capture more detail. + + +#### Conditions and Install/Upgrade + +Conditions determine when the CVO considers certain actions complete, the following table summarizes what it looks at and when. + + +| operation | version | available | degraded | progressing | upgradeable +|-----------|---------|-----------|----------|-------------|-------------| +| Install completion[1] | any(but should be the current version) | true | any | any | any +| Begin upgrade(patch) | any | any | any | any | any +| Begin upgrade(minor) | any | any | any | any | not false +| Begin upgrade (w/ force) | any | any | any | any | any +| Upgrade completion[2]| newVersion(target version for the upgrade) | true | false | any | any + +[1] Install works on all components in parallel, it does not wait for any component to complete before starting another one. + +[2] Upgrade will not proceed with upgrading components in the next runlevel until the previous runlevel completes. + +See also: https://github.com/openshift/cluster-version-operator/blob/a5f5007c17cc14281c558ea363518dcc5b6675c7/pkg/cvo/internal/operatorstatus.go#L176-L189 diff --git a/dev-guide/cluster-version-operator/dev/clusterversion.md b/dev-guide/cluster-version-operator/dev/clusterversion.md new file mode 100644 index 0000000000..0cc0fb174f --- /dev/null +++ b/dev-guide/cluster-version-operator/dev/clusterversion.md @@ -0,0 +1,114 @@ +# ClusterVersion Custom Resource + +The `ClusterVersion` is a custom resource object which holds the current version of the cluster. +This object is used by the administrator to declare their target cluster state, which the cluster-version operator (CVO) then works to transition the cluster to that target state. + +## Finding your current update image + +You can extract the current update image from the `ClusterVersion` object: + +```console +$ oc get clusterversion -o jsonpath='{.status.desired.image}{"\n"}' version +registry.ci.openshift.org/openshift/origin-release@sha256:c1f11884c72458ffe91708a4f85283d591b42483c2325c3d379c3d32c6ac6833 +``` + +## Setting objects unmanaged + +For testing operators, it is sometimes helpful to disable CVO management so you can alter objects without the CVO stomping on your changes. +To get a list of objects managed by the CVO, run: + +```console +$ oc adm release extract --from=registry.ci.openshift.org/openshift/origin-release@sha256:c1f11884c72458ffe91708a4f85283d591b42483c2325c3d379c3d32c6ac6833 --to=release-image +$ ls release-image | head -n5 +0000_07_cluster-network-operator_00_namespace.yaml +0000_07_cluster-network-operator_01_crd.yaml +0000_07_cluster-network-operator_02_rbac.yaml +0000_07_cluster-network-operator_03_deployment.yaml +0000_08_cluster-dns-operator_00-cluster-role.yaml +``` + +To get a list of current overrides, run: + +```console +$ oc get -o json clusterversion version | jq .spec.overrides +[ + { + "kind": "APIService", + "name": "v1alpha1.packages.apps.redhat.com", + "unmanaged": true + } +] +``` + +To add an entry to that list, you can use a [JSON Patch][json-patch] to add a [`ComponentOverride`][ComponentOverride]. +For example, to set the network operator's deployment unmanaged: + +```console +$ head -n5 release-image/0000_07_cluster-network-operator_03_deployment.yaml +apiVersion: apps/v1 +kind: DaemonSet +metadata: + name: network-operator + namespace: openshift-network-operator +``` +If there are currently no other overrides configured: +```console +$ cat <version-patch-first-override.yaml +- op: add + path: /spec/overrides + value: + - kind: Deployment + group: apps/v1 + name: network-operator + namespace: openshift-network-operator + unmanaged: true +EOF +``` +To add to list of already existing overrides: +```console +$ cat <version-patch-add-override.yaml +- op: add + path: /spec/overrides/- + value: + kind: Deployment + group: apps/v1 + name: network-operator + namespace: openshift-network-operator + unmanaged: true +EOF +``` +```console +$ oc patch clusterversion version --type json -p "$(cat version-patch.yaml)" +``` + +You can verify the update with: + +```console +$ oc get -o json clusterversion version | jq .spec.overrides +[ + { + "kind": "APIService", + "name": "v1alpha1.packages.apps.redhat.com", + "unmanaged": true + }, + { + "kind": "Deployment", + "name": "network-operator", + "namespace": "openshift-network-operator", + "unmanaged": true + } +] +``` + +After updating the `ClusterVersion`, you can make your desired edits to the unmanaged object. + +## Disabling the cluster-version operator + +When you just want to turn off the cluster-version operator instead of fiddling with per-object overrides, you can: + +```console +$ oc scale --replicas 0 -n openshift-cluster-version deployments/cluster-version-operator +``` + +[ComponentOverride]: https://godoc.org/github.com/openshift/api/config/v1#ComponentOverride +[json-patch]: https://tools.ietf.org/html/rfc6902 diff --git a/dev-guide/cluster-version-operator/dev/metrics.md b/dev-guide/cluster-version-operator/dev/metrics.md new file mode 100644 index 0000000000..da463f1161 --- /dev/null +++ b/dev-guide/cluster-version-operator/dev/metrics.md @@ -0,0 +1,73 @@ +# CVO Metrics + +The Cluster Version Operator reports the following metrics: + +The cluster version is reported as seconds since the epoch with labels for `version` and +`image`. The `type` label reports which value is reported: + +* `current` - the version the operator is applying right now (the running CVO version) and the age of the payload +* `initial` - the initial version of the cluster, and value is the creation timestamp of the cluster version (cluster age) +* `cluster` - the current version of the cluster with `from_version` set to the initial version, and value is the creation timestamp of the cluster version (cluster age) +* `failure` - if the failure condition is set, reports the last transition time for the condition. +* `desired` - reported if different from current as the most recent timestamp on the cluster version +* `completed` - the time the most recent version was completely applied, or absent if not reached +* `updating` - if the operator is moving to a new version, the time the update started + +The `from_version` label is set where appropriate and is the previous completed version for the provided `type`. Empty for +`initial`, and otherwise empty if there was no previous completed version (still installing). + +```prometheus +# HELP cluster_version Reports the version of the cluster. +# TYPE cluster_version gauge +cluster_version{image="test/image:2",type="current",version="4.0.3",from_version="4.0.2"} 130000000 +cluster_version{image="test/image:2",type="failure",version="4.0.3",from_version="4.0.2"} 132000400 +cluster_version{image="test/image:4",type="desired",version="4.0.4",from_version="4.0.2"} 132000400 +cluster_version{image="test/image:2",type="completed",version="4.0.3",from_version="4.0.2"} 132000100 +cluster_version{image="test/image:1",type="initial",version="4.0.1",from_version=""} 131000000 +cluster_version{image="test/image:2",type="cluster",version="4.0.3",from_version="4.0.1"} 131000000 +cluster_version{image="test/image:3",type="updating",version="4.0.4",from_version="4.0.3"} 132000400 +# HELP cluster_version_available_updates Report the count of available versions for an upstream and channel. +# TYPE cluster_version_available_updates gauge +cluster_version_available_updates{channel="fast",upstream="https://api.openshift.com/api/upgrades_info/v1/graph"} 0 +``` + +Metrics about cluster operators: + +```prometheus +# HELP cluster_operator_conditions Report the conditions for active cluster operators. 0 is False and 1 is True. +# TYPE cluster_operator_conditions gauge +cluster_operator_conditions{condition="Available",name="version",namespace="openshift-cluster-version",reason="Happy"} 1 +cluster_operator_conditions{condition="Degraded",name="version",namespace="openshift-cluster-version",reason=""} 0 +cluster_operator_conditions{condition="Progressing",name="version",namespace="openshift-cluster-version",reason=""} 0 +cluster_operator_conditions{condition="RetrievedUpdates",name="version",namespace="openshift-cluster-version",reason=""} 0 +# HELP cluster_operator_up Reports key highlights of the active cluster operators. +# TYPE cluster_operator_up gauge +cluster_operator_up{name="version",namespace="openshift-cluster-version",version="4.0.1"} 1 +``` + +Metrics reported while applying the image: + +```prometheus +# HELP cluster_version_payload Report the number of entries in the image. +# TYPE cluster_version_payload gauge +cluster_version_payload{type="applied",version="4.0.3"} 0 +cluster_version_payload{type="pending",version="4.0.3"} 1 +# HELP cluster_operator_payload_errors Report the number of errors encountered applying the image. +# TYPE cluster_operator_payload_errors gauge +cluster_operator_payload_errors{version="4.0.3"} 10 +``` + +Metrics about the installation: + +`cluster_installer` records information about the installation process. +The type is either "openshift-install", indicating that `openshift-install` was used to install the cluster (IPI) or "other", indicating that an unknown process installed the cluster (UPI). +When `openshift-install` creates a cluster, it will also report its version and invoker. +When an unknown process installed the cluster, the version and invoker reported will be that of the `openshift-install` invocation which created the manifests. +The version is helpful for determining exactly which builds are being used to install (e.g. were they official builds or had they been modified). +The invoker is "user" by default, but it may be overridden by a consuming tool (e.g. Hive, CI, Assisted Installer). + +```prometheus +# TYPE cluster_installer gauge +cluster_installer{type="openshift-install",invoker="user",version="unreleased-master-1209-gfd08f44181f2111486749e2fb38399088f315cfb"} 1 +cluster_installer{type="other",invoker="user",version="unreleased-master-1209-gfd08f44181f2111486749e2fb38399088f315cfb"} 1 +``` diff --git a/dev-guide/cluster-version-operator/dev/object-deletion.md b/dev-guide/cluster-version-operator/dev/object-deletion.md new file mode 100644 index 0000000000..7c95b605f3 --- /dev/null +++ b/dev-guide/cluster-version-operator/dev/object-deletion.md @@ -0,0 +1,116 @@ +# Manifest Annotation For Object Deletion + +Developers can remove any of the currently managed CVO objects by modifying an existing manifest and adding the delete annotation `.metadata.annotations["release.openshift.io/delete"]="true"`. This manifest annotation is a request for the CVO to delete the in-cluster object instead of creating/updating it. + +Actual object deletion and subsequent deletion monitoring and status checking will only occur during an upgrade. During initial installation the delete annotation prevents further processing of the manifest since the given object should not be created. + +## Implementation Details + +When the following annotation appears in a CVO supported manifest and is set to `true` the associated object will be removed from the cluster by the CVO. +Values other than `true` will result in a CVO failure and should therefore result in CVO CI failure. +```yaml +apiVersion: apps/v1 +... +metadata: +... + annotations: + release.openshift.io/delete: "true" +``` +The existing CVO ordering scheme defined [here](operators.md) is also used for object removal. +This provides a simple and familiar method of deleting multiple objects by reversing the order in which they were created. +It is the developer's responsibility to ensure proper deletion ordering and to ensure that all items originally created by an object are deleted when that object is deleted. +For example, an operator may have to be modified, or a new operator created, to take explicit actions to remove external resources. +The modified or new operator would then be removed in a subsequent update. + +Similar to how CVO handles create/update requests, deletion requests are implemented in a non-blocking manner whereby CVO issues the initial request to delete an object kicking off resource finalization and after which resource removal. +CVO does not wait for actual resource removal but instead continues. +CVO logs when a delete is initiated, that the delete is ongoing when a manifest is processed again and found to have a deletion time stamp, and delete completion upon resource finalization. + +If an object cannot be successfully removed CVO will set `Upgradeable=False` which in turn blocks cluster update to the next minor release. + +## Examples + +The following examples provide guidance to OpenShift developers on how resources may be removed but this will vary depending on the component. +Ultimately it is the developer's responsibility to ensure the removal works by thoroughly testing. +In all cases, and as general guidance, an operator should never allow itself to be removed if the operator's operand has not been removed. + +### The autoscaler operator + +Remove the cluster-autoscaler-operator deployment. +The existing cluster-autoscaler-operator deployment manifest 0000_50_cluster-autoscaler-operator_07_deployment.yaml is modified to contain the delete annotation: +```yaml +--- +apiVersion: apps/v1 +kind: Deployment +metadata: + name: cluster-autoscaler-operator + namespace: openshift-machine-api + annotations: + release.openshift.io/delete: "true" +... +``` +Additional manifest properties such as `spec` may be set if convenient (e.g. because you are looking to make a minimal change vs. a previous version of the manifest), but those properties have no affect on manifests with the delete annotation. + +### The service-catalog operators + +In release 4.5 two jobs were created to remove the Service Catalog - openshift-service-catalog-controller-manager-remover and openshift-service-catalog-apiserver-remover. +In release 4.6, these jobs and all their supporting cluster objects also needed to be removed. +The following example shows how to do Service Catalog removal using the object deletion manifest annotation. + +The Service Catalog is composed of two components, the cluster-svcat-apiserver-operator and the cluster-svcat-controller-manager-operator. +Each of these components use manifests for creation/update of the component's required resources: namespace, roles, operator deployment, etc. +The cluster-svcat-apiserver-operator had [these associated manifests][svcat-apiserver-4.4-manifests]. + +The deletion annotation would be added to these manifests: + +* `0000_50_cluster-svcat-apiserver-operator_00_namespace.yaml` containing the `openshift-service-catalog-apiserver-operator` namespace. +* `0000_50_cluster-svcat-apiserver-operator_02_config.crd.yaml` containing a cluster-scoped CRD. +* `0000_50_cluster-svcat-apiserver-operator_03_config.cr.yaml` containing a cluster-scoped, create-only ServiceCatalogAPIServer. +* `0000_50_cluster-svcat-apiserver-operator_04_roles.yaml` containing a cluster-scoped ClusterRoleBinding. +* `0000_50_cluster-svcat-apiserver-operator_08_cluster-operator.yaml` containing a cluster-scoped ClusterOperator. + +These manifests would be dropped because their removal would occur as part of one of the above resource deletions: + +* `0000_50_cluster-svcat-apiserver-operator_03_configmap.yaml` containing a ConfigMap in the `openshift-service-catalog-apiserver-operator` namespace. +* `0000_50_cluster-svcat-apiserver-operator_03_version-configmap.yaml` containing another ConfigMap in the `openshift-service-catalog-apiserver-operator` namespace. +* `0000_50_cluster-svcat-apiserver-operator_05_serviceaccount.yaml` containing a ServiceAccount in the `openshift-service-catalog-apiserver-operator` namespace. +* `0000_50_cluster-svcat-apiserver-operator_06_service.yaml` containing a Service in the `openshift-service-catalog-apiserver-operator` namespace. +* `0000_50_cluster-svcat-apiserver-operator_07_deployment.yaml` containing a Deployment in the `openshift-service-catalog-apiserver-operator` namespace. +* `0000_90_cluster-svcat-apiserver-operator_00_prometheusrole.yaml` containing a Role in the `openshift-service-catalog-apiserver-operator` namespace. +* `0000_90_cluster-svcat-apiserver-operator_01_prometheusrolebinding.yaml` containing a RoleBinding in the `openshift-service-catalog-apiserver-operator` namespace. +* `0000_90_cluster-svcat-apiserver-operator_02-operator-servicemonitor.yaml` containing a ServiceMonitor in the `openshift-service-catalog-apiserver-operator` namespace. + +So the remaining manifests with deletion annotations would be the namespace and the cluster-scoped CRD, ServiceCatalogAPIServer, ClusterRoleBinding, and ClusterOperator. +The ordering of the surviving manifests would not be particularly important, although keeping the namespace first would avoid removing the ClusterRoleBinding while the consuming Deployment was still running. +If multiple deletions are required, it is up to the developer to name the manifests such that deletions occur in the correct order. + +Similar handling would be required for the svcat-controller-manager operator. + +If resources external to kubernetes must be removed the developer must provide the means to do so. +This is expected to be done through modification of an operator to do the removals during it's finalization. +If operator modification for object removal is necessary that operator would be deleted in a subsequent update. + +The deletion manifests described above would have been preserved through 4.5.z release and removed in 4.6. +See the [Subsequent Releases Strategy](#subsequent-releases-strategy) section for details. + +## Removing functionality that users might notice + +Below is the flow for removing functionality that users might notice, like the web console. + +* The first step is deprecating the functionality. + During the deprecation release 4.y, the functionality should remain available, with the operator setting Upgradeable=False and linking release notes like [these][deprecated-marketplace-apis]. +* Cluster Administrators must follow the linked release notes to opt in to the removal before updating to the next minor release in 4.(y+1). + When the administrator opts in to the removal, the operator should stop setting Upgradeable=False. +* Depending on how the engineering team that owns the functionality implements its removal, the operand components may be removed when Cluster Administrators opt-in to the removal, or they may be left running and be removed during the transition to the next minor release 4.(y+1). +* During the update to the next minor release 4.(y+1) the manifest delete annotation would be used to remove the operator, and, if they have not already been removed as part of the opt-in in release 4.y, any remaining operand components. + +## Subsequent Releases Strategy + +Special consideration must be given to subsequent updates which may still contain manifests with the delete annotation. +These manifests will result in `object no longer exists` errors assuming the current release had properly and fully removed the given objects. +It is acceptable for subsequent z-level updates to still contain the delete manifests but minor level updates should not and therefore the handling of the delete error will differ between these update levels. +A z-level update will be allowed to proceed while a minor level update will be blocked. +This will be accomplished through the existing CVO precondition mechanism which already behaves in this manner with regard to z-level and minor updates. + +[svcat-apiserver-4.4-manifests]: https://github.com/openshift/cluster-svcat-apiserver-operator/tree/aa7927fbfe8bf165c5b84167b7c3f5d9cb394e14/manifests +[deprecated-marketplace-apis]: https://docs.openshift.com/container-platform/4.4/release_notes/ocp-4-4-release-notes.html#ocp-4-4-marketplace-apis-deprecated diff --git a/dev-guide/cluster-version-operator/dev/operators.md b/dev-guide/cluster-version-operator/dev/operators.md new file mode 100644 index 0000000000..2eb3ee3469 --- /dev/null +++ b/dev-guide/cluster-version-operator/dev/operators.md @@ -0,0 +1,167 @@ +# Operator integration with CVO + +The CVO installs other operators onto a cluster. It is responsible for applying the manifests each +operator uses (without any parameterization) and for ensuring an order that installation and +updates follow. + +## What is the order that resources get created/updated in? + +The CVO will load a release image and look at the contents of two directories: `/manifests` and +`/release-manifests` within that image. The `/manifests` directory is part of the CVO image and +contains the basic deployment and other roles for the CVO. The `/release-manifests` directory is +created by the `oc adm release new` command from the `/manifests` directories of all other +candidate operators. + +In install and reconciliation, the CVO runs components in random, parallel fashion and retrying +as necessary. Within a component the manifests are run in lexographic order, but between components +no ordering is enforced. Installs run fully parallel, while reconciliation runs small batches at +a time. + +During upgrades, the contents of `/release-manifests` are applied in order, exactly as `ls` would +return on a standard Linux or Unix derivative. The CVO supports the idea of "run levels" by +defining a convention for how operators that wish to run before other operators should name +their manifests. Manifest files are of the form `0000___`, declaring the run level (see [below for a list of assigned levels](#how-do-i-get-added-as-a-special-run-level)), the component name that +usually matches your operator name (e.g. `kube-apiserver` or `cluster-monitoring-operator`), +and a local name to order manifests within a given runlevel/component block. + +A few special optimizations are applied above linear ordering - if the CVO sees two different +components that have the same run level - for instance, `0000_70_cluster-monitoring-operator_*` and +`0000_70_cluster-samples-operator_*` - each component will execute in parallel to the others, +preserving the order of tasks within the component. + +Ordering is only applied during upgrades, where some components rely on another component +being updated or deleted first. As a convenience, the CVO guarantees that components at an +earlier run level will be created, updated, or deleted before your component is invoked. Note +however that components without `ClusterOperator` objects defined may not be fully deployed +when your component is executed, so always ensure your prerequisites know that they must +correctly obey the `ClusterOperator` protocol to be available. More sophisticated components +should observe the prerequisite `ClusterOperator`s directly and use the `versions` field to +enforce safety. + +## How do I get added to the release image? + +Add the following to your Dockerfile + +```Dockerfile +FROM … + +ADD manifests-for-operator/ /manifests +LABEL io.openshift.release.operator=true +``` + +Ensure your image is published into the cluster release tag by ci-operator +Wait for a new release image to be created (usually once you push to master in your operator). + +## What do I put in /manifests? + +You need the following: + +1..N manifest yaml or JSON files (preferably YAML for readability) that deploy your operator, including: + +- Namespace for your operator +- Roles your operator needs +- A service account and a service account role binding +- Deployment for your operator +- A ClusterOperator CR [more info here](clusteroperator.md) +- Any other config objects your operator might need +- An image-references file (See below) + +In your deployment you can reference the latest development version of your operator image (quay.io/openshift/origin-machine-api-operator:latest). If you have other hard-coded image strings, try to put them as environment variables on your deployment or as a config map. + +Manifest files may also be used to delete your object [more info here](object-deletion.md). + +### Names of manifest files + +Your manifests will be applied in alphabetical order by the CVO, so name your files in the order you want them run. +If you are a normal operator (don’t need to run before the kube apiserver), you should name your manifest files in a way that feels easy: + +```linter +/manifests/ + deployment.yaml + roles.yaml +``` + +If you’d like to ensure your manifests are applied in order to the cluster add a numeric prefix to sort in the directory: + +```linter +/manifests/ + 01_roles.yaml + 02_deployment.yaml +``` + +When your manifests are added to the release image, they’ll be given a prefix that corresponds to the name of your repo/image: + +```linter +/release-manifests/ + 99_ingress-operator_01_roles.yaml + 99_ingress-operator_02_deployment.yaml +``` + +Only manifests with the extensions `.yaml`, `.yml`, or `.json` will be applied, like `kubectl create -f DIR`. + +### What if I only want the CVO to create my resource, but never update it? + +This is only applicable to cases where the contents of a resource are not managed, but the presence is required for +usability. Today the only known use-case is config.openshift.io, so that `oc edit foo.config.openshift.io` "just works". + +To do this, you can set .metadata.annotations["release.openshift.io/create-only"]="true". + +### How do I get added as a special run level? + +Some operators need to run at a specific time in the release process (OLM, kube, openshift core operators, network, service CA). These components can ensure they run in a specific order across operators by prefixing their manifests with: + +```linter + 0000___ +``` + +For example, the Kube core operators run in runlevel 10-19 and have filenames like + +```linter + 0000_13_cluster-kube-scheduler-operator_03_crd.yaml +``` + +Assigned runlevels + +- 00-04 - CVO +- 05 - cluster-config-operator +- 07 - Network operator +- 08 - DNS operator +- 09 - Service certificate authority and machine approver +- 10-29 - Kubernetes operators (master team) +- 30-39 - Machine API +- 50-59 - Operator-lifecycle manager +- 60-69 - OpenShift core operators (master team) + +## How do I ensure the right images get used by my manifests? + +Your manifests can contain a tag to the latest development image published by Origin. You’ll annotate your manifests by creating a file that identifies those images. + +Assume you have two images in your manifests - `quay.io/openshift/origin-ingress-operator:latest` and `quay.io/openshift/origin-haproxy-router:latest`. Those correspond to the following tags `ingress-operator` and `haproxy-router` when the CI runs. + +Create a file `image-references` in the /manifests dir with the following contents: + +```yaml +kind: ImageStream +apiVersion: image.openshift.io/v1 +spec: + tags: + - name: ingress-operator + from: + kind: DockerImage + Name: quay.io/openshift/origin-ingress-operator + - name: haproxy-router + from: + kind: DockerImage + Name: quay.io/openshift/origin-haproxy-router +``` + +The release tooling will read image-references and do the following operations: + +Verify that the tags `ingress-operator` and `haproxy-router` exist from the release / CI tooling (in the image stream `openshift/origin-v4.0` on api.ci). If they don’t exist, you’ll get a build error. +Do a find and replace in your manifests (effectively a sed) that replaces `quay.io/openshift/origin-haproxy-router(:.*|@:.*)` with `registry.ci.openshift.org/openshift/origin-v4.0@sha256:` +Store the fact that operator ingress-operator uses both of those images in a metadata file alongside the manifests +Bundle up your manifests and the metadata file as a docker image and push them to a registry + +Later on, when someone wants to mirror a particular release, there will be tooling that can take the list of all images used by operators and mirror them to a new repo. + +This pattern tries to balance between having the manifests in your source repo be able to deploy your latest upstream code *and* allowing us to get a full listing of all images used by various operators. diff --git a/dev-guide/cluster-version-operator/dev/upgrades.md b/dev-guide/cluster-version-operator/dev/upgrades.md new file mode 100644 index 0000000000..2ab63743b7 --- /dev/null +++ b/dev-guide/cluster-version-operator/dev/upgrades.md @@ -0,0 +1,84 @@ +# Upgrades and order + +In the CVO, upgrades are performed in the order described in the payload (lexographic), with operators only running in parallel if they share the same file prefix `0000_NN_` and differ by the next chunk `OPERATOR`, i.e. `0000_70_authentication-operator_*` and `0000_70_image-registry-operator_*` are run in parallel. + +Priorities during upgrade: + +1. Simplify the problems that can occur +2. Ensure applications aren't disrupted during upgrade +3. Complete within a reasonable time period (30m-1h for control plane) + +During upgrade we bias towards predictable ordering for operators that lack sophistication about detecting their prerequisites. Over time, operators should be better at detecting their prerequisites without overcomplicating or risking the predictability of upgrades. + +Currently, upgrades proceed in operator order without distinguishing between node and control plane components. Future improvements may allow nodes to upgrade independently and at different schedules to reduce production impact. This in turn complicates the investment operator teams must make in testing and understanding how to version their control plane components independently of node infrastructure. + +All components must be N-1 minor version (4.y and 4.y-1) compatible - a component must update its operator first, then its dependents. All operators and control plane components MUST handle running with their dependents at a N-1 minor version for extended periods and test in that scenario. + +## Generalized ordering + +The following rough order is defined for how upgrades should proceed: + +```linter +config-operator + +kube-apiserver + +kcm/ksched + +important, internal apis, likely to break unless tested: +* cloud-credential-operator +* openshift-apiserver + +non-disruptive: +* everything + +olm (maybe later move to post disruptive) + +maximum disruptive: + node-specific daemonsets are on-disruptive: + * network + * dns + + eventually push button separate node upgrade: + * mco, mao, cloud-operators +``` + +Which in practice can be described in runlevels: + +```linter +0000_10_*: config-operator +0000_20_*: kube-apiserver +0000_25_*: kube scheduler and controller manager +0000_30_*: other apiservers: openshift and machine +0000_40_*: reserved +0000_50_*: all non-order specific components +0000_60_*: reserved +0000_70_*: disruptive node-level components: dns, sdn, multus +0000_80_*: machine operators +0000_90_*: reserved for any post-machine updates +``` + +## Why does the OpenShift 4 upgrade process "restart" in the middle? + +Since the release of OpenShift 4, a somewhat frequently asked question is: Why sometimes during an `oc adm upgrade` (cluster upgrade) does the process appear to re-start partway through? [This bugzilla](https://bugzilla.redhat.com/show_bug.cgi?id=1690816) for example has a number of duplicates, and I've seen the question appear in chat and email forums. + +The answer to this question is worth explaining in detail, because it illustrates some fundamentals of the [self-driving, operator-focused OpenShift 4](https://blog.openshift.com/openshift-4-a-noops-platform/). +During the initial development of OpenShift 4, the toplevel [cluster-version-operator](https://github.com/openshift/cluster-version-operator/) (CVO) and the [machine-config-operator](https://github.com/openshift/machine-config-operator/) (MCO) were developed concurrently (and still are). + +The MCO is just one of a number of "second level" operators that the CVO manages. However, the relationship between the CVO and MCO is somewhat special because the MCO [updates the operating system itself](https://github.com/openshift/machine-config-operator/blob/master/docs/OSUpgrades.md) for the control plane. + +If the new release image has an updated operating system (`machine-os-content`), the CVO pulling down an update ends up causing it to (indirectly) restart itself. + +This is because in order to apply the OS update (or any config changes) MCO will drain each node it is working on updating, then reboot. The CVO is just a regular pod (driven by a `deployment`) running in the cluster (`oc -n openshift-cluster-version get pods`); it gets drained and rescheduled just like the rest of the platform it manages, as well as user applications. + +Also, besides operating system updates, there's the case where an updated payload changes the CVO image itself. + +Today, there's no special support in the CVO for passing "progress" between the previous and new pod; the new pod just looks at the current cluster state and attemps to reconcile between the observed and desired state. This is generally true of the "second level" operators as well, from the MCO to the network operator, the router, etc. + +Hence, the fact that the CVO is terminated and restarted is visible to components watching the `clusterversion` object as the status is recalculated. + +I could imagine at some point adding clarification for this; perhaps a basic boolean flag state in e.g. a `ConfigMap` or so that denoted that the pod was drained due to an upgrade, and the new CVO pod would "consume" that flag and include "Resuming upgrade..." text in its status. But I think that's probably all we should do. + +By not special casing upgrading itself, the CVO restart works the same way as it would if the kernel hit a panic and froze, or the hardware died, there was an unrecoverable network partition, etc. By having the "normal" code path work in exactly the same way as the "exceptional" path, we ensure the upgrade process is robust and tested constantly. + +In conclusion, OpenShift 4 installations by default have the cluster "self-manage", and the transient cosmetic upgrade status blip is a normal and expected consequence of this. diff --git a/dev-guide/cluster-version-operator/user/reconciliation.md b/dev-guide/cluster-version-operator/user/reconciliation.md new file mode 100644 index 0000000000..735c242f88 --- /dev/null +++ b/dev-guide/cluster-version-operator/user/reconciliation.md @@ -0,0 +1,195 @@ +# Reconciliation + +This document describes the cluster-version operator's reconciliation logic and explains how the operator applies a release image to the cluster. + +## Release image content + +Unpack a release image locally, so we can look at what's inside: + +```console +$ mkdir /tmp/release +$ oc image extract quay.io/openshift-release-dev/ocp-release:4.5.1-x86_64 --path /:/tmp/release +``` + +Here are all the manifests supplied by images that set the `io.openshift.release.operator=true` label: + +```console +$ ls /tmp/release/release-manifests +0000_03_authorization-openshift_01_rolebindingrestriction.crd.yaml +0000_03_config-operator_01_operatorhub.crd.yaml +0000_03_config-operator_01_proxy.crd.yaml +0000_03_quota-openshift_01_clusterresourcequota.crd.yaml +0000_03_security-openshift_01_scc.crd.yaml +0000_05_config-operator_02_apiserver.cr.yaml +0000_05_config-operator_02_authentication.cr.yaml +... +0000_90_service-ca-operator_02_prometheusrolebinding.yaml +0000_90_service-ca-operator_03_servicemonitor.yaml +image-references +release-metadata +``` + +`release-metadata` holds information about the release image, including recommended update sources and an errata link: + +```console +$ cat /tmp/release/release-manifests/release-metadata +{ + "kind": "cincinnati-metadata-v0", + "version": "4.5.1", + "previous": [ + "4.4.10", + ... + "4.5.0-rc.7", + "4.5.1-rc.0" + ], + "metadata": { + "description": "", + "url": "https://access.redhat.com/errata/RHBA-2020:2409" + } +} +``` + +`image-references` holds references to all the images needed to run OpenShift's core (these are the images that `oc adm release mirror ...` will mirror, in addition to the release image itself): + +```console +$ cat /tmp/release/release-manifests/image-references +{ + "kind": "ImageStream", + "apiVersion": "image.openshift.io/v1", + "metadata": { + "name": "4.5.1", + "creationTimestamp": "2020-07-11T00:40:39Z", + "annotations": { + "release.openshift.io/from-image-stream": "ocp/4.5-art-latest-2020-07-10-055255", + "release.openshift.io/from-release": "registry.svc.ci.openshift.org/ocp/release:4.5.0-0.nightly-2020-07-10-055255" + } + }, + "spec": { + "lookupPolicy": { + "local": false + }, + "tags": [ + { + "name": "aws-machine-controllers", + "annotations": { + "io.openshift.build.commit.id": "cca9ed80024c9c63a218fd4e421fdde48dfdc4a2", + "io.openshift.build.commit.ref": "", + "io.openshift.build.source-location": "https://github.com/openshift/cluster-api-provider-aws" + }, + "from": { + "kind": "DockerImage", + "name": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7d29cf4dc690bfee58b9247bc7e67210ce9634139b15ce2bd56fbe905b246c83" + }, + "generation": 2, + "importPolicy": {}, + "referencePolicy": { + "type": "Source" + } + }, + ... + ] + }, + "status": { + "dockerImageRepository": "" + } +} +``` + +## Manifest graph + +The cluster-version operator unpacks the release image, ingests manifests, loads them into a graph. +For upgrades, the graph is ordered by the number and component of the manifest file: + +
+ +
+ +The `0000_03_authorization-openshift_*` manifest gets its own node, the `0000_03_quota-openshift_01_*` manifest gets its own node, and the `0000_03_security-openshift_*` manifest gets its own node. +The next group of manifests are under `0000_05_config-operator_*`. +Because the number is bumped, the graph blocks until the previous `0000_03_*` are all complete before beginning the `0000_05_*` block. + +We are more relaxed for the initial install, because there is not yet any user data in the cluster to be worried about. +So the graph nodes are all parallelized with the by-number ordering flattened out: + +
+ +
+ +For the usual reconciliation loop (neither an upgrade between releases nor a fresh install), the flattened graph is also randomly permuted to avoid hanging on ordering bugs. + +## Reconciling the graph + +The cluster-version operator spawns worker goroutines that walk the graph, pushing manifests in their queue. +For each manifest in the node, the worker reconciles the cluster with the manifest using a resource builder. +On error (or timeout), the worker abandons the manifest, graph node, and any dependencies of that graph node. +On success, the worker proceeds to the next manifest in the graph node. + +## Resource builders + +Resource builders reconcile a cluster object with a manifest from the release image. +The general approach is to generates a merged manifest combining critical spec properties from the release-image manifest with data from a preexisting in-cluster object, if any. +If the merged manifest differs from the in-cluster object, the merged manifest is pushed back into the cluster. + +Some types have additional logic, as described in the following subsections. +Note that this logic only applies to manifests included in the release image itself. +For example, only [ClusterOperator](../dev/clusteroperator.md) from the release image will have the blocking logic described [below](#clusteroperator); if an admin or secondary operator pushed a ClusterOperator object, it would not impact the cluster-version operator's graph reconciliation. + +### ClusterOperator + +The cluster-version operator does not push [ClusterOperator](../dev/clusteroperator.md) into the cluster. +Instead, the operators create ClusterOperator themselves. +The ClusterOperator builder only monitors the in-cluster object and blocks until it is: + +* Available +* The ClusterOperator contains at least the versions listed in the associated release image manifest. + For example, an OpenShift API server ClusterOperator entry in the release image like: + + ```yaml + apiVersion: config.openshift.io/v1 + kind: ClusterOperator + metadata: + name: openshift-apiserver + spec: {} + status: + versions: + - name: operator + version: "4.1.0" + ``` + + would block until the in-cluster ClusterOperator reported `operator` at version 4.1.0. +* Not degraded (except during initialization, where we ignore the degraded status) + +### CustomResourceDefinition + +After pushing the merged CustomResourceDefinition into the cluster, the builder monitors the in-cluster object and blocks until it is established. + +### DaemonSet + +The builder does not block after an initial DaemonSet push (when the in-cluster object has generation 1). + +For subsequent updates, the builder blocks until: + +* The in-cluster object's observed generation catches up with the specified generation. +* Pods with the release-image-specified configuration are scheduled on each node. +* There are no nodes without available, ready pods. + +### Deployment + +The builder does not block after an initial Deployment push (when the in-cluster object has generation 1). + +For subsequent updates, the builder blocks until: + +* The in-cluster object's observed generation catches up with the specified generation. +* Sufficient pods with the release-image-specified configuration are scheduled to fulfill the requested `replicas`. +* There are no unavailable replicas. + +### Job + +After pushing the merged Job into the cluster, the builder blocks until the Job succeeds. + +The cluster-version operator will panic if spec.selector is set because there are no clear use-cases for setting it in release manifests. + +Subsequent updates: + +* The cluster-version operator is currently unable to delete and recreate a Job to track changes in release manifests. Please avoid making changes to Job manifests until the cluster-version operator supports Job delete/recreate. +* A Job's spec.selector will never be updated because spec.selector is immutable. diff --git a/dev-guide/cluster-version-operator/user/status.md b/dev-guide/cluster-version-operator/user/status.md new file mode 100644 index 0000000000..0f16744139 --- /dev/null +++ b/dev-guide/cluster-version-operator/user/status.md @@ -0,0 +1,138 @@ +# Conditions + +[The ClusterVersion object](../dev/clusterversion.md) sets `conditions` describing the state of the cluster-version operator (CVO). +This document describes those conditions and, where appropriate, suggests possible mitigations. + +## Failing + +When `Failing` is True, the CVO is failing to reconcile the cluster with the desired release image. +In all cases, the impact on the cluster will be that dependent nodes in [the manifest graph](reconciliation.md#manifest-graph) may not be [reconciled](reconciliation.md#reconciling-the-graph). +Note that the graph [may be flattened](reconciliation.md#manifest-graph), in which case there are no dependent nodes. + +Most reconciliation errors will result in `Failing=True`, although [`ClusterOperatorNotAvailable`](#clusteroperatornotavailable) has special handling. + +### NoDesiredImage + +The CVO has not been given a release image to reconcile. + +If this happens it is a CVO coding error, because clearing [`desiredUpdate`][api-desired-update] should return you to the current CVO's release image. + +### ClusterOperatorNotAvailable + +`ClusterOperatorNotAvailable` (or the consolidated `ClusterOperatorsNotAvailable`) is set when the CVO fails to retrieve the ClusterOperator from the cluster or when the retrieved ClusterOperator does not satisfy [the reconciliation conditions](reconciliation.md#clusteroperator). + +Unlike most manifest-reconciliation failures, this error does not immediately result in `Failing=True`. +Under some conditions during installs and updates, the CVO will treat this condition as a `Progressing=True` condition and give the operator up to fourty minutes to level before reporting `Failing=True`. + +## RetrievedUpdates + +When `RetrievedUpdates` is `True`, the CVO is succesfully retrieving updates, which is good. +When `RetrievedUpdates` is `False`, `reason` will be set to explain why, as discussed in the following subsections. +In all cases, the impact is that the cluster will not be able to retrieve recommended updates, so cluster admins will need to monitor for available updates on their own or risk falling behind on security or other bugfixes. +When CVO is unable to retrieve recommended updates the CannotRetrieveUpdates alert will fire containing the reason. This alert will not fire when the reason updates cannot be retrieved is NoChannel. + +### NoUpstream + +No `upstream` server has been set to retrieve updates. + +Fix by setting `spec.upstream` in ClusterVersion to point to a [Cincinnati][] server, for example https://api.openshift.com/api/upgrades_info/v1/graph . + +### InvalidURI + +The configured `upstream` URI is not valid. + +Fix by setting `spec.upstream` in ClusterVersion to point to a valid [Cincinnati][] URI, for example https://api.openshift.com/api/upgrades_info/v1/graph . + +### InvalidID + +The configured `clusterID` is not a valid UUID. + +Fix by setting `spec.clusterID` to a valid UUID. +The UUID should be unique to a given cluster, because it is the default value used for reporting Telemetry and Insights. +It may also be used by the CVO when making Cincinnati requests, so that Cincinnati can return update recommentations tailored to the specific cluster. + +### NoArchitecture + +The set of architectures has not been configured. + +If this happens it is a CVO coding error. +There is no mitigation short of updating to a new release image with a fixed CVO. + +#### Impact + +The cluster will not be able to retrieve recommended updates, so cluster admins will need to monitor for available updates on their own or risk falling behind on security or other bugfixes. + +If this happens it is a CVO coding error. +There is no mitigation short of updating to a new release image with a fixed CVO. + +### NoCurrentVersion + +The cluster version does not have a semantic version assigned and cannot calculate valid upgrades. + +If this happens it is a release-image creation error. +There is no mitigation short of updating to a new release image with fixed metadata. + +### NoChannel + +The update `channel` has not been configured. + +Fix by setting `channel` to [a valid value][channels], e.g. `stable-4.3`. + +### InvalidCurrentVersion + +The current cluster version is not a valid semantic version and cannot be used to calculate upgrades. + +If this happens it is a release-image creation error. +There is no mitigation short of updating to a new release image with fixed metadata. + +### InvalidRequest + +The CVO was unable to construct a valid Cincinnati request. + +If this happens it is a CVO coding error. +There is no mitigation short of updating to a new release image with a fixed CVO. + +### RemoteFailed + +The CVO was unable to connect to the configured `upstream`. + +This could be caused by a misconfigured `upstream` URI. +It could also be caused by networking/connectivity issues (e.g. firewalls, air gaps, hardware failures, etc.) between the CVO and Cincinnati server. +It could also be caused by a crashed or otherwise broken Cincinnati server. + +### ResponseFailed + +The Cincinnati server returned a non-200 response or the connection failed before the CVO read the full response body. + +This could be the CVO failing to construct a valid request. +It could also be caused by networking/connectivity issues (e.g. hardware failures, network partitions, etc.). +It could also be an overloaded or otherwise failing Cincinnati server. + +### ResponseInvalid + +The Cincinnati server returned a response that was not valid JSON or is otherwise corrupted. + +This could be caused by a buggy Cincinnati server. +It could also be caused by response corruption, e.g. if the configured `upstream` was in the clear over HTTP or via a man-in-the-middle HTTPS proxy, and an intervening component altered the response in flight. + +### VersionNotFound + +The currently reconciling cluster version was not found in the configured `channel`. + +This usually means that the configured `channel` is known to Cincinnati, but the version the cluster is currently applying is not found in that channel's graph. +You have some options to fix: + +* Set `channel` to [a valid value][channels]. + For example, `stable-4.7`. +* Clear `channel` if you do not want the operator polling the configured `upstream` for recommended updates. + For example, if your operator is unable to reach any upstream update service, or if you updated to a release that is not in any channel. +* Update back to a release that occurs in a channel, although you are on your own to determine a safe update path. + +### Unknown + +If this happens it is a CVO coding error. +There is no mitigation short of updating to a new release image with a fixed CVO. + +[api-desired-update]: https://github.com/openshift/api/blob/34f54f12813aaed8822bb5bc56e97cbbfa92171d/config/v1/types_cluster_version.go#L40-L54 +[channels]: https://docs.openshift.com/container-platform/4.7/updating/updating-cluster-between-minor.html#understanding-upgrade-channels_updating-cluster-between-minor +[Cincinnati]: https://github.com/openshift/cincinnati/blob/master/docs/design/openshift.md diff --git a/dev-guide/cluster-version-operator/user/update-workflow.md b/dev-guide/cluster-version-operator/user/update-workflow.md new file mode 100644 index 0000000000..0dedfdce37 --- /dev/null +++ b/dev-guide/cluster-version-operator/user/update-workflow.md @@ -0,0 +1,30 @@ +# Update Process + +The Cluster Version Operator (CVO) runs in every cluster. CVO is in charge of performing updates to the cluster. It does this primarily by updating the manifests for all of the Second-Level Operators. + +The Cluster Version Operator, like all operators, is driven by its corresponding Operator custom resources. +This custom resource (i.e. clusterversion object) reports the next available updates considered by the CVO. +CVO gets the next available update information from policy engine of OpenShift update service (OSUS). +OSUS is part of the cluster version object. +This allows the cluster updates to be driven both by the console, OC command line interface and by modifying the clusterversion object manually. +Also clusterversion object can modified to direct the CVO to the policy engine API endpoint provided by any OSUS instance. + + + +The series of steps that the Cluster Version Operator follows is detailed below: + +1. CVO sleeps for a set duration of time plus some jitter. +2. CVO checks in to the upstream Policy Engine, downloading the latest update graph for the channel to which it’s subscribed. +3. CVO determines the next update(s) in the graph and writes them to the "available updates" field in its Operator custom resource. + 1. If there are no updates available, CVO goes back to step 1. +4. If automatic updates are enabled, CVO writes the newest update into the "desired update" field in its Operator custom resource. +5. CVO waits for the "desired update" field in its Operator custom resource to be set to something other than its current version. +6. CVO instructs the local container runtime to download the image specified in the "desired update" field. +7. CVO validates the digest in the downloaded image and verifies that it was signed by the private half of one of its hard coded keys. + 1. If the image is invalid, it is removed from the local system and CVO goes back to step 1. +8. CVO validates that the downloaded image can be applied to the currently running version by inspecting `release-metadata`. + 1. If the image cannot be applied, it is removed from the local system and CVO goes back to step 1. +9. CVO applies the deployment for itself, triggering Kubernetes to replace CVO with a newer version. +10. CVO applies the remainder of the deployments from the downloaded image, in order, triggering the SLOs to begin updating. +11. CVO waits for all of the SLOs to report that they are in a done state. +12. CVO goes back to step 1. \ No newline at end of file