From 4de0979f33ca3fcb71995ed7b7be4363640613af Mon Sep 17 00:00:00 2001 From: Sebastian Kopacz Date: Wed, 30 Aug 2023 09:11:24 -0400 Subject: [PATCH] OSDOCS-6630: second iteration of how updates work doc --- modules/update-cluster-version-object.adoc | 127 ++++++++++++++++++ modules/update-cvo.adoc | 12 ++ modules/update-evaluate-availability.adoc | 93 ------------- modules/update-manifest-application.adoc | 13 +- modules/update-process-workflow.adoc | 8 +- .../how-updates-work.adoc | 8 +- 6 files changed, 160 insertions(+), 101 deletions(-) create mode 100644 modules/update-cluster-version-object.adoc create mode 100644 modules/update-cvo.adoc diff --git a/modules/update-cluster-version-object.adoc b/modules/update-cluster-version-object.adoc new file mode 100644 index 000000000000..13c98a347757 --- /dev/null +++ b/modules/update-cluster-version-object.adoc @@ -0,0 +1,127 @@ +// Module included in the following assemblies: +// +// * updating/understanding_updates/how-updates-work.adoc + +:_content-type: CONCEPT +[id="update-cluster-version-object_{context}"] += The ClusterVersion object + +One of the resources that the Cluster Version Operator (CVO) monitors is the `ClusterVersion` resource. + +Administrators and OpenShift components can communicate or interact with the CVO through the `ClusterVersion` object. +The desired CVO state is declared through the `ClusterVersion` object and the current CVO state is reflected in the object's status. + +[NOTE] +==== +Do not directly modify the `ClusterVersion` object. Instead, use interfaces such as the `oc` CLI or the web console to declare your update target. +==== + +The CVO continually reconciles the cluster with the target state declared in the `spec` property of the `ClusterVersion` resource. +When the desired release differs from the actual release, that reconciliation updates the cluster. + +//to-do: this might be heading overload, consider deleting this heading if the context switch from the previous paragraph to this content is smooth enough to not require one. +[discrete] +== Update availability data + +The `ClusterVersion` resource also contains information about updates that are available to the cluster. +This includes updates that are available, but not recommended due to a known risk that applies to the cluster. +These updates are known as conditional updates. +To learn how the CVO maintains this information about available updates in the `ClusterVersion` resource, see the "Evaluation of update availability" section. + +* You can inspect all available updates with the following command: ++ +[source,terminal] +---- +$ oc adm upgrade --include-not-recommended +---- ++ +[NOTE] +==== +The additional `--include-not-recommended` parameter includes updates that are available but not recommended due to a known risk that applies to the cluster. +==== ++ +.Example output +[source,terminal] +---- +Cluster version is 4.10.22 + +Upstream is unset, so the cluster will use an appropriate default. +Channel: fast-4.11 (available channels: candidate-4.10, candidate-4.11, eus-4.10, fast-4.10, fast-4.11, stable-4.10) + +Recommended updates: + + VERSION IMAGE + 4.10.26 quay.io/openshift-release-dev/ocp-release@sha256:e1fa1f513068082d97d78be643c369398b0e6820afab708d26acda2262940954 + 4.10.25 quay.io/openshift-release-dev/ocp-release@sha256:ed84fb3fbe026b3bbb4a2637ddd874452ac49c6ead1e15675f257e28664879cc + 4.10.24 quay.io/openshift-release-dev/ocp-release@sha256:aab51636460b5a9757b736a29bc92ada6e6e6282e46b06e6fd483063d590d62a + 4.10.23 quay.io/openshift-release-dev/ocp-release@sha256:e40e49d722cb36a95fa1c03002942b967ccbd7d68de10e003f0baa69abad457b + +Supported but not recommended updates: + + Version: 4.11.0 + Image: quay.io/openshift-release-dev/ocp-release@sha256:300bce8246cf880e792e106607925de0a404484637627edf5f517375517d54a4 + Recommended: False + Reason: RPMOSTreeTimeout + Message: Nodes with substantial numbers of containers and CPU contention may not reconcile machine configuration https://bugzilla.redhat.com/show_bug.cgi?id=2111817#c22 +---- ++ +The `oc adm upgrade` command queries the `ClusterVersion` resource for information about available updates and presents it in a human-readable format. + +* One way to directly inspect the underlying availability data created by the CVO is by querying the `ClusterVersion` resource with the following command: ++ +[source,terminal] +---- +$ oc get clusterversion version -o json | jq '.status.availableUpdates' +---- ++ +.Example output +[source,terminal] +---- +[ + { + "channels": [ + "candidate-4.11", + "candidate-4.12", + "fast-4.11", + "fast-4.12" + ], + "image": "quay.io/openshift-release-dev/ocp-release@sha256:400267c7f4e61c6bfa0a59571467e8bd85c9188e442cbd820cc8263809be3775", + "url": "https://access.redhat.com/errata/RHBA-2023:3213", + "version": "4.11.41" + }, + ... +] +---- + +* A similar command can be used to check conditional updates: ++ +[source,terminal] +---- +$ oc get clusterversion version -o json | jq '.status.conditionalUpdates' +---- ++ +.Example output +[source,terminal] +---- +[ + { + "conditions": [ + { + "lastTransitionTime": "2023-05-30T16:28:59Z", + "message": "The 4.11.36 release only resolves an installation issue https://issues.redhat.com//browse/OCPBUGS-11663 , which does not affect already running clusters. 4.11.36 does not include fixes delivered in recent 4.11.z releases and therefore upgrading from these versions would cause fixed bugs to reappear. Red Hat does not recommend upgrading clusters to 4.11.36 version for this reason. https://access.redhat.com/solutions/7007136", + "reason": "PatchesOlderRelease", + "status": "False", + "type": "Recommended" + } + ], + "release": { + "channels": [...], + "image": "quay.io/openshift-release-dev/ocp-release@sha256:8c04176b771a62abd801fcda3e952633566c8b5ff177b93592e8e8d2d1f8471d", + "url": "https://access.redhat.com/errata/RHBA-2023:1733", + "version": "4.11.36" + }, + "risks": [...] + }, + ... +] +---- \ No newline at end of file diff --git a/modules/update-cvo.adoc b/modules/update-cvo.adoc new file mode 100644 index 000000000000..547f4369bc1e --- /dev/null +++ b/modules/update-cvo.adoc @@ -0,0 +1,12 @@ +// Module included in the following assemblies: +// +// * updating/understanding_updates/how-updates-work.adoc + +:_content-type: CONCEPT +[id="update-cvo_{context}"] += The Cluster Version Operator + +// adding a poorly written, technically inaccurate skeleton of a module for now, which can be replaced/refined by SMEs as they see fit + +The Cluster Version Operator (CVO) is the primary component that orchestrates and facilitates the {product-title} update process. +During installation and standard cluster operation, the CVO is constantly comparing the manifests of managed cluster Operators to in-cluster resources, and reconciling discrepancies to ensure that the actual state of these resources match their desired state. diff --git a/modules/update-evaluate-availability.adoc b/modules/update-evaluate-availability.adoc index 45df8e6f14e5..036087f4a7b1 100644 --- a/modules/update-evaluate-availability.adoc +++ b/modules/update-evaluate-availability.adoc @@ -19,96 +19,3 @@ If the CVO finds that the cluster does not match the risks of an update, or that The user interface, either the web console or the OpenShift CLI (`oc`), presents this information in sectioned headings to the administrator. Each *supported but not recommended* update recommendation contains a link to further resources about the risk so that the administrator can make an informed decision about the update. - -You can inspect all available updates with the following command: - -[source,terminal] ----- -$ oc adm upgrade --include-not-recommended ----- - -The additional `--include-not-recommended` parameter includes updates that are available but not recommended due to a known risk that applies to the cluster. - -.Example output -[source,terminal] ----- -Cluster version is 4.10.22 - -Upstream is unset, so the cluster will use an appropriate default. -Channel: fast-4.11 (available channels: candidate-4.10, candidate-4.11, eus-4.10, fast-4.10, fast-4.11, stable-4.10) - -Recommended updates: - - VERSION IMAGE - 4.10.26 quay.io/openshift-release-dev/ocp-release@sha256:e1fa1f513068082d97d78be643c369398b0e6820afab708d26acda2262940954 - 4.10.25 quay.io/openshift-release-dev/ocp-release@sha256:ed84fb3fbe026b3bbb4a2637ddd874452ac49c6ead1e15675f257e28664879cc - 4.10.24 quay.io/openshift-release-dev/ocp-release@sha256:aab51636460b5a9757b736a29bc92ada6e6e6282e46b06e6fd483063d590d62a - 4.10.23 quay.io/openshift-release-dev/ocp-release@sha256:e40e49d722cb36a95fa1c03002942b967ccbd7d68de10e003f0baa69abad457b - -Supported but not recommended updates: - - Version: 4.11.0 - Image: quay.io/openshift-release-dev/ocp-release@sha256:300bce8246cf880e792e106607925de0a404484637627edf5f517375517d54a4 - Recommended: False - Reason: RPMOSTreeTimeout - Message: Nodes with substantial numbers of containers and CPU contention may not reconcile machine configuration https://bugzilla.redhat.com/show_bug.cgi?id=2111817#c22 ----- - -One way to inspect the underlying availability data created by the CVO is by querying the `ClusterVersion` resource with the following command: - -[source,terminal] ----- -$ oc get clusterversion version -o json | jq '.status.availableUpdates' ----- - -.Example output -[source,terminal] ----- -[ - { - "channels": [ - "candidate-4.11", - "candidate-4.12", - "fast-4.11", - "fast-4.12" - ], - "image": "quay.io/openshift-release-dev/ocp-release@sha256:400267c7f4e61c6bfa0a59571467e8bd85c9188e442cbd820cc8263809be3775", - "url": "https://access.redhat.com/errata/RHBA-2023:3213", - "version": "4.11.41" - }, - ... -] ----- - -A similar command can be used to check conditional updates: - -[source,terminal] ----- -$ oc get clusterversion version -o json | jq '.status.conditionalUpdates' ----- - -.Example output -[source,terminal] ----- -[ - { - "conditions": [ - { - "lastTransitionTime": "2023-05-30T16:28:59Z", - "message": "The 4.11.36 release only resolves an installation issue https://issues.redhat.com//browse/OCPBUGS-11663 , which does not affect already running clusters. 4.11.36 does not include fixes delivered in recent 4.11.z releases and therefore upgrading from these versions would cause fixed bugs to reappear. Red Hat does not recommend upgrading clusters to 4.11.36 version for this reason. https://access.redhat.com/solutions/7007136", - "reason": "PatchesOlderRelease", - "status": "False", - "type": "Recommended" - } - ], - "release": { - "channels": [...], - "image": "quay.io/openshift-release-dev/ocp-release@sha256:8c04176b771a62abd801fcda3e952633566c8b5ff177b93592e8e8d2d1f8471d", - "url": "https://access.redhat.com/errata/RHBA-2023:1733", - "version": "4.11.36" - }, - "risks": [...] - }, - ... -] ----- diff --git a/modules/update-manifest-application.adoc b/modules/update-manifest-application.adoc index 400a9767cfd5..d19970440503 100644 --- a/modules/update-manifest-application.adoc +++ b/modules/update-manifest-application.adoc @@ -38,24 +38,27 @@ The CVO then applies manifests following the generated dependency graph. [NOTE] ==== For some resource types, the CVO monitors the resource after its manifest is applied, and considers it to be successfully updated only after the resource reaches a stable state. -Achieving this stable state can take some time. -This is especially true for cluster Operators, which might perform their own update actions in the cluster after the CVO deploys their new versions. -While the additional update actions take place, these cluster Operators temporarily set their `Progressing` condition to `True`. +Achieving this state can take some time. +This is especially true for `ClusterOperator` resources, while the CVO waits for a cluster Operator to update itself and then update its `ClusterOperator` status. ==== +// to do: potentially reword the note above to clarify that specific resources are being applied at one time, and not necessarily all the resources for that component. + The CVO waits until all cluster Operators in the Runlevel meet the following conditions before it proceeds to the next Runlevel: * The cluster Operators have an `Available=True` condition. * The cluster Operators have a `Degraded=False` condition. +// to do: potentially clarify that this condition is not applicable during installations, and also potentially add documentation (here or elsewhere) that explains how the CVO is constantly reconciling states whether or not an update is happening. + * The cluster Operators declare they have achieved the desired version in their ClusterOperator resource. Some actions can take significant time to finish. The CVO waits for the actions to complete in order to ensure the subsequent Runlevels can proceed safely. -The process of applying all manifests is expected to take 60 to 120 minutes in total; see *Understanding {product-title} update duration* for more information about factors that influence update duration. +Initially reconciling the new release's manifests is expected to take 60 to 120 minutes in total; see *Understanding {product-title} update duration* for more information about factors that influence update duration. image::update-runlevels.png[A diagram displaying the sequence of Runlevels and the manifests of components within each level] In the previous example diagram, the CVO is waiting until all work is completed at Runlevel 20. The CVO has applied all manifests to the Operators in the Runlevel, but the `kube-apiserver-operator ClusterOperator` performs some actions after its new version was deployed. The `kube-apiserver-operator ClusterOperator` declares this progress through the `Progressing=True` condition and by not declaring the new version as reconciled in its `status.versions`. -The CVO waits until the ClusterOperator reports an acceptable status, and then it will start applying manifests at Runlevel 25. +The CVO waits until the ClusterOperator reports an acceptable status, and then it will start reconciling manifests at Runlevel 25. diff --git a/modules/update-process-workflow.adoc b/modules/update-process-workflow.adoc index 3c2091952759..4f3c02836596 100644 --- a/modules/update-process-workflow.adoc +++ b/modules/update-process-workflow.adoc @@ -27,14 +27,18 @@ The job then extracts the manifests and metadata from the release image to a sha Certain conditions can prevent updates from proceeding. These conditions are either determined by the CVO itself, or reported by individual cluster Operators that detect some details about the cluster that the Operator considers problematic for the update. +// to do: potentially add an example of a precondition to the bullet above. + . The CVO records the accepted release in `status.desired` and creates a `status.history` entry about the new update. -. The CVO begins applying the manifests from the release image. +. The CVO begins reconciling the manifests from the release image. Cluster Operators are updated in separate stages called Runlevels, and the CVO ensures that all Operators in a Runlevel finish updating before it proceeds to the next level. . Manifests for the CVO itself are applied early in the process. When the CVO deployment is applied, the current CVO pod terminates, and a CVO pod using the new version starts. -The new CVO proceeds to apply the remaining manifests. +The new CVO proceeds to reconcile the remaining manifests. + +// to do: potentially replace some instances of "apply" in this doc with something like "reconcile" to imply that a lot of these processes are constantly repeating, rather than happening only once. . The update proceeds until the entire control plane is updated to the new version. Individual cluster Operators might perform update tasks on their domain of the cluster, and while they do so, they report their state through the `Progressing=True` condition. diff --git a/updating/understanding_updates/how-updates-work.adoc b/updating/understanding_updates/how-updates-work.adoc index e48f820e2e69..ee381126b963 100644 --- a/updating/understanding_updates/how-updates-work.adoc +++ b/updating/understanding_updates/how-updates-work.adoc @@ -8,8 +8,14 @@ toc::[] The following sections describe each major aspect of the {product-title} (OCP) update process in detail. For a general overview of how updates work, see the xref:../../updating/understanding_updates/intro-to-updates.adoc#understanding-openshift-updates[Introduction to OpenShift updates]. +// The Cluster Version Operator +include::modules/update-cvo.adoc[leveloffset=+1] + +// The ClusterVersion object +include::modules/update-cluster-version-object.adoc[leveloffset=+2] + // Evaluation of update availability -include::modules/update-evaluate-availability.adoc[leveloffset=+1] +include::modules/update-evaluate-availability.adoc[leveloffset=+2] [role="_additional-resources"] .Additional resources