openshift · openshift-merge-robot · Sep 10, 2020 · Jun 17, 2020 · sdodson · Jun 17, 2020
diff --git a/docs/dev/upgrades.md b/docs/dev/upgrades.md
@@ -56,4 +56,28 @@ Which in practice can be described in runlevels:
 0000_70_*: disruptive node-level components: dns, sdn, multus
 0000_80_*: machine operators
 0000_90_*: reserved for any post-machine updates
-```
+```
+
+## Why does the OpenShift 4 upgrade process "restart" in the middle?
+
+Since the release of OpenShift 4, a somewhat frequently asked question is: Why sometimes during an `oc adm upgrade` (cluster upgrade) does the process appear to re-start partway through?  [This bugzilla](https://bugzilla.redhat.com/show_bug.cgi?id=1690816) for example has a number of duplicates, and I've seen the question appear in chat and email forums.
+
+The answer to this question is worth explaining in detail, because it illustrates some fundamentals of the [self-driving, operator-focused OpenShift 4](https://blog.openshift.com/openshift-4-a-noops-platform/).  During the initial development of OpenShift 4, the toplevel [cluster-version-operator](https://github.com/openshift/cluster-version-operator/) (CVO) and the [machine-config-operator](https://github.com/openshift/machine-config-operator/) (MCO) were developed concurrently (and still are).
+
+The MCO is just one of a number of "second level" operators that the CVO manages.  However, the relationship between the CVO and MCO is somewhat special because the MCO [updates the operating system itself](https://github.com/openshift/machine-config-operator/blob/master/docs/OSUpgrades.md) for the control plane.
+
+If the new release image has an updated operating system (`machine-os-content`), the CVO pulling down an update ends up causing it to (indirectly) restart itself.
+
+This is because in order to apply the OS update (or any config changes) MCO will drain each node it is working on updating, then reboot.  The CVO is just a regular pod (driven by a `deployment`) running in the cluster (`oc -n openshift-cluster-version get pods`); it gets drained and rescheduled just like the rest of the platform it manages, as well as user applications.
+
+Also, besides operating system updates, there's the case where an updated payload changes the CVO image itself.
+
+Today, there's no special support in the CVO for passing "progress" between the previous and new pod; the new pod just looks at the current cluster state and attemps to reconcile between the observed and desired state.  This is generally true of the "second level" operators as well, from the MCO to the network operator, the router, etc.
+
+Hence, the fact that the CVO is terminated and restarted is visible to components watching the `clusterversion` object as the status is recalculated.
+
+I could imagine at some point adding clarification for this; perhaps a basic boolean flag state in e.g. a `ConfigMap` or so that denoted that the pod was drained due to an upgrade, and the new CVO pod would "consume" that flag and include "Resuming upgrade..." text in its status. But I think that's probably all we should do.
+
+By not special casing upgrading itself, the CVO restart works the same way as it would if the kernel hit a panic and froze, or the hardware died, there was an unrecoverable network partition, etc.  By having the "normal" code path work in exactly the same way as the "exceptional" path, we ensure the upgrade process is robust and tested constantly.
+
+In conclusion, OpenShift 4 installations by default have the cluster "self-manage", and the transient cosmetic upgrade status blip is a normal and expected consequence of this.