From b79a7ba30c0624037fc507a4f7c55b6f34ea0c12 Mon Sep 17 00:00:00 2001 From: Zane Bitter Date: Thu, 26 Aug 2021 16:32:29 -0400 Subject: [PATCH 1/4] baremetal: Add strategy for upgrading CoreOS-based deploy image --- .../baremetal/upgrade-coreos-deploy-image.md | 268 ++++++++++++++++++ 1 file changed, 268 insertions(+) create mode 100644 enhancements/baremetal/upgrade-coreos-deploy-image.md diff --git a/enhancements/baremetal/upgrade-coreos-deploy-image.md b/enhancements/baremetal/upgrade-coreos-deploy-image.md new file mode 100644 index 0000000000..fa06b2bf2e --- /dev/null +++ b/enhancements/baremetal/upgrade-coreos-deploy-image.md @@ -0,0 +1,268 @@ +--- +title: upgrade-coreos-deploy-image +authors: + - "@zaneb" +reviewers: + - "@hardys" + - "@dtantsur" + - "@elfosardo" + - "@sadasu" + - "@kirankt" + - "@asalkeld" + - "@cgwalters" + - "@cybertron" + - "@dhellmann" +approvers: + - "@hardys" + - "@sadasu" +creation-date: 2021-08-24 +last-updated: 2021-08-24 +status: implementable +see-also: + - "/enhancements/coreos-bootimages.md" +--- + +# Upgrades of the CoreOS-based deploy image + +## Release Signoff Checklist + +- [ ] Enhancement is `implementable` +- [ ] Design details are appropriately documented from clear requirements +- [ ] Test plan is defined +- [ ] Operational readiness criteria is defined +- [ ] Graduation criteria for dev preview, tech preview, GA +- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/) + +## Summary + +To ensure that ironic-python-agent runs on top of an up-to-date OS, we will +update the CoreOS image URLs in the baremetal Provisioning CR to the latest +specified by the release metadata. For users running disconnected installs, we +will require them to make the latest versions available and block further +upgrades until they do so. + +## Motivation + +Currently, the deploy disk image (i.e. the image running IPA - +ironic-python-agent) is a RHEL kernel plus initrd that is installed (from an +RPM) into the `ironic-ipa-downloader` container image, which in turn is part of +the OpenShift release payload. When the metal3 Pod starts up, the disk image is +copied from the container to a HostPath volume whence it is available to +Ironic. + +The provisioning OS disk image is a separate CoreOS QCOW2 image. The URL for +this is known by the installer. It points to the cloud by default and may be +customised by the user to allow disconnected installs. The URL is stored in the +Provisioning CR at install time and never updated automatically. The image +itself is downloaded once and permanently cached on all of the master nodes. +Never updating the image is tolerable because, upon booting, the CoreOS image +will update itself to the version matching the cluster it is to join. It +remains suboptimal because new Machines will take longer and longer (and more +and more bandwidth) to boot as the cluster ages, and also because support for +particular hardware may theoretically require a particular version of CoreOS. +(The former issue at least exists on all platforms, and this is the subject of +a [long-standing enhancement +proposal](https://github.com/openshift/enhancements/pull/201).) + +We want to change the deploy disk image to use CoreOS. This may take the form +of both an ISO (for hosts that can use virtualmedia) and of a kernel + initrd + +rootfs (for hosts that use PXE). Like the provisioning disk image, the URLs for +these are known by the installer, but they point to the cloud by default and +may be customised by the user to allow disconnected installs. IPA itself is +delivered separately, as a container image as part of the OpenShift release +payload. We do not wish to continue maintaining or shipping the +ironic-ipa-downloader as part of the payload as well, since it (a) is huge and +(b) requires maintenance effort. This effectively extends the limitation that +we are not updating the provisioning OS image to include the deploy image as +well, although we will continue to be able to update IPA itself. + +Once this is in place, we no longer need the QCOW2 image at all, since we can +‘provision’ by asking CoreOS in the deploy image to install itself (using +custom deploy steps in Ironic, exposed as a custom deploy method in Metal³). +However, this requires updating any existing MachineSets, which is not planned +for the first release. + +A naive approach would mean that upon upgrading from an existing cluster, we +would no longer have a guaranteed way of booting into _either_ deploy image: + +* The existing deploy kernel + initrd will still exist on at least one master, + but may not exist on all of them, and not all that do exist are necessarily + the most recent version. Even if we found a way to sync them, we would have no + mechanism to update the image to match the current Ironic version, or fix + bugs, including security bugs. +* We have no way of knowing the URLs for the new deploy image, because they can + only be supplied at install time by the installer. + +### Goals + +* Ensure that no matter which version of OpenShift a cluster was installed + with, we are able to deliver updates to IPA and the OS it runs on. +* Stop maintaining the non-CoreOS RHEL-based IPA image within 1-2 releases. +* Never break existing clusters, even if they are deployed in disconnected + environments. + +### Non-Goals + +* Automatically switch pre-existing MachineSets to deploy with `coreos-install` + instead of via QCOW2 images. +* Update the CoreOS QCOW2 image in the cluster with each OpenShift release. +* Provide the CoreOS images as part of the release payload. + +## Proposal + +We will both ship the code to use the CoreOS image for IPA and continue to ship +the current ironic-ipa-downloader container image (which has the RHEL IPA image +built in) in parallel for one release to ensure no immediate loss of +functionality after an upgrade. + +The release payload [includes metadata](/enhancements/coreos-bootimages.md) +that points to the CoreOS artifacts corresponding to the current running +release. This includes the QCOW2, ISO, kernel, initrd, and rootfs. The actual +images in use are defined in the Provisioning CR Spec. These are fixed at the +time of the initial installation, and may have been customised by the user for +installation in a disconnected environment. Since OpenShift 4.9 there are +fields for each of the image types/parts, although in clusters installed before +this enhancement is implemented, only the QCOW2 field +(`ProvisioningOSDownloadURL`) is set. + +The cluster-baremetal-operator will verify the image URLs as part of +reconciling the Provisioning CR. + +If any of the `PreprovisioningOSDownloadURLs` are not set and the +`ProvisioningOSDownloadURL` is set to point to the regular location (i.e. the +QCOW location has not been customised), then the cluster-baremetal-operator +will update the Provisioning Spec to use the latest images in the +`PreprovisioningOSDownloadURLs`. + +If any of the `PreprovisioningOSDownloadURLs` are set to point to the regular +location (i.e. the ISO or kernel/initramfs/rootfs have not been customised) but +they point to a version that is not the latest, then the +cluster-baremetal-operator will update the Provisioning Spec to use the latest +images in the `PreprovisioningOSDownloadURLs`. Note that the kernel, initramfs +and rootfs must always be changed (or not) in lockstep. + +The `ProvisioningOSDownloadURL` (QCOW2 link) will never be modified +automatically, since there may be MachineSets relying on it (indirectly, via +the image cache). + +If the `ProvisioningOSDownloadURL` has been customised to point to a +non-standard location and any of the `PreprovisioningOSDownloadURLs` are not +set, the cluster-baremetal-operator will attempt to heuristically infer the +correct URLs. It will do so by substitution the release version and file +extension with the latest version and appropriate extension (respectively) +wherever those appear in the QCOW path. It will then attempt to verify the +existence of these files by performing an HTTP HEAD request to the generated +URLs. If the request succeeds, the cluster-baremetal-operator will update the +Provisioning Spec with the generated URL. If it fails, the +cluster-baremetal-operator will report its status as incomplete. This will +prevent upgrading to the next release (which will *not* continue to ship +ironic-ipa-downloader as a backup), until such time as the user manually makes +the required images available. + +If any of the `PreprovisioningOSDownloadURLs` have been customised to point to +a non-standard location and a version that is not the latest, the +cluster-baremetal-operator will perform the same procedure, except that the new +URL for each field will be the existing URL with the version replaced with the +latest one wherever it appears in the path. The status will be reported as +incomplete on failure, which means that users must provide the latest images at +a predictable location for every upgrade of the cluster. + +### User Stories + +As an operator of a disconnected cluster, I want to upgrade my cluster and have it to continue to work for provisioning baremetal machines. + +As an operator of an OpenShift cluster, I want to add to my cluster new +hardware that was not fully supported in RHEL at the time I installed the +cluster. + +As an operator of an OpenShift cluster, I want to ensure that the OS running on +hosts prior to them being provisioned as part of the cluster is up to date with +bug and security fixes. + +An an OpenShift user, I want to stop downloading a extra massive image that is +separate to the one used for cluster members, and based on a different +distribution of RHEL, as part of the release payload. + +### Risks and Mitigations + +If the HEAD request succeeds but does not result in a valid image, we may +report success when in fact we will be unable to boot any hosts with the given +image. The machine-os-downloader container should do some basic due diligence +on the images it downloads. + +## Design Details + +### Open Questions + +* Will marking the cluster-baremetal-operator as incomplete cause an upgrade to + be rolled back? Is there a better form of alert that will prevent a future + upgrade without rolling back the current one? +* Should we try to heuristically infer the URLs at all when they are missing, + or just require the user to manually set them? +* Is it acceptable to require users of disconnected installs to take action on + every upgrade? Is that better or worse than leaving them with an out-of-date + OS to run IPA on (since it will *not* be updated until actually provisioned + as a cluster node). + +### Test Plan + +We will need to test all of the following scenarios: + +* Install with release N +* Upgrade from release N-1 -> N +* Simulated upgrade from release N -> N+1 + +The simulated upgrade will require modifying the image metadata inside the +release payload. + +Furthermore, we will need to test each of these scenarios both with the default +image URLs and with custom URLs for disconnected installs. + +### Graduation Criteria + +N/A + +#### Dev Preview -> Tech Preview + +N/A + +#### Tech Preview -> GA + +N/A + +#### Removing a deprecated feature + +N/A + +### Upgrade / Downgrade Strategy + +See... everything above. + +### Version Skew Strategy + +Changes will happen once both the new metadata and a version of the +cluster-baremetal-operator that supports this feature are present. The order in +which these appear is irrelevant, and changes will only have any discernable +effect on BaremetalHosts newly added or deprovisioned after the update anyway. + +## Implementation History + +N/A + +## Drawbacks + +Users operating disconnected installs will be required to manually make +available the latest CoreOS images on each cluster version upgrade. + +## Alternatives + +Don't try to keep the CoreOS image up to date with each release, and instead +require only that working images have been specified at least once. + +Instead of upgrading, have CoreOS somehow try to update itself in place before +running IPA. (This is likely to be slow, and it's not clear that it is even +possible since we will be running as a live ISO, not written to disk at this +point.) + +Don't try to guess the locations of the images if they are not set, and require +the user to manually specify them. From 0c77b525347a547fae3df2348142bc88110f037f Mon Sep 17 00:00:00 2001 From: Zane Bitter Date: Mon, 30 Aug 2021 14:07:39 -0400 Subject: [PATCH 2/4] Clarifications in response to feedback --- .../baremetal/upgrade-coreos-deploy-image.md | 35 +++++++++++-------- 1 file changed, 21 insertions(+), 14 deletions(-) diff --git a/enhancements/baremetal/upgrade-coreos-deploy-image.md b/enhancements/baremetal/upgrade-coreos-deploy-image.md index fa06b2bf2e..3e1a49add6 100644 --- a/enhancements/baremetal/upgrade-coreos-deploy-image.md +++ b/enhancements/baremetal/upgrade-coreos-deploy-image.md @@ -76,14 +76,16 @@ ironic-ipa-downloader as part of the payload as well, since it (a) is huge and we are not updating the provisioning OS image to include the deploy image as well, although we will continue to be able to update IPA itself. -Once this is in place, we no longer need the QCOW2 image at all, since we can -‘provision’ by asking CoreOS in the deploy image to install itself (using -custom deploy steps in Ironic, exposed as a custom deploy method in Metal³). -However, this requires updating any existing MachineSets, which is not planned -for the first release. +Once the CoreOS-based deploy image is in place, we no longer need the QCOW2 +image at all for newly-deployed clusters, since we can ‘provision’ by asking +CoreOS to install itself (using custom deploy steps in Ironic, exposed as a +custom deploy method in Metal³). However, to use this method on pre-existing +clusters requires updating any existing MachineSets, which is not planned for +the first release. -A naive approach would mean that upon upgrading from an existing cluster, we -would no longer have a guaranteed way of booting into _either_ deploy image: +A naive approach to rolling out the CoreOS-based deploy image would mean that +upon upgrading from an existing cluster, we would no longer have a guaranteed +way of booting into _either_ deploy image: * The existing deploy kernel + initrd will still exist on at least one master, but may not exist on all of them, and not all that do exist are necessarily @@ -148,16 +150,16 @@ the image cache). If the `ProvisioningOSDownloadURL` has been customised to point to a non-standard location and any of the `PreprovisioningOSDownloadURLs` are not set, the cluster-baremetal-operator will attempt to heuristically infer the -correct URLs. It will do so by substitution the release version and file +correct URLs. It will do so by substituting the release version and file extension with the latest version and appropriate extension (respectively) wherever those appear in the QCOW path. It will then attempt to verify the existence of these files by performing an HTTP HEAD request to the generated URLs. If the request succeeds, the cluster-baremetal-operator will update the Provisioning Spec with the generated URL. If it fails, the -cluster-baremetal-operator will report its status as incomplete. This will -prevent upgrading to the next release (which will *not* continue to ship -ironic-ipa-downloader as a backup), until such time as the user manually makes -the required images available. +cluster-baremetal-operator will report its status as Degraded and not +Upgradeable. This will prevent upgrading, since the next major release will +*not* continue to ship ironic-ipa-downloader as a backup, until such time as +the user manually makes the required images available. If any of the `PreprovisioningOSDownloadURLs` have been customised to point to a non-standard location and a version that is not the latest, the @@ -194,9 +196,9 @@ on the images it downloads. ### Open Questions -* Will marking the cluster-baremetal-operator as incomplete cause an upgrade to +* Will marking the cluster-baremetal-operator as degraded cause an upgrade to be rolled back? Is there a better form of alert that will prevent a future - upgrade without rolling back the current one? + major upgrade? * Should we try to heuristically infer the URLs at all when they are missing, or just require the user to manually set them? * Is it acceptable to require users of disconnected installs to take action on @@ -218,6 +220,11 @@ release payload. Furthermore, we will need to test each of these scenarios both with the default image URLs and with custom URLs for disconnected installs. +In the case of disconnected installs, we should be sure test the scenario where +the user initially fails to make the new image available. This must block +further upgrades without breaking anything else (including the ability to +provision new servers), and recover complete once the image is provided. + ### Graduation Criteria N/A From 81dd7e3d4e9473475c5f761f3678bf2cd6c75657 Mon Sep 17 00:00:00 2001 From: Zane Bitter Date: Wed, 8 Sep 2021 21:43:24 -0400 Subject: [PATCH 3/4] Improve details --- .../baremetal/upgrade-coreos-deploy-image.md | 73 +++++++++++-------- 1 file changed, 43 insertions(+), 30 deletions(-) diff --git a/enhancements/baremetal/upgrade-coreos-deploy-image.md b/enhancements/baremetal/upgrade-coreos-deploy-image.md index 3e1a49add6..47e15f2199 100644 --- a/enhancements/baremetal/upgrade-coreos-deploy-image.md +++ b/enhancements/baremetal/upgrade-coreos-deploy-image.md @@ -12,6 +12,8 @@ reviewers: - "@cgwalters" - "@cybertron" - "@dhellmann" + - "@sdodson" + - "@LorbusChris" approvers: - "@hardys" - "@sadasu" @@ -64,7 +66,7 @@ particular hardware may theoretically require a particular version of CoreOS. a [long-standing enhancement proposal](https://github.com/openshift/enhancements/pull/201).) -We want to change the deploy disk image to use CoreOS. This may take the form +We are changing the deploy disk image to use CoreOS. This may take the form of both an ISO (for hosts that can use virtualmedia) and of a kernel + initrd + rootfs (for hosts that use PXE). Like the provisioning disk image, the URLs for these are known by the installer, but they point to the cloud by default and @@ -108,7 +110,7 @@ way of booting into _either_ deploy image: * Automatically switch pre-existing MachineSets to deploy with `coreos-install` instead of via QCOW2 images. * Update the CoreOS QCOW2 image in the cluster with each OpenShift release. -* Provide the CoreOS images as part of the release payload. +* Build the CoreOS images from the release payload. ## Proposal @@ -134,7 +136,10 @@ If any of the `PreprovisioningOSDownloadURLs` are not set and the `ProvisioningOSDownloadURL` is set to point to the regular location (i.e. the QCOW location has not been customised), then the cluster-baremetal-operator will update the Provisioning Spec to use the latest images in the -`PreprovisioningOSDownloadURLs`. +`PreprovisioningOSDownloadURLs`. The servers set by previous versions of the +installer are `rhcos-redirector.apps.art.xq1c.p1.openshiftapps.com` and +`releases-art-rhcos.svc.ci.openshift.org`. OKD uses a separate mirror for +(Fedora) CoreOS, and this too should be treated as a non-customised location. If any of the `PreprovisioningOSDownloadURLs` are set to point to the regular location (i.e. the ISO or kernel/initramfs/rootfs have not been customised) but @@ -150,24 +155,35 @@ the image cache). If the `ProvisioningOSDownloadURL` has been customised to point to a non-standard location and any of the `PreprovisioningOSDownloadURLs` are not set, the cluster-baremetal-operator will attempt to heuristically infer the -correct URLs. It will do so by substituting the release version and file +correct URL(s). It will do so by substituting the release version and file extension with the latest version and appropriate extension (respectively) -wherever those appear in the QCOW path. It will then attempt to verify the -existence of these files by performing an HTTP HEAD request to the generated -URLs. If the request succeeds, the cluster-baremetal-operator will update the -Provisioning Spec with the generated URL. If it fails, the -cluster-baremetal-operator will report its status as Degraded and not -Upgradeable. This will prevent upgrading, since the next major release will -*not* continue to ship ironic-ipa-downloader as a backup, until such time as -the user manually makes the required images available. +wherever those appear in the QCOW path. The default paths include both short +version strings (of the form `/4\.[4-9]\b/`) and longer build versions (of the +form `/4[4-9]\.[0-9]{2}\.[0-9]{12}-[0-9]/`). The cluster-baremetal-operator +will then attempt to verify the existence of these files by performing an HTTP +HEAD request to the generated URLs. If the request succeeds, the +cluster-baremetal-operator will update the Provisioning Spec with the generated +URL. If it fails, the cluster-baremetal-operator will report its status as +Degraded and not Upgradeable. This will prevent upgrading, since the next major +release will *not* continue to ship ironic-ipa-downloader as a backup, until +such time as the user manually makes the required images available. +(Unfortunately this prevents even minor version upgrades, when it is only the +next major version that we really want to block.) If any of the `PreprovisioningOSDownloadURLs` have been customised to point to a non-standard location and a version that is not the latest, the cluster-baremetal-operator will perform the same procedure, except that the new URL for each field will be the existing URL with the version replaced with the -latest one wherever it appears in the path. The status will be reported as -incomplete on failure, which means that users must provide the latest images at -a predictable location for every upgrade of the cluster. +latest one wherever it appears in the path. If this fails, the operator will +report its status as Degraded but still Upgradeable. + +To ensure that the machine config will work with different versions of +Ignition, we will restore a [change to the Machine Config +Operator](https://github.com/openshift/machine-config-operator/pull/1792) that +was [previously +reverted](https://github.com/openshift/machine-config-operator/pull/2126) but +should now be viable after [fixes to the +installer](https://github.com/openshift/installer/pull/4413). ### User Stories @@ -187,32 +203,26 @@ distribution of RHEL, as part of the release payload. ### Risks and Mitigations -If the HEAD request succeeds but does not result in a valid image, we may -report success when in fact we will be unable to boot any hosts with the given -image. The machine-os-downloader container should do some basic due diligence -on the images it downloads. +If the HEAD request succeeds but does not refer to a valid image, we may report +success when in fact we will be unable to boot any hosts with the given image. +The expected checksum is available in the metadata, so the +machine-os-downloader should check it. However, if this fails it will block the +rollout of Metal³ altogether. ## Design Details ### Open Questions -* Will marking the cluster-baremetal-operator as degraded cause an upgrade to - be rolled back? Is there a better form of alert that will prevent a future - major upgrade? * Should we try to heuristically infer the URLs at all when they are missing, or just require the user to manually set them? -* Is it acceptable to require users of disconnected installs to take action on - every upgrade? Is that better or worse than leaving them with an out-of-date - OS to run IPA on (since it will *not* be updated until actually provisioned - as a cluster node). ### Test Plan We will need to test all of the following scenarios: -* Install with release N -* Upgrade from release N-1 -> N -* Simulated upgrade from release N -> N+1 +* Install with release 4.y +* Upgrade from release 4.(y-1) -> 4.y +* Simulated upgrade from release 4.y -> 4.y.z The simulated upgrade will require modifying the image metadata inside the release payload. @@ -223,7 +233,7 @@ image URLs and with custom URLs for disconnected installs. In the case of disconnected installs, we should be sure test the scenario where the user initially fails to make the new image available. This must block further upgrades without breaking anything else (including the ability to -provision new servers), and recover complete once the image is provided. +provision new servers), and recover completely once the image is provided. ### Graduation Criteria @@ -266,6 +276,9 @@ available the latest CoreOS images on each cluster version upgrade. Don't try to keep the CoreOS image up to date with each release, and instead require only that working images have been specified at least once. +Wait for it to be possible to [build images within the cluster using only the +release payload](https://github.com/openshift/enhancements/pull/201). + Instead of upgrading, have CoreOS somehow try to update itself in place before running IPA. (This is likely to be slow, and it's not clear that it is even possible since we will be running as a live ISO, not written to disk at this From 42ae607e6d0393d78659f65b94bb02a935badab4 Mon Sep 17 00:00:00 2001 From: Zane Bitter Date: Wed, 8 Sep 2021 21:43:45 -0400 Subject: [PATCH 4/4] Don't propose keeping the image up to date Limit ourselves to finding an image URL when there is none; do not attempt to keep it up to date in perpetuity. --- .../baremetal/upgrade-coreos-deploy-image.md | 48 +++++++++---------- 1 file changed, 23 insertions(+), 25 deletions(-) diff --git a/enhancements/baremetal/upgrade-coreos-deploy-image.md b/enhancements/baremetal/upgrade-coreos-deploy-image.md index 47e15f2199..1b193f83be 100644 --- a/enhancements/baremetal/upgrade-coreos-deploy-image.md +++ b/enhancements/baremetal/upgrade-coreos-deploy-image.md @@ -141,13 +141,6 @@ installer are `rhcos-redirector.apps.art.xq1c.p1.openshiftapps.com` and `releases-art-rhcos.svc.ci.openshift.org`. OKD uses a separate mirror for (Fedora) CoreOS, and this too should be treated as a non-customised location. -If any of the `PreprovisioningOSDownloadURLs` are set to point to the regular -location (i.e. the ISO or kernel/initramfs/rootfs have not been customised) but -they point to a version that is not the latest, then the -cluster-baremetal-operator will update the Provisioning Spec to use the latest -images in the `PreprovisioningOSDownloadURLs`. Note that the kernel, initramfs -and rootfs must always be changed (or not) in lockstep. - The `ProvisioningOSDownloadURL` (QCOW2 link) will never be modified automatically, since there may be MachineSets relying on it (indirectly, via the image cache). @@ -170,21 +163,6 @@ such time as the user manually makes the required images available. (Unfortunately this prevents even minor version upgrades, when it is only the next major version that we really want to block.) -If any of the `PreprovisioningOSDownloadURLs` have been customised to point to -a non-standard location and a version that is not the latest, the -cluster-baremetal-operator will perform the same procedure, except that the new -URL for each field will be the existing URL with the version replaced with the -latest one wherever it appears in the path. If this fails, the operator will -report its status as Degraded but still Upgradeable. - -To ensure that the machine config will work with different versions of -Ignition, we will restore a [change to the Machine Config -Operator](https://github.com/openshift/machine-config-operator/pull/1792) that -was [previously -reverted](https://github.com/openshift/machine-config-operator/pull/2126) but -should now be viable after [fixes to the -installer](https://github.com/openshift/installer/pull/4413). - ### User Stories As an operator of a disconnected cluster, I want to upgrade my cluster and have it to continue to work for provisioning baremetal machines. @@ -213,6 +191,12 @@ rollout of Metal³ altogether. ### Open Questions +* Do we need to restore a [change to the Machine Config + Operator](https://github.com/openshift/machine-config-operator/pull/1792) to + allow working with different versions of Ignition that was [previously + reverted](https://github.com/openshift/machine-config-operator/pull/2126) but + should now be viable after [fixes to the + installer](https://github.com/openshift/installer/pull/4413)? * Should we try to heuristically infer the URLs at all when they are missing, or just require the user to manually set them? @@ -235,6 +219,9 @@ the user initially fails to make the new image available. This must block further upgrades without breaking anything else (including the ability to provision new servers), and recover completely once the image is provided. +In future releases we will need to test that new versions of IPA work with all +of the previous releases of CoreOS. + ### Graduation Criteria N/A @@ -269,12 +256,23 @@ N/A ## Drawbacks Users operating disconnected installs will be required to manually make -available the latest CoreOS images on each cluster version upgrade. +available the latest CoreOS images either before or after the initial upgrade. + +Hosts that are running the ironic agent (i.e. that have not yet been +provisioned as cluster members) will be running out-of-date versions of CoreOS +(though not the agent) that may be missing specific hardware support or contain +bugs including CVEs. (However, this was already the case for the deployed image +in the window where it is updating itself, so the problem of missing hardware +support may not get any worse.) Updating to the latest images requires manual +intervention from the user. ## Alternatives -Don't try to keep the CoreOS image up to date with each release, and instead -require only that working images have been specified at least once. +Use similar heuristics to also update the CoreOS image on each subsequent +release (including minor releases if the image in the metadata changes). In a +disconnected environment, if the image is not available we would mark the +operator as Degraded but leave it as Upgradeable provided that _some_ image is +specified. Wait for it to be possible to [build images within the cluster using only the release payload](https://github.com/openshift/enhancements/pull/201).