-
Notifications
You must be signed in to change notification settings - Fork 533
baremetal: Include CoreOS ISO in the release payload #909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
openshift-merge-robot
merged 7 commits into
openshift:master
from
zaneb:coreos-image-in-release
Oct 26, 2021
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
1264dfd
baremetal: Include CoreOS ISO in the release payload
zaneb d9ef46a
Document update triggering mechanism
zaneb 29940bf
Add discussion about multi-architecture support
zaneb 0cbcc14
Answered questions about OKD
zaneb dc30aea
Mention benefits to assisted-service
zaneb 308574b
Further clarifications
zaneb 35c0e09
Describe implementation details for ART
zaneb File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,285 @@ | ||
| --- | ||
| title: coreos-image-in-release | ||
| authors: | ||
| - "@zaneb" | ||
| reviewers: | ||
| - "@hardys" | ||
| - "@dtantsur" | ||
| - "@elfosardo" | ||
| - "@sadasu" | ||
| - "@kirankt" | ||
| - "@asalkeld" | ||
| - "@cgwalters" | ||
| - "@aravindhp" | ||
| - "@jlebon" | ||
| - "@sosiouxme" | ||
| - "@dhellmann" | ||
| - "@sdodson" | ||
| - "@LorbusChris" | ||
| approvers: | ||
| - "@hardys" | ||
| - "@aravindhp" | ||
| creation-date: 2021-09-22 | ||
| last-updated: 2021-09-22 | ||
| status: implementable | ||
| see-also: | ||
| - "/enhancements/coreos-bootimages.md" | ||
| --- | ||
|
|
||
| # Include the CoreOS image in the release for baremetal | ||
|
|
||
| ## Release Signoff Checklist | ||
|
|
||
| - [ ] Enhancement is `implementable` | ||
| - [ ] Design details are appropriately documented from clear requirements | ||
| - [ ] Test plan is defined | ||
| - [ ] Operational readiness criteria is defined | ||
| - [ ] Graduation criteria for dev preview, tech preview, GA | ||
| - [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/) | ||
|
|
||
| ## Summary | ||
|
|
||
| The baremetal platform is switching from the OpenStack QCOW2 CoreOS image to | ||
| the live ISO (as also used for UPI). To ensure that existing disconnected | ||
| clusters can update to this, the ISO image will be included in the release | ||
| payload. This will be balanced out by removing the RHEL image currently shipped | ||
| in the release payload. | ||
|
|
||
| ## Motivation | ||
|
|
||
| Currently, the deploy disk image (i.e. the image running IPA - | ||
| `ironic-python-agent`) is a RHEL kernel plus initrd that is installed (from an | ||
| RPM) into the `ironic-ipa-downloader` container image, which in turn is part of | ||
| the OpenShift release payload. When the metal3 Pod starts up, the disk image is | ||
| copied from the container to a HostPath volume whence it is available to | ||
| Ironic. | ||
|
|
||
| The target OS disk image is a separate CoreOS QCOW2 image. The URL for this is | ||
| known by the installer. It points to the public Internet by default and may be | ||
| customised by the user to allow disconnected installs. The URL is stored in the | ||
| Provisioning CR at install time and never updated automatically. The image | ||
| itself is downloaded once and permanently cached on all of the master nodes. | ||
| Never updating the image is tolerable because, upon booting, the CoreOS image | ||
| will update itself to the version matching the cluster it is to join. It | ||
| remains suboptimal because new Machines will take longer (and use more | ||
| bandwidth) to join once the cluster is upgraded. This issue exists on all | ||
| platforms, and is the subject of a [long-standing enhancement | ||
| proposal](https://github.com/openshift/enhancements/pull/201). Other issues | ||
| specific to the baremetal platform are that boot times for bare metal servers | ||
| can be very long (and therefore the reboot is costly), and that support for | ||
| particular hardware may theoretically require a particular version of CoreOS. | ||
|
|
||
| We are changing the deploy disk image to use the same CoreOS images used for | ||
| UPI deployments. These take the form of both a live ISO (for hosts that can use | ||
| virtualmedia) and of a kernel + initrd + rootfs (for hosts that use PXE). When | ||
| upgrading an existing disconnected cluster, we currently have no way to acquire | ||
| these images without the user manually intervening to mirror them. | ||
|
|
||
| Like the QCOW2 provisioning disk image, the URLs for these images are known by | ||
| the installer, but they point to the cloud by default and would have to be | ||
| customised by the user at install time to allow disconnected installs. | ||
| Following a similar approach to that currently used with the QCOW2 also | ||
| effectively extends the limitation that we are not updating the provisioning OS | ||
| image to include the deploy image as well. | ||
|
|
||
| The assisted-service already uses a similar method to that being implemented | ||
| for IPI, customising a RHCOS ISO to deploy with. To use the on-premises | ||
| assisted-service to install clusters using ZTP, users must similarly mirror the | ||
| ISO. | ||
|
|
||
| The agent itself (IPA) is delivered separately, in a container image as part of | ||
| the OpenShift release payload, so in any event we will continue to be able to | ||
| update IPA. | ||
|
|
||
| We wish to solve the problems with obtaining an up-to-date CoreOS by including | ||
| it in the release payload. | ||
|
|
||
| ### Goals | ||
|
|
||
| * Ensure that no matter which version of OpenShift a cluster was installed | ||
| with, we are able to deliver updates to IPA and the OS it runs on. | ||
zaneb marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| * Stop maintaining and shipping the non-CoreOS, RHEL-based IPA PXE files. | ||
| * Eliminate the need for users doing disconnected installations to mirror | ||
| anything other than the release payload. | ||
| * Allow adding hardware to the cluster that is supported in RHCOS but may not | ||
| have been at the time the cluster was initially created. | ||
| * Eliminate the window in which hosts run an out-of-date (and therefore | ||
| potentially buggy) OS image prior to joining the cluster. | ||
| * Never break existing clusters, even if they are deployed in disconnected | ||
| environments. | ||
|
|
||
| ### Non-Goals | ||
|
|
||
| * Automatically switch pre-existing MachineSets to deploy with | ||
| `coreos-installer` instead of via QCOW2 images. | ||
| * Update the CoreOS QCOW2 image in the cluster with each OpenShift release. | ||
| * Provide CoreOS images for platforms other than baremetal. | ||
| * Eliminate the extra reboot performed to update CoreOS after initial | ||
| provisioning. | ||
|
|
||
| ## Proposal | ||
|
|
||
| Build a container image containing the latest CoreOS ISO and the | ||
| `coreos-installer` binary. This container can be used e.g. as an init container | ||
| to make the ISO available where required. The `coreos-installer iso extract | ||
| pxe` command can also be used to produce the kernel, initrd, and rootfs from | ||
| the ISO for the purposes of PXE booting. | ||
|
|
||
| This image could either replace the existing content of the | ||
| `ironic-ipa-downloader` repo, be built from the new | ||
| `image-customization-controller` repo, or be built from a new repo. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Personally I think it'd be clearer if we created a new repo/image than re-purposing either of these? |
||
|
|
||
| ### User Stories | ||
|
|
||
| As an operator of a disconnected cluster, I want to upgrade my cluster and have it to continue to work for provisioning baremetal machines. | ||
|
|
||
| As an operator of an OpenShift cluster, I want to add to my cluster new | ||
| hardware that was not fully supported in RHEL at the time I installed the | ||
| cluster. | ||
|
|
||
| As an operator of an OpenShift cluster, I want to ensure that the OS running on | ||
| hosts prior to them being provisioned as part of the cluster is up to date with | ||
| bug and security fixes. | ||
|
|
||
| ### Implementation Details/Notes/Constraints | ||
|
|
||
| We will need to restore a [change to the Machine Config | ||
| Operator](https://github.com/openshift/machine-config-operator/pull/1792) to | ||
| allow working with different versions of Ignition that was [previously | ||
| reverted](https://github.com/openshift/machine-config-operator/pull/2126) but | ||
| should now be viable after [fixes to the | ||
| installer](https://github.com/openshift/installer/pull/4413). | ||
|
|
||
| The correct version of CoreOS to use is available from the [stream | ||
| metadata](https://github.com/openshift/installer/blob/master/data/data/rhcos-stream.json) | ||
| in the openshift-installer. This could be obtained by using the installer | ||
| container image as a build image. Deriving the container image from the | ||
| installer container has the convenient side-effect of always triggering a | ||
| rebuild when the version changes. | ||
|
|
||
| Because OpenShift container images are built in an offline environment, it is | ||
| not possible to simply download the image from the public Internet at container | ||
| build time. Instead this will be accomplished by ART defining a "modifications" | ||
| hook that, each time the source is updated before a build, checks whether | ||
| `data/data/rhcos-stream.json` has changed. If it has, ISOs from the locations | ||
| given there are downloaded and made available (via dist-git lookaside cache) in | ||
| the build context with an arch-specific name, something like `rhcos.$arch.iso` | ||
| so that the Dockerfile can select the appropriate arch. | ||
|
|
||
| Different processor architectures require different ISOs (though all are | ||
| represented in the stream metadata). For the foreseeable future, we only need to | ||
| be able to pull a container for the local architecture that contains an ISO for | ||
| that same architecture. In the future, it is possible that the containers for | ||
| each architecture's ISO may themselves need to be multi-architecture, so that | ||
| it is possible to run a container image containing an ISO for a *different* | ||
| architecture. | ||
|
|
||
| The RHCOS release pipeline may need to be adjusted to ensure that the latest | ||
| ISO is always available in the necessary location. | ||
|
|
||
| OKD uses Fedora CoreOS images instead of RHEL CoreOS, but the URLs are | ||
| nevertheless listed in the installer's [stream | ||
| metadata](https://github.com/openshift/installer/blob/master/data/data/fcos-stream.json). | ||
| The build environment for OKD is *not* disconnected, so the container build can | ||
| download the image directly. | ||
|
|
||
| ### Risks and Mitigations | ||
|
|
||
| If the build pipeline does not guarantee the latest RHCOS version gets built | ||
zaneb marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| into the container image, then baremetal platform users may miss out on the | ||
| latest bug and security fixes. | ||
|
|
||
| ## Design Details | ||
|
|
||
| ### Open Questions [optional] | ||
|
|
||
| How will we provide up-to-date images to the container build? | ||
|
|
||
| ### Test Plan | ||
|
|
||
| The expected SHA256 hashes of the ISO and PXE files are available metadata in | ||
| the cluster, so we should be able to verify at runtime that we have the correct | ||
| image. | ||
|
|
||
| ### Graduation Criteria | ||
|
|
||
| #### Dev Preview -> Tech Preview | ||
|
|
||
| N/A | ||
|
|
||
| #### Tech Preview -> GA | ||
|
|
||
| N/A | ||
|
|
||
| #### Removing a deprecated feature | ||
|
|
||
| N/A | ||
|
|
||
| ### Upgrade / Downgrade Strategy | ||
|
|
||
| The container registry will always contain an image with latest ISO. This will | ||
| get rolled out to the `image-customization-controller` pod, and future boots of | ||
| the deploy image will be based on the new ISO. Because the same container image | ||
| will need to be used in an init container for the Pod running ironic (in order | ||
| to provide the PXE files), and changes to it will result in a restart of ironic | ||
| that should ensure that any BaremetalHosts currently booted into the deploy | ||
| image (i.e. not provisioned) will be rebooted into the new one. | ||
zaneb marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| For the initial release, pre-existing clusters will continue to provision with | ||
| QCOW2 images (but now via the new CoreOS-based IPA). Since the MachineSets will | ||
| not be automatically updated, everything will continue to work after | ||
| downgrading again (now via the old RHEL-based IPA). | ||
|
|
||
| Newly-installed clusters cannot be downgraded to a version prior to their | ||
| initial install version. | ||
|
|
||
| ### Version Skew Strategy | ||
|
|
||
| The Cluster Baremetal Operator should ensure that the | ||
| `image-customization-controller` is updated before Ironic, so that reboots of | ||
| non-provisioned nodes triggered by the Ironic restart use the new image. | ||
|
|
||
| ## Implementation History | ||
|
|
||
| N/A | ||
|
|
||
| ## Drawbacks | ||
|
|
||
| Users with disconnected installs will end up mirroring the ISO as part of the | ||
| release payload, even if they are not installing on the baremetal platform and | ||
| thus will make no use of it. The extra data is substantially a duplicate of the | ||
| ostree data already stored in the release payload. While this is less than | ||
| ideal, the fact that this change allows us to remove the RHEL-based IPA image | ||
| already being shipped in the release payload means it is actually a net | ||
| improvement. | ||
|
|
||
| ## Alternatives | ||
|
|
||
| We could use the prototype | ||
| [coreos-diskimage-rehydrator](https://github.com/cgwalters/coreos-diskimage-rehydrator) | ||
| to generate the ISO. This would allow us to generate images for other platforms | ||
| without doubling up on storage. However it would be much simpler to wait until | ||
| building images for other platforms is actually a requirement before | ||
| introducing this complexity. The ISO generating process is an implementation | ||
| detail from the user's perspective, so it can be modified when required. | ||
|
|
||
| We could attempt to generate an ISO from the existing ostree data in the | ||
| Machine Config. However, there is no known way to generate an ISO that is | ||
| bit-for-bit identical to the release, so this presents an unacceptably high | ||
| risk as the ISO used will not be the one that has been tested. | ||
|
|
||
| We could attempt to de-duplicate the data in the reverse direction, by having | ||
| the Machine Config extract its ostree from the ISO. This is theoretically | ||
| possible, but by no means straightforward. It could always be implemented at a | ||
| later date if required. | ||
|
|
||
| We could require users installing or upgrading disconnected clusters to | ||
| [manually mirror the ISO](https://github.com/openshift/enhancements/pull/879). | ||
|
|
||
| ## Infrastructure Needed | ||
|
|
||
| We will need to have the latest RHCOS images available for download into a | ||
| container image in the build environment. | ||
|
|
||
| We may need a new repo to host the Dockerfile to build the container image | ||
| containing CoreOS. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.