Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 14 additions & 10 deletions docs/HACKING.md
Original file line number Diff line number Diff line change
Expand Up @@ -271,10 +271,10 @@ Then, to directly push your images, run:
$ ./hack/cluster-push.sh
```

## Hacking on `machine-os-content`
## Hacking on `rhel-coreos` image

If you own part of the operating system (from kernel to kubelet) you
are part of the `machine-os-content`. More information in [OSUpgrades.md](OSUpgrades.md).
are part of the `rhel-coreos` image. More information in [OSUpgrades.md](OSUpgrades.md).
You will want a workflow for testing changes to a cluster.

### Directly applying changes live to a node
Expand All @@ -287,38 +287,42 @@ replace binaries there (e.g. `/usr/bin/crio`). For anything that requires a reb

### Applying a custom oscontainer

A more advanced flow is to build a custom `machine-os-content`
With OCP 4.12+, we are using `rhel-coreos` which is an OCI container image and
can be used as a base image to create custom container.
The easiest way to create and apply custom image is using [layering](https://docs.openshift.com/container-platform/4.15/post_installation_configuration/coreos-layering.html).

A more advanced flow is to build a custom `rhel-coreos`
container; this exercises applying updates the same way that is used by
the default upgrade path. This can be useful if for example you're testing
code related to upgrades. For this, see https://github.com/coreos/coreos-assembler/pull/489
(A future iteration of this document will better describe this part)
But let's assume you have a custom container and have pushed to a registry, for
this example `quay.io/example/machine-os-content:latest`.
this example `quay.io/example/rhel-coreos:latest`.

Once you have an oscontainer, you can again use `oc debug node/` and `pivot` to directly switch
to the target oscontainer, e.g. `pivot quay.io/example/machine-os-content:latest`.
to the target oscontainer, e.g. `rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/example/rhel-coreos:latest`.
If you choose this path though the MCD will go degraded until you revert the change.

If you want to roll it out to the entire cluster using the MCO, first scale down the CVO:
`oc -n openshift-cluster-version scale --replicas=0 deploy/cluster-version-operator`.

Then,
`oc -n openshift-machine-config-operator edit configmap/machine-config-osimageurl`
and change the `osImageURL: quay.io/example/machine-os-content@sha256:...`.
and change the `osImageURL: quay.io/example/rhel-coreos@sha256:...`.
Notice the use of the pull-by-digest form `@sha256`; this is required by the MCO.

This will follow the upgrade process that's normally used for upgrades and only
drain/reboot a single node at a time.

### Replacing `machine-os-content` in a new release image
### Replacing `rhel-coreos` content in a new release image

The method that best matches the way true upgrades work though is to build
a custom release image that includes your custom `machine-os-content` as an
a custom release image that includes your custom `rhel-coreos` as an
override. To do this, follow the instructions above for creating a custom
release image, but instead of overriding `machine-config-operator`, override
`machine-os-content`. Note that today the MCO code requires that the OS
`rhel-coreos`. Note that today the MCO code requires that the OS
come from a "digested" pull spec, e.g.
`oc adm release new ... machine-os-content=quay.io/user/machine-os@sha256:49aefeabe1459e4091859b89ac1bc43d4161296cf80113fb633d59a56018ffa6`.
`oc adm release new ... rhel-coreos=quay.io/user/rhel-coreos@sha256:49aefeabe1459e4091859b89ac1bc43d4161296cf80113fb633d59a56018ffa6`.
It will fail if you use e.g. `:latest`.

At the time of this writing, the [kubelet](https://github.com/smarterclayton/origin/blob/4de957b019aee56931b1a29af148cf64865a969b/images/os/Dockerfile)
Expand Down
62 changes: 30 additions & 32 deletions docs/OSUpgrades.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,26 @@
# OS updates

The MCO manages the Red Hat Enterprise Linux CoreOS (RHCOS) operating system. Further,
the operating system itself is just another part of the [release image](https://github.com/openshift/cluster-version-operator/), called `machine-os-content`.
the operating system itself is just another part of the [release image](https://github.com/openshift/cluster-version-operator/), called `rhel-coreos`.

In other words, the cluster controls the operating system.

# "Bootimage" vs machine-os-content
# "Bootimage" vs rhel-coreos image

We will use the term "bootimage" to mean an initial RHCOS disk image, such
as an AMI, bare metal raw disk image, VMWare VMDK, OpenStack qcow2, etc.
These bootimages are built using [coreos-assembler](https://github.com/coreos/coreos-assembler).

Today, [the installer](https://github.com/openshift/installer/) pins the "bootimages"
it uses, and released installers also pin the release image. As noted above,
release images contain `machine-os-content`, which can be a *different*
release images contain `rhel-coreos`, which can be a *different*
RHCOS version. You can find the installer-pinned bootimage in e.g. [this file](https://github.com/openshift/installer/blob/release-4.4/data/data/rhcos.json).

[A pending enhancement](https://github.com/openshift/enhancements/pull/201) describes
generating and inspecting bootimage data from the release image
(not yet implemented).

It's essential to understand that both the bootimage and the `machine-os-content` container
It's essential to understand that both the bootimage and the `rhel-coreos` container
are both essentially wrappers for an [OSTree](https://github.com/ostreedev/ostree) commit.
The OSTree format is an image format designed for in-place operating system updates; it operates
at the filesystem level (like container images) but (unlike container runtimes) has
Expand All @@ -37,7 +37,7 @@ and in general it can be hard to require that in every environment (for
example, bare metal PXE setups).

[As of today](https://github.com/openshift/machine-config-operator/pull/1766/), when a node boots the MCO serves it Ignition for configuration,
including a systemd unit called `machine-config-operator-firstboot.service`
including a systemd unit called `machine-config-daemon-firstboot.service`
which pulls code onto the host, and then it runs `Before=kubelet.service`
to perform an OS update and reboot.

Expand All @@ -59,15 +59,15 @@ as well as for the control plane.

The bootstrap node's `bootkube.sh` service pulls the release image, which
contains a reference to the MCO (`machine-config-operator`) and also a
reference to a newer `machine-os-content`. The `bootkube.sh` service runs the MCO in
reference to a newer `rhel-coreos`. The `bootkube.sh` service runs the MCO in
"bootstrap" mode to generate and serve Ignition to the master machines.

The control plane nodes wait in the initramfs, retrying until they are able to
fetch the Ignition config from the bootstrap node.

When that succeeds, the above process of `machine-config-daemon-firstboot.service`
runs which extracts OS updates from the `machine-os-content` container image,
and the control plane nodes each reboot (before `kubelet.service` has started).
runs which performs OS update using rpm-ostree by specifying refspec that references `rhel-coreos`
image and the control plane nodes each reboot (before `kubelet.service` has started).

When the control plane nodes reboot and form a cluster, the bootstrap
node is torn down.
Expand All @@ -88,7 +88,7 @@ After a node (whether control plane or worker) has joined the cluster, the MCO
takes over. Previously, each individual node was running systemd units;
now changes are coordinated via the MCO.

When the administrator starts an `oc adm upgrade`, if a new `machine-os-content`
When the administrator starts an `oc adm upgrade`, if a new `rhel-coreos`
is provided in the release image, it will be rolled out to the control plane
and workers.

Expand Down Expand Up @@ -143,11 +143,9 @@ The overall approach here is that the operating system is just one part of the c
Integrity of the OpenShift platform is handled to start by the
[cluster version operator](https://github.com/openshift/cluster-version-operator).
Today the CVO will by default GPG verify the integrity of the release image
before applying it. The release image contains a `sha256` digest of `machine-os-content`
which is used by the MCO for updates. On the host, the container runtime
`podman` verifies the integrity of that `sha256` when pulling the image,
before the MCO reads its content. Hence, there is end-to-end GPG-verified integrity
for the operating system updates (as well as the rest of the cluster components
before applying it. The release image contains a `sha256` digest of `rhel-coreos`
which is used by the MCO for updates. MCD performs update by directly supply `rhel-coreos` image reference to rpm-ostree.
Hence, there is end-to-end GPG-verified integrity for the operating system updates (as well as the rest of the cluster components
which run as regular containers).

Q: Why do you do this weird "ostree repository in container" thing? Why ostree?
Expand All @@ -157,8 +155,8 @@ system that has been in use for many years by multiple distributions. It handle
SELinux and bootloaders, etc. We're just "encapsulating" that system inside
a container image for all of the above reasons (management, etc.).

At some point in the future though it's likely that we will try to change
the `machine-os-content` container to look more like an unpacked container image.
With OCP 4.12 and onward releases, we are using OCI container image that can be used
called `rhel-coreos` in release payload.

Q: How do I look at the content in the ostree repository inside the container?

Expand All @@ -169,23 +167,23 @@ container for example.
From there, probably the simplest thing is to use `oc image extract`
to unpack the container image. Something like this:
```
$ mkdir machine-os-content
$ oc image extract quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:02d810d3eb284e684bd20d342af3a800e955cccf0bb55e23ee0b434956221bdd --path /:machine-os-content
$ find machine-os-content/srv/repo/ -name '*.commit'
machine-os-content/srv/repo/objects/33/dd81479490fbb61a58af8525a71934e7545b9ed72d846d3e32a3f33f6fac9d.commit
$ ostree --repo=machine-os-content/srv/repo ls 33dd81479490fbb61a58af8525a71934e7545b9ed72d846d3e32a3f33f6fac9d
$ mkdir rhel-coreos
$ oc image extract quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6973e9353f29e678cad79fe768c22cfd6697d8aa30d2aeaa78cceea989925ded --path /:rhel-coreos
$ find rhel-coreos/ -name '*.commit'
rhel-coreos/sysroot/ostree/repo/objects/bf/445baa02dbbe3bb4b8a6f216c478c78d2e22f45343879e405a12550ba8664a.commit
$ ostree --repo=rhel-coreos/sysroot/ostree/repo ls bf445baa02dbbe3bb4b8a6f216c478c78d2e22f45343879e405a12550ba8664a
d00755 0 0 0 /
l00777 0 0 0 /bin -> usr/bin
l00777 0 0 0 /home -> var/home
l00777 0 0 0 /lib -> usr/lib
l00777 0 0 0 /lib64 -> usr/lib64
l00777 0 0 0 /media -> run/media
l00777 0 0 0 /mnt -> var/mnt
l00777 0 0 0 /opt -> var/opt
l00777 0 0 0 /ostree -> sysroot/ostree
l00777 0 0 0 /root -> var/roothome
l00777 0 0 0 /sbin -> usr/sbin
l00777 0 0 0 /srv -> var/srv
l00777 1000 1000 0 /bin -> usr/bin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal but this looks...wrong, that content shouldn't be owned by uid 1000.

l00777 1000 1000 0 /home -> var/home
l00777 1000 1000 0 /lib -> usr/lib
l00777 1000 1000 0 /lib64 -> usr/lib64
l00777 1000 1000 0 /media -> run/media
l00777 1000 1000 0 /mnt -> var/mnt
l00777 1000 1000 0 /opt -> var/opt
l00777 1000 1000 0 /ostree -> sysroot/ostree
l00777 1000 1000 0 /root -> var/roothome
l00777 1000 1000 0 /sbin -> usr/sbin
l00777 1000 1000 0 /srv -> var/srv
d00755 0 0 0 /boot
d00755 0 0 0 /dev
d00755 0 0 0 /proc
Expand Down
1 change: 0 additions & 1 deletion pkg/controller/common/helpers_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -287,7 +287,6 @@ func TestMergeMachineConfigs(t *testing.T) {
require.Nil(t, err)
expectedMachineConfig := &mcfgv1.MachineConfig{
Spec: mcfgv1.MachineConfigSpec{
// TODO(jkyros): take this back out when we drop machine-os-content
OSImageURL: GetDefaultBaseImageContainer(&cconfig.Spec),
KernelArguments: []string{},
Config: runtime.RawExtension{
Expand Down
2 changes: 1 addition & 1 deletion pkg/daemon/daemon.go
Original file line number Diff line number Diff line change
Expand Up @@ -1243,7 +1243,7 @@ func (dn *Daemon) RunFirstbootCompleteMachineconfig() error {
}

// Currently, we generally expect the bootimage to be older, but in the special
// case of having bootimage == machine-os-content, and no kernel arguments
// case of having bootimage == rhel-coreos, and no kernel arguments
// specified, then we don't need to do anything here.
mcDiffNotEmpty, err := dn.compareMachineConfig(oldConfig, &mc)
if err != nil {
Expand Down
2 changes: 1 addition & 1 deletion pkg/daemon/update.go
Original file line number Diff line number Diff line change
Expand Up @@ -328,7 +328,7 @@ func podmanCopy(imgURL, osImageContentDir string) (err error) {
// only delete created container, we will delete container image later as we may need it for podmanInspect()
defer podmanRemove(containerName)

// copy the content from create container locally into a temp directory under /run/machine-os-content/
// copy the content from create container locally into a temp directory under /run/
cid := strings.TrimSpace(string(cidBuf))
args = []string{"cp", fmt.Sprintf("%s:/", cid), osImageContentDir}
_, err = pivotutils.RunExtBackground(numRetriesNetCommands, "podman", args...)
Expand Down
2 changes: 1 addition & 1 deletion pkg/daemon/update_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,7 @@ func TestReconcilableDiff(t *testing.T) {
assert.Equal(t, diff.files, true)

newConfig = newMachineConfigFromFiles(oldFiles)
newConfig.Spec.OSImageURL = "example.com/machine-os-content:new"
newConfig.Spec.OSImageURL = "example.com/rhel-coreos:new"
diff, err = reconcilable(oldConfig, newConfig)
checkReconcilableResults(t, "os update", err)
assert.Equal(t, diff.osUpdate, true)
Expand Down