Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions docs/MachineConfiguration.md
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,8 @@ When a machine boots with `nosmt` Kernel Argument, it disables multi-threading o
### KernelType

This feature is available with OCP 4.4 and onward releases as both `day 1` and `day 2` operation. It allows to choose between traditional and Real Time (RT) kernel on an RHCOS node. Supported values are
`""` or `default` for traditional kernel and `realtime` for RT kernel.
`""` or `default` for traditional kernel, `realtime` for RT kernel and `64k-pages` for 64k memory pages on aarch64.
Note that `64k-pages` and `realtime` cannot be selected at the same time. Also, 64k pages support is limited to aarch64 architecture.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there some way we can limit this change to apply only on aarch64 nodes? and possibly throw some sort of error if the associated machine pool is not aarch64 based?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I know, RT kernel is not supported on ARM64 cluster, So apply kernelType realtime + 64k-pages on ARM64 cluster is invalid case.

If we apply kernelType 64k-pages on AMD64 cluster, machine config pool will be degraded with below error message

- lastTransitionTime: '2023-10-18T06:55:53Z'
  message: 'Failed to render configuration for pool worker: kernelType=64k-pages is
    invalid'
  reason: ''
  status: 'True'
  type: RenderDegraded

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops..did not notice that ...thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also does this error mean that the machine will be unusable when the update fails? i am thinking of a case where we have a multi-arch compute cluster with x86+arm64 compute nodes? would the x86 machines error out and be in a "NotReady" state?

Copy link
Contributor Author

@jbtrystram jbtrystram Oct 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Prashanth684 I'll try it out today and followup
update: Clusterbot is not working for me today so I can't try that quickly without going through the process of building a whole release payload, which I won't be able to do today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that 64k-pages and realtime cannot be selected at the same time. Also, 64k pages support is limited to aarch64 architecture.

Should we also say that realtime cannot be selected on aarch64 to make things clearer (not sure for P/Z)?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also does this error mean that the machine will be unusable when the update fails? i am thinking of a case where we have a multi-arch compute cluster with x86+arm64 compute nodes? would the x86 machines error out and be in a "NotReady" state?

If the realtime kernel is applying on arm64 node, the machine will be degraded i.e. machineconfiguration.openshift.io/state: Degraded annotation machineconfiguration.openshift.io/reason will show you that rt packages are not available

Copy link
Contributor

@sinnykumari sinnykumari Oct 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If user applies unsupported combination, node will goes degraded. MCO applies config per pool. So, it is admin responsibility to make sure that they apply generic config when a pool has multi architecture nodes. Good thing is if one node fails to apply the config, MCD marks the node degraded and hence it won't start upgrading another node (when maxUnavialble: 1) to cascade the issue. On the degraded node, existing stuff would work fine but in order to schedule a new pod, issue on the node will need to be first resolved so that update can complete and node is marked again schedulable.


To set kernelType field during cluster install, see the [installer guide](https://github.com/openshift/installer/blob/master/docs/user/customization.md#Switching-RHCOS-host-kernel-using-KernelType).

Expand All @@ -201,7 +202,13 @@ spec:
**Note:** The RT kernel lowers throughput (performance) in return for improved worst-case latency bounds. This feature is intended only for use cases that require consistent low latency. For more information, see the [Linux Foundation wiki](https://wiki.linuxfoundation.org/realtime/start) and the [RHEL RT portal](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_for_real_time/8/).

### RHCOS Extensions
RHCOS is a minimal OCP focused OS which provides capabilities common across all the platforms. With extensions support, OCP 4.6 and onward users can enable a limited set of additional functionality on the RHCOS nodes. In OCP 4.6 the supported extensions is `usbguard`. In OCP 4.8 the supported extensions are `usbguard` and `sandboxed-containers`. In OCP 4.11 the supported extensions are `usbguard`, `sandboxed-containers`, and `kerberos`. In OCP 4.14 the supported extensions are `usbguard`, `sandboxed-containers`, `kerberos`, `ipsec` and `wasm`.
RHCOS is a minimal OCP focused OS which provides capabilities common across all the platforms. With extensions support, OCP 4.6 and onward users can enable a limited set of additional functionality on the RHCOS nodes.
| OCP version | Supported extensions |
| ------------- | ---------------------------- |
| 4.6 | `usbguard` |
| 4.8 | `usbguard`, `sandboxed-containers` |
| 4.11 | `usbguard`, `sandboxed-containers`, `kerberos` |
| 4.14 | `usbguard`, `sandboxed-containers`, `kerberos`, `ipsec`, `wasm` |

Extensions can be installed by creating a MachineConfig object. Extensions can be enabled as both day1 and day2. Check [installer guide](https://github.com/openshift/installer/blob/master/docs/user/customization.md#Enabling-RHCOS-Extensions) to enable extensions during cluster install.

Expand Down
3 changes: 3 additions & 0 deletions pkg/controller/common/constants.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@ const (
// KernelTypeRealtime denominates the realtime kernel type
KernelTypeRealtime = "realtime"

// KernelType64kPages denominates the 64k pages kernel
KernelType64kPages = "64k-pages"

// MasterLabel defines the label associated with master node. The master taint uses the same label as taint's key
MasterLabel = "node-role.kubernetes.io/master"

Expand Down
8 changes: 4 additions & 4 deletions pkg/controller/common/helpers.go
Original file line number Diff line number Diff line change
Expand Up @@ -120,17 +120,17 @@ func MergeMachineConfigs(configs []*mcfgv1.MachineConfig, cconfig *mcfgv1.Contro
return nil, err
}

// Setting FIPS to true or kerneType to realtime in any MachineConfig takes priority in setting that field
// Setting FIPS to true or kernelType to a non-default value in any MachineConfig takes priority in setting that field
for _, cfg := range configs {
if cfg.Spec.FIPS {
fips = true
}
if cfg.Spec.KernelType == KernelTypeRealtime {
if cfg.Spec.KernelType == KernelTypeRealtime || cfg.Spec.KernelType == KernelType64kPages {
kernelType = cfg.Spec.KernelType
}
}

// If no MC sets kerneType, then set it to 'default' since that's what it is using
// If no MC sets kernelType, then set it to 'default' since that's what it is using
if kernelType == "" {
kernelType = KernelTypeDefault
}
Expand Down Expand Up @@ -569,7 +569,7 @@ func InSlice(elem string, slice []string) bool {

// ValidateMachineConfig validates that given MachineConfig Spec is valid.
func ValidateMachineConfig(cfg mcfgv1.MachineConfigSpec) error {
if !(cfg.KernelType == "" || cfg.KernelType == KernelTypeDefault || cfg.KernelType == KernelTypeRealtime) {
if !(cfg.KernelType == "" || cfg.KernelType == KernelTypeDefault || cfg.KernelType == KernelTypeRealtime || cfg.KernelType == KernelType64kPages) {
return fmt.Errorf("kernelType=%s is invalid", cfg.KernelType)
}

Expand Down
55 changes: 37 additions & 18 deletions pkg/daemon/update.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ import (
"os/user"
"path/filepath"
"reflect"
goruntime "runtime"
"strconv"
"strings"
"syscall"
Expand Down Expand Up @@ -273,7 +274,7 @@ func (dn *CoreOSDaemon) applyOSChanges(mcDiff machineConfigDiff, oldConfig, newC
}
}

// Only check the image type and excute OS changes if:
// Only check the image type and execute OS changes if:
// - machineconfig changed
// - we're staying on a realtime kernel ( need to run rpm-ostree update )
// - we have extensions ( need to run rpm-ostree update )
Expand All @@ -282,7 +283,9 @@ func (dn *CoreOSDaemon) applyOSChanges(mcDiff machineConfigDiff, oldConfig, newC
// if they were in use, so we also need to preserve that behavior.
// https://issues.redhat.com/browse/OCPBUGS-4049
if mcDiff.osUpdate || mcDiff.extensions || mcDiff.kernelType || mcDiff.kargs ||
canonicalizeKernelType(newConfig.Spec.KernelType) == ctrlcommon.KernelTypeRealtime || len(newConfig.Spec.Extensions) > 0 {
canonicalizeKernelType(newConfig.Spec.KernelType) == ctrlcommon.KernelTypeRealtime ||
canonicalizeKernelType(newConfig.Spec.KernelType) == ctrlcommon.KernelType64kPages ||
len(newConfig.Spec.Extensions) > 0 {

// Throw started/staged events only if there is any update required for the OS
if dn.nodeWriter != nil {
Expand Down Expand Up @@ -722,6 +725,8 @@ func (mcDiff *machineConfigDiff) osChangesString() string {
func canonicalizeKernelType(kernelType string) string {
if kernelType == ctrlcommon.KernelTypeRealtime {
return ctrlcommon.KernelTypeRealtime
} else if kernelType == ctrlcommon.KernelType64kPages {
return ctrlcommon.KernelType64kPages
}
return ctrlcommon.KernelTypeDefault
}
Expand Down Expand Up @@ -1127,7 +1132,7 @@ func (dn *CoreOSDaemon) applyExtensions(oldConfig, newConfig *mcfgv1.MachineConf
}

// switchKernel updates kernel on host with the kernelType specified in MachineConfig.
// Right now it supports default (traditional) and realtime kernel
// Right now it supports default (traditional), realtime kernel and 64k pages kernel
func (dn *CoreOSDaemon) switchKernel(oldConfig, newConfig *mcfgv1.MachineConfig) error {
// We support Kernel update only on RHCOS and SCOS nodes
if !dn.os.IsEL() {
Expand All @@ -1144,12 +1149,17 @@ func (dn *CoreOSDaemon) switchKernel(oldConfig, newConfig *mcfgv1.MachineConfig)
return nil
}

// 64K memory pages kernel is only supported for aarch64
if newKtype == ctrlcommon.KernelType64kPages && goruntime.GOARCH != "arm64" {
return fmt.Errorf("64k-pages is only supported for aarch64 architecture")
}

// TODO: Drop this code and use https://github.com/coreos/rpm-ostree/issues/2542 instead
defaultKernel := []string{"kernel", "kernel-core", "kernel-modules", "kernel-modules-core", "kernel-modules-extra"}
// Note this list explicitly does *not* include kernel-rt as that is a meta-package that tries to pull in a lot
// of other dependencies we don't want for historical reasons.
// kernel-rt also has a split off kernel-rt-kvm subpackage because it's in a separate subscription in RHEL.
realtimeKernel := []string{"kernel-rt-core", "kernel-rt-modules", "kernel-rt-modules-extra", "kernel-rt-kvm"}
hugePagesKernel := []string{"kernel-64k-core", "kernel-64k-modules", "kernel-64k-modules-core", "kernel-64k-modules-extra"}

if oldKtype != newKtype {
logSystem("Initiating switch to kernel %s", newKtype)
Expand All @@ -1165,6 +1175,15 @@ func (dn *CoreOSDaemon) switchKernel(oldConfig, newConfig *mcfgv1.MachineConfig)
args = append(args, "--install", pkg)
}

return runRpmOstree(args...)
} else if newKtype == ctrlcommon.KernelType64kPages {
// Switch to 64k pages kernel
args := []string{"override", "remove"}
args = append(args, defaultKernel...)
for _, pkg := range hugePagesKernel {
args = append(args, "--install", pkg)
}

return runRpmOstree(args...)
}
return fmt.Errorf("unhandled kernel type %s", newKtype)
Expand Down Expand Up @@ -1878,49 +1897,49 @@ func (dn *Daemon) InplaceUpdateViaNewContainer(target string) error {
return nil
}

// queueRevertRTKernel undoes the layering of the RT kernel
func (dn *Daemon) queueRevertRTKernel() error {
// queueRevertKernelSwap undoes the layering of the RT kernel or kernel-64k hugepages
func (dn *Daemon) queueRevertKernelSwap() error {
booted, _, err := dn.NodeUpdaterClient.GetBootedAndStagedDeployment()
if err != nil {
return err
}

// Before we attempt to do an OS update, we must remove the kernel-rt switch
// Before we attempt to do an OS update, we must remove the kernel-rt or kernel-64k switch
// because in the case of updating from RHEL8 to RHEL9, the kernel packages are
// OS version dependent. See also https://github.com/coreos/rpm-ostree/issues/2542
// (Now really what we want to do here is something more like rpm-ostree override reset --kernel
// i.e. the inverse of https://github.com/coreos/rpm-ostree/pull/4322 so that
// we're again not hardcoding even the prefix of kernel packages)
kernelOverrides := []string{}
kernelRtLayers := []string{}
kernelExtLayers := []string{}
for _, removal := range booted.RequestedBaseRemovals {
if removal == "kernel" || strings.HasPrefix(removal, "kernel-") {
kernelOverrides = append(kernelOverrides, removal)
}
}
for _, pkg := range booted.RequestedPackages {
if strings.HasPrefix(pkg, "kernel-rt-") {
kernelRtLayers = append(kernelRtLayers, pkg)
if strings.HasPrefix(pkg, "kernel-rt-") || strings.HasPrefix(pkg, "kernel-64k-") {
kernelExtLayers = append(kernelExtLayers, pkg)
}
}
// We *only* do this switch if the node has done a switch from kernel -> kernel-rt.
// We *only* do this switch if the node has done a switch from kernel -> kernel-rt or kernel-64k.
// We don't want to override any machine-local hotfixes for the kernel package.
// Implicitly in this we don't really support machine-local hotfixes for kernel-rt.
// Implicitly in this we don't really support machine-local hotfixes for kernel-rt or kernel-64k.
// The only sane way to handle that is declarative drop-ins, but really we want to
// just go to deploying pre-built images and not doing per-node mutation with rpm-ostree
// at all.
if len(kernelOverrides) > 0 && len(kernelRtLayers) > 0 {
if len(kernelOverrides) > 0 && len(kernelExtLayers) > 0 {
args := []string{"override", "reset"}
args = append(args, kernelOverrides...)
for _, pkg := range kernelRtLayers {
for _, pkg := range kernelExtLayers {
args = append(args, "--uninstall", pkg)
}
err := runRpmOstree(args...)
if err != nil {
return err
}
} else if len(kernelOverrides) > 0 || len(kernelRtLayers) > 0 {
klog.Infof("notice: detected %d overrides and %d kernel-rt layers", len(kernelOverrides), len(kernelRtLayers))
} else if len(kernelOverrides) > 0 || len(kernelExtLayers) > 0 {
klog.Infof("notice: detected %d kernel overrides and %d kernel-rt or kernel-64k layers", len(kernelOverrides), len(kernelExtLayers))
} else {
klog.Infof("No kernel overrides or replacement detected")
}
Expand Down Expand Up @@ -2094,10 +2113,10 @@ func (dn *CoreOSDaemon) applyLayeredOSChanges(mcDiff machineConfigDiff, oldConfi
}
}()

// If we have an OS update *or* a kernel type change, then we must undo the RT kernel
// If we have an OS update *or* a kernel type change, then we must undo the kernel swap
// enablement.
if mcDiff.osUpdate || mcDiff.kernelType {
if err := dn.queueRevertRTKernel(); err != nil {
if err := dn.queueRevertKernelSwap(); err != nil {
mcdPivotErr.Inc()
return err
}
Expand Down