Skip to content

Conversation

@miabbott
Copy link
Member

This proposal covers adding support for real-time kernels to RHCOS and
the ability to select those kernels with the MCO.

@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 19, 2019
@miabbott
Copy link
Member Author

cc: @cgwalters @ashcrow @runcom @sinnykumari @imcleod @darkmuggle @mrguitar

Copy link
Contributor

@darkmuggle darkmuggle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions, but otherwise it looks good to me. The prose is clear and concise.

If there are additional changes required across the OCP product to properly support
real-time kernel (i.e. container runtime, etc) which cannot be delivered at the
same time, the real-time kernel packages can still be included as part of the
`machine-os-content` image. Additionally, exposing the tunable in the MachineConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind clarifying this? The part about lacking full support pivots to removing the tunable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My logic was:

  1. we prep RT kernel support in RHCOS + MCO
  2. we discover there are other components that are not ready
  3. mitigation: keep the RT kernel in the machine-os-content and backout the MCO change, effectively hiding the knob from users

Does that help?


1. Include the `kernel-rt` packages in the `machine-os-content` image
2. Provide tunable in MachineConfig that selects the type of kernel to use
3. Do initial tuning of RHCOS node after real-time kernel is applied
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to clarify what "tuning" is affected. For example, FIPS and encryption?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this falls into the bucket of tuned profiles. @cgwalters can you provide some details?

Copy link
Contributor

@sinnykumari sinnykumari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good to me. One inline comment

- Test removal of `kernel-rt` packages on single RHCOS node
- Test `kernelType: realtime` on OCP cluster with RHCOS nodes using default kernel
- Test `kernelType: default` on OCP cluster with RHCOS nodes using default kernel
- Test `kernelType: default` on OCP cluster with RHCOS nodes already using real-time kernel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifying one MC with setting kerneType to realtime and another one to default can lead to uncertain behavior because rendered MC will pick one of them. Instead of creating another MC with kernelType: default, We might want to test here deleting MC which adds kernelType: realtime .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it; will update that test case.


### Providing the Packages

The proposal is to include the `kernel-rt` packages in the `machine-os-content`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: In the standard machine-os-content. Basically denote it's not a special or different one.


When the MCO parses a MachineConfig with `kernelType: default`, it shall instruct `rpm-ostree`
on the RHCOS node to remove any `kernel-rt` packages and use the default kernel. If the
`kernel-rt` packages are not present, it should be a no-op.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any logging/warning?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't going to get into those details, but no harm in calling it out explicitly.

Copy link
Member

@ashcrow ashcrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a couple minor items but it looks quite good!

for things like nVidia GPUs will need to rebuild the kernel modules
to support the real-time kernel.

- Customers requiring FIPS support **SHOULD NOT** use the real-time kernel.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we handle this situation explicitly in the MCO? Like, the MCO can refuse to enable it (e.g. by setting the kerneltype=default on the rendered MC when fips=true)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems feasible. @sinnykumari WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes should be possible. MCO can print some message saying Not supported when FIPS is already enabled.

This proposal covers adding support for real-time kernels to RHCOS and
the ability to select those kernels with the MCO.
@miabbott
Copy link
Member Author

Finally got around to making the updates ⬆️

Copy link
Member

@ashcrow ashcrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with the latest updates

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 29, 2020
@ashcrow
Copy link
Member

ashcrow commented Jan 30, 2020

Anyone object to merging?

@miabbott
Copy link
Member Author

Can someone give a /lgtm here?

It's just a formality at this point; the machine-os-content already contains the RT kernel as described and the MCO has the functionality to switch kernels

@sdodson
Copy link
Member

sdodson commented Feb 18, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 18, 2020
@openshift-merge-robot openshift-merge-robot merged commit 6d25129 into openshift:master Feb 18, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ashcrow, cgwalters, miabbott, mike-nguyen, sdodson

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants