Skip to content

Conversation

@jbtrystram
Copy link
Contributor

@jbtrystram jbtrystram commented Sep 4, 2023

Since RHEL 9 reverted to 4K memory pages for aarch64, add a way to
switch to a hugepage kernel.
The MachineConfig should contain the following to trigger the kernel
switch:
spec:
kernelType: 64k-pages

This is exclusive with the realtime kernel option.

xref https://issues.redhat.com/browse/COS-2402
This requires openshift/os#1351

@openshift-ci openshift-ci bot requested review from cdoern and djoshy September 4, 2023 13:24
@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 4, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 4, 2023

Hi @jbtrystram. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@travier
Copy link
Member

travier commented Sep 4, 2023

Not an MCO expert but this extension is more a "kernel-rt" extension than a classic extension so it should likely follow a similar code path.

@travier
Copy link
Member

travier commented Sep 4, 2023

Found:

So we should likely introduce a new value for kernelType.

@jbtrystram
Copy link
Contributor Author

jbtrystram commented Sep 4, 2023

Updated the PR with a first round of changes, following the kernel-rt code path.

Should I duplicate the e2e test TestKernelType and test for the 64k kernel ? Or is this test enough to validate that switching kernels works without testing both cases ?

Also, please wait a bit while I go through a deployment of this to test on a real cluster :)

@cgwalters
Copy link
Member

/ok-to-test
/approve

@openshift-ci openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 6, 2023
@jbtrystram
Copy link
Contributor Author

/retest

Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need to have something similar to queueRevertRTKernel() but for this new 64k-hugepages kernel.

@jbtrystram
Copy link
Contributor Author

I just got this to work nicely on an ARM64 cluster after creating the following machine config :

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: "worker"
  name: 99-worker-hugepages
spec:
  kernelType: 64k-pages

The rendered machineConfig is updated successfully, and the node get updated through the MCD.
After a reboot I get in a debug pod :

sh-4.4# getconf PAGESIZE
65536

@jbtrystram
Copy link
Contributor Author

/retest

@jbtrystram
Copy link
Contributor Author

Here is the recap of some testing I've done today :

  • Deploy a 4.14 OCP on aarch64
  • Upgraded it to a custom release containing this PR + openshift/os
  • Create the following MachineConfig:
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: "worker"
  name: 99-worker-hugepages
spec:
  kernelType: 64k-pages

A new MachineConfig is rendered : rendered-worker-bf7b27b3afb07f5c8659e0ca73363294

MCD log :

I0914 12:40:31.274415 2442 update.go:1975] Initiating switch to kernel 64k-pages
I0914 12:40:31.276810 2442 update.go:1960] Running: rpm-ostree override remove kernel kernel-core kernel-modules kernel-modules-core kernel-modules-extra --install kernel-64k-core --install kernel-64k-modules --install kernel-64k-modules-core --install kernel-64k-modules-extra
Checking out tree 03d5f14...done
Enabled rpm-md repositories: coreos-extensions
Importing rpm-md...done
rpm-md repo 'coreos-extensions' (cached); generated: 2023-09-07T09:25:22Z solvables: 52
Resolving dependencies...done
Importing packages...done
Applying 5 overrides and 4 overlays
Processing packages...done
Running pre scripts...done
Running post scripts...done
Running posttrans scripts...done
Writing rpmdb...done
Generating initramfs...done
Writing OSTree commit...done
Staging deployment...done
Removed:
kernel-5.14.0-284.30.1.el9_2.aarch64
kernel-core-5.14.0-284.30.1.el9_2.aarch64
kernel-modules-5.14.0-284.30.1.el9_2.aarch64
kernel-modules-core-5.14.0-284.30.1.el9_2.aarch64
kernel-modules-extra-5.14.0-284.30.1.el9_2.aarch64
Added:
kernel-64k-core-5.14.0-284.30.1.el9_2.aarch64
kernel-64k-modules-5.14.0-284.30.1.el9_2.aarch64
kernel-64k-modules-core-5.14.0-284.30.1.el9_2.aarch64
kernel-64k-modules-extra-5.14.0-284.30.1.el9_2.aarch64
Use "rpm-ostree override reset" to undo overrides
Run "systemctl reboot" to start a reboot
I0914 12:43:45.102765 2442 update.go:1975] Rebooting node
I0914 12:43:45.104782 2442 update.go:2005] Removing SIGTERM protection
I0914 12:43:45.104812 2442 update.go:1975] initiating reboot: Node will reboot into config rendered-worker-bf7b27b3afb07f5c8659e0ca73363294
I0914 12:43:45.169801 2442 update.go:1975] reboot successful
I0914 12:43:45.171769 2442 daemon.go:652] Node ip-10-0-9-8.us-east-2.compute.internal is queued for a reboot, skipping sync.

[...]

I0914 12:44:49.339598 2479 rpm-ostree.go:307] Running captured: rpm-ostree status
I0914 12:44:49.374411 2479 daemon.go:1473] State: idle
Deployments:
* ostree-unverified-registry:quay.io/jbtrystramtestimages/rhcos-container:latest
  Digest: sha256:0d68cf52beba8ef17fd549d22cdff97abe41701549fb778300f0bfb69484e942
  Version: 414.92.202309070913-0 (2023-09-14T12:18:33Z)
  RemovedBasePackages: kernel-modules-core kernel-modules-extra kernel-core kernel kernel-modules 5.14.0-284.30.1.el9_2
  LayeredPackages: kernel-64k-core kernel-64k-modules kernel-64k-modules-core
  kernel-64k-modules-extra
  ostree-unverified-registry:quay.io/jbtrystramtestimages/rhcos-container:latest
  Digest: sha256:0d68cf52beba8ef17fd549d22cdff97abe41701549fb778300f0bfb69484e942
  Version: 414.92.202309070913-0 (2023-09-14T12:18:33Z)
  • Debug terminal in a node :
sh-4.4# getconf PAGESIZE
65536

All the nodes are up with the new MachineConfig:

$ oc get mcp                                                                                                                                
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-b355df462a7693e9bc4627068958c679   True      False      False      3              3                   3                     0                      119m
worker   rendered-worker-bf7b27b3afb07f5c8659e0ca73363294   True      False      False      3              3                   3                     0                      119m
  • revert MachineConfig to default : oc delete machineconfig 99-worker-hugepages

MCD log

I0914 13:03:32.024426 2484 update.go:1960] Running: rpm-ostree override reset kernel kernel-core kernel-modules kernel-modules-core kernel-modules-extra --uninstall kernel-64k-core --uninstall kernel-64k-modules --uninstall kernel-64k-modules-core --uninstall kernel-64k-modules-extra
Staging deployment...done
Removed:
kernel-64k-core-5.14.0-284.30.1.el9_2.aarch64
kernel-64k-modules-5.14.0-284.30.1.el9_2.aarch64
kernel-64k-modules-core-5.14.0-284.30.1.el9_2.aarch64
kernel-64k-modules-extra-5.14.0-284.30.1.el9_2.aarch64
Added:
kernel-5.14.0-284.30.1.el9_2.aarch64
kernel-core-5.14.0-284.30.1.el9_2.aarch64
kernel-modules-5.14.0-284.30.1.el9_2.aarch64
kernel-modules-core-5.14.0-284.30.1.el9_2.aarch64
kernel-modules-extra-5.14.0-284.30.1.el9_2.aarch64
Run "systemctl reboot" to start a reboot
  • Debug terminal in a node :
sh-4.4# getconf PAGESIZE
4096
sh-5.1# rpm-ostree status
State: idle
Deployments:
* ostree-unverified-registry:quay.io/jbtrystramtestimages/rhcos-container:latest
  Digest: sha256:0d68cf52beba8ef17fd549d22cdff97abe41701549fb778300f0bfb69484e942
  Version: 414.92.202309070913-0 (2023-09-14T12:18:33Z)

All the nodes are back to the previously rendered MachineConfig:

[machine-config-operator] oc get mcp                                              
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-b355df462a7693e9bc4627068958c679   True      False      False      3              3                   3                     0                      151m
worker   rendered-worker-02235dd022ac27fd6988786d72b22f16   True      False      False      3              3                   3                     0                      151m

@travier
Copy link
Member

travier commented Sep 15, 2023

/retest

1 similar comment
@jbtrystram
Copy link
Contributor Author

/retest

Copy link
Contributor

@sinnykumari sinnykumari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please link jira story associated to this PR.
This will need e2e testing added as well, existing kernelType related test is at https://github.com/openshift/machine-config-operator/blob/master/test/e2e/mcd_test.go#L195 .
One question: Are we going to have 64K kernelType for RT kernel as well in future or it is irrelevant?
Putting hold for QE pre-merge testing
/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 25, 2023
@jbtrystram jbtrystram changed the title Add kernel-64k extension Add kernel-64k kernelType Sep 25, 2023
@sinnykumari
Copy link
Contributor

Looking at epic https://issues.redhat.com/browse/COS-2402 and PR description, it seems like this is only relvent to aarch64. I am curious, are we allowing users to switch to kernel-64k on all arches or only on aarch64? If so, mentioning this explicitly in doc would avoid accidental use of this field on other arches. Also perhaps throwing an error message in daemon log on non aarch64 nodes.

@jbtrystram
Copy link
Contributor Author

/hold Now that: openshift/api#1453 has merged I think we should be modifying the CRD there rather than here.

@cdoern see openshift/api#1612

jbtrystram added a commit to jbtrystram/openshift-api that referenced this pull request Oct 10, 2023
@jbtrystram
Copy link
Contributor Author

/hold Now that: openshift/api#1453 has merged I think we should be modifying the CRD there rather than here.

@cdoern see openshift/api#1612

this is now merged :)

@jbtrystram
Copy link
Contributor Author

/retest

@aleskandro
Copy link
Member

aleskandro commented Oct 18, 2023

Hi, this feature cannot be supported on the GCP platform as their VMs do not allow booting kernels with 64k pagesize.

We'll document this, but we wonder if we could error out something in case a user tries to apply an MC for that in GCP. I'd see it as a nice to have for improving the UX behind this feature.

Cc @andymcc @Prashanth684 @jeffdyoung @sergiordlr

This feature is available with OCP 4.4 and onward releases as both `day 1` and `day 2` operation. It allows to choose between traditional and Real Time (RT) kernel on an RHCOS node. Supported values are
`""` or `default` for traditional kernel and `realtime` for RT kernel.
`""` or `default` for traditional kernel, `realtime` for RT kernel and `64k-pages` for 64k memory pages on aarch64.
Note that `64k-pages` and `realtime` cannot be selected at the same time. Also, 64k pages support is limited to aarch64 architecture.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there some way we can limit this change to apply only on aarch64 nodes? and possibly throw some sort of error if the associated machine pool is not aarch64 based?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I know, RT kernel is not supported on ARM64 cluster, So apply kernelType realtime + 64k-pages on ARM64 cluster is invalid case.

If we apply kernelType 64k-pages on AMD64 cluster, machine config pool will be degraded with below error message

- lastTransitionTime: '2023-10-18T06:55:53Z'
  message: 'Failed to render configuration for pool worker: kernelType=64k-pages is
    invalid'
  reason: ''
  status: 'True'
  type: RenderDegraded

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops..did not notice that ...thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also does this error mean that the machine will be unusable when the update fails? i am thinking of a case where we have a multi-arch compute cluster with x86+arm64 compute nodes? would the x86 machines error out and be in a "NotReady" state?

Copy link
Contributor Author

@jbtrystram jbtrystram Oct 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Prashanth684 I'll try it out today and followup
update: Clusterbot is not working for me today so I can't try that quickly without going through the process of building a whole release payload, which I won't be able to do today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that 64k-pages and realtime cannot be selected at the same time. Also, 64k pages support is limited to aarch64 architecture.

Should we also say that realtime cannot be selected on aarch64 to make things clearer (not sure for P/Z)?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also does this error mean that the machine will be unusable when the update fails? i am thinking of a case where we have a multi-arch compute cluster with x86+arm64 compute nodes? would the x86 machines error out and be in a "NotReady" state?

If the realtime kernel is applying on arm64 node, the machine will be degraded i.e. machineconfiguration.openshift.io/state: Degraded annotation machineconfiguration.openshift.io/reason will show you that rt packages are not available

Copy link
Contributor

@sinnykumari sinnykumari Oct 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If user applies unsupported combination, node will goes degraded. MCO applies config per pool. So, it is admin responsibility to make sure that they apply generic config when a pool has multi architecture nodes. Good thing is if one node fails to apply the config, MCD marks the node degraded and hence it won't start upgrading another node (when maxUnavialble: 1) to cascade the issue. On the degraded node, existing stuff would work fine but in order to schedule a new pod, issue on the node will need to be first resolved so that update can complete and node is marked again schedulable.

@rioliu-rh
Copy link

rioliu-rh commented Oct 19, 2023

Update latest test result
executed 4 cases (details can be found in MCO-798) on AWS cluster, all pass
cases are failed on GCP cluster, like mentioned in above comment, CPU does not support 64k-pages kernel, node cannot be started

EFI stub: ERROR: This 64 KB granular kernel is not supported by your CPU

  Failed to boot both default and fallback entries.

Press any key to continue...

MCO QE will execute cases on other platforms later

@rioliu-rh
Copy link

Update latest test result
Executed 4 cases on Azure, no issue found

@jlebon
Copy link
Member

jlebon commented Oct 19, 2023

Hi, this feature cannot be supported on the GCP platform as their VMs do not allow booting kernels with 64k pagesize.

Interesting. Do you have a link about this with more information? Is it for all machine types or only the smaller ones?

We'll document this, but we wonder if we could error out something in case a user tries to apply an MC for that in GCP. I'd see it as a nice to have for improving the UX behind this feature.

If indeed it's not supported at all, my vote would also be to give a nicer error here rather than wait until the user hits a likely much more obscure error down the line. That said, I wouldn't necessarily block this PR on this if @jbtrystram would rather do that as a follow-up.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 19, 2023
@aleskandro
Copy link
Member

aleskandro commented Oct 19, 2023

Hi, this feature cannot be supported on the GCP platform as their VMs do not allow booting kernels with 64k pagesize.

Interesting. Do you have a link about this with more information? Is it for all machine types or only the smaller ones?

Any machine size cannot boot a 64k pagesize Linux kernel on Google Cloud. We have some internal discussions about this, and there should be BZs. I'll try to dig them again.

@rioliu-rh
Copy link

Update latest test result
@sergiordlr did test on both baremetal HA/SNO cluster, no issue found.

Summarize the test status

  • tests are executed on AWS, Azure, GCP and Baremetal platforms
  • tests are passed on AWS, Azure, Baremetal platforms
  • tests are failed on GCP platform, CPU used by GCP VM does not support 64k page size kernel (Doc needed)

@rioliu-rh
Copy link

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 20, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 20, 2023

@rioliu-rh: The label(s) /label qe-approve cannot be applied. These labels are supported: acknowledge-critical-fixes-only, platform/aws, platform/azure, platform/baremetal, platform/google, platform/libvirt, platform/openstack, ga, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, px-approved, docs-approved, qe-approved, downstream-change-needed, rebase/manual, approved, backport-risk-assessed, bugzilla/valid-bug, cherry-pick-approved, jira/valid-bug, staff-eng-approved. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

Details

In response to this:

/label qe-approve

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rioliu-rh
Copy link

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Oct 20, 2023
Since RHEL 9 reverted to 4K memory pages for aarch64, add a way to
switch to a hugepage kernel.
The MachineConfig should contain the following to trigger the kernel
switch:
spec:
  kernelType: 64k-pages

This is exclusive with the `realtime` kernel option.

xref https://issues.redhat.com/browse/COS-2402
This requires openshift/os#1351

Signed-off-by: jbtrystram <jbtrystram@redhat.com>
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 20, 2023
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 20, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 20, 2023

@jbtrystram: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn ddd975a link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@sinnykumari
Copy link
Contributor

QE testing has been done.
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 23, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 23, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, jbtrystram, sinnykumari

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [cgwalters,sinnykumari]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot merged commit c0ee989 into openshift:master Oct 23, 2023
@jbtrystram jbtrystram deleted the arm-64pages branch October 23, 2023 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. qe-approved Signifies that QE has signed off on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants