CNF-10238: NTO render command for SNO boot arguments #844

jlojosnegros · 2023-11-03T12:32:09Z

We need to speed up bootstrap.
To do so we need to apply kernel boot arguments without restarting node.

Add a new render command to prepare all tuned profiles, run tuneD, read bootcmdline and render a MachineConfig to apply those kernel arguments.

Warning: running tuneD would modify some system files so this command should be executed in a properly isolated environment.

see: openshift/installer#7692

openshift-ci · 2023-11-03T12:32:14Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

jlojosnegros · 2023-11-07T15:05:08Z

/test all

pkg/tuned/cmd/render/render.go

jlojosnegros · 2023-11-08T07:52:37Z

/retest

jlojosnegros · 2023-11-15T08:31:06Z

/cc @vitus133

jlojosnegros · 2023-11-15T08:32:07Z

/CC @MarSik

pkg/tuned/cmd/render/cmd.go

pkg/tuned/cmd/render/render.go

MarSik · 2023-12-05T11:37:56Z

pkg/tuned/cmd/render/render.go

+		tuneD = append(tuneD, tunedFromPP)
+	}
+
+	tuneDrecommended := operator.TunedRecommend(tuneD)


I wonder if this should go to an MCP loop in case there are multiple pools with multiple profiles.

Actually, I just realized we need to do the opposite. We need to make sure we will select only render the tuned and perf profile for the master MCP (probably). We do not know the proper cpu topology for the other MCPs.

So ... still need to load all the PerformanceProfiles? or just those which will match with MCP master?

I think master only and only on SNO.

Signed-off-by: Jose Luis Ojosnegros Manchón <[email protected]>

yanirq · 2023-12-07T10:43:01Z

/retest

MarSik · 2023-12-07T11:02:03Z

/retitle CNF-10238: NTO render command for SNO boot arguments
/approve
/lgtm

We want to merge the current working solution (tested) and iterate from there to avoid unnecessary duplication of work wrt OCP branching.

openshift-ci-robot · 2023-12-07T11:02:28Z

@jlojosnegros: This pull request references CNF-10238 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "4.15.0" version, but no target version was set.

Details

In response to this:

We need to speed up bootstrap.
To do so we need to apply kernel boot arguments without restarting node.

Add a new render command to prepare all tuned profiles, run tuneD, read bootcmdline and render a MachineConfig to apply those kernel arguments.

Warning: running tuneD would modify some system files so this command should be executed in a properly isolated environment.

see: openshift/installer#7692

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

MarSik · 2023-12-07T11:04:13Z

/lgtm

MarSik · 2023-12-07T11:04:20Z

/approve

openshift-ci · 2023-12-07T11:07:03Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jlojosnegros, MarSik

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [MarSik]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2023-12-07T13:15:18Z

/retest-required

Remaining retests: 0 against base HEAD e9fa899 and 2 for PR HEAD f92e268 in total

openshift-ci · 2023-12-07T15:14:48Z

@jlojosnegros: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-bot · 2023-12-07T18:45:47Z

[ART PR BUILD NOTIFIER]

This PR has been included in build cluster-node-tuning-operator-container-v4.15.0-202312071813.p0.g901f395.assembly.stream for distgit cluster-node-tuning-operator.
All builds following this will include this PR.

This change combines PRs 970, 998 and 1024 which fixed OCPBUGS-30647 in 4.16. Summary of changes: * Even though there is currently no namespace collision with TuneD using in/var/lib/tuned, change this path to /var/lib/ocp-tuned * Remove bin/run. While this means a little code duplication across Containerfiles, we no longer need to do anything special at run time. This should make things easier for the future. * Do not inherit --enable-leader-election and --version NTO flags as they are not handled by subcommands anyway (yet) * Remove openshift-tuned binary and use NTO subcommand instead. * /var/lib/tuned/profiles-data is no longer used, remove it. * Remove openshift-tuned PID file code. It is no longer used. * Clean up after openshift#844 * Remove TuneD timeout code and reload on ERRORs * fix logging in updateTunedProfile() and optimize the calls to update node annotations and update Profile.Status * clean up tunedStop() to return only one value * during TuneD process shutdown, handle the fact the TuneD process might have already exitted * the openshift-tuned operand now no longer unnecessarily exits when TuneD process exits; when TuneD process exits, wait for k8s object changes and only then restart TuneD * do not use buffered channels * the indication that TuneD is reloading is now a status bit potentially reportable back to the operator * introduce Change type for the TuneD event processor to avoid races, where it was previously possible to change TuneD configuration during TuneD profile reload * register the fact TuneD finished reloading in case the primary TuneD profile does not exist * conditional TuneD reload when Cloud Provider changes * minor logging and comment improvements Resolves: OCPBUGS-36355

This change combines PRs 970, 998 and 1024 which fixed OCPBUGS-30647 in 4.16. Summary of changes: * Even though there is currently no namespace collision with TuneD using in/var/lib/tuned, change this path to /var/lib/ocp-tuned * Remove bin/run. While this means a little code duplication across Containerfiles, we no longer need to do anything special at run time. This should make things easier for the future. * Do not inherit --enable-leader-election and --version NTO flags as they are not handled by subcommands anyway (yet) * Remove openshift-tuned binary and use NTO subcommand instead. * /var/lib/tuned/profiles-data is no longer used, remove it. * Remove openshift-tuned PID file code. It is no longer used. * Clean up after #844 * Remove TuneD timeout code and reload on ERRORs * fix logging in updateTunedProfile() and optimize the calls to update node annotations and update Profile.Status * clean up tunedStop() to return only one value * during TuneD process shutdown, handle the fact the TuneD process might have already exitted * the openshift-tuned operand now no longer unnecessarily exits when TuneD process exits; when TuneD process exits, wait for k8s object changes and only then restart TuneD * do not use buffered channels * the indication that TuneD is reloading is now a status bit potentially reportable back to the operator * introduce Change type for the TuneD event processor to avoid races, where it was previously possible to change TuneD configuration during TuneD profile reload * register the fact TuneD finished reloading in case the primary TuneD profile does not exist * conditional TuneD reload when Cloud Provider changes * minor logging and comment improvements Resolves: OCPBUGS-36355 Co-authored-by: Jiri Mencak <[email protected]>

This is a backport of openshift#1095 which fixed OCPBUGS-36355 in 4.15. Summary of changes: * Change the operand's home directory from TuneD's artifacts directory /var/lib/tuned to /var/lib/ocp-tuned * Remove bin/run. While this means a little code duplication across Containerfiles, we no longer need to do anything special at run time. This should make things easier for the future. * Do not inherit --enable-leader-election and --version NTO flags as they are not handled by subcommands anyway (yet) * Remove openshift-tuned binary and use NTO subcommand instead. * /var/lib/tuned/profiles-data is no longer used, remove it. * Remove openshift-tuned PID file code. It is no longer used. * Clean up after openshift#844 * Remove TuneD timeout code and reload on ERRORs * Fix logging in updateTunedProfile() and optimize the calls to update node annotations and update Profile.Status * Clean up tunedStop() to return only one value * During TuneD process shutdown, handle the fact the TuneD process might have already exitted * The openshift-tuned operand now no longer unnecessarily exits when TuneD process exits; when TuneD process exits, wait for k8s object changes and only then restart TuneD * Do not use buffered channels * The indication that TuneD is reloading is now a status bit potentially reportable back to the operator * Introduce Change type for the TuneD event processor to avoid races, where it was previously possible to change TuneD configuration during TuneD profile reload * Register the fact TuneD finished reloading in case the primary TuneD profile does not exist * Conditional TuneD reload when Cloud Provider changes * Minor logging and comment improvements Resolves: OCPBUGS-37734

This is a backport of #1095 which fixed OCPBUGS-36355 in 4.15. Summary of changes: * Change the operand's home directory from TuneD's artifacts directory /var/lib/tuned to /var/lib/ocp-tuned * Remove bin/run. While this means a little code duplication across Containerfiles, we no longer need to do anything special at run time. This should make things easier for the future. * Do not inherit --enable-leader-election and --version NTO flags as they are not handled by subcommands anyway (yet) * Remove openshift-tuned binary and use NTO subcommand instead. * /var/lib/tuned/profiles-data is no longer used, remove it. * Remove openshift-tuned PID file code. It is no longer used. * Clean up after #844 * Remove TuneD timeout code and reload on ERRORs * Fix logging in updateTunedProfile() and optimize the calls to update node annotations and update Profile.Status * Clean up tunedStop() to return only one value * During TuneD process shutdown, handle the fact the TuneD process might have already exitted * The openshift-tuned operand now no longer unnecessarily exits when TuneD process exits; when TuneD process exits, wait for k8s object changes and only then restart TuneD * Do not use buffered channels * The indication that TuneD is reloading is now a status bit potentially reportable back to the operator * Introduce Change type for the TuneD event processor to avoid races, where it was previously possible to change TuneD configuration during TuneD profile reload * Register the fact TuneD finished reloading in case the primary TuneD profile does not exist * Conditional TuneD reload when Cloud Provider changes * Minor logging and comment improvements Resolves: OCPBUGS-37734 Co-authored-by: Jiri Mencak <[email protected]>

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 3, 2023

jlojosnegros mentioned this pull request Nov 6, 2023

Tao mod #828

Closed

jlojosnegros force-pushed the reduce-reboots branch 2 times, most recently from c99cc52 to bf92d4c Compare November 7, 2023 15:03

jlojosnegros changed the title ~~WIP render command~~ render command Nov 7, 2023

jlojosnegros force-pushed the reduce-reboots branch from bf92d4c to a5cfd36 Compare November 7, 2023 15:32

jlojosnegros mentioned this pull request Nov 7, 2023

CNF-10170: bootkube.sh: Render kernel boot arguments for SNO openshift/installer#7692

Merged

jlojosnegros commented Nov 7, 2023

View reviewed changes

pkg/tuned/cmd/render/render.go Outdated Show resolved Hide resolved

jlojosnegros force-pushed the reduce-reboots branch 2 times, most recently from fffd0e4 to da99c2c Compare November 8, 2023 15:40

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 13, 2023

jlojosnegros force-pushed the reduce-reboots branch from da99c2c to 03c4288 Compare November 14, 2023 11:51

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 14, 2023

jlojosnegros marked this pull request as ready for review November 14, 2023 11:51

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 14, 2023

openshift-ci bot requested review from kpouget and yanirq November 14, 2023 11:52

jlojosnegros force-pushed the reduce-reboots branch from 03c4288 to c044f65 Compare November 14, 2023 11:54

openshift-ci bot requested a review from vitus133 November 15, 2023 08:31

openshift-ci bot requested a review from MarSik November 15, 2023 08:32

MarSik reviewed Nov 16, 2023

View reviewed changes

pkg/tuned/cmd/render/cmd.go Outdated Show resolved Hide resolved

MarSik reviewed Nov 16, 2023

View reviewed changes

pkg/tuned/cmd/render/render.go Outdated Show resolved Hide resolved

jlojosnegros force-pushed the reduce-reboots branch 3 times, most recently from 6d33a60 to f5940f4 Compare November 23, 2023 11:18

jlojosnegros force-pushed the reduce-reboots branch from 853de8c to 4fabcc5 Compare December 5, 2023 09:29

MarSik reviewed Dec 5, 2023

View reviewed changes

New MachineConfig render command

f92e268

Signed-off-by: Jose Luis Ojosnegros Manchón <[email protected]>

jlojosnegros force-pushed the reduce-reboots branch from 4fabcc5 to f92e268 Compare December 5, 2023 14:11

openshift-ci bot changed the title ~~render command~~ CNF-10238: NTO render command for SNO boot arguments Dec 7, 2023

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 7, 2023

openshift-ci bot assigned MarSik Dec 7, 2023

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 7, 2023

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 7, 2023

openshift-merge-bot bot merged commit 901f395 into openshift:master Dec 7, 2023

jlojosnegros mentioned this pull request Dec 12, 2023

CNF-10587: NTO: Fine tuning render bootcmd mc command #878

Merged

This was referenced Feb 13, 2024

CNF-10170: bootkube.sh: Render kernel boot arguments for SNO openshift/installer#8007

Closed

CNF-10170: Avoid command to fail if there is no PP on input folders #935

Merged

vitus133 mentioned this pull request Mar 5, 2024

CNF-10170: bootkube.sh: Render kernel boot arguments for SNO openshift/installer#8099

Merged

jmencak mentioned this pull request Jun 30, 2024

OCPBUGS-36355: Backport fix for OCPBUGS-30647 #1095

Merged

jmencak mentioned this pull request Jul 30, 2024

OCPBUGS-37734: Backport fix for OCPBUGS-36355 #1126

Merged

CNF-10238: NTO render command for SNO boot arguments #844

CNF-10238: NTO render command for SNO boot arguments #844

Uh oh!

Conversation

jlojosnegros commented Nov 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Nov 3, 2023

Uh oh!

jlojosnegros commented Nov 7, 2023

Uh oh!

Uh oh!

jlojosnegros commented Nov 8, 2023

Uh oh!

jlojosnegros commented Nov 15, 2023

Uh oh!

jlojosnegros commented Nov 15, 2023

Uh oh!

Uh oh!

Uh oh!

MarSik Dec 5, 2023

Choose a reason for hiding this comment

Uh oh!

MarSik Dec 5, 2023

Choose a reason for hiding this comment

Uh oh!

jlojosnegros Dec 5, 2023

Choose a reason for hiding this comment

Uh oh!

MarSik Dec 5, 2023

Choose a reason for hiding this comment

Uh oh!

yanirq commented Dec 7, 2023

Uh oh!

MarSik commented Dec 7, 2023 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Dec 7, 2023 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MarSik commented Dec 7, 2023

Uh oh!

MarSik commented Dec 7, 2023

Uh oh!

openshift-ci bot commented Dec 7, 2023

Uh oh!

openshift-ci-robot commented Dec 7, 2023

Uh oh!

openshift-ci bot commented Dec 7, 2023

Uh oh!

openshift-bot commented Dec 7, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jlojosnegros commented Nov 3, 2023 •

edited

Loading

MarSik commented Dec 7, 2023 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Dec 7, 2023 •

edited by openshift-ci bot

Loading