Skip to content

Conversation

@jlojosnegros
Copy link
Contributor

@jlojosnegros jlojosnegros commented Nov 3, 2023

We need to speed up bootstrap.
To do so we need to apply kernel boot arguments without restarting node.

Add a new render command to prepare all tuned profiles, run tuneD, read bootcmdline and render a MachineConfig to apply those kernel arguments.

Warning: running tuneD would modify some system files so this command should be executed in a properly isolated environment.

see: openshift/installer#7692

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 3, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 3, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@jlojosnegros jlojosnegros mentioned this pull request Nov 6, 2023
@jlojosnegros jlojosnegros force-pushed the reduce-reboots branch 2 times, most recently from c99cc52 to bf92d4c Compare November 7, 2023 15:03
@jlojosnegros jlojosnegros changed the title WIP render command render command Nov 7, 2023
@jlojosnegros
Copy link
Contributor Author

/test all

@jlojosnegros
Copy link
Contributor Author

/retest

@jlojosnegros jlojosnegros force-pushed the reduce-reboots branch 2 times, most recently from fffd0e4 to da99c2c Compare November 8, 2023 15:40
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 13, 2023
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 14, 2023
@jlojosnegros jlojosnegros marked this pull request as ready for review November 14, 2023 11:51
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 14, 2023
@openshift-ci openshift-ci bot requested review from kpouget and yanirq November 14, 2023 11:52
@jlojosnegros
Copy link
Contributor Author

/cc @vitus133

@openshift-ci openshift-ci bot requested a review from vitus133 November 15, 2023 08:31
@jlojosnegros
Copy link
Contributor Author

/CC @MarSik

@openshift-ci openshift-ci bot requested a review from MarSik November 15, 2023 08:32
@jlojosnegros jlojosnegros force-pushed the reduce-reboots branch 3 times, most recently from 6d33a60 to f5940f4 Compare November 23, 2023 11:18
tuneD = append(tuneD, tunedFromPP)
}

tuneDrecommended := operator.TunedRecommend(tuneD)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should go to an MCP loop in case there are multiple pools with multiple profiles.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I just realized we need to do the opposite. We need to make sure we will select only render the tuned and perf profile for the master MCP (probably). We do not know the proper cpu topology for the other MCPs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So ... still need to load all the PerformanceProfiles? or just those which will match with MCP master?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think master only and only on SNO.

Signed-off-by: Jose Luis Ojosnegros Manchón <[email protected]>
@yanirq
Copy link
Contributor

yanirq commented Dec 7, 2023

/retest

@MarSik
Copy link
Contributor

MarSik commented Dec 7, 2023

/retitle CNF-10238: NTO render command for SNO boot arguments
/approve
/lgtm

We want to merge the current working solution (tested) and iterate from there to avoid unnecessary duplication of work wrt OCP branching.

@openshift-ci openshift-ci bot changed the title render command CNF-10238: NTO render command for SNO boot arguments Dec 7, 2023
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 7, 2023
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Dec 7, 2023

@jlojosnegros: This pull request references CNF-10238 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "4.15.0" version, but no target version was set.

Details

In response to this:

We need to speed up bootstrap.
To do so we need to apply kernel boot arguments without restarting node.

Add a new render command to prepare all tuned profiles, run tuneD, read bootcmdline and render a MachineConfig to apply those kernel arguments.

Warning: running tuneD would modify some system files so this command should be executed in a properly isolated environment.

see: openshift/installer#7692

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@MarSik
Copy link
Contributor

MarSik commented Dec 7, 2023

/lgtm

@MarSik
Copy link
Contributor

MarSik commented Dec 7, 2023

/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 7, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 7, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jlojosnegros, MarSik

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 7, 2023
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD e9fa899 and 2 for PR HEAD f92e268 in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 7, 2023

@jlojosnegros: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 901f395 into openshift:master Dec 7, 2023
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

This PR has been included in build cluster-node-tuning-operator-container-v4.15.0-202312071813.p0.g901f395.assembly.stream for distgit cluster-node-tuning-operator.
All builds following this will include this PR.

jmencak added a commit to jmencak/cluster-node-tuning-operator that referenced this pull request Jun 30, 2024
This change combines PRs 970, 998 and 1024 which fixed OCPBUGS-30647 in 4.16.

Summary of changes:
  * Even though there is currently no namespace collision with TuneD
    using in/var/lib/tuned, change this path to /var/lib/ocp-tuned
  * Remove bin/run.  While this means a little code duplication across
    Containerfiles, we no longer need to do anything special at run
    time.  This should make things easier for the future.
  * Do not inherit --enable-leader-election and --version NTO flags
    as they are not handled by subcommands anyway (yet)
  * Remove openshift-tuned binary and use NTO subcommand instead.
  * /var/lib/tuned/profiles-data is no longer used, remove it.
  * Remove openshift-tuned PID file code.  It is no longer used.
  * Clean up after openshift#844
  * Remove TuneD timeout code and reload on ERRORs
  * fix logging in updateTunedProfile() and optimize the calls
    to update node annotations and update Profile.Status
  * clean up tunedStop() to return only one value
  * during TuneD process shutdown, handle the fact the TuneD process
    might have already exitted
  * the openshift-tuned operand now no longer unnecessarily exits when
    TuneD process exits; when TuneD process exits, wait for k8s object
    changes and only then restart TuneD
  * do not use buffered channels
  * the indication that TuneD is reloading is now a status bit potentially
    reportable back to the operator
  * introduce Change type for the TuneD event processor to avoid races, where
    it was previously possible to change TuneD configuration during TuneD profile
    reload
  * register the fact TuneD finished reloading in case the primary TuneD profile
    does not exist
  * conditional TuneD reload when Cloud Provider changes
  * minor logging and comment improvements

Resolves: OCPBUGS-36355
jmencak added a commit to jmencak/cluster-node-tuning-operator that referenced this pull request Jul 1, 2024
This change combines PRs 970, 998 and 1024 which fixed OCPBUGS-30647 in 4.16.

Summary of changes:
  * Even though there is currently no namespace collision with TuneD
    using in/var/lib/tuned, change this path to /var/lib/ocp-tuned
  * Remove bin/run.  While this means a little code duplication across
    Containerfiles, we no longer need to do anything special at run
    time.  This should make things easier for the future.
  * Do not inherit --enable-leader-election and --version NTO flags
    as they are not handled by subcommands anyway (yet)
  * Remove openshift-tuned binary and use NTO subcommand instead.
  * /var/lib/tuned/profiles-data is no longer used, remove it.
  * Remove openshift-tuned PID file code.  It is no longer used.
  * Clean up after openshift#844
  * Remove TuneD timeout code and reload on ERRORs
  * fix logging in updateTunedProfile() and optimize the calls
    to update node annotations and update Profile.Status
  * clean up tunedStop() to return only one value
  * during TuneD process shutdown, handle the fact the TuneD process
    might have already exitted
  * the openshift-tuned operand now no longer unnecessarily exits when
    TuneD process exits; when TuneD process exits, wait for k8s object
    changes and only then restart TuneD
  * do not use buffered channels
  * the indication that TuneD is reloading is now a status bit potentially
    reportable back to the operator
  * introduce Change type for the TuneD event processor to avoid races, where
    it was previously possible to change TuneD configuration during TuneD profile
    reload
  * register the fact TuneD finished reloading in case the primary TuneD profile
    does not exist
  * conditional TuneD reload when Cloud Provider changes
  * minor logging and comment improvements

Resolves: OCPBUGS-36355
openshift-merge-bot bot pushed a commit that referenced this pull request Jul 10, 2024
This change combines PRs 970, 998 and 1024 which fixed OCPBUGS-30647 in 4.16.

Summary of changes:
  * Even though there is currently no namespace collision with TuneD
    using in/var/lib/tuned, change this path to /var/lib/ocp-tuned
  * Remove bin/run.  While this means a little code duplication across
    Containerfiles, we no longer need to do anything special at run
    time.  This should make things easier for the future.
  * Do not inherit --enable-leader-election and --version NTO flags
    as they are not handled by subcommands anyway (yet)
  * Remove openshift-tuned binary and use NTO subcommand instead.
  * /var/lib/tuned/profiles-data is no longer used, remove it.
  * Remove openshift-tuned PID file code.  It is no longer used.
  * Clean up after #844
  * Remove TuneD timeout code and reload on ERRORs
  * fix logging in updateTunedProfile() and optimize the calls
    to update node annotations and update Profile.Status
  * clean up tunedStop() to return only one value
  * during TuneD process shutdown, handle the fact the TuneD process
    might have already exitted
  * the openshift-tuned operand now no longer unnecessarily exits when
    TuneD process exits; when TuneD process exits, wait for k8s object
    changes and only then restart TuneD
  * do not use buffered channels
  * the indication that TuneD is reloading is now a status bit potentially
    reportable back to the operator
  * introduce Change type for the TuneD event processor to avoid races, where
    it was previously possible to change TuneD configuration during TuneD profile
    reload
  * register the fact TuneD finished reloading in case the primary TuneD profile
    does not exist
  * conditional TuneD reload when Cloud Provider changes
  * minor logging and comment improvements

Resolves: OCPBUGS-36355

Co-authored-by: Jiri Mencak <[email protected]>
jmencak added a commit to jmencak/cluster-node-tuning-operator that referenced this pull request Jul 30, 2024
This is a backport of openshift#1095 which fixed OCPBUGS-36355 in 4.15.

Summary of changes:
  * Change the operand's home directory from TuneD's artifacts
    directory /var/lib/tuned to /var/lib/ocp-tuned
  * Remove bin/run.  While this means a little code duplication across
    Containerfiles, we no longer need to do anything special at run
    time.  This should make things easier for the future.
  * Do not inherit --enable-leader-election and --version NTO flags
    as they are not handled by subcommands anyway (yet)
  * Remove openshift-tuned binary and use NTO subcommand instead.
  * /var/lib/tuned/profiles-data is no longer used, remove it.
  * Remove openshift-tuned PID file code.  It is no longer used.
  * Clean up after openshift#844
  * Remove TuneD timeout code and reload on ERRORs
  * Fix logging in updateTunedProfile() and optimize the calls
    to update node annotations and update Profile.Status
  * Clean up tunedStop() to return only one value
  * During TuneD process shutdown, handle the fact the TuneD process
    might have already exitted
  * The openshift-tuned operand now no longer unnecessarily exits when
    TuneD process exits; when TuneD process exits, wait for k8s object
    changes and only then restart TuneD
  * Do not use buffered channels
  * The indication that TuneD is reloading is now a status bit potentially
    reportable back to the operator
  * Introduce Change type for the TuneD event processor to avoid races, where
    it was previously possible to change TuneD configuration during TuneD profile
    reload
  * Register the fact TuneD finished reloading in case the primary TuneD profile
    does not exist
  * Conditional TuneD reload when Cloud Provider changes
  * Minor logging and comment improvements

Resolves: OCPBUGS-37734
openshift-merge-bot bot pushed a commit that referenced this pull request Aug 14, 2024
This is a backport of #1095 which fixed OCPBUGS-36355 in 4.15.

Summary of changes:
  * Change the operand's home directory from TuneD's artifacts
    directory /var/lib/tuned to /var/lib/ocp-tuned
  * Remove bin/run.  While this means a little code duplication across
    Containerfiles, we no longer need to do anything special at run
    time.  This should make things easier for the future.
  * Do not inherit --enable-leader-election and --version NTO flags
    as they are not handled by subcommands anyway (yet)
  * Remove openshift-tuned binary and use NTO subcommand instead.
  * /var/lib/tuned/profiles-data is no longer used, remove it.
  * Remove openshift-tuned PID file code.  It is no longer used.
  * Clean up after #844
  * Remove TuneD timeout code and reload on ERRORs
  * Fix logging in updateTunedProfile() and optimize the calls
    to update node annotations and update Profile.Status
  * Clean up tunedStop() to return only one value
  * During TuneD process shutdown, handle the fact the TuneD process
    might have already exitted
  * The openshift-tuned operand now no longer unnecessarily exits when
    TuneD process exits; when TuneD process exits, wait for k8s object
    changes and only then restart TuneD
  * Do not use buffered channels
  * The indication that TuneD is reloading is now a status bit potentially
    reportable back to the operator
  * Introduce Change type for the TuneD event processor to avoid races, where
    it was previously possible to change TuneD configuration during TuneD profile
    reload
  * Register the fact TuneD finished reloading in case the primary TuneD profile
    does not exist
  * Conditional TuneD reload when Cloud Provider changes
  * Minor logging and comment improvements

Resolves: OCPBUGS-37734

Co-authored-by: Jiri Mencak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants