-
Notifications
You must be signed in to change notification settings - Fork 121
OCPBUGS-23167: Add performance real time tuned template #954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-23167: Add performance real time tuned template #954
Conversation
|
@rbaturov: This pull request references Jira Issue OCPBUGS-23167, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
| return nil, err | ||
| } | ||
| name := components.GetComponentName(profile.Name, components.ProfileNamePerformance) | ||
| RealTimeKernelProfileName := components.ProfileNamePerformanceRT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to use the GetComponentName method here too, otherwise two perf profiles will try to create the same profile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get why two perf profiles will try to create the same profile.
Anyway I tried to do that before, what happend was that the tuned profile got posted to the cluster with the name of the profile, i. e "-performance" suffix (openshift-node-performance-rt-performance) while the include section is looking for hardcoded "openshift-node-performance-rt"
2024-02-14 10:49:19,013 ERROR tuned.daemon.daemon: Cannot set initial profile. No tunings will be enabled: Cannot load profile(s) 'openshift-node-performance-performance': Cannot find profile 'openshift-node-performance-rt' in '['/etc/tuned', '/usr/lib/tuned']'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think about the flow when you have two PerformanceProfiles in a cluster:
Each performance profile generates a new Tuned object that contains the tuned profiles. So two performance profiles result in two Tuned objects that will both contain the same openshift-node-performance-rt tuned profile. This results in NTO trying to save both of the defined profiles into the same filename.
You are right the include will have to be improved to refer to the proper name. Template variable with the perf profile name can do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed that.
bcbe9fb to
9f6360c
Compare
MarSik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: MarSik, rbaturov The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/jira refresh |
|
@MarSik: This pull request references Jira Issue OCPBUGS-23167, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
5d0d263 to
aba67d3
Compare
hack/render-sync.sh
Outdated
| rendersync bootstrap-cluster/performance pinned-cluster/default bootstrap/no-mcp | ||
| rendersync bootstrap-cluster/performance pinned-cluster/default bootstrap-cluster/extra-mcp bootstrap/extra-mcp No newline at end of file | ||
| rendersync bootstrap-cluster/performance pinned-cluster/default bootstrap-cluster/extra-mcp bootstrap/extra-mcp | ||
| rendersync base/performance manual-cluster/performance no-ref |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the no-ref set should not include any reference label, we need to parametrize the render to deal with this.
I'm not sure why we are adding the render-sync invocations in this PR btw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will issue a different PR that would parametrize the render to deal with this.
|
/hold depends on this PR to get merged |
Kernel parameters that are not supported on RT kernel systems are being applied, causing errors to be logged to the tuned daemon, resulting in a degraded profile state. Therefore, I added the openshift-node-performance-rt profile that would be included if an RT kernel is detected, thereby dropping the unsupported kernel parameters before they are applied.
This could be great to have a generic test that make sure that after performance profile is applied the tuned profile is not degraded. This would prevent future issues like OCPBUGS-23167.
Updated render-sync to include artifacts that are needed for the e2e tests. Committing the rendered items to this commit.
aba67d3 to
ba51af0
Compare
|
/unhold |
|
@rbaturov: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
ffromani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like a good direction, few comments inside
| } | ||
| }) | ||
|
|
||
| It("Tuned profile shouldn't be degraded", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should not be degraded after what? in this test you check a condition (this part LGTM) but there is no obvious link to how to get to this state. IOW, does this test depend on a specific order? which part of the system creates the conditions we're checking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should not be degraded after tuned profile has been applied to the system.
The object will be degraded if error messages are observed by the tuned daemon when applying the TuneD daemon profile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ffromani Any error in the tuned log causes the state to switch to degraded. This fix removes a typical case - a sysctl that does not exist for the given kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, and what in the test causes any change in the system which could lead to Degraded state?
After reading the surrounding tests (something that could have pointed out earlier) I see other tests do in this debatable way, so this is a more general issue which is out of the scope here.
| rendersync bootstrap-cluster/performance pinned-cluster/default bootstrap-cluster/extra-mcp bootstrap/extra-mcp No newline at end of file | ||
| rendersync bootstrap-cluster/performance pinned-cluster/default bootstrap-cluster/extra-mcp bootstrap/extra-mcp | ||
| rendersync --owner-ref none -- base/performance manual-cluster/performance no-ref | ||
| rendersync --owner-ref none -- base/performance manual-cluster/cpuFrequency default/cpuFrequency No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we want owner reference in this case? it should be only no-ref without the reference (hence the name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This syncs the manifests for this test https://github.com/openshift/cluster-node-tuning-operator/blob/master/test/e2e/performanceprofile/functests-render-command/1_render_command/render_test.go#L115
And it uses no owner references. Probably copy&paste. I believe we should default to having the references most of the time, but I do not think we should fix that as part of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough
ffromani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
@rbaturov: Jira Issue OCPBUGS-23167: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-23167 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/cherry-pick release-4.15 |
|
@yanirq: #954 failed to apply on top of branch "release-4.15": DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
[ART PR BUILD NOTIFIER] This PR has been included in build cluster-node-tuning-operator-container-v4.16.0-202402211939.p0.g19686cd.assembly.stream.el9 for distgit cluster-node-tuning-operator. |
|
Fix included in accepted release 4.16.0-0.nightly-2024-02-22-021321 |
|
/cherry-pick release-4.15 |
|
@rbaturov: #954 failed to apply on top of branch "release-4.15": DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
* Add performance real time tuned template Kernel parameters that are not supported on RT kernel systems are being applied, causing errors to be logged to the tuned daemon, resulting in a degraded profile state. Therefore, I added the openshift-node-performance-rt profile that would be included if an RT kernel is detected, thereby dropping the unsupported kernel parameters before they are applied. * Added a test that make sure the tuned profile is not degraded This could be great to have a generic test that make sure that after performance profile is applied the tuned profile is not degraded. This would prevent future issues like OCPBUGS-23167. * render-sync update Updated render-sync to include artifacts that are needed for the e2e tests. Committing the rendered items to this commit.
* Add performance real time tuned template Kernel parameters that are not supported on RT kernel systems are being applied, causing errors to be logged to the tuned daemon, resulting in a degraded profile state. Therefore, I added the openshift-node-performance-rt profile that would be included if an RT kernel is detected, thereby dropping the unsupported kernel parameters before they are applied. * Added a test that make sure the tuned profile is not degraded This could be great to have a generic test that make sure that after performance profile is applied the tuned profile is not degraded. This would prevent future issues like OCPBUGS-23167. * render-sync update Updated render-sync to include artifacts that are needed for the e2e tests. Committing the rendered items to this commit.
* Add performance real time tuned template Kernel parameters that are not supported on RT kernel systems are being applied, causing errors to be logged to the tuned daemon, resulting in a degraded profile state. Therefore, I added the openshift-node-performance-rt profile that would be included if an RT kernel is detected, thereby dropping the unsupported kernel parameters before they are applied. * Added a test that make sure the tuned profile is not degraded This could be great to have a generic test that make sure that after performance profile is applied the tuned profile is not degraded. This would prevent future issues like OCPBUGS-23167. * render-sync update Updated render-sync to include artifacts that are needed for the e2e tests. Committing the rendered items to this commit.
* Add performance real time tuned template Kernel parameters that are not supported on RT kernel systems are being applied, causing errors to be logged to the tuned daemon, resulting in a degraded profile state. Therefore, I added the openshift-node-performance-rt profile that would be included if an RT kernel is detected, thereby dropping the unsupported kernel parameters before they are applied. * Added a test that make sure the tuned profile is not degraded This could be great to have a generic test that make sure that after performance profile is applied the tuned profile is not degraded. This would prevent future issues like OCPBUGS-23167. * render-sync update Updated render-sync to include artifacts that are needed for the e2e tests. Committing the rendered items to this commit. Signed-off-by: rbaturov <[email protected]>
* Add performance real time tuned template Kernel parameters that are not supported on RT kernel systems are being applied, causing errors to be logged to the tuned daemon, resulting in a degraded profile state. Therefore, I added the openshift-node-performance-rt profile that would be included if an RT kernel is detected, thereby dropping the unsupported kernel parameters before they are applied. * Added a test that make sure the tuned profile is not degraded This could be great to have a generic test that make sure that after performance profile is applied the tuned profile is not degraded. This would prevent future issues like OCPBUGS-23167. * render-sync update Updated render-sync to include artifacts that are needed for the e2e tests. Committing the rendered items to this commit. Signed-off-by: rbaturov <[email protected]>
* Add performance real time tuned template Kernel parameters that are not supported on RT kernel systems are being applied, causing errors to be logged to the tuned daemon, resulting in a degraded profile state. Therefore, I added the openshift-node-performance-rt profile that would be included if an RT kernel is detected, thereby dropping the unsupported kernel parameters before they are applied. * Added a test that make sure the tuned profile is not degraded This could be great to have a generic test that make sure that after performance profile is applied the tuned profile is not degraded. This would prevent future issues like OCPBUGS-23167. * render-sync update Updated render-sync to include artifacts that are needed for the e2e tests. Committing the rendered items to this commit.
…d template (#984) * Backport: Add performance real time tuned template (#954) * Add performance real time tuned template Kernel parameters that are not supported on RT kernel systems are being applied, causing errors to be logged to the tuned daemon, resulting in a degraded profile state. Therefore, I added the openshift-node-performance-rt profile that would be included if an RT kernel is detected, thereby dropping the unsupported kernel parameters before they are applied. * Added a test that make sure the tuned profile is not degraded This could be great to have a generic test that make sure that after performance profile is applied the tuned profile is not degraded. This would prevent future issues like OCPBUGS-23167. * render-sync update Updated render-sync to include artifacts that are needed for the e2e tests. Committing the rendered items to this commit. Signed-off-by: rbaturov <[email protected]> * NO-JIRA: Update tuned profile degraded test (#1005) * Added isVM func to util Signed-off-by: Ronny Baturov <[email protected]> * Update tuned profile degraded test Updated the tuned degradation test to fail only when the tuned profile was found degraded on BM host. In case of tuned profile found degraded on a VM, only a warning would be reported in the logs. The reason for that is that the fact CI is using VMs - and some configurations like certain kernel args can't be applied and therby lead to the tuned profile being degraded. Signed-off-by: Ronny Baturov <[email protected]> --------- Signed-off-by: Ronny Baturov <[email protected]> --------- Signed-off-by: rbaturov <[email protected]> Signed-off-by: Ronny Baturov <[email protected]>
…d template (openshift#984) * Backport: Add performance real time tuned template (openshift#954) * Add performance real time tuned template Kernel parameters that are not supported on RT kernel systems are being applied, causing errors to be logged to the tuned daemon, resulting in a degraded profile state. Therefore, I added the openshift-node-performance-rt profile that would be included if an RT kernel is detected, thereby dropping the unsupported kernel parameters before they are applied. * Added a test that make sure the tuned profile is not degraded This could be great to have a generic test that make sure that after performance profile is applied the tuned profile is not degraded. This would prevent future issues like OCPBUGS-23167. * render-sync update Updated render-sync to include artifacts that are needed for the e2e tests. Committing the rendered items to this commit. Signed-off-by: rbaturov <[email protected]> * NO-JIRA: Update tuned profile degraded test (openshift#1005) * Added isVM func to util Signed-off-by: Ronny Baturov <[email protected]> * Update tuned profile degraded test Updated the tuned degradation test to fail only when the tuned profile was found degraded on BM host. In case of tuned profile found degraded on a VM, only a warning would be reported in the logs. The reason for that is that the fact CI is using VMs - and some configurations like certain kernel args can't be applied and therby lead to the tuned profile being degraded. Signed-off-by: Ronny Baturov <[email protected]> --------- Signed-off-by: Ronny Baturov <[email protected]> --------- Signed-off-by: rbaturov <[email protected]> Signed-off-by: Ronny Baturov <[email protected]>
…d template (#984) (#1025) * Backport: Add performance real time tuned template (#954) * Add performance real time tuned template Kernel parameters that are not supported on RT kernel systems are being applied, causing errors to be logged to the tuned daemon, resulting in a degraded profile state. Therefore, I added the openshift-node-performance-rt profile that would be included if an RT kernel is detected, thereby dropping the unsupported kernel parameters before they are applied. * Added a test that make sure the tuned profile is not degraded This could be great to have a generic test that make sure that after performance profile is applied the tuned profile is not degraded. This would prevent future issues like OCPBUGS-23167. * render-sync update Updated render-sync to include artifacts that are needed for the e2e tests. Committing the rendered items to this commit. * NO-JIRA: Update tuned profile degraded test (#1005) * Added isVM func to util * Update tuned profile degraded test Updated the tuned degradation test to fail only when the tuned profile was found degraded on BM host. In case of tuned profile found degraded on a VM, only a warning would be reported in the logs. The reason for that is that the fact CI is using VMs - and some configurations like certain kernel args can't be applied and therby lead to the tuned profile being degraded. --------- --------- Signed-off-by: rbaturov <[email protected]> Signed-off-by: Ronny Baturov <[email protected]>
Kernel parameters that are not supported on RT kernel systems are being applied, causing errors to be logged to the tuned daemon, resulting in a degraded profile state.
Therefore, I added the openshift-node-performance-rt profile that would be included if an RT kernel is detected, thereby dropping the unsupported kernel parameters before they are applied.