Skip to content

Vendor and run performance-addon-operator as part of the controller-runtime NTO manager#314

Closed
cynepco3hahue wants to merge 3 commits intoopenshift:masterfrom
cynepco3hahue:pao_integration
Closed

Vendor and run performance-addon-operator as part of the controller-runtime NTO manager#314
cynepco3hahue wants to merge 3 commits intoopenshift:masterfrom
cynepco3hahue:pao_integration

Conversation

@cynepco3hahue
Copy link

No description provided.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 6, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 6, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cynepco3hahue
To complete the pull request process, please assign kpouget after the PR has been reviewed.
You can assign the PR to them by writing /assign @kpouget in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jmencak
Copy link
Contributor

jmencak commented Feb 8, 2022

/cc @jmencak

@jmencak
Copy link
Contributor

jmencak commented Feb 8, 2022

/uncc @ArangoGutierrez

@openshift-ci openshift-ci bot requested review from jmencak and removed request for ArangoGutierrez February 8, 2022 13:29
@cynepco3hahue cynepco3hahue force-pushed the pao_integration branch 2 times, most recently from dee875d to de53168 Compare February 13, 2022 15:22
@cynepco3hahue
Copy link
Author

/retest

1 similar comment
@cynepco3hahue
Copy link
Author

/retest

@yanirq
Copy link
Contributor

yanirq commented Feb 14, 2022

@dagrayvid we can start testing the performance of this combination of PAO and NTO with this PR (based on #316)
The manifests in this folder should be applied manually (including the new additions).

@@ -0,0 +1,55 @@
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably increase the prefix number for this and the following files by 10. Or perhaps use 45-webhook-configuration.yaml if changing the name of the other manifests could cause problems?

Copy link
Author

@cynepco3hahue cynepco3hahue Feb 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yee, I wanted to avoid the renaming, 45 will work for me

@dagrayvid
Copy link
Contributor

This PR performs very similarly to the current implementation of NTO. As we expected, the addition of the PAO control loop has no impact when there are no PerformanceProfiles to reconcile.

  1. Fully idle:
Old implementation:                 pr-314:
2022-04-02 17:39:25: 101 18         2022-14-02 11:27:42: 45 9
2022-04-02 17:40:25: 103 19         2022-14-02 11:28:42: 46 10
2022-04-02 17:41:25: 105 20         2022-14-02 11:29:42: 48 10
2022-04-02 17:42:25: 105 20         2022-14-02 11:30:42: 49 12
delta:               4   2                               4  3
  1. Creating pods with no label matching used.
Old implementation:                 pr-314:
2022-04-02 17:46:33: 112 22         2022-14-02 11:32:54: 52 13
2022-04-02 17:47:33: 113 22         2022-14-02 11:33:54: 53 13
2022-04-02 17:48:33: 115 23         2022-14-02 11:34:54: 54 14
2022-04-02 17:49:33: 117 23         2022-14-02 11:35:54: 56 14
delta:               5   1                               4 1
  1. Creating pods with a profile that matches to the pods (3 master 3 worker cluster)
Old implementation:                 pr-314:
2022-04-02 17:53:18: 169  33        2022-14-02 11:43:02: 130 28
2022-04-02 17:54:18: 539  56        2022-14-02 11:44:02: 501 51
2022-04-02 17:55:18: 969  72        2022-14-02 11:45:02: 981 71
2022-04-02 17:56:18: 1488 86        2022-14-02 11:46:02: 1547 86
delta:               1319 53                             1417 58

With this, I am comfortable merging #316. @jmencak, WDYT?

@jmencak
Copy link
Contributor

jmencak commented Feb 14, 2022

With this, I am comfortable merging #316. @jmencak, WDYT?

This looks good. Do we have any data wrt. memory usage?

@dagrayvid
Copy link
Contributor

dagrayvid commented Feb 14, 2022

This looks good. Do we have any data wrt. memory usage?

I haven't looked at memory usage yet. I can probably gather some data on this today if needed. Is memory usage as much a concern as CPU usage for core OCP operators?

@jmencak
Copy link
Contributor

jmencak commented Feb 14, 2022

This looks good. Do we have any data wrt. memory usage?

I haven't looked at memory usage yet. I can probably gather some data on this today if needed. Is memory usage as much a concern as CPU usage for core OCP operators?

Not as much as CPU usage, but it would be nice to know.

@dagrayvid
Copy link
Contributor

I ran the first and third test cases to get an idea of the memory usage before / after these changes.

It's probably easiest to just share it with a graph of memory usage over time. There is a line for fully idle, and then a line for memory over time while creating 1000 pods using pod-label matching, and then deleting them to see if the usage goes back down to where it was before creating any pods.

Screen Shot 2022-02-14 at 6 16 48 PM

One note: In case you wonder why the idle memory usage is lower than the memory usage at the start of the pod creation case. I created the custom profile before starting the second test case, without creating any matching pods. This jump was seen in both the new and old implementations (corrected from previous comment)

@cynepco3hahue
Copy link
Author

/retest

1 similar comment
@cynepco3hahue
Copy link
Author

/retest

Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
Copy link
Contributor

@jmencak jmencak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Artyom! Untested (yet), but the changes look good to me. Only nits right now.

Scheme: mgr.GetScheme(),
Recorder: mgr.GetEventRecorderFor("performance-profile-controller"),
}).SetupWithManager(mgr); err != nil {
klog.Exitf("unable to create PerformanceProfile controller : %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistency. Not sure about PAO, but in NTO, we don't have spaces after text in text: %v. Can we keep it that way? Also, I notice you don't have a space in a similar message above.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo, done :)

webHookServer.KeyName = webhookKeyName

if err = (&performancev2.PerformanceProfile{}).SetupWebhookWithManager(mgr); err != nil {
klog.Exitf("unable to create PerformanceProfile v2 webhook : %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, as noted above.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
verbs: ["create","get","update","patch"]
- apiGroups: ["node.k8s.io"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the past, I was asked by ProdSec to review RBAC. After the review, I added a few comments here. Can we also do this for PAO? Especially here for runtimeclasses. performance.openshift.io should be clear to all.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

controller, err := operator.NewController()
if err != nil {
klog.Fatal(err)
klog.Fatalf("failed to create new controller %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistency. s/controller %v/controller: %v/ ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

klog.Fatalf("failed to create new controller %v", err)
}

if err := mgr.Add(controller); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for doing this, we missed this in the earlier review. :/

@yanirq
Copy link
Contributor

yanirq commented Feb 24, 2022

@cynepco3hahue in the current vendoring PAO still uses openshift-performance-addon-operator namespace:

https://github.com/openshift-kni/performance-addon-operators/blob/c61353605e60f48ac75432a37489d278d32eb4ce/controllers/performanceprofile_controller.go#L227

If we will keep it that way we also need to add the namespace to the manifests.

Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
@cynepco3hahue
Copy link
Author

@cynepco3hahue in the current vendoring PAO still uses openshift-performance-addon-operator namespace:

https://github.com/openshift-kni/performance-addon-operators/blob/c61353605e60f48ac75432a37489d278d32eb4ce/controllers/performanceprofile_controller.go#L227

If we will keep it that way we also need to add the namespace to the manifests.

It should not affect us, it is used only to generate correct manifests automatically. We are adding manifests manually so all kube builder thing is not relevant for us.

@cynepco3hahue
Copy link
Author

/hold waiting for upgrade handling under the PAO.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 28, 2022
@cynepco3hahue cynepco3hahue changed the title WIP: vendor and run performance-addon-operator as part of the controller-runtime NTO manager Vendor and run performance-addon-operator as part of the controller-runtime NTO manager Feb 28, 2022
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 28, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 28, 2022

@cynepco3hahue: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-pao d4fd838 link true /test e2e-gcp-pao

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 28, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 28, 2022

@cynepco3hahue: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cynepco3hahue
Copy link
Author

not relevant anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments