monitoring of a new resource - node.config.openshift.io #2959

sairameshv · 2022-02-22T08:28:53Z

- What I did
Implemented changes to the Operator, RenderConfig and other data structures to accommodate the monitoring of the nodes.config.openshift.io custom resource

- How to verify it
Create an openshift cluster with this change, create a new Node custom resource according to the CRD and the controller's sync function would get triggered.

- Description for the changelog
Reference EP: https://github.com/openshift/enhancements/blob/master/enhancements/worker-latency-profile/worker-latency-profile.md

Signed-off-by: Harshal Patil <[email protected]>

…onfig CRD

sairameshv · 2022-02-22T08:29:03Z

/hold

openshift-ci · 2022-02-22T08:29:25Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sairameshv
To complete the pull request process, please assign sinnykumari after the PR has been reviewed.
You can assign the PR to them by writing /assign @sinnykumari in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kikisdeliveryservice · 2022-02-22T17:59:15Z

As per comments on 2 other prs: #2943 and #2950

Is there an enhancement for the new controller you are adding via this PR? I was told that the approach would be reworked and it would be removed. cc: @rphillips

sairameshv · 2022-02-23T04:29:41Z

As per comments on 2 other prs: #2943 and #2950

Is there an enhancement for the new controller you are adding via this PR? I was told that the approach would be reworked and it would be removed. cc: @rphillips

Hello @kikisdeliveryservice , Thanks for the review.
The new controller will not be added and the relevant code would be removed. This PR is for an experimental purpose as of now and hence a draft.(Deploying the cluster using these changes leveraging the cluster-bot)
Once the complete code is ready, I would re-request for the review.

…r into OCPNODE-852

sairameshv · 2022-03-01T10:45:28Z

Hey Kirsten (@kikisdeliveryservice),

I'm not successful in deploying the cluster to test the above changes using the cluster-bot.
I'm not totally sure if the issue could be with these changes.
If so, could you help me in identifying the issue ? I suspect that I may have missed something here.

Thanks in advance.

kikisdeliveryservice · 2022-03-02T19:01:09Z

/test verify

kikisdeliveryservice · 2022-03-02T19:01:30Z

I'll take a look.

sairameshv · 2022-03-03T06:56:52Z

I'll take a look.

Hey @kikisdeliveryservice ,
I created a custom MCO image, created a release and tried bringing up the cluster manually to test my changes.
As expected the cluster bringup is failing and I found some info from the bootstrap log-bundle where it points to the same location that I doubted.
The bootstrap process is getting failed with the following logs.

Mar 03 06:20:55 ip-10-0-1-242 bootkube.sh[5244]: Rendering MCO manifests...
Mar 03 06:21:03 ip-10-0-1-242 bootkube.sh[5244]: I0303 06:21:03.463073       1 bootstrap.go:86] Version: machine-config-daemon-4.6.0-202006240615.p0-1277-g88371abc (88371abc7f48164f0ab8d31805cac2eee8bfcba4)
Mar 03 06:21:03 ip-10-0-1-242 bootkube.sh[5244]: F0303 06:21:03.463338       1 bootstrap.go:121] error rendering bootstrap manifests: open /assets/manifests/cluster-node-02-config.yml: no such file or directory
Mar 03 06:21:06 ip-10-0-1-242 systemd[1]: bootkube.service: Main process exited, code=exited, status=255/n/a
Mar 03 06:21:06 ip-10-0-1-242 systemd[1]: bootkube.service: Failed with result 'exit-code'.
Mar 03 06:21:11 ip-10-0-1-242 systemd[1]: bootkube.service: Service RestartSec=5s expired, scheduling restart.
Mar 03 06:21:11 ip-10-0-1-242 systemd[1]: bootkube.service: Scheduled restart job, restart counter is at 2.
Mar 03 06:21:11 ip-10-0-1-242 systemd[1]: Stopped Bootstrap a Kubernetes cluster.

Can I know where exactly this file "/assets/manifests/cluster-infrastructure-02-config.yml" is placed (in CVO or MCO) so that I would place the "assets/manifests/cluster-node-02-config.yml" accordingly?

Thanks,
Ramesh

yuqi-zhang · 2022-03-03T15:20:08Z

cmd/machine-config-operator/bootstrap.go

I'm not sure if you are trying to make this mandatory or not. If you are, you would need to add this in the installer generated manifests first. I'd cross-check with the installer team to see if they are ok with this path.

Since this is for testing purposes, you can just openshift-install create manifests -> add that file -> openshift-install create cluster to have it render your manually generated one.

Yes, I'm trying to figure the places where I need to make the changes.
openshift/installer#5676 is a related draft PR that I created for the testing purpose.
If everything works fine, I would make the proper changes along with the required test cases.
Could you help here in listing out what all other changes needed in introducing this file?

yuqi-zhang · 2022-03-03T15:22:24Z

I guess there's also the larger picture of this being another resource the MCO manages and syncs. It says in https://github.com/openshift/enhancements/blob/master/enhancements/worker-latency-profile/worker-latency-profile.md

[Machine Config Operator (MCO)](https://github.com/openshift/machine-config-operator) sets the appropriate value of the Kubelet flag --node-status-update-frequency

Would we be able to just make this a flag in the kubeletconfig or something instead of adding a whole new controller path and manifest rendering?

sairameshv · 2022-03-03T16:03:51Z

I guess there's also the larger picture of this being another resource the MCO manages and syncs. It says in https://github.com/openshift/enhancements/blob/master/enhancements/worker-latency-profile/worker-latency-profile.md

[Machine Config Operator (MCO)](https://github.com/openshift/machine-config-operator) sets the appropriate value of the Kubelet flag --node-status-update-frequency

Would we be able to just make this a flag in the kubeletconfig or something instead of adding a whole new controller path and manifest rendering?

Yes @yuqi-zhang ,
The idea is to manage the "Node" object which would be embedded in the "ControllerConfig" object similar to that of "Infrastructure" object and accordingly make the changes to the KubeletConfig whenever the Node object changes

kikisdeliveryservice · 2022-03-03T18:40:08Z

Add some of node team to this PR as they have some experience in adding things to installer/mco.

cc: @harche @rphillips

added code to modify kubelet config based on the worker latency profile

harche · 2022-03-07T11:30:49Z

pkg/controller/kubelet-config/kubelet_config_controller.go

+	if cc.Spec.Node != nil {
+		switch cc.Spec.Node.Spec.WorkerLatencyProfile {
+		case configv1.MediumUpdateAverageReaction:
+			originalKubeConfig.NodeStatusUpdateFrequency = metav1.Duration{Duration: 20 * time.Second}


Instead of hard coding it here, create a const variable.

These consts should probably be in the API as consts.

openshift/api#1136 PR has been raised for the same.

rphillips · 2022-03-07T14:51:01Z

pkg/operator/bootstrap.go

 	}

-	spec, err := createDiscoveredControllerConfigSpec(infra, network, proxy, dns)
+	spec, err := createDiscoveredControllerConfigSpec(infra, node, network, proxy, dns)


would be nice to wrap all these variables into a structure to make it easier to extend later

rphillips · 2022-03-07T14:51:31Z

pkg/operator/sync.go

+		return nil, nil, nil, nil, nil, err
 	}
-	return infra, network, proxy, dns, nil
+	return infra, node, network, proxy, dns, nil


ditto here... perhaps wrap all this into a structure

openshift-ci · 2022-03-24T17:12:39Z

@sairameshv: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-metal-ipi	`92ebda3`	link	false	`/test e2e-metal-ipi`
ci/prow/okd-e2e-aws	`92ebda3`	link	false	`/test okd-e2e-aws`
ci/prow/e2e-aws-workers-rhel7	`92ebda3`	link	false	`/test e2e-aws-workers-rhel7`
ci/prow/e2e-gcp-op-single-node	`92ebda3`	link	false	`/test e2e-gcp-op-single-node`
ci/prow/e2e-vsphere-upgrade	`92ebda3`	link	false	`/test e2e-vsphere-upgrade`
ci/prow/e2e-ovn-step-registry	`92ebda3`	link	false	`/test e2e-ovn-step-registry`
ci/prow/e2e-aws-workers-rhel8	`92ebda3`	link	false	`/test e2e-aws-workers-rhel8`
ci/prow/e2e-aws-upgrade-single-node	`92ebda3`	link	false	`/test e2e-aws-upgrade-single-node`
ci/prow/e2e-aws-disruptive	`92ebda3`	link	false	`/test e2e-aws-disruptive`
ci/prow/e2e-aws-serial	`92ebda3`	link	false	`/test e2e-aws-serial`
ci/prow/e2e-aws-single-node	`92ebda3`	link	false	`/test e2e-aws-single-node`
ci/prow/4.12-upgrade-from-stable-4.11-images	`62c41de`	link	true	`/test 4.12-upgrade-from-stable-4.11-images`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

sairameshv · 2022-03-31T13:50:26Z

/close

Closing this PR as the #3015 serves the same purpose.

openshift-ci · 2022-03-31T14:00:47Z

@sairameshv: Closed this PR.

Details

In response to this:

/close

Closing this PR as the #3015 serves the same purpose.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

harche and others added 4 commits February 7, 2022 12:03

Import API with Node CRD

b4941b1

Signed-off-by: Harshal Patil <[email protected]>

Add generated client for node API

9047987

Signed-off-by: Harshal Patil <[email protected]>

Controller for Node API

0318515

Signed-off-by: Harshal Patil <[email protected]>

Implemented changes required to embed the Node CRD in the ControllerC…

92ebda3

…onfig CRD

openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 22, 2022

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 22, 2022

sairameshv marked this pull request as draft February 22, 2022 08:29

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 22, 2022

openshift-ci bot requested review from cgwalters and mkenigs February 22, 2022 08:29

rebased and resolved merge conflicts

46d5f7f

openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 22, 2022

kikisdeliveryservice removed request for cgwalters and mkenigs February 22, 2022 17:52

sairameshv added 2 commits February 23, 2022 19:28

removed config-node controller code

2cb4fd3

Merge branch 'master' of github.com:sairameshv/machine-config-operato…

f23d707

…r into OCPNODE-852

yuqi-zhang reviewed Mar 3, 2022

View reviewed changes

kikisdeliveryservice requested review from harche and rphillips March 3, 2022 18:40

updated the controllerconfig crd

7505dbc

added code to modify kubelet config based on the worker latency profile

sairameshv force-pushed the OCPNODE-852 branch from 4465a60 to 7505dbc Compare March 7, 2022 10:49

bumped API

87066a1

harche reviewed Mar 7, 2022

View reviewed changes

rphillips reviewed Mar 7, 2022

View reviewed changes

sairameshv force-pushed the OCPNODE-852 branch from dab5e47 to 59d74f1 Compare March 10, 2022 15:24

aaded the code

490c033

sairameshv force-pushed the OCPNODE-852 branch from 069b1e5 to 490c033 Compare March 11, 2022 10:56

openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 12, 2022

rebased, resolved merge conflicts

62c41de

openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 14, 2022

sairameshv mentioned this pull request Mar 14, 2022

[OCPNODE-852] workerlatency profiles - updating the kubelet configuration based on the nodes.config.openshift.io resource #3015

Merged

sairameshv marked this pull request as ready for review March 31, 2022 13:49

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 31, 2022

sairameshv closed this Mar 31, 2022

openshift-ci bot requested review from cheesesashimi and kikisdeliveryservice March 31, 2022 13:56

monitoring of a new resource - node.config.openshift.io #2959

monitoring of a new resource - node.config.openshift.io #2959

Uh oh!

Conversation

sairameshv commented Feb 22, 2022

Uh oh!

sairameshv commented Feb 22, 2022

Uh oh!

openshift-ci bot commented Feb 22, 2022

Uh oh!

kikisdeliveryservice commented Feb 22, 2022

Uh oh!

sairameshv commented Feb 23, 2022

Uh oh!

sairameshv commented Mar 1, 2022

Uh oh!

kikisdeliveryservice commented Mar 2, 2022

Uh oh!

kikisdeliveryservice commented Mar 2, 2022

Uh oh!

sairameshv commented Mar 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuqi-zhang Mar 3, 2022

Choose a reason for hiding this comment

Uh oh!

sairameshv Mar 3, 2022

Choose a reason for hiding this comment

Uh oh!

yuqi-zhang commented Mar 3, 2022

Uh oh!

sairameshv commented Mar 3, 2022

Uh oh!

kikisdeliveryservice commented Mar 3, 2022

Uh oh!

harche Mar 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rphillips Mar 7, 2022

Choose a reason for hiding this comment

Uh oh!

sairameshv Mar 8, 2022

Choose a reason for hiding this comment

Uh oh!

rphillips Mar 7, 2022

Choose a reason for hiding this comment

Uh oh!

rphillips Mar 7, 2022

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Mar 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sairameshv commented Mar 31, 2022

Uh oh!

openshift-ci bot commented Mar 31, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sairameshv commented Mar 3, 2022 •

edited

Loading

harche Mar 7, 2022 •

edited

Loading

openshift-ci bot commented Mar 24, 2022 •

edited

Loading