Skip to content

Conversation

@sairameshv
Copy link
Member

- What I did
Implemented changes to the Operator, RenderConfig and other data structures to accommodate the monitoring of the nodes.config.openshift.io custom resource

- How to verify it
Create an openshift cluster with this change, create a new Node custom resource according to the CRD and the controller's sync function would get triggered.

- Description for the changelog
Reference EP: https://github.com/openshift/enhancements/blob/master/enhancements/worker-latency-profile/worker-latency-profile.md

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 22, 2022
@sairameshv
Copy link
Member Author

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 22, 2022
@sairameshv sairameshv marked this pull request as draft February 22, 2022 08:29
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 22, 2022
@openshift-ci openshift-ci bot requested review from cgwalters and mkenigs February 22, 2022 08:29
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 22, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sairameshv
To complete the pull request process, please assign sinnykumari after the PR has been reviewed.
You can assign the PR to them by writing /assign @sinnykumari in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 22, 2022
@kikisdeliveryservice
Copy link
Contributor

As per comments on 2 other prs: #2943 and #2950

Is there an enhancement for the new controller you are adding via this PR? I was told that the approach would be reworked and it would be removed. cc: @rphillips

@sairameshv
Copy link
Member Author

As per comments on 2 other prs: #2943 and #2950

Is there an enhancement for the new controller you are adding via this PR? I was told that the approach would be reworked and it would be removed. cc: @rphillips

Hello @kikisdeliveryservice , Thanks for the review.
The new controller will not be added and the relevant code would be removed. This PR is for an experimental purpose as of now and hence a draft.(Deploying the cluster using these changes leveraging the cluster-bot)
Once the complete code is ready, I would re-request for the review.

@sairameshv
Copy link
Member Author

Hey Kirsten (@kikisdeliveryservice),

I'm not successful in deploying the cluster to test the above changes using the cluster-bot.
I'm not totally sure if the issue could be with these changes.
If so, could you help me in identifying the issue ? I suspect that I may have missed something here.

Thanks in advance.

@kikisdeliveryservice
Copy link
Contributor

/test verify

@kikisdeliveryservice
Copy link
Contributor

I'll take a look.

@sairameshv
Copy link
Member Author

sairameshv commented Mar 3, 2022

I'll take a look.

Hey @kikisdeliveryservice ,
I created a custom MCO image, created a release and tried bringing up the cluster manually to test my changes.
As expected the cluster bringup is failing and I found some info from the bootstrap log-bundle where it points to the same location that I doubted.
The bootstrap process is getting failed with the following logs.

Mar 03 06:20:55 ip-10-0-1-242 bootkube.sh[5244]: Rendering MCO manifests...
Mar 03 06:21:03 ip-10-0-1-242 bootkube.sh[5244]: I0303 06:21:03.463073       1 bootstrap.go:86] Version: machine-config-daemon-4.6.0-202006240615.p0-1277-g88371abc (88371abc7f48164f0ab8d31805cac2eee8bfcba4)
Mar 03 06:21:03 ip-10-0-1-242 bootkube.sh[5244]: F0303 06:21:03.463338       1 bootstrap.go:121] error rendering bootstrap manifests: open /assets/manifests/cluster-node-02-config.yml: no such file or directory
Mar 03 06:21:06 ip-10-0-1-242 systemd[1]: bootkube.service: Main process exited, code=exited, status=255/n/a
Mar 03 06:21:06 ip-10-0-1-242 systemd[1]: bootkube.service: Failed with result 'exit-code'.
Mar 03 06:21:11 ip-10-0-1-242 systemd[1]: bootkube.service: Service RestartSec=5s expired, scheduling restart.
Mar 03 06:21:11 ip-10-0-1-242 systemd[1]: bootkube.service: Scheduled restart job, restart counter is at 2.
Mar 03 06:21:11 ip-10-0-1-242 systemd[1]: Stopped Bootstrap a Kubernetes cluster.

Can I know where exactly this file "/assets/manifests/cluster-infrastructure-02-config.yml" is placed (in CVO or MCO) so that I would place the "assets/manifests/cluster-node-02-config.yml" accordingly?

Thanks,
Ramesh

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if you are trying to make this mandatory or not. If you are, you would need to add this in the installer generated manifests first. I'd cross-check with the installer team to see if they are ok with this path.

Since this is for testing purposes, you can just openshift-install create manifests -> add that file -> openshift-install create cluster to have it render your manually generated one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm trying to figure the places where I need to make the changes.
openshift/installer#5676 is a related draft PR that I created for the testing purpose.
If everything works fine, I would make the proper changes along with the required test cases.
Could you help here in listing out what all other changes needed in introducing this file?

@yuqi-zhang
Copy link
Contributor

I guess there's also the larger picture of this being another resource the MCO manages and syncs. It says in https://github.com/openshift/enhancements/blob/master/enhancements/worker-latency-profile/worker-latency-profile.md

[Machine Config Operator (MCO)](https://github.com/openshift/machine-config-operator) sets the appropriate value of the Kubelet flag --node-status-update-frequency

Would we be able to just make this a flag in the kubeletconfig or something instead of adding a whole new controller path and manifest rendering?

@sairameshv
Copy link
Member Author

I guess there's also the larger picture of this being another resource the MCO manages and syncs. It says in https://github.com/openshift/enhancements/blob/master/enhancements/worker-latency-profile/worker-latency-profile.md

[Machine Config Operator (MCO)](https://github.com/openshift/machine-config-operator) sets the appropriate value of the Kubelet flag --node-status-update-frequency

Would we be able to just make this a flag in the kubeletconfig or something instead of adding a whole new controller path and manifest rendering?

Yes @yuqi-zhang ,
The idea is to manage the "Node" object which would be embedded in the "ControllerConfig" object similar to that of "Infrastructure" object and accordingly make the changes to the KubeletConfig whenever the Node object changes

@kikisdeliveryservice
Copy link
Contributor

Add some of node team to this PR as they have some experience in adding things to installer/mco.

cc: @harche @rphillips

added code to modify kubelet config based on the worker latency profile
if cc.Spec.Node != nil {
switch cc.Spec.Node.Spec.WorkerLatencyProfile {
case configv1.MediumUpdateAverageReaction:
originalKubeConfig.NodeStatusUpdateFrequency = metav1.Duration{Duration: 20 * time.Second}
Copy link
Contributor

@harche harche Mar 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hard coding it here, create a const variable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These consts should probably be in the API as consts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openshift/api#1136 PR has been raised for the same.

}

spec, err := createDiscoveredControllerConfigSpec(infra, network, proxy, dns)
spec, err := createDiscoveredControllerConfigSpec(infra, node, network, proxy, dns)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to wrap all these variables into a structure to make it easier to extend later

return nil, nil, nil, nil, nil, err
}
return infra, network, proxy, dns, nil
return infra, node, network, proxy, dns, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto here... perhaps wrap all this into a structure

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 12, 2022
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 14, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 24, 2022

@sairameshv: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-metal-ipi 92ebda3 link false /test e2e-metal-ipi
ci/prow/okd-e2e-aws 92ebda3 link false /test okd-e2e-aws
ci/prow/e2e-aws-workers-rhel7 92ebda3 link false /test e2e-aws-workers-rhel7
ci/prow/e2e-gcp-op-single-node 92ebda3 link false /test e2e-gcp-op-single-node
ci/prow/e2e-vsphere-upgrade 92ebda3 link false /test e2e-vsphere-upgrade
ci/prow/e2e-ovn-step-registry 92ebda3 link false /test e2e-ovn-step-registry
ci/prow/e2e-aws-workers-rhel8 92ebda3 link false /test e2e-aws-workers-rhel8
ci/prow/e2e-aws-upgrade-single-node 92ebda3 link false /test e2e-aws-upgrade-single-node
ci/prow/e2e-aws-disruptive 92ebda3 link false /test e2e-aws-disruptive
ci/prow/e2e-aws-serial 92ebda3 link false /test e2e-aws-serial
ci/prow/e2e-aws-single-node 92ebda3 link false /test e2e-aws-single-node
ci/prow/4.12-upgrade-from-stable-4.11-images 62c41de link true /test 4.12-upgrade-from-stable-4.11-images

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@sairameshv sairameshv marked this pull request as ready for review March 31, 2022 13:49
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 31, 2022
@sairameshv
Copy link
Member Author

/close

Closing this PR as the #3015 serves the same purpose.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 31, 2022

@sairameshv: Closed this PR.

Details

In response to this:

/close

Closing this PR as the #3015 serves the same purpose.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants