Skip to content

Conversation

@dagrayvid
Copy link
Contributor

This PR enables NTO to generate MachineConfigs for setting kernel boot parameters in HyperShift environments. NTO will embed the MachineConfig manifests in ConfigMaps with labels specifying that it is NTO-generated and what NodePool it should be applied to. The ConfigMaps will be created in the hosted control plane namespace, and will need to be picked up by the NodePool controller in order to be applied (will add link to HyperShift PR once it is open).

This is "phase-2" of the design outlined in the following enhancement: openshift/enhancements#1229

/cc @jmencak

@openshift-ci openshift-ci bot requested a review from jmencak September 2, 2022 15:43
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 2, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dagrayvid

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 2, 2022
@dagrayvid
Copy link
Contributor Author

I still have more manual testing and cleanup to do. Opening the PR now for some early review.

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 2, 2022
Copy link
Contributor

@jmencak jmencak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look at this PR and also tested. Unfortunatly, cannot make this work today:

E0906 15:34:09.666708       1 controller.go:200] unable to sync(profile/openshift-cluster-node-tuning-operator/ip-10-0-135-78.eu-central-1.compute.internal) reached max retries (15): failed to sync Profile ip-10-0-135-78.eu-central-1.compute.internal: failed to update Profile ip-10-0-135-78.eu-central-1.compute.internal: failed to create ConfigMap nto-mc- for MachineConfig 50-nto-: ConfigMap "nto-mc-" is invalid: metadata.name: Invalid value: "nto-mc-": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')

will continue with the review later on.


operatorGeneratedMachineConfig = "hypershift.openshift.io/operator-machine-config"
mcConfigMapDataKey = "config"
mcConfigMapLabelKey = "tuned.openshift.io/machineConfig"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should use something like "tuned.openshift.io/machine-config" vs "tuned.openshift.io/machineConfig". While camelCase can look cool to some and saves some characters, I'm not sure how much it is used elsewhere as label keys. It would be nice to have consistency, but we already don't have it with labels like nodePool and nodePoolName. :/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jiri, at least for the tuned.openshfit.io label keys, we can try to be consistent with machine-config.

I may not need this specific label key at all, depending on whether the HyperShift team wants these ConfigMaps labeled with an NTO-specific label, or if they expect this functionality to be used by other operators, in which case we will probably only use hypershift.openshift.io/operator-machine-config.

@dagrayvid dagrayvid force-pushed the hypershift-machineconfigs branch 5 times, most recently from 2a09ebe to d474c14 Compare September 8, 2022 14:13
Copy link
Contributor

@jmencak jmencak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitting the review again. There are a few nits to address and unnecessary syncing. What worries me most at the moment is the issue pointed out here . I believe it was because I used .spec.recommend.match for 1 node in two-node hosted cluster. While this is documented, I think this needs to be catched and logged earlier.

@dagrayvid dagrayvid force-pushed the hypershift-machineconfigs branch from d474c14 to 8b30ee4 Compare September 8, 2022 17:13
@dagrayvid
Copy link
Contributor Author

E0906 15:34:09.666708       1 controller.go:200] unable to sync(profile/openshift-cluster-node-tuning-operator/ip-10-0-135-78.eu-central-1.compute.internal) reached max retries (15): failed to sync Profile ip-10-0-135-78.eu-central-1.compute.internal: failed to update Profile ip-10-0-135-78.eu-central-1.compute.internal: failed to create ConfigMap nto-mc- for MachineConfig 50-nto-: ConfigMap "nto-mc-" is invalid: metadata.name: Invalid value: "nto-mc-": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')

I was able to reproduce this issue, and found a fix. We shouldn't be syncing MachineConfigs if node label based matching is used, since the nodePoolName will be "".

Copy link
Contributor

@jmencak jmencak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changes, David! Only few last nits now.

// Start with node/pod label based matching
if recommend.Match != nil && pc.profileMatches(recommend.Match, nodeName) {
klog.V(2).Infof("calculateProfileHyperShift: node / pod label matching used. node: %s, tunedProfileName: %s, nodePoolName: %s, operand: %v", nodeName, *recommend.Profile, "", recommend.Operand)
klog.V(3).Infof("calculateProfileHyperShift: node / pod label matching used for node: %s, tunedProfileName: %s, nodePoolName: %s, operand: %v", nodeName, *recommend.Profile, "", recommend.Operand)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate the logging here and below, it is useful and the verbosity is just right. Thank you. I should add something like that to calculateProfile().

@dagrayvid dagrayvid force-pushed the hypershift-machineconfigs branch from 8b30ee4 to c374808 Compare September 9, 2022 14:09
@jmencak
Copy link
Contributor

jmencak commented Sep 9, 2022

Thank you for your changes, David. I've also tested this code.
/lgtm
Please unhold once you're happy with your testing.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 9, 2022
- Fix typo from hypershift changes for mcpInformer
- Add informer for NTO-generated MachineConfig ConfigMaps
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Sep 9, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 10, 2022

@dagrayvid: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@jmencak
Copy link
Contributor

jmencak commented Sep 12, 2022

Thank you David, the fixes make sense to me. Still
/lgtm
but leaving the hold for more testing.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 12, 2022
@dagrayvid
Copy link
Contributor Author

/unhold

I have completed my planned manual testing.

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 12, 2022
@openshift-merge-robot openshift-merge-robot merged commit ad3968e into openshift:master Sep 12, 2022
IlyaTyomkin pushed a commit to IlyaTyomkin/cluster-node-tuning-operator that referenced this pull request May 23, 2023
…hift#456)

* Fix logs and verbosity during HyperShift profile calculation

* HyperShift: Embed NTO-generated MachineConfigs in ConfigMaps

* Fix bugs in MachineConfig related informers

- Fix typo from hypershift changes for mcpInformer
- Add informer for NTO-generated MachineConfig ConfigMaps
IlyaTyomkin pushed a commit to IlyaTyomkin/cluster-node-tuning-operator that referenced this pull request Jun 13, 2023
…hift#456)

* Fix logs and verbosity during HyperShift profile calculation

* HyperShift: Embed NTO-generated MachineConfigs in ConfigMaps

* Fix bugs in MachineConfig related informers

- Fix typo from hypershift changes for mcpInformer
- Add informer for NTO-generated MachineConfig ConfigMaps
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants