-
Notifications
You must be signed in to change notification settings - Fork 435
Enable NodePool controller to apply generated MachineConfigs #1729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
4b089ad
f98df3d
3943a4e
bff593a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,7 @@ | ||
| # Manage node-level tuning with the Node Tuning Operator | ||
|
|
||
| If you would like to set some node-level tuning on the nodes in your hosted cluster, you can use the [Node Tuning Operator](https://docs.openshift.com/container-platform/4.11/scalability_and_performance/using-node-tuning-operator.html). In HyperShift, node tuning can be configured by creating ConfigMaps which contain Tuned objects, and referencing these ConfigMaps in your NodePools. Currently Node Tuning is limited to tunables which the TuneD daemon can apply directly like setting `sysctl` values. Tuning that requires setting kernel boot parameters is not yet supported in HyperShift. | ||
| ## Creating a simple TuneD profile for setting sysctl settings | ||
| If you would like to set some node-level tuning on the nodes in your hosted cluster, you can use the [Node Tuning Operator](https://docs.openshift.com/container-platform/latest/scalability_and_performance/using-node-tuning-operator.html). In HyperShift, node tuning can be configured by creating ConfigMaps which contain Tuned objects, and referencing these ConfigMaps in your NodePools. | ||
|
|
||
| 1. Create a ConfigMap which contains a valid Tuned manifest and reference it in a NodePool. The example Tuned manifest below defines a profile which sets `vm.dirty_ratio` to 55, on Nodes which contain the Node label `tuned-1-node-label` with any value. | ||
|
|
||
|
|
@@ -84,7 +85,7 @@ If you would like to set some node-level tuning on the nodes in your hosted clus | |
| nodepool-1-worker-2 tuned-1-profile True False 7m14s | ||
| ``` | ||
|
|
||
| As we can see, both worker nodes in the nodepool have the tuned-1-profile applied. Note that if no custom profiles are created, the `openshift-node` profile will be applied by default. | ||
| As we can see, both worker nodes in the NodePool have the tuned-1-profile applied. Note that if no custom profiles are created, the `openshift-node` profile will be applied by default. | ||
|
|
||
|
|
||
| 3. To confirm the tuning was applied correctly, we can start a debug shell on a Node and check the sysctl values: | ||
|
|
@@ -95,4 +96,126 @@ If you would like to set some node-level tuning on the nodes in your hosted clus | |
| Example output: | ||
| ``` | ||
| vm.dirty_ratio = 55 | ||
| ``` | ||
|
|
||
| ## Applying tuning which requires kernel boot parameters | ||
| You can also use the Node Tuning Operator for more complex tuning which requires setting kernel boot parameters. | ||
| As an example, the following steps can be followed to create a NodePool with huge pages reserved. | ||
|
|
||
| 1. Create the following ConfigMap which contains a Tuned object manifest for creating 10 hugepages of size 2M. | ||
|
|
||
| Save this ConfigMap manifest in a file called `tuned-hugepages.yaml`: | ||
| ``` | ||
| apiVersion: v1 | ||
| kind: ConfigMap | ||
| metadata: | ||
| name: tuned-hugepages | ||
| namespace: clusters | ||
| data: | ||
| tuned: | | ||
| apiVersion: tuned.openshift.io/v1 | ||
| kind: Tuned | ||
| metadata: | ||
| name: hugepages | ||
| namespace: openshift-cluster-node-tuning-operator | ||
| spec: | ||
| profile: | ||
| - data: | | ||
| [main] | ||
| summary=Boot time configuration for hugepages | ||
| include=openshift-node | ||
| [bootloader] | ||
| cmdline_openshift_node_hugepages=hugepagesz=2M hugepages=50 | ||
| name: openshift-node-hugepages | ||
| recommend: | ||
| - priority: 20 | ||
| profile: openshift-node-hugepages | ||
| ``` | ||
| > **_NOTE:_** The `.spec.recommend.match` field is intentionally left blank. In this case this Tuned will be applied to all Nodes in the NodePool where this ConfigMap is referenced. It is advised to group Nodes with the same hardware configuration into the same NodePool. Not following this practice might result in TuneD operands calculating conflicting kernel parameters for two or more nodes sharing the same NodePool. | ||
|
|
||
| Create the ConfigMap in the management cluster: | ||
| ``` | ||
| oc --kubeconfig="$MGMT_KUBECONFIG" create -f tuned-hugepages.yaml | ||
| ``` | ||
|
|
||
| 2. Create a new NodePool manifest YAML file, customize the NodePools upgrade type, and reference the previously created ConfigMap in the `spec.tunedConfig` section before creating it in the management cluster. | ||
|
|
||
| Create the NodePool manifest and save it in a file called `hugepages-nodepool.yaml`: | ||
| ``` | ||
| NODEPOOL_NAME=hugepages-example | ||
| INSTANCE_TYPE=m5.2xlarge | ||
| NODEPOOL_REPLICAS=2 | ||
|
|
||
| hypershift create nodepool aws \ | ||
| --cluster-name $CLUSTER_NAME \ | ||
| --name $NODEPOOL_NAME \ | ||
| --node-count $NODEPOOL_REPLICAS \ | ||
| --instance-type $INSTANCE_TYPE \ | ||
| --render > hugepages-nodepool.yaml | ||
| ``` | ||
|
|
||
| Edit `hugepages-nodepool.yaml`. Set `.spec.management.upgradeType` to `InPlace`, and set `.spec.tunedConfig` to reference the `tuned-hugepages` ConfigMap you created. | ||
| ``` | ||
| apiVersion: hypershift.openshift.io/v1alpha1 | ||
| kind: NodePool | ||
| metadata: | ||
| name: hugepages-nodepool | ||
| namespace: clusters | ||
| ... | ||
| spec: | ||
| management: | ||
| ... | ||
| upgradeType: InPlace | ||
| ... | ||
| tunedConfig: | ||
| - name: tuned-hugepages | ||
| ``` | ||
| > **_NOTE:_** Setting `.spec.management.upgradeType` to `InPlace` is recommended to avoid unnecessary Node recreations when applying the new MachineConfigs. With the `Replace` upgrade type, Nodes will be fully deleted and new nodes will replace them when applying the new kernel boot parameters that are calculated by the TuneD operand. | ||
|
|
||
| Create the NodePool in the management cluster: | ||
| ``` | ||
| oc --kubeconfig="$MGMT_KUBECONFIG" create -f hugepages-nodepool.yaml | ||
| ``` | ||
|
|
||
|
|
||
| 3. After the Nodes become available, the containerized TuneD daemon will calculate the required kernel boot parameters based on the applied TuneD profile. After the Nodes become `Ready` and reboot once to apply the generated MachineConfig, you can verify that the Tuned profile is applied and that the kernel boot parameters have been set. | ||
|
|
||
| List the Tuned objects in the hosted cluster: | ||
| ``` | ||
| oc --kubeconfig="$HC_KUBECONFIG" get Tuneds -n openshift-cluster-node-tuning-operator | ||
| ``` | ||
|
|
||
| Example output: | ||
| ``` | ||
| NAME AGE | ||
| default 123m | ||
| hugepages-8dfb1fed 1m23s | ||
| rendered 123m | ||
| ``` | ||
|
|
||
| List the Profiles in the hosted cluster: | ||
| ``` | ||
| oc --kubeconfig="$HC_KUBECONFIG" get Profiles -n openshift-cluster-node-tuning-operator | ||
| ``` | ||
|
|
||
| Example output: | ||
| ``` | ||
| NAME TUNED APPLIED DEGRADED AGE | ||
| nodepool-1-worker-1 openshift-node True False 132m | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what happens if someone modified this profiles?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @enxebre I'm not sure I know what you are asking. Are you wondering what would happen if someone modified the Profile objects from the hosted cluster side?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@dagrayvid Yes, is it possible to change something guest cluster side which the NTO watches and reconciles against management side config and so triggering an upgrade?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In theory, yes. This was discussed in some of the earlier design discussions about enabling NTO on HyperShift. Like in standalone OCP, the NTO Operand (containerized TuneD daemon) writes the needed kernel boot parameters calculated by TuneD based on the applied profile to the Profile objects status.bootcmdline field. This field is read by the Operator before creating / updating the NTO-generated MachineConfig. If the Profile object were edited by someone with admin privileges on the guest cluster, the Operator and Operand would simultaneously reconcile. The Operand would reconcile the Profile, overwriting any change in the status. The Operator would also reconcile the Profile, potentially updating the NTO-generated MachineConfig based on the Profile status.bootcmdline change, in a race with the Operand. If the operator "loses" the race, after the operand does overwrite any admin user changes to the Profile, the operator will reconcile the Profile again, syncing the MachineConfig. When we discussed this earlier on, the answer was that this should be "okay" as admin users of the hosted cluster already have root access to the nodes (i.e. oc debug). |
||
| nodepool-1-worker-2 openshift-node True False 131m | ||
| hugepages-nodepool-worker-1 openshift-node-hugepages True False 4m8s | ||
| hugepages-nodepool-worker-2 openshift-node-hugepages True False 3m57s | ||
| ``` | ||
|
|
||
| Both worker nodes in the new NodePool have the `openshift-node-hugepages` profile applied. | ||
|
|
||
|
|
||
| 4. To confirm the tuning was applied correctly, we can start a debug shell on a Node and check `/proc/cmdline` | ||
| ``` | ||
| oc --kubeconfig="$HC_KUBECONFIG" debug node/nodepool-1-worker-1 -- chroot /host cat /proc/cmdline | ||
| ``` | ||
|
|
||
| Example output: | ||
| ``` | ||
| BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-... hugepagesz=2M hugepages=50 | ||
| ``` | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is IMPORTANT, but let's leave it just a NOTE. This bogged me down as the first "user" of the new code. An aside, yesterday I was testing with the
.spec.recommend.matchfield targetting a single node in the node pool. It is probably the reason I was getting this and something that still needs to be addressed on the NTO side.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be no possible choice for the user to target particular nodes within a NodePool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is behaviour is based off of how NTO works in standalone OCP. There are many cases where a TuneD profile is making in-place changes to node tunables and no MachineConfig is needed. For example, if a user wants to set some sysctl values on one node with particular labels and assign some Pods only to that Node by label.
If we do decide to remove this feature in HyperShift, we can do that, but it would be a change to the NTO code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that the original issue Jiri hit here was fixed. If a user does use Node label based matching, no MachineConfig will be generated based on that Profile