[layering] update mcbs branch with master #2912

kikisdeliveryservice · 2022-01-14T18:16:28Z

Looking at @mkenigs PR I realized that the MCBS branch was working off of a very old version of master (from November!).

Updating branch with today's master as a merge commit into mcbs which will let us one day merge the two.

This makes it easier to reuse parts of the function.

With this we let the installer take care of installing the initial user-data secret and we then take over with our managed secret. if we are upgrading (and thus no installer at play), we just create the new (managed) secret. This is a cherry-pick of the original patch 1f52e48, which was later reverted by 1c65355. Signed-off-by: Antonio Murdaca <[email protected]> Co-authored-by: Zane Bitter <[email protected]>

This will allow us to manage a secret to scale up nodes with ignition v2 binary installed. Signed-off-by: Yu Qi Zhang <[email protected]> (cherry picked from commit 22b8b24)

(cherry picked from commit f6b6938)

openshift#2752 fixed the 1-1 mapping for kubeletconfig, this PR fix the machineconfig name and pool name 1-1 mapping for containerruntime config. Signed-off-by: Qi Wang <[email protected]>

This drop-in that exists for the baremetal and vsphere platforms is unnecessary. Cri-o already respects when $CONTAINER_STREAM_ADDRESS is set in the environment without having to edit the commandline. Removing this drop-in reduces code fragility without any change in functionality. Signed-off-by: Jim Ramsay <[email protected]>

GPG keys were added to the list of reboot exceptions, and inaccurate documentation was updated. Selected registries.conf changes were previously listed under the "None" action, but registries.conf changes always trigger the "Reload Crio" action. Some organizational and wording changes were made to make this more clear

Manage user data

Bug 2028731: fixes 1 to 1 containerruntime config mapping

All the upgrade candidate nodes in a mcp would be applied `UpdateInProgress: PreferNoSchedule` taint. The taint will be removed MCD once the upgrade is complete. Since kubernetes/kubernetes#104251 landed, the nodes not having PreferNoSchedule taint will have higher score. Before the upgrade starts, MCC will taint all the nodes in the cluster that are supposed to be upgraded. Once the upgrade is complete since MCD will remove the taint, none of the nodes will have `UpdateInProgress: PreferNoSchedule` taint. This ensures the score of the nodes will be equal again. Why is this needed? This reduces the pod churn when the cluster upgrade is in progress. When the non-upgraded nodes in the cluster have `UpdateInProgress: PreferNoSchedule` taint, they would get lesser score and the pods would prefer to land onto untainted(upgraded) nodes there by reducing the chances of landing onto an unupgraded node which can cause one more reschedule

To better support multiple release architectures, as well as multiple developer workstation OSes and architectures, outputting the current GOOS / GOARCH values helps the developer ensure they are building for the correct target architecture.

This reduces reliance upon the oc command by further leveraging the Kubernetes API provided by the framework.ClientSet object. In particular, this eliminates the need to shell out to oc to set / remove labels on nodes. In cases where we do have to shell out (e.g., ExecCmdOnNode), the following assurances will now be made: 1. Make sure that we have the oc command in our $PATH. 2. Ensure that if we set the path to our Kubeconfig file via the NewClientSet constructor (as opposed to setting $KUBECONFIG), that oc is aware of that path. There are cases where we cannot get the Kubeconfig file because we're either running in-cluster or with a code-defined Kubeconfig object. Running ExecCmdOnNode will still fail in those cases. However, the error message will be more explicit about the cause.

This introduces a way of proactively checking for the divergence or "drift" of the on-disk configuration state from what is specified within a MachineConfig. Using fsnotify to listen for filesystem events, the node's on-disk state is validated upon detection of a write event for any of the files specified by the currently applied MachineConfig. Files whose contents or mode have changed will cause the node to be marked Degraded until the cluster admin takes remedial action. This can be resetting the file back to its known contents / mode, or the creating the forcefile, which causes the current MachineConfig to be re-applied.

This PR is to resolve a panic when `PlatformStatus.VSphere` is nil.

…aint [MCC][MCD]: Introduce in progress taint

…cally-check-config-drift Proactively detect config drift

The event ordering in the controller is not guaranteed if there are same operations on the object, sometimes they can be combined, sometimes they can be ignored. This commit makes the TestMakeProgress test more robust by using subtest. Test output: go test ./pkg/controller/node -race -run ^\TestShouldMakeProgress\$ -count 100 ok github.com/openshift/machine-config-operator/pkg/controller/node 84.172s

These functions will be needed by both openshift#2802 and openshift#2851 so adding them here to avoid merge conflicts later Moved/renamed newFile -> NewIgnFile

…nits fix races while syncing node events

Signed-off-by: Jaime Caamaño Ruiz <[email protected]>

* On error, clean up any configuration to avoid interference. This configuration will be saved on a temporary directory for troubleshooting. * Consolidated logic to rollback any applied configuration, to activate a connection profile, reload NM, print network state information and exit handling. * Avoid using `nmcli device connect` as it will generate a persistent connection profile if there wasn't any, ipossibly changing the state the node was initially deployed with. Signed-off-by: Jaime Caamaño Ruiz <[email protected]>

Signed-off-by: Jaime Caamaño Ruiz <[email protected]>

Write MTU migration configuration

The Config Drift Monitor (openshift#2795) was previously unaware of compressed files. What would happen is the MCD would unzip a compressed file payload and write that to disk. However, the Config Drift Monitor was unaware that the file was compressed, so it was comparing the compressed contents of the MachineConfig against the uncompressed contents that were written to disk. Because of that, the Config Drift Monitor would erroneously degrade the node / MCP. Fixes: #2032565

Add helper functions to work with Ignition Configs

If a config change does not contain changes to registries.conf, don't apply checks specific to registries.conf Also start using helper functions added in openshift#2870

configure-ovs: improvements & reset openvswitch configuration on every boot

openshift-bot · 2022-01-14T22:14:49Z

/retest-required