[release-4.17] OCPBUGS-38292: controller: default to runc when upgrading clusters from 4.17 to 4.18 #4715

sohankunkerkar · 2024-11-22T20:12:20Z

We need to handle the following cases:

The user has already set the default runtime and then updates to 4.17.z with this change.
The user updates to 4.17.z with this change and then tries setting the default runtime of their choice.
The user updates from 4.17.z with the defaulting logic to 4.18 and then wants to set the runtime.

sohankunkerkar · 2024-11-22T20:34:11Z

/retest

openshift-ci-robot · 2024-11-25T15:08:34Z

@sohankunkerkar: This pull request references Jira Issue OCPBUGS-38292, which is invalid:

expected Jira Issue OCPBUGS-38292 to depend on a bug targeting a version in 4.18.0 and in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

We need to handle a case where users are upgrading from 4.16 to 4.17, if the containerruntimeconfig already exists, then MCO will not create an MC to set the default runtime.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

QiWang19 · 2024-11-25T18:23:15Z

pkg/controller/container-runtime-config/container_runtime_config_controller.go

+			continue
+		}
+		// Check if ContainerRuntimeConfig exists
+		managedKeyForCtr, err := getManagedKeyCtrCfg(pool, ctrl.client, cfg)


getManagedKeyCtrCfg returns the MC name of passed in cfg object. It can be one of existing MC or a new MC that does not exist.
There may be multiple containerruntimeconfig objects and only the last one takes effect.
I think for checking the current runtime configuration, we have to traverse the existing 99-pool-generated-containerruntimeconfig-x machineconfigs in reverse order and check the first existing DefaultRuntime configuration. @yuqi-zhang What do you think?

Hmm, I guess this is a scenario where the design of the CRCC is a bit weird. Thinking through this scenario:

the user provides their own runtime pin, which generates into 99-pool-generated-containerruntimeconfig and has contents for /etc/crio/crio.conf.d/01-ctrcfg-defaultRuntime

the user then adds a second config that sets e.g. pidLimit, which then translates to /etc/crio/crio.conf.d/01-ctrcfg-pidsLimit as 99-pool-generated-containerruntimeconfig-2

Technically, the way we frame kubelet/containerruntimeconfigs, we don't merge them and we now 2 machineconfigs, but both containerruntimeconfigs still exist and are taking effect since they define different files. If I were to ever unpin via delete the runtime default, I guess it still works since 99-pool-generated-containerruntimeconfig should get deleted alongside it?

Then in that case I guess we do have to parse through all the existing configuration... either through parsing all MCs that exist or all containerruntimeconfigs that exist (assuming that's the only way you can set a default runtime)

If I were to ever unpin via delete the runtime default, I guess it still works since 99-pool-generated-containerruntimeconfig should get deleted alongside it?

Yes, 99-pool-generated-containerruntimeconfig will be deleted automatically when the corresponding ContainerRuntimeConfig objects are deleted.

QiWang19 · 2024-11-25T18:23:24Z

pkg/controller/container-runtime-config/container_runtime_config_controller.go

 	}

+	// create the MC for the drop in default-container-runtime crio.conf file
+	if err := ctrl.createDefaultContainerRuntimeMC(cfg); err != nil {


we would not get a cfg object if the key passed to syncContainerRuntimeConfig is forceSyncOnUpgrade. syncContainerRuntimeConfig would return before this line because the ContainerRuntimeConfig does not exist.

Oh, so the best way here is to query an API and get the cfg?

I just think if we have to fetch the machineconfigs to check if the existing runtime configuration has been set on the cluster we don't neet the cfg argument for createDefaultContainerRuntimeMC

QiWang19 · 2024-11-25T18:38:20Z

Could you clarify the purpose of this PR? My understanding is:

If the cluster sets a default runtime to crun, we don't create a MachineConfig to switch to runc.
If the cluster does not set a runtime configuration, we default to runc.
If the cluster explicitly sets the runtime to runc, we leave the default as runc without making changes.

sohankunkerkar · 2024-11-25T18:54:03Z

Could you clarify the purpose of this PR? My understanding is:

If the cluster sets a default runtime to crun, we don't create a MachineConfig to switch to runc.

If the cluster does not set a runtime configuration, we default to runc.

If the cluster explicitly sets the runtime to runc, we leave the default as runc without making changes.

I think the idea here is to capture the following scenarios:

If the user has already set the default_runtime to crun via the container runtime config (not possible via an MC because we are making crun the default in 4.18) and then updates to 4.17.z with this change, the cluster should retain the default_runtime set by the user.
If the user updates to 4.17.z with this change and then tries setting the default_runtime to crun, it should work.
If the user updates from 4.17.z with the defaulting logic to 4.18 and then wants to set the default_runtime to runc, they will need to delete the MC manually, as we will not be handling the auto-deletion logic in MCO.

And for other cases where the cluster version is < 4.17, I have already confirmed with the OTA team that all updates will go through this change, regardless of the cluster's previous state.

sohankunkerkar · 2024-11-27T03:53:17Z

/retest

pkg/controller/container-runtime-config/helpers.go

sohankunkerkar · 2024-11-27T14:32:07Z

/test unit

QiWang19

The code should work fine, but we will need to perform some manual upgrade testing to ensure everything functions correctly.

pkg/controller/container-runtime-config/helpers.go

pkg/controller/container-runtime-config/container_runtime_config_controller.go

sohankunkerkar · 2024-12-02T19:05:48Z

/test e2e-gcp-op

openshift-ci-robot · 2024-12-16T21:57:26Z

@sohankunkerkar: This pull request references Jira Issue OCPBUGS-38292, which is invalid:

expected Jira Issue OCPBUGS-38292 to depend on a bug targeting a version in 4.18.0 and in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

We need to handle the following cases:

The user has already set the default runtime and then updates to 4.17.z with this change.

The user updates to 4.17.z with this change and then tries setting the default runtime of their choice.

The user updates from 4.17.z with the defaulting logic to 4.18 and then wants to set the runtime.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2024-12-16T23:30:09Z

/retest-required

Remaining retests: 0 against base HEAD 119a374 and 2 for PR HEAD 1fb734a in total

openshift-ci-robot · 2024-12-17T00:47:20Z

/retest-required

Remaining retests: 0 against base HEAD 119a374 and 2 for PR HEAD 1fb734a in total

openshift-ci-robot · 2024-12-17T02:27:01Z

/retest-required

Remaining retests: 0 against base HEAD 119a374 and 2 for PR HEAD 1fb734a in total

sohankunkerkar · 2024-12-17T04:06:50Z

/retest

sohankunkerkar · 2024-12-17T14:05:51Z

/retest

sohankunkerkar · 2024-12-18T14:14:21Z

/retest

sohankunkerkar · 2024-12-18T18:40:08Z

/retest

openshift-ci-robot · 2024-12-18T19:13:39Z

/retest-required

Remaining retests: 0 against base HEAD 119a374 and 2 for PR HEAD 1fb734a in total

openshift-ci · 2024-12-18T22:58:22Z

@sohankunkerkar: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-gcp-op-techpreview	`1fb734a`	link	false	`/test e2e-gcp-op-techpreview`
ci/prow/e2e-azure-ovn-upgrade-out-of-change	`1fb734a`	link	false	`/test e2e-azure-ovn-upgrade-out-of-change`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci-robot · 2024-12-18T23:01:53Z

@sohankunkerkar: Jira Issue OCPBUGS-38292: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-38292 has been moved to the MODIFIED state.

Details

In response to this:

We need to handle the following cases:

The user has already set the default runtime and then updates to 4.17.z with this change.

The user updates to 4.17.z with this change and then tries setting the default runtime of their choice.

The user updates from 4.17.z with the defaulting logic to 4.18 and then wants to set the runtime.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-bot · 2024-12-19T02:51:36Z

[ART PR BUILD NOTIFIER]

Distgit: ose-machine-config-operator
This PR has been included in build ose-machine-config-operator-container-v4.17.0-202412182334.p0.gd7c30c8.assembly.stream.el9.
All builds following this will include this PR.

in cri-o 1.33, a change cri-o/cri-o#8962 was made to the default limits set for CRI-O. Now, the ulimit nofile is set much lower, with space to set it higher. however, some workloads don't expect this change, and fail (see https://issues.redhat.com/browse/OCPBUGS-62095) This was worked around temporarily in openshift#5308, but that workaround was not intendd to be carried in to 4.21. Instead, we should drop-in an ignition file on upgrades from 4.20 to 4.21 to make sure existing clusters don't get this change, but new clusters started in 4.21 do. This was entirely based on openshift#4715 Signed-off-by: Peter Hunt <pehunt@redhat.com>

openshift-ci bot requested review from mtrmac and wgahnagl November 22, 2024 20:36

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 23, 2024

sohankunkerkar force-pushed the up-release-4.17 branch from b3e8e63 to 6b44497 Compare November 25, 2024 05:01

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 25, 2024

sohankunkerkar force-pushed the up-release-4.17 branch 2 times, most recently from ce4319d to efde22d Compare November 25, 2024 14:19

sohankunkerkar changed the title ~~crio: skip MC creation if the containerruntimeconfig already exists~~ OCPBUGS-38292: controller: default to runc when upgrading clusters from 4.17 to 4.18 Nov 25, 2024

sohankunkerkar force-pushed the up-release-4.17 branch from efde22d to 7dedcc4 Compare November 25, 2024 17:53

QiWang19 reviewed Nov 25, 2024

View reviewed changes

sohankunkerkar force-pushed the up-release-4.17 branch 3 times, most recently from 7458888 to aa124e4 Compare November 26, 2024 22:33

QiWang19 reviewed Nov 27, 2024

View reviewed changes

pkg/controller/container-runtime-config/helpers.go Outdated Show resolved Hide resolved

sohankunkerkar force-pushed the up-release-4.17 branch 2 times, most recently from 24abfa3 to 736bd12 Compare November 27, 2024 13:48

QiWang19 reviewed Nov 27, 2024

View reviewed changes

pkg/controller/container-runtime-config/helpers.go Outdated Show resolved Hide resolved

pkg/controller/container-runtime-config/container_runtime_config_controller.go Outdated Show resolved Hide resolved

sohankunkerkar force-pushed the up-release-4.17 branch from 736bd12 to 84fbe70 Compare November 27, 2024 17:40

rbaturov mentioned this pull request Dec 3, 2024

OCPBUGS-45450: [release-4.18]: Remove container runtime selection openshift/cluster-node-tuning-operator#1233

Merged

openshift-ci bot changed the title ~~[release-4.17] NO-JIRA: OCPBUGS-38292: controller: default to runc when upgrading clusters from 4.17 to 4.18~~ [release-4.17] OCPBUGS-38292: controller: default to runc when upgrading clusters from 4.17 to 4.18 Dec 16, 2024

openshift-ci-robot added jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Dec 16, 2024

mrunalp added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Dec 16, 2024

openshift-merge-bot bot merged commit d7c30c8 into openshift:release-4.17 Dec 18, 2024
15 of 17 checks passed

sohankunkerkar deleted the up-release-4.17 branch December 19, 2024 00:35

ngopalak-redhat mentioned this pull request Nov 7, 2025

[release-4.20] OCPBUGS-65777: Enforce OCP 4.20 and earlier cluster to have AutoSizingReserved disabled by default #5387

Merged

Prashanth684 mentioned this pull request Nov 13, 2025

OKD-294: Migrate runtime from runc to crun on an upgrade for OKD #5389

Merged

ngopalak-redhat mentioned this pull request Nov 19, 2025

WIP : [release-4.20] kubelet-config compressible patch #5412

Draft

haircommander mentioned this pull request Dec 23, 2025

OCPBUGS-70201: ctrcfg: set increase ulimits when upgrading from 4.20 to 4.21 #5516

Merged

[release-4.17] OCPBUGS-38292: controller: default to runc when upgrading clusters from 4.17 to 4.18 #4715

[release-4.17] OCPBUGS-38292: controller: default to runc when upgrading clusters from 4.17 to 4.18 #4715

Uh oh!

Conversation

sohankunkerkar commented Nov 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sohankunkerkar commented Nov 22, 2024

Uh oh!

openshift-ci-robot commented Nov 25, 2024

Uh oh!

QiWang19 Nov 25, 2024

Choose a reason for hiding this comment

Uh oh!

yuqi-zhang Nov 25, 2024

Choose a reason for hiding this comment

Uh oh!

QiWang19 Nov 25, 2024

Choose a reason for hiding this comment

Uh oh!

QiWang19 Nov 25, 2024

Choose a reason for hiding this comment

Uh oh!

sohankunkerkar Nov 25, 2024

Choose a reason for hiding this comment

Uh oh!

QiWang19 Nov 25, 2024

Choose a reason for hiding this comment

Uh oh!

QiWang19 commented Nov 25, 2024

Uh oh!

sohankunkerkar commented Nov 25, 2024

Uh oh!

sohankunkerkar commented Nov 27, 2024

Uh oh!

Uh oh!

sohankunkerkar commented Nov 27, 2024

Uh oh!

QiWang19 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sohankunkerkar commented Dec 2, 2024

Uh oh!

openshift-ci-robot commented Dec 16, 2024

Uh oh!

openshift-ci-robot commented Dec 16, 2024

Uh oh!

openshift-ci-robot commented Dec 17, 2024

Uh oh!

openshift-ci-robot commented Dec 17, 2024

Uh oh!

sohankunkerkar commented Dec 17, 2024

Uh oh!

sohankunkerkar commented Dec 17, 2024

Uh oh!

sohankunkerkar commented Dec 18, 2024

Uh oh!

sohankunkerkar commented Dec 18, 2024

Uh oh!

openshift-ci-robot commented Dec 18, 2024

Uh oh!

openshift-ci bot commented Dec 18, 2024

Uh oh!

Uh oh!

openshift-ci-robot commented Dec 18, 2024

Uh oh!

openshift-bot commented Dec 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants

sohankunkerkar commented Nov 22, 2024 •

edited

Loading