-
Notifications
You must be signed in to change notification settings - Fork 462
OCPBUGS-53427: pkg/operator/status: Drop kubelet skew guard, add RHEL guard #4956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-53427: pkg/operator/status: Drop kubelet skew guard, add RHEL guard #4956
Conversation
|
@wking: This pull request references Jira Issue OCPBUGS-53427, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
68256ae to
5fc0354
Compare
5fc0354 to
9915680
Compare
aabb1bf to
56ba6cb
Compare
The kubelet skew guards are from 1471d2c (Bug 1986453: Check for API server and node versions skew, 2021-07-27, openshift#2658). But the Kube API server also landed similar guards in openshift/cluster-kube-apiserver-operator@9ce4f74775 (add KubeletVersionSkewController, 2021-08-26, openshift/cluster-kube-apiserver-operator#1199). openshift/enhancements@0ba744e750 (eus-upgrades-mvp: don't enforce skew check in MCO, 2021-04-29, openshift/enhancements#762) had shifted the proposal form MCO-guards to KAS-guards, so I'm not entirely clear on why the MCO guards landed at all. But it's convenient for me that they did, because while I'm dropping them here, I'm recycling the Node lister for a new check. 4.19 is dropping bare, package-managed RHEL support. I'd initially thought about looking for RHEL entries like: osImage: Red Hat Enterprise Linux 8.6 (Ootpa) while excluding RHCOS entries like: osImage: Red Hat Enterprise Linux CoreOS 419.96.202503032242-0 But instead of switching on osImage, I'm using the node.openshift.io/os_id label to find package-managed RHEL Nodes. The machine-config operator is setting up the label [1] based on the ID value in /etc/os-release. On RHCOS instances, the ID value is 'rhcos' [2]. On package-managed RHEL, it's 'rhel' [3,4]. [1]: https://github.com/openshift/machine-config-operator/blob/ddc18e84f4a0650e0e87aa0a4f90f9cf01b5259c/templates/worker/01-worker-kubelet/_base/units/kubelet.service.yaml#L19-L31 [2]: https://github.com/openshift/os/blob/41f6a028d37b750db0bf4257447d809bd9cbe4bf/manifest-ocp-rhel-9.6.yaml#L41 [3]: https://github.com/openshift/enhancements/blob/ea465e192bfb58ec8654f1c904a4af68777f68ec/enhancements/rhcos/split-rhcos-into-layers.md?plain=1#L416 [4]: https://github.com/openshift/machine-config-operator/blob/ddc18e84f4a0650e0e87aa0a4f90f9cf01b5259c/pkg/daemon/osrelease/osrelease.go#L69
56ba6cb to
13cceb0
Compare
|
/jira refresh |
|
@wking: This pull request references Jira Issue OCPBUGS-53427, which is valid. The bug has been moved to the POST state. 7 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (gpei+old@redhat.com), skipping review request. The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@wking: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Generally seems sane to me, just one question. We seem to be setting the /approve Will add final backport tags after QE has done pre-merge testing. |
so we're failing-open. And maybe not alerting; I could see us growing alerting for any |
|
Pre-merge verification steps: Have verified using IPI based AWS 4.18 cluster. To add the rhel node used the jenkins job Detail steps are define here https://issues.redhat.com/browse/OCPBUGS-53427?focusedId=27088479&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-27088479 /label cherry-pick-approved |
|
/lgtm /label backport-risk-assessed |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: djoshy, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
57836ff
into
openshift:release-4.18
|
@wking: Jira Issue OCPBUGS-53427: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-53427 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[ART PR BUILD NOTIFIER] Distgit: ose-machine-config-operator |
|
Fix included in accepted release 4.18.0-0.nightly-2025-05-06-231850 |
In 4.19: * 377a78b (pkg/operator/status: Drop PoolUpdating as an Upgradeable=False condition, 2024-12-16, openshift#4760). * 0c21907 (pkg/operator/status: Drop kubelet skew guard, 2025-04-03, openshift#4970). But in 4.18, we're using the other order: * 13cceb0 (pkg/operator/status: Drop kubelet skew guard, add RHEL guard, 2025-03-26, openshift#4956). * 20fe075 (pkg/operator/status: Drop PoolUpdating as an Upgradeable=False condition, 2024-12-16, openshift#5065). So I'm adding this follow-up commit within openshift#5065 to remove the 'updating' variable that both the kubelet-skew-guard and the PoolUpdating guard had used, but which we no longer need now that both are gone in 4.18.
Closes: OCPBUGS-53427
- What I did
The kubelet skew guards are from 1471d2c (#2658). But the Kube API server also landed similar guards in
openshift/cluster-kube-apiserver-operator@9ce4f74775 (openshift/cluster-kube-apiserver-operator#1199).
/enhancements@0ba744e750 (openshift/enhancements#762) had shifted the proposal form MCO-guards to KAS-guards, so I'm not entirely clear on why the MCO guards landed at all. But it's convenient for me that they did, because while I'm dropping them here, I'm recycling the Node lister for a new check.
4.19 is dropping bare-RHEL support, and I want the Node lister to look for RHEL entries like:
but we are ok with RHCOS entries like:
- How to verify it
Install a 4.18 cluster with this fix. Its
machine-configClusterOperator should beUpgradeable=True. Install a bare-RHEL node. The ClusterOperator should becomeUpgradeable=Falseand complain about that node. Remove the bare-RHEL node or somehow convert it to RHCOS. The ClusterOperator should becomeUpgradeable=Trueagain.- Description for the changelog
The machine-config operator now detects bare-RHEL Nodes and warns that they will not be compatible with OpenShift 4.19.