-
Notifications
You must be signed in to change notification settings - Fork 213
OCPBUGS-20321: pkg/cvo/sync_worker: Always enable the DeploymentConfig capability #981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-20321: pkg/cvo/sync_worker: Always enable the DeploymentConfig capability #981
Conversation
The capability is new in 4.14, via openshift/api@d557f9784b (WRKLDS-728: Make Build and DeploymentConfig API optional through capabilities, 2023-05-24, openshift/api#1462) and ba3aeb9 (vendor: bump openshift/api, 2023-08-02, openshift#950). But as pointed out in [1], 4.14 releases do not declare any manifests as linked to the new capability: $ oc adm release extract --to manifests quay.io/openshift-release-dev/ocp-release:4.14.0-rc.5-x86_64 Extracted release payload from digest sha256:042899f17f33259ed9f2cfc179930af283733455720f72ea3483fd1905f9b301 created at 2023-10-10T18:00:08Z $ grep -ohr 'capability.openshift.io/name:.*' manifests | sort | uniq capability.openshift.io/name: baremetal capability.openshift.io/name: Console capability.openshift.io/name: CSISnapshot capability.openshift.io/name: ImageRegistry capability.openshift.io/name: Insights capability.openshift.io/name: MachineAPI capability.openshift.io/name: marketplace capability.openshift.io/name: NodeTuning capability.openshift.io/name: openshift-samples capability.openshift.io/name: Storage That means our existing logic to compare reconciled-manifest requirements for detecting the need to implicitly enable capabilities breaks down. In this commit, I'm teaching the outgoing 4.13 CVO that all 4.13 clusters have the DeploymentConfig capability enabled (even if it is not declared by a ClusterVersion capability in 4.13), so that capability needs to persist into 4.14 releases, to avoid surprising admins by dropping functionality. Folks who do want to drop DeploymentConfig functionality will need to perform fresh installs with 4.14 or later, because capabilities cannot be uninstalled [2]. [1]: https://issues.redhat.com/browse/OCPBUGS-20321 [2]: https://github.com/openshift/enhancements/blob/d2edd51b600c5490eaa3650aac3b45a0bff5b3d5/enhancements/installer/component-selection.md#capabilities-cannot-be-uninstalled
|
@wking: This pull request references Jira Issue OCPBUGS-20321, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/jira refresh |
|
@wking: This pull request references Jira Issue OCPBUGS-20321, which is valid. The bug has been moved to the POST state. 6 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
… Temporarily drop MachineAPI from AWS Cluster Bot job This was added here (among other places) in 365b996 (add MachineAPI cap to no-caps job, 2023-07-18, openshift#39892), because 4.14 and later None-set clusters need the MachineAPI capability enabled to allow installer/bootstrap-provisioned compute machines. But I need to be able to install a 4.13 None-capability cluster to test [1], and currently: launch 4.13,openshift/cluster-version-operator#981 aws,no-capabilities fails with [2]: level=error msg=failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: invalid "install-config.yaml" file: capabilities.additionalEnabledCapabilities[0]: Unsupported value: "MachineAPI": supported values: "CSISnapshot", "Console", "Insights", "NodeTuning", "Storage", "baremetal", "marketplace", "openshift-samples" In this commit, I'm dropping the MachineAPI injection from the AWS job. This should get my aws,no-capabilities invocation working on 4.13 while I test my pull, after which we can revert this change. In the meantime, folks who need to use no-capabilities Cluster Bot clusters to test 4.14 and later installs can use Azure or GCP or other non-AWS platforms. [1]: openshift/cluster-version-operator#981 [2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-aws-modern/1712189151484317696#1:build-log.txt%3A108
… Temporarily drop MachineAPI from AWS Cluster Bot job (#44248) This was added here (among other places) in 365b996 (add MachineAPI cap to no-caps job, 2023-07-18, #39892), because 4.14 and later None-set clusters need the MachineAPI capability enabled to allow installer/bootstrap-provisioned compute machines. But I need to be able to install a 4.13 None-capability cluster to test [1], and currently: launch 4.13,openshift/cluster-version-operator#981 aws,no-capabilities fails with [2]: level=error msg=failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: invalid "install-config.yaml" file: capabilities.additionalEnabledCapabilities[0]: Unsupported value: "MachineAPI": supported values: "CSISnapshot", "Console", "Insights", "NodeTuning", "Storage", "baremetal", "marketplace", "openshift-samples" In this commit, I'm dropping the MachineAPI injection from the AWS job. This should get my aws,no-capabilities invocation working on 4.13 while I test my pull, after which we can revert this change. In the meantime, folks who need to use no-capabilities Cluster Bot clusters to test 4.14 and later installs can use Azure or GCP or other non-AWS platforms. [1]: openshift/cluster-version-operator#981 [2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-aws-modern/1712189151484317696#1:build-log.txt%3A108
|
Pre-merge testing, via Cluster Bot $ curl -s 'https://api.openshift.com/api/upgrades_info/graph?channel=candidate-4.15&arch=amd64' | jq -r '.nodes[] | .version + " " + .payload' | sort | tail -n3
4.14.0-rc.4 quay.io/openshift-release-dev/ocp-release@sha256:4d3c8199b50cd1e129755336468759342255a0090d09424133df1ea60253da13
4.14.0-rc.5 quay.io/openshift-release-dev/ocp-release@sha256:042899f17f33259ed9f2cfc179930af283733455720f72ea3483fd1905f9b301
4.15.0-ec.0 quay.io/openshift-release-dev/ocp-release@sha256:217c3265267f7695bba068d328ab513c5f6c607d936abb884e84e9b615bb8841Updating my test cluster to rc.4: $ oc get -o json clusterversion version | jq '.status | {desired, capabilities}'
{
"desired": {
"image": "registry.build05.ci.openshift.org/ci-ln-1zqndwt/release@sha256:b18e5278c0f66f5a48d7933f8da52cb064d2a7e6ff6824e687d49b47f05f9bb9",
"version": "4.13.0-0.test-2023-10-11-210131-ci-ln-1zqndwt-latest"
},
"capabilities": {
"knownCapabilities": [
"CSISnapshot",
"Console",
"Insights",
"NodeTuning",
"Storage",
"baremetal",
"marketplace",
"openshift-samples"
]
}
}
$ oc -n openshift-config patch cm admin-acks --patch '{"data":{"ack-4.13-kube-1.27-api-removals-in-4.14":"true"}}' --type=merge
$ oc adm upgrade --allow-explicit-upgrade --to-image quay.io/openshift-release-dev/ocp-release@sha256:4d3c8199b50cd1e129755336468759342255a0090d09424133df1ea60253da13And once the update is running: $ oc get -o json clusterversion version | jq '.status | {desired, capabilities, conditions: ([.conditions[] | select(.type == "ImplicitlyEnabledCapabilities")])}'
{
"desired": {
"image": "quay.io/openshift-release-dev/ocp-release@sha256:4d3c8199b50cd1e129755336468759342255a0090d09424133df1ea60253da13",
"url": "https://access.redhat.com/errata/RHSA-2023:5006",
"version": "4.14.0-rc.4"
},
"capabilities": {
"enabledCapabilities": [
"Console",
"DeploymentConfig",
"ImageRegistry",
"MachineAPI"
],
"knownCapabilities": [
"Build",
"CSISnapshot",
"Console",
"DeploymentConfig",
"ImageRegistry",
"Insights",
"MachineAPI",
"NodeTuning",
"Storage",
"baremetal",
"marketplace",
"openshift-samples"
]
},
"conditions": [
{
"lastTransitionTime": "2023-10-11T21:38:36Z",
"message": "The following capabilities could not be disabled: Console, DeploymentConfig, ImageRegistry, MachineAPI",
"reason": "CapabilitiesImplicitlyEnabled",
"status": "True",
"type": "ImplicitlyEnabledCapabilities"
}
]
}so that looks like it's working. Without bothering to wait for the rc.4 update to complete, head to rc.5, forcing through an $ oc adm upgrade --force --allow-upgrade-with-warnings --allow-explicit-upgrade --to-image quay.io/openshift-release-dev/ocp-release@sha256:042899f17f33259ed9f2cfc179930af283733455720f72ea3483fd1905f9b301Status still looks good: $ oc get -o json clusterversion version | jq '.status | {desired, capabilities, conditions: ([.conditions[] | select(.type == "ImplicitlyEnabledCapabilities")])}'
{
"desired": {
"image": "quay.io/openshift-release-dev/ocp-release@sha256:042899f17f33259ed9f2cfc179930af283733455720f72ea3483fd1905f9b301",
"url": "https://access.redhat.com/errata/RHSA-2023:5006",
"version": "4.14.0-rc.5"
},
"capabilities": {
"enabledCapabilities": [
"Console",
"DeploymentConfig",
"ImageRegistry",
"MachineAPI"
],
"knownCapabilities": [
"Build",
"CSISnapshot",
"Console",
"DeploymentConfig",
"ImageRegistry",
"Insights",
"MachineAPI",
"NodeTuning",
"Storage",
"baremetal",
"marketplace",
"openshift-samples"
]
},
"conditions": [
{
"lastTransitionTime": "2023-10-11T21:38:36Z",
"message": "The following capabilities could not be disabled: Console, DeploymentConfig, ImageRegistry, MachineAPI",
"reason": "CapabilitiesImplicitlyEnabled",
"status": "True",
"type": "ImplicitlyEnabledCapabilities"
}
]
}And then push on to 4.15, again without waiting for a complete update: $ oc adm upgrade --force --allow-upgrade-with-warnings --allow-explicit-upgrade --to-image quay.io/openshift-release-dev/ocp-release@sha256:217c3265267f7695bba068d328ab513c5f6c607d936abb884e84e9b615bb8841Status still looks good: $ oc get -o json clusterversion version | jq '.status | {desired, capabilities, conditions: ([.conditions[] | select(.type == "ImplicitlyEnabledCapabilities")])}'
{
"desired": {
"image": "quay.io/openshift-release-dev/ocp-release@sha256:217c3265267f7695bba068d328ab513c5f6c607d936abb884e84e9b615bb8841",
"version": "4.15.0-ec.0"
},
"capabilities": {
"enabledCapabilities": [
"Console",
"DeploymentConfig",
"ImageRegistry",
"MachineAPI"
],
"knownCapabilities": [
"Build",
"CSISnapshot",
"Console",
"DeploymentConfig",
"ImageRegistry",
"Insights",
"MachineAPI",
"NodeTuning",
"Storage",
"baremetal",
"marketplace",
"openshift-samples"
]
},
"conditions": [
{
"lastTransitionTime": "2023-10-11T21:38:36Z",
"message": "The following capabilities could not be disabled: Console, DeploymentConfig, ImageRegistry, MachineAPI",
"reason": "CapabilitiesImplicitlyEnabled",
"status": "True",
"type": "ImplicitlyEnabledCapabilities"
}
]
} |
|
/lgtm |
|
@wking: Overrode contexts on behalf of wking: ci/prow/e2e-agnostic-ovn DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@wking: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
Update pre-merge testing result from QE side, // Install a non-cap cluster with the image built from the PR // Get clusterversion // Get build and dc // Upgrade to 4.14 nightly // In the middle of the upgrade, check clusterversion. dc is implicitly enabled // After upgrade is complete, check clusterversion // Get build and dc resource |
|
/label qe-approved |
|
verification by @wking as well as @shellyyang1989 looks good to me. |
|
@wking: This pull request references Jira Issue OCPBUGS-20321, which is valid. 6 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
label /cherry-pick-approved |
|
/label cherry-pick-approved |
| if strings.HasPrefix(payloadUpdate.Release.Version, "4.14.") { | ||
| deploymentConfig := configv1.ClusterVersionCapability("DeploymentConfig") | ||
| if _, ok := work.Capabilities.EnabledCapabilities[deploymentConfig]; !ok && !capability.Contains(implicitlyEnabledCaps, deploymentConfig) { | ||
| implicitlyEnabledCaps = append(implicitlyEnabledCaps, deploymentConfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wking The issue is we have declared a capability but there are no associated manifests which has this capability. We can have this issue in future as well. Can we declare a generic code structure for this? A method which can return capability names which should be enabled when no manifests has the capability. I would expect the code structure to be present in master and then we should back-port this appropriately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can assess a refactor in the future but right now we need this in a 4.13.z asap because users will need to upgrade to a 4.13.z containing this logic before they can upgrade to 4.14.0 (otherwise 4.13 clusters that have a static set of caps enabled in 4.13 will have the deploymentconfig cap disabled when they upgrade to 4.14.0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bparees ack. Will add the labels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created OTA-1040 trying to feel out a long-term plan for this kind of thing.
LalatenduMohanty
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/label backport-risk-assessed
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bparees, LalatenduMohanty, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@wking: Jira Issue OCPBUGS-20321: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-20321 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Fix included in accepted release 4.13.0-0.nightly-2023-10-12-193258 |
The capability is new in 4.14, via openshift/api@d557f9784b (openshift/api#1462) and ba3aeb9 (#950). But as pointed out in OCPBUGS-20321, 4.14 releases do not declare any manifests as linked to the new capability:
That means our existing logic to compare reconciled-manifest requirements for detecting the need to implicitly enable capabilities breaks down. In this commit, I'm teaching the outgoing 4.13 CVO that all 4.13 clusters have the
DeploymentConfigcapability enabled (even if it is not declared by a ClusterVersion capability in 4.13), so that capability needs to persist into 4.14 releases, to avoid surprising admins by dropping functionality.Folks who do want to drop
DeploymentConfigfunctionality will need to perform fresh installs with 4.14 or later, because capabilities cannot be uninstalled.