Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented Oct 11, 2023

The capability is new in 4.14, via openshift/api@d557f9784b (openshift/api#1462) and ba3aeb9 (#950). But as pointed out in OCPBUGS-20321, 4.14 releases do not declare any manifests as linked to the new capability:

$ oc adm release extract --to manifests quay.io/openshift-release-dev/ocp-release:4.14.0-rc.5-x86_64
Extracted release payload from digest sha256:042899f17f33259ed9f2cfc179930af283733455720f72ea3483fd1905f9b301 created at 2023-10-10T18:00:08Z
$ grep -ohr 'capability.openshift.io/name:.*' manifests | sort | uniq
capability.openshift.io/name: baremetal
capability.openshift.io/name: Console
capability.openshift.io/name: CSISnapshot
capability.openshift.io/name: ImageRegistry
capability.openshift.io/name: Insights
capability.openshift.io/name: MachineAPI
capability.openshift.io/name: marketplace
capability.openshift.io/name: NodeTuning
capability.openshift.io/name: openshift-samples
capability.openshift.io/name: Storage

That means our existing logic to compare reconciled-manifest requirements for detecting the need to implicitly enable capabilities breaks down. In this commit, I'm teaching the outgoing 4.13 CVO that all 4.13 clusters have the DeploymentConfig capability enabled (even if it is not declared by a ClusterVersion capability in 4.13), so that capability needs to persist into 4.14 releases, to avoid surprising admins by dropping functionality.

Folks who do want to drop DeploymentConfig functionality will need to perform fresh installs with 4.14 or later, because capabilities cannot be uninstalled.

The capability is new in 4.14, via openshift/api@d557f9784b
(WRKLDS-728: Make Build and DeploymentConfig API optional through
capabilities, 2023-05-24, openshift/api#1462) and ba3aeb9 (vendor:
bump openshift/api, 2023-08-02, openshift#950).  But as pointed out in [1],
4.14 releases do not declare any manifests as linked to the new
capability:

  $ oc adm release extract --to manifests quay.io/openshift-release-dev/ocp-release:4.14.0-rc.5-x86_64
  Extracted release payload from digest sha256:042899f17f33259ed9f2cfc179930af283733455720f72ea3483fd1905f9b301 created at 2023-10-10T18:00:08Z
  $ grep -ohr 'capability.openshift.io/name:.*' manifests | sort | uniq
  capability.openshift.io/name: baremetal
  capability.openshift.io/name: Console
  capability.openshift.io/name: CSISnapshot
  capability.openshift.io/name: ImageRegistry
  capability.openshift.io/name: Insights
  capability.openshift.io/name: MachineAPI
  capability.openshift.io/name: marketplace
  capability.openshift.io/name: NodeTuning
  capability.openshift.io/name: openshift-samples
  capability.openshift.io/name: Storage

That means our existing logic to compare reconciled-manifest
requirements for detecting the need to implicitly enable capabilities
breaks down.  In this commit, I'm teaching the outgoing 4.13 CVO that
all 4.13 clusters have the DeploymentConfig capability enabled (even
if it is not declared by a ClusterVersion capability in 4.13), so that
capability needs to persist into 4.14 releases, to avoid surprising
admins by dropping functionality.

Folks who do want to drop DeploymentConfig functionality will need to
perform fresh installs with 4.14 or later, because capabilities cannot
be uninstalled [2].

[1]: https://issues.redhat.com/browse/OCPBUGS-20321
[2]: https://github.com/openshift/enhancements/blob/d2edd51b600c5490eaa3650aac3b45a0bff5b3d5/enhancements/installer/component-selection.md#capabilities-cannot-be-uninstalled
@openshift-ci-robot openshift-ci-robot added jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 11, 2023
@openshift-ci-robot
Copy link
Contributor

@wking: This pull request references Jira Issue OCPBUGS-20321, which is invalid:

  • expected Jira Issue OCPBUGS-20321 to depend on a bug targeting a version in 4.14.0 and in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

The capability is new in 4.14, via openshift/api@d557f9784b (openshift/api#1462) and ba3aeb9 (#950). But as pointed out in OCPBUGS-20321, 4.14 releases do not declare any manifests as linked to the new capability:

$ oc adm release extract --to manifests quay.io/openshift-release-dev/ocp-release:4.14.0-rc.5-x86_64
Extracted release payload from digest sha256:042899f17f33259ed9f2cfc179930af283733455720f72ea3483fd1905f9b301 created at 2023-10-10T18:00:08Z
$ grep -ohr 'capability.openshift.io/name:.*' manifests | sort | uniq
capability.openshift.io/name: baremetal
capability.openshift.io/name: Console
capability.openshift.io/name: CSISnapshot
capability.openshift.io/name: ImageRegistry
capability.openshift.io/name: Insights
capability.openshift.io/name: MachineAPI
capability.openshift.io/name: marketplace
capability.openshift.io/name: NodeTuning
capability.openshift.io/name: openshift-samples
capability.openshift.io/name: Storage

That means our existing logic to compare reconciled-manifest requirements for detecting the need to implicitly enable capabilities breaks down. In this commit, I'm teaching the outgoing 4.13 CVO that all 4.13 clusters have the DeploymentConfig capability enabled (even if it is not declared by a ClusterVersion capability in 4.13), so that capability needs to persist into 4.14 releases, to avoid surprising admins by dropping functionality.

Folks who do want to drop DeploymentConfig functionality will need to perform fresh installs with 4.14 or later, because capabilities cannot be uninstalled.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wking
Copy link
Member Author

wking commented Oct 11, 2023

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 11, 2023
@openshift-ci-robot
Copy link
Contributor

@wking: This pull request references Jira Issue OCPBUGS-20321, which is valid. The bug has been moved to the POST state.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.13.z) matches configured target version for branch (4.13.z)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)
  • dependent bug Jira Issue OCPBUGS-20431 is in the state Closed (Done), which is one of the valid states (VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA))
  • dependent Jira Issue OCPBUGS-20431 targets the "4.14.0" version, which is one of the valid target versions: 4.14.0
  • bug has dependents

Requesting review from QA contact:
/cc @evakhoni

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 11, 2023
wking added a commit to wking/openshift-release that referenced this pull request Oct 11, 2023
… Temporarily drop MachineAPI from AWS Cluster Bot job

This was added here (among other places) in 365b996 (add MachineAPI
cap to no-caps job, 2023-07-18, openshift#39892), because 4.14 and later
None-set clusters need the MachineAPI capability enabled to allow
installer/bootstrap-provisioned compute machines.  But I need to be
able to install a 4.13 None-capability cluster to test [1], and currently:

  launch 4.13,openshift/cluster-version-operator#981 aws,no-capabilities

fails with [2]:

  level=error msg=failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: invalid "install-config.yaml" file: capabilities.additionalEnabledCapabilities[0]: Unsupported value: "MachineAPI": supported values: "CSISnapshot", "Console", "Insights", "NodeTuning", "Storage", "baremetal", "marketplace", "openshift-samples"

In this commit, I'm dropping the MachineAPI injection from the AWS
job.  This should get my aws,no-capabilities invocation working on
4.13 while I test my pull, after which we can revert this change.  In
the meantime, folks who need to use no-capabilities Cluster Bot
clusters to test 4.14 and later installs can use Azure or GCP or other
non-AWS platforms.

[1]: openshift/cluster-version-operator#981
[2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-aws-modern/1712189151484317696#1:build-log.txt%3A108
openshift-ci bot pushed a commit to openshift/release that referenced this pull request Oct 11, 2023
… Temporarily drop MachineAPI from AWS Cluster Bot job (#44248)

This was added here (among other places) in 365b996 (add MachineAPI
cap to no-caps job, 2023-07-18, #39892), because 4.14 and later
None-set clusters need the MachineAPI capability enabled to allow
installer/bootstrap-provisioned compute machines.  But I need to be
able to install a 4.13 None-capability cluster to test [1], and currently:

  launch 4.13,openshift/cluster-version-operator#981 aws,no-capabilities

fails with [2]:

  level=error msg=failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: invalid "install-config.yaml" file: capabilities.additionalEnabledCapabilities[0]: Unsupported value: "MachineAPI": supported values: "CSISnapshot", "Console", "Insights", "NodeTuning", "Storage", "baremetal", "marketplace", "openshift-samples"

In this commit, I'm dropping the MachineAPI injection from the AWS
job.  This should get my aws,no-capabilities invocation working on
4.13 while I test my pull, after which we can revert this change.  In
the meantime, folks who need to use no-capabilities Cluster Bot
clusters to test 4.14 and later installs can use Azure or GCP or other
non-AWS platforms.

[1]: openshift/cluster-version-operator#981
[2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-aws-modern/1712189151484317696#1:build-log.txt%3A108
@wking
Copy link
Member Author

wking commented Oct 11, 2023

Pre-merge testing, via Cluster Bot launch 4.13,openshift/cluster-version-operator#981 aws,no-capabilities (logs), after openshift/release#44248 temporarily enabled 4.13 aws,no-capabilities Cluster Bot job. Figuring out update targets:

$ curl -s 'https://api.openshift.com/api/upgrades_info/graph?channel=candidate-4.15&arch=amd64' | jq -r '.nodes[] | .version + " " + .payload' | sort | tail -n3
4.14.0-rc.4 quay.io/openshift-release-dev/ocp-release@sha256:4d3c8199b50cd1e129755336468759342255a0090d09424133df1ea60253da13
4.14.0-rc.5 quay.io/openshift-release-dev/ocp-release@sha256:042899f17f33259ed9f2cfc179930af283733455720f72ea3483fd1905f9b301
4.15.0-ec.0 quay.io/openshift-release-dev/ocp-release@sha256:217c3265267f7695bba068d328ab513c5f6c607d936abb884e84e9b615bb8841

Updating my test cluster to rc.4:

$ oc get -o json clusterversion version | jq '.status | {desired, capabilities}'
{
  "desired": {
    "image": "registry.build05.ci.openshift.org/ci-ln-1zqndwt/release@sha256:b18e5278c0f66f5a48d7933f8da52cb064d2a7e6ff6824e687d49b47f05f9bb9",
    "version": "4.13.0-0.test-2023-10-11-210131-ci-ln-1zqndwt-latest"
  },
  "capabilities": {
    "knownCapabilities": [
      "CSISnapshot",
      "Console",
      "Insights",
      "NodeTuning",
      "Storage",
      "baremetal",
      "marketplace",
      "openshift-samples"
    ]
  }
}
$ oc -n openshift-config patch cm admin-acks --patch '{"data":{"ack-4.13-kube-1.27-api-removals-in-4.14":"true"}}' --type=merge
$ oc adm upgrade --allow-explicit-upgrade --to-image quay.io/openshift-release-dev/ocp-release@sha256:4d3c8199b50cd1e129755336468759342255a0090d09424133df1ea60253da13

And once the update is running:

$ oc get -o json clusterversion version | jq '.status | {desired, capabilities, conditions: ([.conditions[] | select(.type == "ImplicitlyEnabledCapabilities")])}'
{
  "desired": {
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:4d3c8199b50cd1e129755336468759342255a0090d09424133df1ea60253da13",
    "url": "https://access.redhat.com/errata/RHSA-2023:5006",
    "version": "4.14.0-rc.4"
  },
  "capabilities": {
    "enabledCapabilities": [
      "Console",
      "DeploymentConfig",
      "ImageRegistry",
      "MachineAPI"
    ],
    "knownCapabilities": [
      "Build",
      "CSISnapshot",
      "Console",
      "DeploymentConfig",
      "ImageRegistry",
      "Insights",
      "MachineAPI",
      "NodeTuning",
      "Storage",
      "baremetal",
      "marketplace",
      "openshift-samples"
    ]
  },
  "conditions": [
    {
      "lastTransitionTime": "2023-10-11T21:38:36Z",
      "message": "The following capabilities could not be disabled: Console, DeploymentConfig, ImageRegistry, MachineAPI",
      "reason": "CapabilitiesImplicitlyEnabled",
      "status": "True",
      "type": "ImplicitlyEnabledCapabilities"
    }
  ]
}

so that looks like it's working. Without bothering to wait for the rc.4 update to complete, head to rc.5, forcing through an KubeletMinorVersion_KubeletMinorVersionUnsupportedNextUpgrade Upgradeable=False, as the CVO is confused about the target version that is complaining about:

$ oc adm upgrade --force --allow-upgrade-with-warnings --allow-explicit-upgrade --to-image quay.io/openshift-release-dev/ocp-release@sha256:042899f17f33259ed9f2cfc179930af283733455720f72ea3483fd1905f9b301

Status still looks good:

$ oc get -o json clusterversion version | jq '.status | {desired, capabilities, conditions: ([.conditions[] | select(.type == "ImplicitlyEnabledCapabilities")])}'
{
  "desired": {
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:042899f17f33259ed9f2cfc179930af283733455720f72ea3483fd1905f9b301",
    "url": "https://access.redhat.com/errata/RHSA-2023:5006",
    "version": "4.14.0-rc.5"
  },
  "capabilities": {
    "enabledCapabilities": [
      "Console",
      "DeploymentConfig",
      "ImageRegistry",
      "MachineAPI"
    ],
    "knownCapabilities": [
      "Build",
      "CSISnapshot",
      "Console",
      "DeploymentConfig",
      "ImageRegistry",
      "Insights",
      "MachineAPI",
      "NodeTuning",
      "Storage",
      "baremetal",
      "marketplace",
      "openshift-samples"
    ]
  },
  "conditions": [
    {
      "lastTransitionTime": "2023-10-11T21:38:36Z",
      "message": "The following capabilities could not be disabled: Console, DeploymentConfig, ImageRegistry, MachineAPI",
      "reason": "CapabilitiesImplicitlyEnabled",
      "status": "True",
      "type": "ImplicitlyEnabledCapabilities"
    }
  ]
}

And then push on to 4.15, again without waiting for a complete update:

$ oc adm upgrade --force --allow-upgrade-with-warnings --allow-explicit-upgrade --to-image quay.io/openshift-release-dev/ocp-release@sha256:217c3265267f7695bba068d328ab513c5f6c607d936abb884e84e9b615bb8841

Status still looks good:

$ oc get -o json clusterversion version | jq '.status | {desired, capabilities, conditions: ([.conditions[] | select(.type == "ImplicitlyEnabledCapabilities")])}'
{
  "desired": {
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:217c3265267f7695bba068d328ab513c5f6c607d936abb884e84e9b615bb8841",
    "version": "4.15.0-ec.0"
  },
  "capabilities": {
    "enabledCapabilities": [
      "Console",
      "DeploymentConfig",
      "ImageRegistry",
      "MachineAPI"
    ],
    "knownCapabilities": [
      "Build",
      "CSISnapshot",
      "Console",
      "DeploymentConfig",
      "ImageRegistry",
      "Insights",
      "MachineAPI",
      "NodeTuning",
      "Storage",
      "baremetal",
      "marketplace",
      "openshift-samples"
    ]
  },
  "conditions": [
    {
      "lastTransitionTime": "2023-10-11T21:38:36Z",
      "message": "The following capabilities could not be disabled: Console, DeploymentConfig, ImageRegistry, MachineAPI",
      "reason": "CapabilitiesImplicitlyEnabled",
      "status": "True",
      "type": "ImplicitlyEnabledCapabilities"
    }
  ]
}

@bparees
Copy link
Contributor

bparees commented Oct 11, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 11, 2023
@wking
Copy link
Member Author

wking commented Oct 11, 2023

CI failures are orthogonal.

/override ci/prow/e2e-agnostic-ovn
/test e2e-agnostic-operator

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 11, 2023

@wking: Overrode contexts on behalf of wking: ci/prow/e2e-agnostic-ovn

Details

In response to this:

CI failures are orthogonal.

/override ci/prow/e2e-agnostic-ovn
/test e2e-agnostic-operator

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 11, 2023

@wking: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-agnostic-ovn baf7ba7 link true /test e2e-agnostic-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@shellyyang1989
Copy link
Contributor

Update pre-merge testing result from QE side,

// Install a non-cap cluster with the image built from the PR

# oc adm upgrade 
warning: Cannot display available updates:
  Reason: VersionNotFound
  Message: Unable to retrieve available updates: currently reconciling cluster version 4.13.0-0.ci.test-2023-10-12-014708-ci-ln-ygnnb6b-latest not found in the "stable-4.13" channel

Cluster version is 4.13.0-0.ci.test-2023-10-12-014708-ci-ln-ygnnb6b-latest

Upgradeable=False

  Reason: AdminAckRequired
  Message: Kubernetes 1.27 and therefore OpenShift 4.14 remove several APIs which require admin consideration. Please see the knowledge article https://access.redhat.com/articles/6958395 for details and instructions.

Upstream is unset, so the cluster will use an appropriate default.
Channel: stable-4.13

// Get clusterversion

# oc get clusterversion -oyaml
...
spec:
    capabilities:
      baselineCapabilitySet: None
    channel: stable-4.13
    clusterID: a0693b9e-75c4-4644-aba9-7ae6c0624ece
  status:
    availableUpdates: null
    capabilities:
      knownCapabilities:
      - CSISnapshot
      - Console
      - Insights
      - NodeTuning
      - Storage
      - baremetal
      - marketplace
      - openshift-samples
...
conditions:
    - lastTransitionTime: "2023-10-12T02:13:44Z"
      message: Capabilities match configured spec
      reason: AsExpected
      status: "False"
      type: ImplicitlyEnabledCapabilities

// Get build and dc

# oc get -A build
No resources found
# oc get -A dc
No resources found

# oc api-resources | grep -E -- "build|deploymentconfig"
deploymentconfigs                     dc                                                                                     apps.openshift.io/v1                          true         DeploymentConfig
buildconfigs                          bc                                                                                     build.openshift.io/v1                         true         BuildConfig
builds                                                                                                                       build.openshift.io/v1                         true         Build
builds                                                                                                                       config.openshift.io/v1                        false        Build

// Upgrade to 4.14 nightly

# oc adm upgrade --to-image registry.ci.openshift.org/ocp/release@sha256:eadcddba44e95b1e5d77c57a8f5eda35f3d94e5e0679f900bd5d8dd2cae573f8 --allow-explicit-upgrade --force
warning: The requested upgrade image is not one of the available updates. You have used --allow-explicit-upgrade for the update to proceed anyway
warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures.
Requested update to release image registry.ci.openshift.org/ocp/release@sha256:eadcddba44e95b1e5d77c57a8f5eda35f3d94e5e0679f900bd5d8dd2cae573f8

// In the middle of the upgrade, check clusterversion. dc is implicitly enabled

# oc get clusterversion -oyaml
...
spec:
    capabilities:
      baselineCapabilitySet: None
    channel: stable-4.13
    clusterID: a0693b9e-75c4-4644-aba9-7ae6c0624ece
...
 status:
    availableUpdates: null
    capabilities:
      enabledCapabilities:
      - Console
      - DeploymentConfig
      - ImageRegistry
      - MachineAPI
      knownCapabilities:
      - Build
      - CSISnapshot
      - Console
      - DeploymentConfig
      - ImageRegistry
      - Insights
      - MachineAPI
      - NodeTuning
      - Storage
      - baremetal
      - marketplace
      - openshift-samples
...
conditions:
  - lastTransitionTime: "2023-10-12T03:19:38Z"
      message: 'The following capabilities could not be disabled: Console, DeploymentConfig,
        ImageRegistry, MachineAPI'
      reason: CapabilitiesImplicitlyEnabled
      status: "True"
      type: ImplicitlyEnabledCapabilities

// After upgrade is complete, check clusterversion

# oc adm upgrade 
warning: Cannot display available updates:
  Reason: VersionNotFound
  Message: Unable to retrieve available updates: currently reconciling cluster version 4.14.0-0.nightly-2023-10-10-084534 not found in the "stable-4.13" channel

Cluster version is 4.14.0-0.nightly-2023-10-10-084534

Upstream is unset, so the cluster will use an appropriate default.
Channel: stable-4.13
# oc get clusterversion -oyaml
...
status:
    availableUpdates: null
    capabilities:
      enabledCapabilities:
      - Console
      - DeploymentConfig
      - ImageRegistry
      - MachineAPI
      knownCapabilities:
      - Build
      - CSISnapshot
      - Console
      - DeploymentConfig
      - ImageRegistry
      - Insights
      - MachineAPI
      - NodeTuning
      - Storage
      - baremetal
      - marketplace
      - openshift-samples
    conditions:
    - lastTransitionTime: "2023-10-12T03:19:38Z"
      message: 'The following capabilities could not be disabled: Console, DeploymentConfig,
        ImageRegistry, MachineAPI'
      reason: CapabilitiesImplicitlyEnabled
      status: "True"
      type: ImplicitlyEnabledCapabilities

// Get build and dc resource

# oc get -A build
NAME      AGE
cluster   159m
# oc get -A dc
Warning: apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
No resources found

# oc api-resources | grep -E -- "build|deploymentconfig"
deploymentconfigs                     dc                                                                                     apps.openshift.io/v1                          true         DeploymentConfig
builds                                                                                                                       config.openshift.io/v1                        false        Build

@jianlinliu
Copy link

/label qe-approved

@evakhoni
Copy link
Contributor

verification by @wking as well as @shellyyang1989 looks good to me.
/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Oct 12, 2023
@openshift-ci-robot
Copy link
Contributor

@wking: This pull request references Jira Issue OCPBUGS-20321, which is valid.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.13.z) matches configured target version for branch (4.13.z)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
  • dependent bug Jira Issue OCPBUGS-20431 is in the state Closed (Done), which is one of the valid states (VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA))
  • dependent Jira Issue OCPBUGS-20431 targets the "4.14.0" version, which is one of the valid target versions: 4.14.0
  • bug has dependents

Requesting review from QA contact:
/cc @evakhoni

Details

In response to this:

The capability is new in 4.14, via openshift/api@d557f9784b (openshift/api#1462) and ba3aeb9 (#950). But as pointed out in OCPBUGS-20321, 4.14 releases do not declare any manifests as linked to the new capability:

$ oc adm release extract --to manifests quay.io/openshift-release-dev/ocp-release:4.14.0-rc.5-x86_64
Extracted release payload from digest sha256:042899f17f33259ed9f2cfc179930af283733455720f72ea3483fd1905f9b301 created at 2023-10-10T18:00:08Z
$ grep -ohr 'capability.openshift.io/name:.*' manifests | sort | uniq
capability.openshift.io/name: baremetal
capability.openshift.io/name: Console
capability.openshift.io/name: CSISnapshot
capability.openshift.io/name: ImageRegistry
capability.openshift.io/name: Insights
capability.openshift.io/name: MachineAPI
capability.openshift.io/name: marketplace
capability.openshift.io/name: NodeTuning
capability.openshift.io/name: openshift-samples
capability.openshift.io/name: Storage

That means our existing logic to compare reconciled-manifest requirements for detecting the need to implicitly enable capabilities breaks down. In this commit, I'm teaching the outgoing 4.13 CVO that all 4.13 clusters have the DeploymentConfig capability enabled (even if it is not declared by a ClusterVersion capability in 4.13), so that capability needs to persist into 4.14 releases, to avoid surprising admins by dropping functionality.

Folks who do want to drop DeploymentConfig functionality will need to perform fresh installs with 4.14 or later, because capabilities cannot be uninstalled.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jianlinliu
Copy link

label /cherry-pick-approved

@jianlinliu
Copy link

/label cherry-pick-approved

@openshift-ci openshift-ci bot added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Oct 12, 2023
if strings.HasPrefix(payloadUpdate.Release.Version, "4.14.") {
deploymentConfig := configv1.ClusterVersionCapability("DeploymentConfig")
if _, ok := work.Capabilities.EnabledCapabilities[deploymentConfig]; !ok && !capability.Contains(implicitlyEnabledCaps, deploymentConfig) {
implicitlyEnabledCaps = append(implicitlyEnabledCaps, deploymentConfig)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wking The issue is we have declared a capability but there are no associated manifests which has this capability. We can have this issue in future as well. Can we declare a generic code structure for this? A method which can return capability names which should be enabled when no manifests has the capability. I would expect the code structure to be present in master and then we should back-port this appropriately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can assess a refactor in the future but right now we need this in a 4.13.z asap because users will need to upgrade to a 4.13.z containing this logic before they can upgrade to 4.14.0 (otherwise 4.13 clusters that have a static set of caps enabled in 4.13 will have the deploymentconfig cap disabled when they upgrade to 4.14.0)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bparees ack. Will add the labels

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created OTA-1040 trying to feel out a long-term plan for this kind of thing.

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 12, 2023
Copy link
Member

@LalatenduMohanty LalatenduMohanty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/label backport-risk-assessed

@LalatenduMohanty
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. lgtm Indicates that a PR is ready to be merged. labels Oct 12, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 12, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bparees, LalatenduMohanty, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [LalatenduMohanty,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot merged commit c622124 into openshift:release-4.13 Oct 12, 2023
@openshift-ci-robot
Copy link
Contributor

@wking: Jira Issue OCPBUGS-20321: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-20321 has been moved to the MODIFIED state.

Details

In response to this:

The capability is new in 4.14, via openshift/api@d557f9784b (openshift/api#1462) and ba3aeb9 (#950). But as pointed out in OCPBUGS-20321, 4.14 releases do not declare any manifests as linked to the new capability:

$ oc adm release extract --to manifests quay.io/openshift-release-dev/ocp-release:4.14.0-rc.5-x86_64
Extracted release payload from digest sha256:042899f17f33259ed9f2cfc179930af283733455720f72ea3483fd1905f9b301 created at 2023-10-10T18:00:08Z
$ grep -ohr 'capability.openshift.io/name:.*' manifests | sort | uniq
capability.openshift.io/name: baremetal
capability.openshift.io/name: Console
capability.openshift.io/name: CSISnapshot
capability.openshift.io/name: ImageRegistry
capability.openshift.io/name: Insights
capability.openshift.io/name: MachineAPI
capability.openshift.io/name: marketplace
capability.openshift.io/name: NodeTuning
capability.openshift.io/name: openshift-samples
capability.openshift.io/name: Storage

That means our existing logic to compare reconciled-manifest requirements for detecting the need to implicitly enable capabilities breaks down. In this commit, I'm teaching the outgoing 4.13 CVO that all 4.13 clusters have the DeploymentConfig capability enabled (even if it is not declared by a ClusterVersion capability in 4.13), so that capability needs to persist into 4.14 releases, to avoid surprising admins by dropping functionality.

Folks who do want to drop DeploymentConfig functionality will need to perform fresh installs with 4.14 or later, because capabilities cannot be uninstalled.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot
Copy link
Contributor

Fix included in accepted release 4.13.0-0.nightly-2023-10-12-193258

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants