Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented Sep 17, 2024

What this PR does / why we need it:

The pull propagates RetrievedUpdates from ClusterVersion up to HostedCluster

To make it easier for folks working at the management-cluster level to notice and address issues with update-recommendation retrieval.

Especially useful since de4bcbe (#3576) made updateService a configurable knob, because:

  • Users might configure a broken updateService, and the new condition would tell them that it wasn't working.
  • Users might install a 4.15 or earlier HostedCluster, where the HostedControlPlane controller (which is extracted from the HostedCluster's release image) was too old to have de4bcbe, so it doesn't propagate the configured updateService, and the cluster-version operator ends up using the default upstream update service. That doesn't work in disconnected/restricted-network environments where api.openshift.com is unreachable. The new condition would tell them that it wasn't working, and that the CVO was using the api.openshift.com service. They'd be left on their own to realize that the updateService feature required a 4.16 or later HostedCluster release image.

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

To make it easier for folks working at the management-cluster level to
notice and address issues with update-recommendation retrieval.

Especially useful since de4bcbe (api/v1beta1/hostedcluster_types:
Add spec.updateService, 2024-03-29, openshift#3576) made updateService a
configurable knob, because:

* Users might configure a broken updateService, and the new condition
  would tell them that it wasn't working.
* Users might install a 4.15 or earlier HostedCluster, where the
  HostedControlPlane controller (which is extracted from the
  HostedCluster's release image) was too old to have de4bcbe, so it
  doesn't propagate the configured updateService, and the
  cluster-version operator ends up using the default upstream update
  service.  That doesn't work in disconnected/restricted-network
  environments where api.openshift.com is unreachable.  The new
  condition would tell them that it wasn't working, and that the CVO
  was using the api.openshift.com service.  They'd be left on their
  own to realize that the updateService feature required a 4.16 or
  later HostedCluster release image.
@openshift-ci openshift-ci bot requested review from enxebre and hasueki September 17, 2024 23:42
@openshift-ci openshift-ci bot added area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release and removed do-not-merge/needs-area labels Sep 17, 2024
@openshift-ci openshift-ci bot added the area/documentation Indicates the PR includes changes for documentation label Sep 17, 2024
@netlify
Copy link

netlify bot commented Sep 17, 2024

Deploy Preview for hypershift-docs ready!

Name Link
🔨 Latest commit beff04d
🔍 Latest deploy log https://app.netlify.com/sites/hypershift-docs/deploys/66ea16e86079250008f7ba1d
😎 Deploy Preview https://deploy-preview-4744--hypershift-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

… to pick up ClusterVersionRetrievedUpdates

Generated with:

  $ go mod tidy
  $ go mod vendor

using:

  $ go version
  go version go1.23.1 linux/amd64
@wking wking force-pushed the propagate-ClusterVersion-RetrievedUpdates branch from 8ff1494 to 5e8cf46 Compare September 18, 2024 01:08
@wking
Copy link
Member Author

wking commented Sep 18, 2024

@wking
Copy link
Member Author

wking commented Sep 18, 2024

e2e-aws-4.17 TestCreateCluster artifacts:

  - lastTransitionTime: "2024-09-18T01:45:13Z"
    message: Condition not found in the CVO.
    observedGeneration: 3
    reason: StatusUnknown
    status: Unknown
    type: ClusterVersionRetrievedUpdates

The HostedCluster doesn't set spec.channel, which means the cluster should not be attempting to retrieve update recommendations anyway. I dunno if the test collects ClusterVersion YAML from the hosted Kube API. So not all that much of a test, but 🤷

@wking
Copy link
Member Author

wking commented Sep 18, 2024

Ok, and in e2e-aws, where the HostedControlPlane controller understands the piping, TestCreateCluster artifacts:

  - lastTransitionTime: "2024-09-18T02:55:19Z"
    message: The update channel has not been configured.
    observedGeneration: 3
    reason: NoChannel
    status: "False"
    type: ClusterVersionRetrievedUpdates

So now we have examples of the new HyperShift operator code vs. both new (works) and old (Condition not found in the CVO) hosted cluster versions, and that all looks good to me. It's not found in HostedControlPlane despite being in ClusterVersion, but probably not worth nit-picking the existing not-found string over.

wking added a commit to wking/hypershift that referenced this pull request Sep 18, 2024
…wn ClusterVersionRetrievedUpdates

ClusterVersion has had the RetrievedUpdates condition since 2018 [1],
before OpenShift v4 went GA.  But until e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2023-01-21, openshift#1954), HostedCluster wasn't
interested in update-service recommendations.  And the
HostedControlPlane only learned how to propagate the condition in
5216915 (*: Propagate RetrievedUpdates from ClusterVersion up to
HostedCluster, 2024-09-17, openshift#4744).  To avoid long-running status like
[2]:

  - lastTransitionTime: "2024-09-18T01:45:13Z"
    message: Condition not found in the CVO.
    observedGeneration: 3
    reason: StatusUnknown
    status: Unknown
    type: ClusterVersionRetrievedUpdates

when a new HyperShift operator (with e438076) is running a 4.17 or
earlier HostedCluster (and its HostedControlPlane operator, without
e438076), this commit silently drops the StatusUnknown condition
from HostedCluster.  We can revert this commit once all
still-supported HostedControlPlane operator versions understand how to
propagate the condition.

This commit's "if we could find it, pass it along" approach is also
compatible with 4.17.z HostedControlPlane controllers picking up the
ability to propagate the condition in a backport, if for some reason
we decide that behavior is worth backporting.

[1]: openshift/cluster-version-operator@286641d#diff-4229ccef40cdb3dd7a8e5ca230d85fa0e74bbc265511ddd94f53acffbcd19b79R100
[2]: openshift#4744 (comment)
@wking wking force-pushed the propagate-ClusterVersion-RetrievedUpdates branch from f16d279 to f8f5ceb Compare September 18, 2024 22:32
wking added a commit to wking/hypershift that referenced this pull request Sep 18, 2024
…wn ClusterVersionRetrievedUpdates

ClusterVersion has had the RetrievedUpdates condition since 2018 [1],
before OpenShift v4 went GA.  But until e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2023-01-21, openshift#1954), HostedCluster wasn't
interested in update-service recommendations.  And the
HostedControlPlane only learned how to propagate the condition in
5216915 (*: Propagate RetrievedUpdates from ClusterVersion up to
HostedCluster, 2024-09-17, openshift#4744).  To avoid long-running status like
[2]:

  - lastTransitionTime: "2024-09-18T01:45:13Z"
    message: Condition not found in the CVO.
    observedGeneration: 3
    reason: StatusUnknown
    status: Unknown
    type: ClusterVersionRetrievedUpdates

when a new HyperShift operator (with e438076) is running a 4.17 or
earlier HostedCluster (and its HostedControlPlane operator, without
e438076), this commit silently drops the StatusUnknown condition
from HostedCluster.  We can revert this commit once all
still-supported HostedControlPlane operator versions understand how to
propagate the condition.

This commit's "if we could find it, pass it along" approach is also
compatible with 4.17.z HostedControlPlane controllers picking up the
ability to propagate the condition in a backport, if for some reason
we decide that behavior is worth backporting.

[1]: openshift/cluster-version-operator@286641d#diff-4229ccef40cdb3dd7a8e5ca230d85fa0e74bbc265511ddd94f53acffbcd19b79R100
[2]: openshift#4744 (comment)
…wn ClusterVersionRetrievedUpdates

ClusterVersion has had the RetrievedUpdates condition since 2018 [1],
before OpenShift v4 went GA.  But until e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2023-01-21, openshift#1954), HostedCluster wasn't
interested in update-service recommendations.  And the
HostedControlPlane only learned how to propagate the condition in
5216915 (*: Propagate RetrievedUpdates from ClusterVersion up to
HostedCluster, 2024-09-17, openshift#4744).  To avoid long-running status like
[2]:

  - lastTransitionTime: "2024-09-18T01:45:13Z"
    message: Condition not found in the CVO.
    observedGeneration: 3
    reason: StatusUnknown
    status: Unknown
    type: ClusterVersionRetrievedUpdates

when a new HyperShift operator (with e438076) is running a 4.17 or
earlier HostedCluster (and its HostedControlPlane operator, without
e438076), this commit silently drops the StatusUnknown condition
from HostedCluster.  We can revert this commit once all
still-supported HostedControlPlane operator versions understand how to
propagate the condition.

This commit's "if we could find it, pass it along" approach is also
compatible with 4.17.z HostedControlPlane controllers picking up the
ability to propagate the condition in a backport, if for some reason
we decide that behavior is worth backporting.

[1]: openshift/cluster-version-operator@286641d#diff-4229ccef40cdb3dd7a8e5ca230d85fa0e74bbc265511ddd94f53acffbcd19b79R100
[2]: openshift#4744 (comment)
@wking wking force-pushed the propagate-ClusterVersion-RetrievedUpdates branch from f8f5ceb to fa5743e Compare September 18, 2024 22:56
@wking
Copy link
Member Author

wking commented Sep 20, 2024

e2e-aws has:

  - lastTransitionTime: "2024-09-18T23:37:31Z"
    message: The update channel has not been configured.
    observedGeneration: 3
    reason: NoChannel
    status: "False"
    type: ClusterVersionRetrievedUpdates

and e2e-aws-4.17 has:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_hypershift/4744/pull-ci-openshift-hypershift-main-e2e-aws-4-17/1836539839235756032/artifacts/e2e-aws-4-17/hypershift-aws-run-e2e/artifacts/TestCreateCluster/namespaces/e2e-clusters-6lmhs/hypershift.openshift.io/hostedclusters/example-tjwjg.yaml | yaml2json | jq -r '.status.conditions[].type' | grep ClusterVersion | sort
ClusterVersionAvailable
ClusterVersionProgressing
ClusterVersionReleaseAccepted
ClusterVersionSucceeding
ClusterVersionUpgradeable

so fa5743e is working as intended.

@sjenning
Copy link
Contributor

/test e2e-aks

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 24, 2024
@sjenning
Copy link
Contributor

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 24, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sjenning, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 24, 2024
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 24, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sjenning, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wking wking changed the title *: Propagate RetrievedUpdates from ClusterVersion up to HostedCluster OTA-1349: *: Propagate RetrievedUpdates from ClusterVersion up to HostedCluster Sep 24, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 24, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 24, 2024

@wking: This pull request references OTA-1349 which is a valid jira issue.

Details

In response to this:

What this PR does / why we need it:

The pull propagates RetrievedUpdates from ClusterVersion up to HostedCluster

To make it easier for folks working at the management-cluster level to notice and address issues with update-recommendation retrieval.

Especially useful since de4bcbe (#3576) made updateService a configurable knob, because:

  • Users might configure a broken updateService, and the new condition would tell them that it wasn't working.
  • Users might install a 4.15 or earlier HostedCluster, where the HostedControlPlane controller (which is extracted from the HostedCluster's release image) was too old to have de4bcbe, so it doesn't propagate the configured updateService, and the cluster-version operator ends up using the default upstream update service. That doesn't work in disconnected/restricted-network environments where api.openshift.com is unreachable. The new condition would tell them that it wasn't working, and that the CVO was using the api.openshift.com service. They'd be left on their own to realize that the updateService feature required a 4.16 or later HostedCluster release image.

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 9f15b4c and 2 for PR HEAD fa5743e in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 45aefe9 and 2 for PR HEAD fa5743e in total

@wking
Copy link
Member Author

wking commented Sep 24, 2024

/retest-required

@wking
Copy link
Member Author

wking commented Sep 24, 2024

ah, possibly we were failing until openshift/cloud-provider-aws#95 merged. Trying again now:

/retest-required

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD acc4f25 and 2 for PR HEAD fa5743e in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD bfb0477 and 1 for PR HEAD fa5743e in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD c34a1f6 and 0 for PR HEAD fa5743e in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 25, 2024

@wking: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 574c274 into openshift:main Sep 25, 2024
@openshift-bot
Copy link

[ART PR BUILD NOTIFIER]

Distgit: hypershift
This PR has been included in build ose-hypershift-container-v4.18.0-202409251142.p0.g574c274.assembly.stream.el9.
All builds following this will include this PR.

bryan-cox pushed a commit to bryan-cox/hypershift that referenced this pull request Sep 25, 2024
…wn ClusterVersionRetrievedUpdates

ClusterVersion has had the RetrievedUpdates condition since 2018 [1],
before OpenShift v4 went GA.  But until e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2023-01-21, openshift#1954), HostedCluster wasn't
interested in update-service recommendations.  And the
HostedControlPlane only learned how to propagate the condition in
5216915 (*: Propagate RetrievedUpdates from ClusterVersion up to
HostedCluster, 2024-09-17, openshift#4744).  To avoid long-running status like
[2]:

  - lastTransitionTime: "2024-09-18T01:45:13Z"
    message: Condition not found in the CVO.
    observedGeneration: 3
    reason: StatusUnknown
    status: Unknown
    type: ClusterVersionRetrievedUpdates

when a new HyperShift operator (with e438076) is running a 4.17 or
earlier HostedCluster (and its HostedControlPlane operator, without
e438076), this commit silently drops the StatusUnknown condition
from HostedCluster.  We can revert this commit once all
still-supported HostedControlPlane operator versions understand how to
propagate the condition.

This commit's "if we could find it, pass it along" approach is also
compatible with 4.17.z HostedControlPlane controllers picking up the
ability to propagate the condition in a backport, if for some reason
we decide that behavior is worth backporting.

[1]: openshift/cluster-version-operator@286641d#diff-4229ccef40cdb3dd7a8e5ca230d85fa0e74bbc265511ddd94f53acffbcd19b79R100
[2]: openshift#4744 (comment)
@wking wking deleted the propagate-ClusterVersion-RetrievedUpdates branch September 25, 2024 15:53
pamelachristie pushed a commit to pamelachristie/hypershift that referenced this pull request Mar 31, 2025
…wn ClusterVersionRetrievedUpdates

ClusterVersion has had the RetrievedUpdates condition since 2018 [1],
before OpenShift v4 went GA.  But until e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2023-01-21, openshift#1954), HostedCluster wasn't
interested in update-service recommendations.  And the
HostedControlPlane only learned how to propagate the condition in
5216915 (*: Propagate RetrievedUpdates from ClusterVersion up to
HostedCluster, 2024-09-17, openshift#4744).  To avoid long-running status like
[2]:

  - lastTransitionTime: "2024-09-18T01:45:13Z"
    message: Condition not found in the CVO.
    observedGeneration: 3
    reason: StatusUnknown
    status: Unknown
    type: ClusterVersionRetrievedUpdates

when a new HyperShift operator (with e438076) is running a 4.17 or
earlier HostedCluster (and its HostedControlPlane operator, without
e438076), this commit silently drops the StatusUnknown condition
from HostedCluster.  We can revert this commit once all
still-supported HostedControlPlane operator versions understand how to
propagate the condition.

This commit's "if we could find it, pass it along" approach is also
compatible with 4.17.z HostedControlPlane controllers picking up the
ability to propagate the condition in a backport, if for some reason
we decide that behavior is worth backporting.

[1]: openshift/cluster-version-operator@286641d#diff-4229ccef40cdb3dd7a8e5ca230d85fa0e74bbc265511ddd94f53acffbcd19b79R100
[2]: openshift#4744 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/documentation Indicates the PR includes changes for documentation area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants