Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented Dec 14, 2022

So that folks driving a HostedCluster have a way to steer their update-recommendation engine.

@openshift-ci openshift-ci bot requested review from enxebre and sjenning December 14, 2022 18:53
@wking wking force-pushed the expand-cluster-version-api-compat branch 11 times, most recently from 075ddf7 to 5312e26 Compare December 14, 2022 22:10
git diff-index --cached --quiet --ignore-submodules HEAD --
git diff-files --quiet --ignore-submodules
echo "WTK"
git diff --exit-code HEAD --
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to figure out what auto-generated changes the verify presubmit is asking for, but this code doesn't seem to be running. At least, I don't see any WTK output before the command exits a few lines down...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I was finally able to drag out a patch by base64-encoing it so the wrapping Make/ci-operator business wouldn't garble the output: 03dfcc2 . But obviously it would be nice to have a more accessible approach...

@wking wking force-pushed the expand-cluster-version-api-compat branch from 6f4f07e to a14f023 Compare December 14, 2022 22:51
@wking wking changed the title OTA-852: WIP: api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and conditionalUpdates OTA-852: api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and conditionalUpdates Dec 14, 2022
@wking wking force-pushed the expand-cluster-version-api-compat branch 3 times, most recently from 07ec5df to 03dfcc2 Compare December 14, 2022 23:51
Makefile Outdated
git diff-files --quiet --ignore-submodules
git diff --exit-code HEAD -- > "$(ARTIFACT_DIR)/x.patch"
$(eval STATUS = $(shell sh -c 'git diff --exit-code HEAD -- | base64 -w0'))
$(if $(strip $(STATUS)),$(error Git diff detected changes: ${STATUS}))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll want to drop this cruft that helps me update autogenerated content once folks feel like the rest of the changes look good. It would be awesome if there was a change we could actually land to expose this diff in CI artifacts, but even if someone figures that out, it would be orthogonal work for a separate pull request.

/hold

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've dropped the git-diff WIP commit with 27b2a8a -> 97445d7, now that I seem to have everything we need.

/hold cancel

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 15, 2022
@wking
Copy link
Member Author

wking commented Dec 15, 2022

I don't understand the TestFuzzyConversion failures. Seems like those are defined for v1alpha1:

$ git grep 'for HostedCluster' | grep _test.go
api/v1alpha1/zz_conversion_test.go:     t.Run("for HostedCluster", conversiontest.FuzzTestFunc(conversiontest.FuzzTestFuncInput{

Maybe those need to be dropped? Told to test v1alpha1? Or ported to v1beta1 and modified? Or...?

// If the cluster is not yet fully initialized desired will be set
// with the information available, which may be an image or a tag.
Desired Release `json:"desired"`
Desired configv1.Release `json:"desired"`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strict superset, property wise, so this should be a non-breaking change for clients.

@petr-muller
Copy link
Member

/cc

@openshift-ci openshift-ci bot requested a review from petr-muller December 15, 2022 12:47
// lastReleaseImageTransitionTime is the time of the last update to the current
// releaseImage property.
//
// Deprecated: Use version.history[0].startedTime instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the above comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine as long as we don't remove it

@wking wking force-pushed the expand-cluster-version-api-compat branch 2 times, most recently from b462287 to d290932 Compare January 10, 2023 01:34
@wking wking force-pushed the expand-cluster-version-api-compat branch from 811cc13 to 4af5ffa Compare January 22, 2023 00:08
@LalatenduMohanty
Copy link
Member

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 26, 2023

@wking: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-ibmcloud-iks 4af5ffa link false /test e2e-ibmcloud-iks

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Member

@LalatenduMohanty LalatenduMohanty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 26, 2023
@csrwng
Copy link
Contributor

csrwng commented Jan 26, 2023

/hold cancel
/approve

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 26, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 26, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: csrwng, LalatenduMohanty, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 26, 2023
@openshift-merge-robot openshift-merge-robot merged commit 928b93b into openshift:main Jan 26, 2023
@wking wking deleted the expand-cluster-version-api-compat branch January 27, 2023 00:10
wking added a commit to wking/hypershift that referenced this pull request Feb 14, 2024
[1] includes [2] pitching local update services in managed regions.
However, HyperShift currently forces ClusterVersion spec.upstream
empty [3].  This commit is part of making the upstream update service
configurable on HyperShift, so those clusters can hear about and
assess known issues with updates in environments where the default
https://api.openshift.com/api/upgrades_info update service is not
accessible, or in which an alternative update service was desired for
testing or policy reasons. This includes [2], mentioned above, but
would also include any other instance of
disconnected/restricted-network use. The alternative for folks who
want HyperShift updates in places where api.openshift.com is
inaccessible are:

* Don't use an update service and manage that aspect manually.  But the
  update service declares multiple new releases each week, as well as
  delivering information about known risks/issues for local clusters
  to evalute their exposure.  That's a lot of information to manage
  manually, if folks decide not to plug into the existing update
  service tooling.
* Run a local update service, and fiddle with DNS and X.509 certs so
  that packets aimed at api.openshift.com get routed to your local
  service . This one requires less long-term effort than manually
  replacing the entire update service system, but it requires your
  clusters to trust an X.509 Certificate Authority that is willing to
  sign certificates for your local service saying "yup, that one is
  definitely api.openshift.com".

The implementation is similar to e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related
status properties piped up via CVO -> ClusterVersion.status ->
HostedControlPlane.status -> HostedCluster.status.  This commit adds a
new spec property that is piped down via:

1. HostedCluster (management cluster) spec.upstream.
2. HostedControlPlane (management cluster) spec.upstream.
3. Cluster-version operator Deployment --upstream command line option
   (management cluster), and also ClusterVersion spec.upstream.
4. Cluster-version operator container (managment cluster).

The CVO's new --upstream option is from [3], and we're using that
pathway instead of ClusterVersion spec, because we don't want the
update service URI to come via the the customer-accessible hosted API.
It's ok for the channel (which is used as a query parameter in the
update-service GET requests) to continue to flow in via ClusterVersion
spec.  I'm still populating ClusterVersion's spec.upstream in this
commit for discoverability, although because --upstream is being used,
the CVO will ignore the ClusterVersion spec.upstream value.

When the HyperShift operator sets this in HostedControlPlane for an
older cluster version, the old HostedControlPlane controller (launched
from the hosted release payload) will not recognize or propagate the
new property.  But that's not a terrible thing, and issues with
fetching update recommendations from the default update service will
still be reported in the RetrievedUpdates condition [4] with a message
mentioning the update service URI if there are problems [5].  Although
it doesn't look like we capture that condition currently:

  $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionFailing ConditionType = "ClusterVersionFailing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionProgressing ConditionType = "ClusterVersionProgressing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionAvailable ConditionType = "ClusterVersionAvailable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted"

We could opt to collect it in follow-up work if issues occur
frequently enough that dropping down to the ClusterVersion level to
debug becomes tedious.

If, in the future, we grow an option to give the hosted CVO kubeconfig
access to both the management and hosted Kubernetes APIs, we could
drop --upstream and have the hosted CVO reach out and read this
configuration off HostedControlPlane or HostedCluster directly.

I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch
up with channel, availableUpdates, and conditionalUpdates, 2023-01-17,
default security group in worker security groups, 2024-02-05, openshift#3527)
suggests that the "v1alpha1 is a superset of v1beta1" policy remains
current.

[1]: https://issues.redhat.com/browse/XCMSTRAT-513
[2]: https://issues.redhat.com/browse/OTA-1185
[3]: openshift/cluster-version-operator#1035
[4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706
[5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365
wking added a commit to wking/hypershift that referenced this pull request Feb 14, 2024
[1] includes [2] pitching local update services in managed regions.
However, HyperShift had (until this commit) been forcing
ClusterVersion spec.upstream empty.  This commit is part of making the
upstream update service configurable on HyperShift, so those clusters
can hear about and assess known issues with updates in environments
where the default https://api.openshift.com/api/upgrades_info update
service is not accessible, or in which an alternative update service
was desired for testing or policy reasons. This includes [2],
mentioned above, but would also include any other instance of
disconnected/restricted-network use. The alternative for folks who
want HyperShift updates in places where api.openshift.com is
inaccessible are:

* Don't use an update service and manage that aspect manually.  But the
  update service declares multiple new releases each week, as well as
  delivering information about known risks/issues for local clusters
  to evalute their exposure.  That's a lot of information to manage
  manually, if folks decide not to plug into the existing update
  service tooling.
* Run a local update service, and fiddle with DNS and X.509 certs so
  that packets aimed at api.openshift.com get routed to your local
  service . This one requires less long-term effort than manually
  replacing the entire update service system, but it requires your
  clusters to trust an X.509 Certificate Authority that is willing to
  sign certificates for your local service saying "yup, that one is
  definitely api.openshift.com".

The implementation is similar to e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related
status properties piped up via CVO -> ClusterVersion.status ->
HostedControlPlane.status -> HostedCluster.status.  This commit adds a
new spec property that is piped down via:

1. HostedCluster (management cluster) spec.upstream.
2. HostedControlPlane (management cluster) spec.upstream.
3. Cluster-version operator Deployment --upstream command line option
   (management cluster), and also ClusterVersion spec.upstream.
4. Cluster-version operator container (managment cluster).

The CVO's new --upstream option is from [3], and we're using that
pathway instead of ClusterVersion spec, because we don't want the
update service URI to come via the the customer-accessible hosted API.
It's ok for the channel (which is used as a query parameter in the
update-service GET requests) to continue to flow in via ClusterVersion
spec.  I'm still populating ClusterVersion's spec.upstream in this
commit for discoverability, although because --upstream is being used,
the CVO will ignore the ClusterVersion spec.upstream value.

When the HyperShift operator sets this in HostedControlPlane for an
older cluster version, the old HostedControlPlane controller (launched
from the hosted release payload) will not recognize or propagate the
new property.  But that's not a terrible thing, and issues with
fetching update recommendations from the default update service will
still be reported in the RetrievedUpdates condition [4] with a message
mentioning the update service URI if there are problems [5].  Although
it doesn't look like we capture that condition currently:

  $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionFailing ConditionType = "ClusterVersionFailing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionProgressing ConditionType = "ClusterVersionProgressing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionAvailable ConditionType = "ClusterVersionAvailable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted"

We could opt to collect it in follow-up work if issues occur
frequently enough that dropping down to the ClusterVersion level to
debug becomes tedious.

If, in the future, we grow an option to give the hosted CVO kubeconfig
access to both the management and hosted Kubernetes APIs, we could
drop --upstream and have the hosted CVO reach out and read this
configuration off HostedControlPlane or HostedCluster directly.

I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch
up with channel, availableUpdates, and conditionalUpdates, 2023-01-17,
default security group in worker security groups, 2024-02-05, openshift#3527)
suggests that the "v1alpha1 is a superset of v1beta1" policy remains
current.

[1]: https://issues.redhat.com/browse/XCMSTRAT-513
[2]: https://issues.redhat.com/browse/OTA-1185
[3]: openshift/cluster-version-operator#1035
[4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706
[5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365
wking added a commit to wking/hypershift that referenced this pull request Feb 14, 2024
[1] includes [2] pitching local update services in managed regions.
However, HyperShift had (until this commit) been forcing
ClusterVersion spec.upstream empty.  This commit is part of making the
upstream update service configurable on HyperShift, so those clusters
can hear about and assess known issues with updates in environments
where the default https://api.openshift.com/api/upgrades_info update
service is not accessible, or in which an alternative update service
was desired for testing or policy reasons. This includes [2],
mentioned above, but would also include any other instance of
disconnected/restricted-network use. The alternative for folks who
want HyperShift updates in places where api.openshift.com is
inaccessible are:

* Don't use an update service and manage that aspect manually.  But the
  update service declares multiple new releases each week, as well as
  delivering information about known risks/issues for local clusters
  to evalute their exposure.  That's a lot of information to manage
  manually, if folks decide not to plug into the existing update
  service tooling.
* Run a local update service, and fiddle with DNS and X.509 certs so
  that packets aimed at api.openshift.com get routed to your local
  service . This one requires less long-term effort than manually
  replacing the entire update service system, but it requires your
  clusters to trust an X.509 Certificate Authority that is willing to
  sign certificates for your local service saying "yup, that one is
  definitely api.openshift.com".

The implementation is similar to e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related
status properties piped up via CVO -> ClusterVersion.status ->
HostedControlPlane.status -> HostedCluster.status.  This commit adds a
new spec property that is piped down via:

1. HostedCluster (management cluster) spec.upstream.
2. HostedControlPlane (management cluster) spec.upstream.
3. Cluster-version operator Deployment --upstream command line option
   (management cluster), and also ClusterVersion spec.upstream.
4. Cluster-version operator container (managment cluster).

The CVO's new --upstream option is from [3], and we're using that
pathway instead of ClusterVersion spec, because we don't want the
update service URI to come via the the customer-accessible hosted API.
It's ok for the channel (which is used as a query parameter in the
update-service GET requests) to continue to flow in via ClusterVersion
spec.  I'm still populating ClusterVersion's spec.upstream in this
commit for discoverability, although because --upstream is being used,
the CVO will ignore the ClusterVersion spec.upstream value.

When the HyperShift operator sets this in HostedControlPlane for an
older cluster version, the old HostedControlPlane controller (launched
from the hosted release payload) will not recognize or propagate the
new property.  But that's not a terrible thing, and issues with
fetching update recommendations from the default update service will
still be reported in the RetrievedUpdates condition [4] with a message
mentioning the update service URI if there are problems [5].  Although
it doesn't look like we capture that condition currently:

  $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionFailing ConditionType = "ClusterVersionFailing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionProgressing ConditionType = "ClusterVersionProgressing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionAvailable ConditionType = "ClusterVersionAvailable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted"

We could opt to collect it in follow-up work if issues occur
frequently enough that dropping down to the ClusterVersion level to
debug becomes tedious.

If, in the future, we grow an option to give the hosted CVO kubeconfig
access to both the management and hosted Kubernetes APIs, we could
drop --upstream and have the hosted CVO reach out and read this
configuration off HostedControlPlane or HostedCluster directly.

I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch
up with channel, availableUpdates, and conditionalUpdates, 2023-01-17, openshift#1954).
The recent 1ae3a36 (Always include AWS default security group in
worker security groups, 2024-02-05, openshift#3527) suggests that the "v1alpha1
is a superset of v1beta1" policy remains current.

[1]: https://issues.redhat.com/browse/XCMSTRAT-513
[2]: https://issues.redhat.com/browse/OTA-1185
[3]: openshift/cluster-version-operator#1035
[4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706
[5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365
wking added a commit to wking/hypershift that referenced this pull request Feb 14, 2024
[1] includes [2] pitching local update services in managed regions.
However, HyperShift had (until this commit) been forcing
ClusterVersion spec.upstream empty.  This commit is part of making the
upstream update service configurable on HyperShift, so those clusters
can hear about and assess known issues with updates in environments
where the default https://api.openshift.com/api/upgrades_info update
service is not accessible, or in which an alternative update service
was desired for testing or policy reasons. This includes [2],
mentioned above, but would also include any other instance of
disconnected/restricted-network use. The alternative for folks who
want HyperShift updates in places where api.openshift.com is
inaccessible are:

* Don't use an update service and manage that aspect manually.  But the
  update service declares multiple new releases each week, as well as
  delivering information about known risks/issues for local clusters
  to evalute their exposure.  That's a lot of information to manage
  manually, if folks decide not to plug into the existing update
  service tooling.
* Run a local update service, and fiddle with DNS and X.509 certs so
  that packets aimed at api.openshift.com get routed to your local
  service . This one requires less long-term effort than manually
  replacing the entire update service system, but it requires your
  clusters to trust an X.509 Certificate Authority that is willing to
  sign certificates for your local service saying "yup, that one is
  definitely api.openshift.com".

The implementation is similar to e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related
status properties piped up via CVO -> ClusterVersion.status ->
HostedControlPlane.status -> HostedCluster.status.  This commit adds a
new spec property that is piped down via:

1. HostedCluster (management cluster) spec.upstream.
2. HostedControlPlane (management cluster) spec.upstream.
3. Cluster-version operator Deployment --upstream command line option
   (management cluster), and also ClusterVersion spec.upstream.
4. Cluster-version operator container (managment cluster).

The CVO's new --upstream option is from [3], and we're using that
pathway instead of ClusterVersion spec, because we don't want the
update service URI to come via the the customer-accessible hosted API.
It's ok for the channel (which is used as a query parameter in the
update-service GET requests) to continue to flow in via ClusterVersion
spec.  I'm still populating ClusterVersion's spec.upstream in this
commit for discoverability, although because --upstream is being used,
the CVO will ignore the ClusterVersion spec.upstream value.

When the HyperShift operator sets this in HostedControlPlane for an
older cluster version, the old HostedControlPlane controller (launched
from the hosted release payload) will not recognize or propagate the
new property.  But that's not a terrible thing, and issues with
fetching update recommendations from the default update service will
still be reported in the RetrievedUpdates condition [4] with a message
mentioning the update service URI if there are problems [5].  Although
it doesn't look like we capture that condition currently:

  $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionFailing ConditionType = "ClusterVersionFailing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionProgressing ConditionType = "ClusterVersionProgressing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionAvailable ConditionType = "ClusterVersionAvailable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted"

We could opt to collect it in follow-up work if issues occur
frequently enough that dropping down to the ClusterVersion level to
debug becomes tedious.

If, in the future, we grow an option to give the hosted CVO kubeconfig
access to both the management and hosted Kubernetes APIs, we could
drop --upstream and have the hosted CVO reach out and read this
configuration off HostedControlPlane or HostedCluster directly.

I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch
up with channel, availableUpdates, and conditionalUpdates, 2023-01-17, openshift#1954).
The recent 1ae3a36 (Always include AWS default security group in
worker security groups, 2024-02-05, openshift#3527) suggests that the "v1alpha1
is a superset of v1beta1" policy remains current.

[1]: https://issues.redhat.com/browse/XCMSTRAT-513
[2]: https://issues.redhat.com/browse/OTA-1185
[3]: openshift/cluster-version-operator#1035
[4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706
[5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365
wking added a commit to wking/hypershift that referenced this pull request Feb 14, 2024
[1] includes [2] pitching local update services in managed regions.
However, HyperShift had (until this commit) been forcing
ClusterVersion spec.upstream empty.  This commit is part of making the
upstream update service configurable on HyperShift, so those clusters
can hear about and assess known issues with updates in environments
where the default https://api.openshift.com/api/upgrades_info update
service is not accessible, or in which an alternative update service
was desired for testing or policy reasons. This includes [2],
mentioned above, but would also include any other instance of
disconnected/restricted-network use. The alternative for folks who
want HyperShift updates in places where api.openshift.com is
inaccessible are:

* Don't use an update service and manage that aspect manually.  But the
  update service declares multiple new releases each week, as well as
  delivering information about known risks/issues for local clusters
  to evalute their exposure.  That's a lot of information to manage
  manually, if folks decide not to plug into the existing update
  service tooling.
* Run a local update service, and fiddle with DNS and X.509 certs so
  that packets aimed at api.openshift.com get routed to your local
  service . This one requires less long-term effort than manually
  replacing the entire update service system, but it requires your
  clusters to trust an X.509 Certificate Authority that is willing to
  sign certificates for your local service saying "yup, that one is
  definitely api.openshift.com".

The implementation is similar to e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related
status properties piped up via CVO -> ClusterVersion.status ->
HostedControlPlane.status -> HostedCluster.status.  This commit adds a
new spec property that is piped down via:

1. HostedCluster (management cluster) spec.upstream.
2. HostedControlPlane (management cluster) spec.upstream.
3. Cluster-version operator Deployment --upstream command line option
   (management cluster), and also ClusterVersion spec.upstream.
4. Cluster-version operator container (managment cluster).

The CVO's new --upstream option is from [3], and we're using that
pathway instead of ClusterVersion spec, because we don't want the
update service URI to come via the the customer-accessible hosted API.
It's ok for the channel (which is used as a query parameter in the
update-service GET requests) to continue to flow in via ClusterVersion
spec.  I'm still populating ClusterVersion's spec.upstream in this
commit for discoverability, although because --upstream is being used,
the CVO will ignore the ClusterVersion spec.upstream value.

When the HyperShift operator sets this in HostedControlPlane for an
older cluster version, the old HostedControlPlane controller (launched
from the hosted release payload) will not recognize or propagate the
new property.  But that's not a terrible thing, and issues with
fetching update recommendations from the default update service will
still be reported in the RetrievedUpdates condition [4] with a message
mentioning the update service URI if there are problems [5].  Although
it doesn't look like we capture that condition currently:

  $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionFailing ConditionType = "ClusterVersionFailing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionProgressing ConditionType = "ClusterVersionProgressing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionAvailable ConditionType = "ClusterVersionAvailable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted"

We could opt to collect it in follow-up work if issues occur
frequently enough that dropping down to the ClusterVersion level to
debug becomes tedious.

If, in the future, we grow an option to give the hosted CVO kubeconfig
access to both the management and hosted Kubernetes APIs, we could
drop --upstream and have the hosted CVO reach out and read this
configuration off HostedControlPlane or HostedCluster directly.

I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch
up with channel, availableUpdates, and conditionalUpdates, 2023-01-17, openshift#1954).
The recent 1ae3a36 (Always include AWS default security group in
worker security groups, 2024-02-05, openshift#3527) suggests that the "v1alpha1
is a superset of v1beta1" policy remains current.

The cmd/install/assets, docs/content/reference/api.md, and vendor
changes are automatic via:

  $ make update

[1]: https://issues.redhat.com/browse/XCMSTRAT-513
[2]: https://issues.redhat.com/browse/OTA-1185
[3]: openshift/cluster-version-operator#1035
[4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706
[5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365
wking added a commit to wking/hypershift that referenced this pull request Feb 14, 2024
[1] includes [2] pitching local update services in managed regions.
However, HyperShift had (until this commit) been forcing
ClusterVersion spec.upstream empty.  This commit is part of making the
upstream update service configurable on HyperShift, so those clusters
can hear about and assess known issues with updates in environments
where the default https://api.openshift.com/api/upgrades_info update
service is not accessible, or in which an alternative update service
was desired for testing or policy reasons. This includes [2],
mentioned above, but would also include any other instance of
disconnected/restricted-network use. The alternative for folks who
want HyperShift updates in places where api.openshift.com is
inaccessible are:

* Don't use an update service and manage that aspect manually.  But the
  update service declares multiple new releases each week, as well as
  delivering information about known risks/issues for local clusters
  to evalute their exposure.  That's a lot of information to manage
  manually, if folks decide not to plug into the existing update
  service tooling.
* Run a local update service, and fiddle with DNS and X.509 certs so
  that packets aimed at api.openshift.com get routed to your local
  service . This one requires less long-term effort than manually
  replacing the entire update service system, but it requires your
  clusters to trust an X.509 Certificate Authority that is willing to
  sign certificates for your local service saying "yup, that one is
  definitely api.openshift.com".

The implementation is similar to e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related
status properties piped up via CVO -> ClusterVersion.status ->
HostedControlPlane.status -> HostedCluster.status.  This commit adds a
new spec property that is piped down via:

1. HostedCluster (management cluster) spec.upstream.
2. HostedControlPlane (management cluster) spec.upstream.
3. Cluster-version operator Deployment --upstream command line option
   (management cluster), and also ClusterVersion spec.upstream.
4. Cluster-version operator container (managment cluster).

The CVO's new --upstream option is from [3], and we're using that
pathway instead of ClusterVersion spec, because we don't want the
update service URI to come via the the customer-accessible hosted API.
It's ok for the channel (which is used as a query parameter in the
update-service GET requests) to continue to flow in via ClusterVersion
spec.  I'm still populating ClusterVersion's spec.upstream in this
commit for discoverability, although because --upstream is being used,
the CVO will ignore the ClusterVersion spec.upstream value.

When the HyperShift operator sets this in HostedControlPlane for an
older cluster version, the old HostedControlPlane controller (launched
from the hosted release payload) will not recognize or propagate the
new property.  But that's not a terrible thing, and issues with
fetching update recommendations from the default update service will
still be reported in the RetrievedUpdates condition [4] with a message
mentioning the update service URI if there are problems [5].  Although
it doesn't look like we capture that condition currently:

  $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionFailing ConditionType = "ClusterVersionFailing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionProgressing ConditionType = "ClusterVersionProgressing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionAvailable ConditionType = "ClusterVersionAvailable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted"

We could opt to collect it in follow-up work if issues occur
frequently enough that dropping down to the ClusterVersion level to
debug becomes tedious.

If, in the future, we grow an option to give the hosted CVO kubeconfig
access to both the management and hosted Kubernetes APIs, we could
drop --upstream and have the hosted CVO reach out and read this
configuration off HostedControlPlane or HostedCluster directly.

I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch
up with channel, availableUpdates, and conditionalUpdates, 2023-01-17, openshift#1954).
The recent 1ae3a36 (Always include AWS default security group in
worker security groups, 2024-02-05, openshift#3527) suggests that the "v1alpha1
is a superset of v1beta1" policy remains current.

The cmd/install/assets, client/applyconfiguration,
docs/content/reference/api.md, hack/app-sre/saas_template.yaml, and
vendor changes are automatic via:

  $ make update

[1]: https://issues.redhat.com/browse/XCMSTRAT-513
[2]: https://issues.redhat.com/browse/OTA-1185
[3]: openshift/cluster-version-operator#1035
[4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706
[5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365
wking added a commit to wking/hypershift that referenced this pull request Mar 29, 2024
[1] includes [2] pitching local update services in managed regions.
However, HyperShift had (until this commit) been forcing
ClusterVersion spec.upstream empty.  This commit is part of making the
upstream update service configurable on HyperShift, so those clusters
can hear about and assess known issues with updates in environments
where the default https://api.openshift.com/api/upgrades_info update
service is not accessible, or in which an alternative update service
was desired for testing or policy reasons. This includes [2],
mentioned above, but would also include any other instance of
disconnected/restricted-network use. The alternative for folks who
want HyperShift updates in places where api.openshift.com is
inaccessible are:

* Don't use an update service and manage that aspect manually.  But the
  update service declares multiple new releases each week, as well as
  delivering information about known risks/issues for local clusters
  to evalute their exposure.  That's a lot of information to manage
  manually, if folks decide not to plug into the existing update
  service tooling.
* Run a local update service, and fiddle with DNS and X.509 certs so
  that packets aimed at api.openshift.com get routed to your local
  service . This one requires less long-term effort than manually
  replacing the entire update service system, but it requires your
  clusters to trust an X.509 Certificate Authority that is willing to
  sign certificates for your local service saying "yup, that one is
  definitely api.openshift.com".

The implementation is similar to e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related
status properties piped up via CVO -> ClusterVersion.status ->
HostedControlPlane.status -> HostedCluster.status.  This commit adds a
new spec property that is piped down via:

1. HostedCluster (management cluster) spec.updateService.
2. HostedControlPlane (management cluster) spec.updateService.
3. Cluster-version operator Deployment --update-service command line
   option (management cluster), and also ClusterVersion spec.upstream.
4. Cluster-version operator container (managment cluster).

The CVO's new --update-service option is from [3], and we're using
that pathway instead of ClusterVersion spec, because we don't want the
update service URI to come via the the customer-accessible hosted API.
It's ok for the channel (which is used as a query parameter in the
update-service GET requests) to continue to flow in via ClusterVersion
spec.  I'm still populating ClusterVersion's spec.upstream in this
commit for discoverability, although because --update-service is being
used, the CVO will ignore the ClusterVersion spec.upstream value.

When the HyperShift operator sets this in HostedControlPlane for an
older cluster version, the old HostedControlPlane controller (launched
from the hosted release payload) will not recognize or propagate the
new property.  But that's not a terrible thing, and issues with
fetching update recommendations from the default update service will
still be reported in the RetrievedUpdates condition [4] with a message
mentioning the update service URI if there are problems [5].  Although
it doesn't look like we capture that condition currently:

  $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionFailing ConditionType = "ClusterVersionFailing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionProgressing ConditionType = "ClusterVersionProgressing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionAvailable ConditionType = "ClusterVersionAvailable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted"

We could opt to collect it in follow-up work if issues occur
frequently enough that dropping down to the ClusterVersion level to
debug becomes tedious.

If, in the future, we grow an option to give the hosted CVO kubeconfig
access to both the management and hosted Kubernetes APIs, we could
drop --update-service and have the hosted CVO reach out and read this
configuration off HostedControlPlane or HostedCluster directly.

I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch
up with channel, availableUpdates, and conditionalUpdates, 2023-01-17, openshift#1954).
The recent 1ae3a36 (Always include AWS default security group in
worker security groups, 2024-02-05, openshift#3527) suggests that the "v1alpha1
is a superset of v1beta1" policy remains current.

The cmd/install/assets, client/applyconfiguration,
docs/content/reference/api.md, hack/app-sre/saas_template.yaml, and
vendor changes are automatic via:

  $ make update

[1]: https://issues.redhat.com/browse/XCMSTRAT-513
[2]: https://issues.redhat.com/browse/OTA-1185
[3]: openshift/cluster-version-operator#1035
[4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706
[5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365
wking added a commit to wking/hypershift that referenced this pull request Sep 18, 2024
…wn ClusterVersionRetrievedUpdates

ClusterVersion has had the RetrievedUpdates condition since 2018 [1],
before OpenShift v4 went GA.  But until e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2023-01-21, openshift#1954), HostedCluster wasn't
interested in update-service recommendations.  And the
HostedControlPlane only learned how to propagate the condition in
5216915 (*: Propagate RetrievedUpdates from ClusterVersion up to
HostedCluster, 2024-09-17, openshift#4744).  To avoid long-running status like
[2]:

  - lastTransitionTime: "2024-09-18T01:45:13Z"
    message: Condition not found in the CVO.
    observedGeneration: 3
    reason: StatusUnknown
    status: Unknown
    type: ClusterVersionRetrievedUpdates

when a new HyperShift operator (with e438076) is running a 4.17 or
earlier HostedCluster (and its HostedControlPlane operator, without
e438076), this commit silently drops the StatusUnknown condition
from HostedCluster.  We can revert this commit once all
still-supported HostedControlPlane operator versions understand how to
propagate the condition.

This commit's "if we could find it, pass it along" approach is also
compatible with 4.17.z HostedControlPlane controllers picking up the
ability to propagate the condition in a backport, if for some reason
we decide that behavior is worth backporting.

[1]: openshift/cluster-version-operator@286641d#diff-4229ccef40cdb3dd7a8e5ca230d85fa0e74bbc265511ddd94f53acffbcd19b79R100
[2]: openshift#4744 (comment)
wking added a commit to wking/hypershift that referenced this pull request Sep 18, 2024
…wn ClusterVersionRetrievedUpdates

ClusterVersion has had the RetrievedUpdates condition since 2018 [1],
before OpenShift v4 went GA.  But until e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2023-01-21, openshift#1954), HostedCluster wasn't
interested in update-service recommendations.  And the
HostedControlPlane only learned how to propagate the condition in
5216915 (*: Propagate RetrievedUpdates from ClusterVersion up to
HostedCluster, 2024-09-17, openshift#4744).  To avoid long-running status like
[2]:

  - lastTransitionTime: "2024-09-18T01:45:13Z"
    message: Condition not found in the CVO.
    observedGeneration: 3
    reason: StatusUnknown
    status: Unknown
    type: ClusterVersionRetrievedUpdates

when a new HyperShift operator (with e438076) is running a 4.17 or
earlier HostedCluster (and its HostedControlPlane operator, without
e438076), this commit silently drops the StatusUnknown condition
from HostedCluster.  We can revert this commit once all
still-supported HostedControlPlane operator versions understand how to
propagate the condition.

This commit's "if we could find it, pass it along" approach is also
compatible with 4.17.z HostedControlPlane controllers picking up the
ability to propagate the condition in a backport, if for some reason
we decide that behavior is worth backporting.

[1]: openshift/cluster-version-operator@286641d#diff-4229ccef40cdb3dd7a8e5ca230d85fa0e74bbc265511ddd94f53acffbcd19b79R100
[2]: openshift#4744 (comment)
wking added a commit to wking/hypershift that referenced this pull request Sep 18, 2024
…wn ClusterVersionRetrievedUpdates

ClusterVersion has had the RetrievedUpdates condition since 2018 [1],
before OpenShift v4 went GA.  But until e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2023-01-21, openshift#1954), HostedCluster wasn't
interested in update-service recommendations.  And the
HostedControlPlane only learned how to propagate the condition in
5216915 (*: Propagate RetrievedUpdates from ClusterVersion up to
HostedCluster, 2024-09-17, openshift#4744).  To avoid long-running status like
[2]:

  - lastTransitionTime: "2024-09-18T01:45:13Z"
    message: Condition not found in the CVO.
    observedGeneration: 3
    reason: StatusUnknown
    status: Unknown
    type: ClusterVersionRetrievedUpdates

when a new HyperShift operator (with e438076) is running a 4.17 or
earlier HostedCluster (and its HostedControlPlane operator, without
e438076), this commit silently drops the StatusUnknown condition
from HostedCluster.  We can revert this commit once all
still-supported HostedControlPlane operator versions understand how to
propagate the condition.

This commit's "if we could find it, pass it along" approach is also
compatible with 4.17.z HostedControlPlane controllers picking up the
ability to propagate the condition in a backport, if for some reason
we decide that behavior is worth backporting.

[1]: openshift/cluster-version-operator@286641d#diff-4229ccef40cdb3dd7a8e5ca230d85fa0e74bbc265511ddd94f53acffbcd19b79R100
[2]: openshift#4744 (comment)
bryan-cox pushed a commit to bryan-cox/hypershift that referenced this pull request Sep 25, 2024
…wn ClusterVersionRetrievedUpdates

ClusterVersion has had the RetrievedUpdates condition since 2018 [1],
before OpenShift v4 went GA.  But until e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2023-01-21, openshift#1954), HostedCluster wasn't
interested in update-service recommendations.  And the
HostedControlPlane only learned how to propagate the condition in
5216915 (*: Propagate RetrievedUpdates from ClusterVersion up to
HostedCluster, 2024-09-17, openshift#4744).  To avoid long-running status like
[2]:

  - lastTransitionTime: "2024-09-18T01:45:13Z"
    message: Condition not found in the CVO.
    observedGeneration: 3
    reason: StatusUnknown
    status: Unknown
    type: ClusterVersionRetrievedUpdates

when a new HyperShift operator (with e438076) is running a 4.17 or
earlier HostedCluster (and its HostedControlPlane operator, without
e438076), this commit silently drops the StatusUnknown condition
from HostedCluster.  We can revert this commit once all
still-supported HostedControlPlane operator versions understand how to
propagate the condition.

This commit's "if we could find it, pass it along" approach is also
compatible with 4.17.z HostedControlPlane controllers picking up the
ability to propagate the condition in a backport, if for some reason
we decide that behavior is worth backporting.

[1]: openshift/cluster-version-operator@286641d#diff-4229ccef40cdb3dd7a8e5ca230d85fa0e74bbc265511ddd94f53acffbcd19b79R100
[2]: openshift#4744 (comment)
pamelachristie pushed a commit to pamelachristie/hypershift that referenced this pull request Mar 31, 2025
[1] includes [2] pitching local update services in managed regions.
However, HyperShift had (until this commit) been forcing
ClusterVersion spec.upstream empty.  This commit is part of making the
upstream update service configurable on HyperShift, so those clusters
can hear about and assess known issues with updates in environments
where the default https://api.openshift.com/api/upgrades_info update
service is not accessible, or in which an alternative update service
was desired for testing or policy reasons. This includes [2],
mentioned above, but would also include any other instance of
disconnected/restricted-network use. The alternative for folks who
want HyperShift updates in places where api.openshift.com is
inaccessible are:

* Don't use an update service and manage that aspect manually.  But the
  update service declares multiple new releases each week, as well as
  delivering information about known risks/issues for local clusters
  to evalute their exposure.  That's a lot of information to manage
  manually, if folks decide not to plug into the existing update
  service tooling.
* Run a local update service, and fiddle with DNS and X.509 certs so
  that packets aimed at api.openshift.com get routed to your local
  service . This one requires less long-term effort than manually
  replacing the entire update service system, but it requires your
  clusters to trust an X.509 Certificate Authority that is willing to
  sign certificates for your local service saying "yup, that one is
  definitely api.openshift.com".

The implementation is similar to e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related
status properties piped up via CVO -> ClusterVersion.status ->
HostedControlPlane.status -> HostedCluster.status.  This commit adds a
new spec property that is piped down via:

1. HostedCluster (management cluster) spec.updateService.
2. HostedControlPlane (management cluster) spec.updateService.
3. Cluster-version operator Deployment --update-service command line
   option (management cluster), and also ClusterVersion spec.upstream.
4. Cluster-version operator container (managment cluster).

The CVO's new --update-service option is from [3], and we're using
that pathway instead of ClusterVersion spec, because we don't want the
update service URI to come via the the customer-accessible hosted API.
It's ok for the channel (which is used as a query parameter in the
update-service GET requests) to continue to flow in via ClusterVersion
spec.  I'm still populating ClusterVersion's spec.upstream in this
commit for discoverability, although because --update-service is being
used, the CVO will ignore the ClusterVersion spec.upstream value.

When the HyperShift operator sets this in HostedControlPlane for an
older cluster version, the old HostedControlPlane controller (launched
from the hosted release payload) will not recognize or propagate the
new property.  But that's not a terrible thing, and issues with
fetching update recommendations from the default update service will
still be reported in the RetrievedUpdates condition [4] with a message
mentioning the update service URI if there are problems [5].  Although
it doesn't look like we capture that condition currently:

  $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionFailing ConditionType = "ClusterVersionFailing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionProgressing ConditionType = "ClusterVersionProgressing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionAvailable ConditionType = "ClusterVersionAvailable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted"

We could opt to collect it in follow-up work if issues occur
frequently enough that dropping down to the ClusterVersion level to
debug becomes tedious.

If, in the future, we grow an option to give the hosted CVO kubeconfig
access to both the management and hosted Kubernetes APIs, we could
drop --update-service and have the hosted CVO reach out and read this
configuration off HostedControlPlane or HostedCluster directly.

I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch
up with channel, availableUpdates, and conditionalUpdates, 2023-01-17, openshift#1954).
The recent 1ae3a36 (Always include AWS default security group in
worker security groups, 2024-02-05, openshift#3527) suggests that the "v1alpha1
is a superset of v1beta1" policy remains current.

The cmd/install/assets, client/applyconfiguration,
docs/content/reference/api.md, hack/app-sre/saas_template.yaml, and
vendor changes are automatic via:

  $ make update

[1]: https://issues.redhat.com/browse/XCMSTRAT-513
[2]: https://issues.redhat.com/browse/OTA-1185
[3]: openshift/cluster-version-operator#1035
[4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706
[5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365
pamelachristie pushed a commit to pamelachristie/hypershift that referenced this pull request Mar 31, 2025
…wn ClusterVersionRetrievedUpdates

ClusterVersion has had the RetrievedUpdates condition since 2018 [1],
before OpenShift v4 went GA.  But until e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2023-01-21, openshift#1954), HostedCluster wasn't
interested in update-service recommendations.  And the
HostedControlPlane only learned how to propagate the condition in
5216915 (*: Propagate RetrievedUpdates from ClusterVersion up to
HostedCluster, 2024-09-17, openshift#4744).  To avoid long-running status like
[2]:

  - lastTransitionTime: "2024-09-18T01:45:13Z"
    message: Condition not found in the CVO.
    observedGeneration: 3
    reason: StatusUnknown
    status: Unknown
    type: ClusterVersionRetrievedUpdates

when a new HyperShift operator (with e438076) is running a 4.17 or
earlier HostedCluster (and its HostedControlPlane operator, without
e438076), this commit silently drops the StatusUnknown condition
from HostedCluster.  We can revert this commit once all
still-supported HostedControlPlane operator versions understand how to
propagate the condition.

This commit's "if we could find it, pass it along" approach is also
compatible with 4.17.z HostedControlPlane controllers picking up the
ability to propagate the condition in a backport, if for some reason
we decide that behavior is worth backporting.

[1]: openshift/cluster-version-operator@286641d#diff-4229ccef40cdb3dd7a8e5ca230d85fa0e74bbc265511ddd94f53acffbcd19b79R100
[2]: openshift#4744 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants