Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented Feb 14, 2024

XCMSTRAT-513 includes OTA-1185 pitching local update services in managed regions. However, HyperShift currently forces ClusterVersion spec.upstream empty. This commit is part of making the upstream update service configurable on HyperShift, so those clusters can hear about and assess known issues with updates in environments where the default https://api.openshift.com/api/upgrades_info update service is not accessible, or in which an alternative update service was desired for testing or policy reasons. This includes OTA-1185, mentioned above, but would also include any other instance of disconnected/restricted-network use. The alternative for folks who want HyperShift updates in places where api.openshift.com is inaccessible are:

  • Don't use an update service and manage that aspect manually. But the update service declares multiple new releases each week, as well as delivering information about known risks/issues for local clusters to evalute their exposure. That's a lot of information to manage manually, if folks decide not to plug into the existing update service tooling.
  • Run a local update service, and fiddle with DNS and X.509 certs so that packets aimed at api.openshift.com get routed to your local service . This one requires less long-term effort than manually replacing the entire update service system, but it requires your clusters to trust an X.509 Certificate Authority that is willing to sign certificates for your local service saying "yup, that one is definitely api.openshift.com".

One possible data path would be:

  1. HostedCluster (management cluster).
  2. HostedControlPlane (management cluster).
  3. ClusterVersion spec.upstream (hosted API).
  4. Cluster-version operator container (managment cluster).

that pathway would only require changes to the HyperShift repo. But to avoid URI passing through the customer-accessible hosted API, this commit adds a new --upstream command-line option to support:

  1. HostedCluster (management cluster).
  2. HostedControlPlane (management cluster).
  3. Cluster-version operator Deployment --upstream command line option (management cluster).
  4. Cluster-version operator container (managment cluster).

If, in the future, we grow an option to give the hosted CVO kubeconfig access to both the management and hosted Kubernetes APIs, we could drop --upstream and have the hosted CVO reach out and read this configuration off HostedControlPlane or HostedCluster directly.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Feb 14, 2024

@wking: This pull request references OTA-1210 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

Details

In response to this:

XCMSTRAT-513 includes OTA-1185 pitching local update services in managed regions. However, HyperShift currently forces ClusterVersion spec.upstream empty. This commit is part of making the upstream update service configurable on HyperShift, so those clusters can hear about and assess known issues with updates in environments where the default https://api.openshift.com/api/upgrades_info update service is not accessible, or in which an alternative update service was desired for testing or policy reasons. This includes OTA-1185, mentioned above, but would also include any other instance of disconnected/restricted-network use. The alternative for folks who want HyperShift updates in places where api.openshift.com is inaccessible are:

  • Don't use an update service and manage that aspect manually. But the update service declares multiple new releases each week, as well as delivering information about known risks/issues for local clusters to evalute their exposure. That's a lot of information to manage manually, if folks decide not to plug into the existing update service tooling.
  • Run a local update service, and fiddle with DNS and X.509 certs so that packets aimed at api.openshift.com get routed to your local service . This one requires less long-term effort than manually replacing the entire update service system, but it requires your clusters to trust an X.509 Certificate Authority that is willing to sign certificates for your local service saying "yup, that one is definitely api.openshift.com".

One possible data path would be:

  1. HostedCluster (management cluster).
  2. HostedControlPlane (management cluster).
  3. ClusterVersion spec.upstream (hosted API).
  4. Cluster-version operator container (managment cluster).

that pathway would only require changes to the HyperShift repo. But to avoid URI passing through the customer-accessible hosted API, this commit adds a new --upstream command-line option to support:

  1. HostedCluster (management cluster).
  2. HostedControlPlane (management cluster).
  3. Cluster-version operator Deployment --upstream command line option (management cluster).
  4. Cluster-version operator container (managment cluster).

If, in the future, we grow an option to give the hosted CVO kubeconfig access to both the management and hosted Kubernetes APIs, we could drop --upstream and have the hosted CVO reach out and read this configuration off HostedControlPlane or HostedCluster directly.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 14, 2024
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 14, 2024
wking added a commit to wking/hypershift that referenced this pull request Feb 14, 2024
[1] includes [2] pitching local update services in managed regions.
However, HyperShift currently forces ClusterVersion spec.upstream
empty [3].  This commit is part of making the upstream update service
configurable on HyperShift, so those clusters can hear about and
assess known issues with updates in environments where the default
https://api.openshift.com/api/upgrades_info update service is not
accessible, or in which an alternative update service was desired for
testing or policy reasons. This includes [2], mentioned above, but
would also include any other instance of
disconnected/restricted-network use. The alternative for folks who
want HyperShift updates in places where api.openshift.com is
inaccessible are:

* Don't use an update service and manage that aspect manually.  But the
  update service declares multiple new releases each week, as well as
  delivering information about known risks/issues for local clusters
  to evalute their exposure.  That's a lot of information to manage
  manually, if folks decide not to plug into the existing update
  service tooling.
* Run a local update service, and fiddle with DNS and X.509 certs so
  that packets aimed at api.openshift.com get routed to your local
  service . This one requires less long-term effort than manually
  replacing the entire update service system, but it requires your
  clusters to trust an X.509 Certificate Authority that is willing to
  sign certificates for your local service saying "yup, that one is
  definitely api.openshift.com".

The implementation is similar to e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related
status properties piped up via CVO -> ClusterVersion.status ->
HostedControlPlane.status -> HostedCluster.status.  This commit adds a
new spec property that is piped down via:

1. HostedCluster (management cluster) spec.upstream.
2. HostedControlPlane (management cluster) spec.upstream.
3. Cluster-version operator Deployment --upstream command line option
   (management cluster), and also ClusterVersion spec.upstream.
4. Cluster-version operator container (managment cluster).

The CVO's new --upstream option is from [3], and we're using that
pathway instead of ClusterVersion spec, because we don't want the
update service URI to come via the the customer-accessible hosted API.
It's ok for the channel (which is used as a query parameter in the
update-service GET requests) to continue to flow in via ClusterVersion
spec.  I'm still populating ClusterVersion's spec.upstream in this
commit for discoverability, although because --upstream is being used,
the CVO will ignore the ClusterVersion spec.upstream value.

When the HyperShift operator sets this in HostedControlPlane for an
older cluster version, the old HostedControlPlane controller (launched
from the hosted release payload) will not recognize or propagate the
new property.  But that's not a terrible thing, and issues with
fetching update recommendations from the default update service will
still be reported in the RetrievedUpdates condition [4] with a message
mentioning the update service URI if there are problems [5].  Although
it doesn't look like we capture that condition currently:

  $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionFailing ConditionType = "ClusterVersionFailing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionProgressing ConditionType = "ClusterVersionProgressing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionAvailable ConditionType = "ClusterVersionAvailable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted"

We could opt to collect it in follow-up work if issues occur
frequently enough that dropping down to the ClusterVersion level to
debug becomes tedious.

If, in the future, we grow an option to give the hosted CVO kubeconfig
access to both the management and hosted Kubernetes APIs, we could
drop --upstream and have the hosted CVO reach out and read this
configuration off HostedControlPlane or HostedCluster directly.

I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch
up with channel, availableUpdates, and conditionalUpdates, 2023-01-17,
default security group in worker security groups, 2024-02-05, openshift#3527)
suggests that the "v1alpha1 is a superset of v1beta1" policy remains
current.

[1]: https://issues.redhat.com/browse/XCMSTRAT-513
[2]: https://issues.redhat.com/browse/OTA-1185
[3]: openshift/cluster-version-operator#1035
[4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706
[5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365
wking added a commit to wking/hypershift that referenced this pull request Feb 14, 2024
[1] includes [2] pitching local update services in managed regions.
However, HyperShift had (until this commit) been forcing
ClusterVersion spec.upstream empty.  This commit is part of making the
upstream update service configurable on HyperShift, so those clusters
can hear about and assess known issues with updates in environments
where the default https://api.openshift.com/api/upgrades_info update
service is not accessible, or in which an alternative update service
was desired for testing or policy reasons. This includes [2],
mentioned above, but would also include any other instance of
disconnected/restricted-network use. The alternative for folks who
want HyperShift updates in places where api.openshift.com is
inaccessible are:

* Don't use an update service and manage that aspect manually.  But the
  update service declares multiple new releases each week, as well as
  delivering information about known risks/issues for local clusters
  to evalute their exposure.  That's a lot of information to manage
  manually, if folks decide not to plug into the existing update
  service tooling.
* Run a local update service, and fiddle with DNS and X.509 certs so
  that packets aimed at api.openshift.com get routed to your local
  service . This one requires less long-term effort than manually
  replacing the entire update service system, but it requires your
  clusters to trust an X.509 Certificate Authority that is willing to
  sign certificates for your local service saying "yup, that one is
  definitely api.openshift.com".

The implementation is similar to e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related
status properties piped up via CVO -> ClusterVersion.status ->
HostedControlPlane.status -> HostedCluster.status.  This commit adds a
new spec property that is piped down via:

1. HostedCluster (management cluster) spec.upstream.
2. HostedControlPlane (management cluster) spec.upstream.
3. Cluster-version operator Deployment --upstream command line option
   (management cluster), and also ClusterVersion spec.upstream.
4. Cluster-version operator container (managment cluster).

The CVO's new --upstream option is from [3], and we're using that
pathway instead of ClusterVersion spec, because we don't want the
update service URI to come via the the customer-accessible hosted API.
It's ok for the channel (which is used as a query parameter in the
update-service GET requests) to continue to flow in via ClusterVersion
spec.  I'm still populating ClusterVersion's spec.upstream in this
commit for discoverability, although because --upstream is being used,
the CVO will ignore the ClusterVersion spec.upstream value.

When the HyperShift operator sets this in HostedControlPlane for an
older cluster version, the old HostedControlPlane controller (launched
from the hosted release payload) will not recognize or propagate the
new property.  But that's not a terrible thing, and issues with
fetching update recommendations from the default update service will
still be reported in the RetrievedUpdates condition [4] with a message
mentioning the update service URI if there are problems [5].  Although
it doesn't look like we capture that condition currently:

  $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionFailing ConditionType = "ClusterVersionFailing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionProgressing ConditionType = "ClusterVersionProgressing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionAvailable ConditionType = "ClusterVersionAvailable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted"

We could opt to collect it in follow-up work if issues occur
frequently enough that dropping down to the ClusterVersion level to
debug becomes tedious.

If, in the future, we grow an option to give the hosted CVO kubeconfig
access to both the management and hosted Kubernetes APIs, we could
drop --upstream and have the hosted CVO reach out and read this
configuration off HostedControlPlane or HostedCluster directly.

I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch
up with channel, availableUpdates, and conditionalUpdates, 2023-01-17,
default security group in worker security groups, 2024-02-05, openshift#3527)
suggests that the "v1alpha1 is a superset of v1beta1" policy remains
current.

[1]: https://issues.redhat.com/browse/XCMSTRAT-513
[2]: https://issues.redhat.com/browse/OTA-1185
[3]: openshift/cluster-version-operator#1035
[4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706
[5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365
wking added a commit to wking/hypershift that referenced this pull request Feb 14, 2024
[1] includes [2] pitching local update services in managed regions.
However, HyperShift had (until this commit) been forcing
ClusterVersion spec.upstream empty.  This commit is part of making the
upstream update service configurable on HyperShift, so those clusters
can hear about and assess known issues with updates in environments
where the default https://api.openshift.com/api/upgrades_info update
service is not accessible, or in which an alternative update service
was desired for testing or policy reasons. This includes [2],
mentioned above, but would also include any other instance of
disconnected/restricted-network use. The alternative for folks who
want HyperShift updates in places where api.openshift.com is
inaccessible are:

* Don't use an update service and manage that aspect manually.  But the
  update service declares multiple new releases each week, as well as
  delivering information about known risks/issues for local clusters
  to evalute their exposure.  That's a lot of information to manage
  manually, if folks decide not to plug into the existing update
  service tooling.
* Run a local update service, and fiddle with DNS and X.509 certs so
  that packets aimed at api.openshift.com get routed to your local
  service . This one requires less long-term effort than manually
  replacing the entire update service system, but it requires your
  clusters to trust an X.509 Certificate Authority that is willing to
  sign certificates for your local service saying "yup, that one is
  definitely api.openshift.com".

The implementation is similar to e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related
status properties piped up via CVO -> ClusterVersion.status ->
HostedControlPlane.status -> HostedCluster.status.  This commit adds a
new spec property that is piped down via:

1. HostedCluster (management cluster) spec.upstream.
2. HostedControlPlane (management cluster) spec.upstream.
3. Cluster-version operator Deployment --upstream command line option
   (management cluster), and also ClusterVersion spec.upstream.
4. Cluster-version operator container (managment cluster).

The CVO's new --upstream option is from [3], and we're using that
pathway instead of ClusterVersion spec, because we don't want the
update service URI to come via the the customer-accessible hosted API.
It's ok for the channel (which is used as a query parameter in the
update-service GET requests) to continue to flow in via ClusterVersion
spec.  I'm still populating ClusterVersion's spec.upstream in this
commit for discoverability, although because --upstream is being used,
the CVO will ignore the ClusterVersion spec.upstream value.

When the HyperShift operator sets this in HostedControlPlane for an
older cluster version, the old HostedControlPlane controller (launched
from the hosted release payload) will not recognize or propagate the
new property.  But that's not a terrible thing, and issues with
fetching update recommendations from the default update service will
still be reported in the RetrievedUpdates condition [4] with a message
mentioning the update service URI if there are problems [5].  Although
it doesn't look like we capture that condition currently:

  $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionFailing ConditionType = "ClusterVersionFailing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionProgressing ConditionType = "ClusterVersionProgressing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionAvailable ConditionType = "ClusterVersionAvailable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted"

We could opt to collect it in follow-up work if issues occur
frequently enough that dropping down to the ClusterVersion level to
debug becomes tedious.

If, in the future, we grow an option to give the hosted CVO kubeconfig
access to both the management and hosted Kubernetes APIs, we could
drop --upstream and have the hosted CVO reach out and read this
configuration off HostedControlPlane or HostedCluster directly.

I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch
up with channel, availableUpdates, and conditionalUpdates, 2023-01-17, openshift#1954).
The recent 1ae3a36 (Always include AWS default security group in
worker security groups, 2024-02-05, openshift#3527) suggests that the "v1alpha1
is a superset of v1beta1" policy remains current.

[1]: https://issues.redhat.com/browse/XCMSTRAT-513
[2]: https://issues.redhat.com/browse/OTA-1185
[3]: openshift/cluster-version-operator#1035
[4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706
[5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365
wking added a commit to wking/hypershift that referenced this pull request Feb 14, 2024
[1] includes [2] pitching local update services in managed regions.
However, HyperShift had (until this commit) been forcing
ClusterVersion spec.upstream empty.  This commit is part of making the
upstream update service configurable on HyperShift, so those clusters
can hear about and assess known issues with updates in environments
where the default https://api.openshift.com/api/upgrades_info update
service is not accessible, or in which an alternative update service
was desired for testing or policy reasons. This includes [2],
mentioned above, but would also include any other instance of
disconnected/restricted-network use. The alternative for folks who
want HyperShift updates in places where api.openshift.com is
inaccessible are:

* Don't use an update service and manage that aspect manually.  But the
  update service declares multiple new releases each week, as well as
  delivering information about known risks/issues for local clusters
  to evalute their exposure.  That's a lot of information to manage
  manually, if folks decide not to plug into the existing update
  service tooling.
* Run a local update service, and fiddle with DNS and X.509 certs so
  that packets aimed at api.openshift.com get routed to your local
  service . This one requires less long-term effort than manually
  replacing the entire update service system, but it requires your
  clusters to trust an X.509 Certificate Authority that is willing to
  sign certificates for your local service saying "yup, that one is
  definitely api.openshift.com".

The implementation is similar to e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related
status properties piped up via CVO -> ClusterVersion.status ->
HostedControlPlane.status -> HostedCluster.status.  This commit adds a
new spec property that is piped down via:

1. HostedCluster (management cluster) spec.upstream.
2. HostedControlPlane (management cluster) spec.upstream.
3. Cluster-version operator Deployment --upstream command line option
   (management cluster), and also ClusterVersion spec.upstream.
4. Cluster-version operator container (managment cluster).

The CVO's new --upstream option is from [3], and we're using that
pathway instead of ClusterVersion spec, because we don't want the
update service URI to come via the the customer-accessible hosted API.
It's ok for the channel (which is used as a query parameter in the
update-service GET requests) to continue to flow in via ClusterVersion
spec.  I'm still populating ClusterVersion's spec.upstream in this
commit for discoverability, although because --upstream is being used,
the CVO will ignore the ClusterVersion spec.upstream value.

When the HyperShift operator sets this in HostedControlPlane for an
older cluster version, the old HostedControlPlane controller (launched
from the hosted release payload) will not recognize or propagate the
new property.  But that's not a terrible thing, and issues with
fetching update recommendations from the default update service will
still be reported in the RetrievedUpdates condition [4] with a message
mentioning the update service URI if there are problems [5].  Although
it doesn't look like we capture that condition currently:

  $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionFailing ConditionType = "ClusterVersionFailing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionProgressing ConditionType = "ClusterVersionProgressing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionAvailable ConditionType = "ClusterVersionAvailable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted"

We could opt to collect it in follow-up work if issues occur
frequently enough that dropping down to the ClusterVersion level to
debug becomes tedious.

If, in the future, we grow an option to give the hosted CVO kubeconfig
access to both the management and hosted Kubernetes APIs, we could
drop --upstream and have the hosted CVO reach out and read this
configuration off HostedControlPlane or HostedCluster directly.

I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch
up with channel, availableUpdates, and conditionalUpdates, 2023-01-17, openshift#1954).
The recent 1ae3a36 (Always include AWS default security group in
worker security groups, 2024-02-05, openshift#3527) suggests that the "v1alpha1
is a superset of v1beta1" policy remains current.

[1]: https://issues.redhat.com/browse/XCMSTRAT-513
[2]: https://issues.redhat.com/browse/OTA-1185
[3]: openshift/cluster-version-operator#1035
[4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706
[5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365
@wking wking force-pushed the command-line-upstream branch 3 times, most recently from 4aff8b4 to 922014d Compare February 14, 2024 10:10
wking added a commit to wking/hypershift that referenced this pull request Feb 14, 2024
[1] includes [2] pitching local update services in managed regions.
However, HyperShift had (until this commit) been forcing
ClusterVersion spec.upstream empty.  This commit is part of making the
upstream update service configurable on HyperShift, so those clusters
can hear about and assess known issues with updates in environments
where the default https://api.openshift.com/api/upgrades_info update
service is not accessible, or in which an alternative update service
was desired for testing or policy reasons. This includes [2],
mentioned above, but would also include any other instance of
disconnected/restricted-network use. The alternative for folks who
want HyperShift updates in places where api.openshift.com is
inaccessible are:

* Don't use an update service and manage that aspect manually.  But the
  update service declares multiple new releases each week, as well as
  delivering information about known risks/issues for local clusters
  to evalute their exposure.  That's a lot of information to manage
  manually, if folks decide not to plug into the existing update
  service tooling.
* Run a local update service, and fiddle with DNS and X.509 certs so
  that packets aimed at api.openshift.com get routed to your local
  service . This one requires less long-term effort than manually
  replacing the entire update service system, but it requires your
  clusters to trust an X.509 Certificate Authority that is willing to
  sign certificates for your local service saying "yup, that one is
  definitely api.openshift.com".

The implementation is similar to e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related
status properties piped up via CVO -> ClusterVersion.status ->
HostedControlPlane.status -> HostedCluster.status.  This commit adds a
new spec property that is piped down via:

1. HostedCluster (management cluster) spec.upstream.
2. HostedControlPlane (management cluster) spec.upstream.
3. Cluster-version operator Deployment --upstream command line option
   (management cluster), and also ClusterVersion spec.upstream.
4. Cluster-version operator container (managment cluster).

The CVO's new --upstream option is from [3], and we're using that
pathway instead of ClusterVersion spec, because we don't want the
update service URI to come via the the customer-accessible hosted API.
It's ok for the channel (which is used as a query parameter in the
update-service GET requests) to continue to flow in via ClusterVersion
spec.  I'm still populating ClusterVersion's spec.upstream in this
commit for discoverability, although because --upstream is being used,
the CVO will ignore the ClusterVersion spec.upstream value.

When the HyperShift operator sets this in HostedControlPlane for an
older cluster version, the old HostedControlPlane controller (launched
from the hosted release payload) will not recognize or propagate the
new property.  But that's not a terrible thing, and issues with
fetching update recommendations from the default update service will
still be reported in the RetrievedUpdates condition [4] with a message
mentioning the update service URI if there are problems [5].  Although
it doesn't look like we capture that condition currently:

  $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionFailing ConditionType = "ClusterVersionFailing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionProgressing ConditionType = "ClusterVersionProgressing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionAvailable ConditionType = "ClusterVersionAvailable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted"

We could opt to collect it in follow-up work if issues occur
frequently enough that dropping down to the ClusterVersion level to
debug becomes tedious.

If, in the future, we grow an option to give the hosted CVO kubeconfig
access to both the management and hosted Kubernetes APIs, we could
drop --upstream and have the hosted CVO reach out and read this
configuration off HostedControlPlane or HostedCluster directly.

I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch
up with channel, availableUpdates, and conditionalUpdates, 2023-01-17, openshift#1954).
The recent 1ae3a36 (Always include AWS default security group in
worker security groups, 2024-02-05, openshift#3527) suggests that the "v1alpha1
is a superset of v1beta1" policy remains current.

The cmd/install/assets, docs/content/reference/api.md, and vendor
changes are automatic via:

  $ make update

[1]: https://issues.redhat.com/browse/XCMSTRAT-513
[2]: https://issues.redhat.com/browse/OTA-1185
[3]: openshift/cluster-version-operator#1035
[4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706
[5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365
wking added a commit to wking/hypershift that referenced this pull request Feb 14, 2024
[1] includes [2] pitching local update services in managed regions.
However, HyperShift had (until this commit) been forcing
ClusterVersion spec.upstream empty.  This commit is part of making the
upstream update service configurable on HyperShift, so those clusters
can hear about and assess known issues with updates in environments
where the default https://api.openshift.com/api/upgrades_info update
service is not accessible, or in which an alternative update service
was desired for testing or policy reasons. This includes [2],
mentioned above, but would also include any other instance of
disconnected/restricted-network use. The alternative for folks who
want HyperShift updates in places where api.openshift.com is
inaccessible are:

* Don't use an update service and manage that aspect manually.  But the
  update service declares multiple new releases each week, as well as
  delivering information about known risks/issues for local clusters
  to evalute their exposure.  That's a lot of information to manage
  manually, if folks decide not to plug into the existing update
  service tooling.
* Run a local update service, and fiddle with DNS and X.509 certs so
  that packets aimed at api.openshift.com get routed to your local
  service . This one requires less long-term effort than manually
  replacing the entire update service system, but it requires your
  clusters to trust an X.509 Certificate Authority that is willing to
  sign certificates for your local service saying "yup, that one is
  definitely api.openshift.com".

The implementation is similar to e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related
status properties piped up via CVO -> ClusterVersion.status ->
HostedControlPlane.status -> HostedCluster.status.  This commit adds a
new spec property that is piped down via:

1. HostedCluster (management cluster) spec.upstream.
2. HostedControlPlane (management cluster) spec.upstream.
3. Cluster-version operator Deployment --upstream command line option
   (management cluster), and also ClusterVersion spec.upstream.
4. Cluster-version operator container (managment cluster).

The CVO's new --upstream option is from [3], and we're using that
pathway instead of ClusterVersion spec, because we don't want the
update service URI to come via the the customer-accessible hosted API.
It's ok for the channel (which is used as a query parameter in the
update-service GET requests) to continue to flow in via ClusterVersion
spec.  I'm still populating ClusterVersion's spec.upstream in this
commit for discoverability, although because --upstream is being used,
the CVO will ignore the ClusterVersion spec.upstream value.

When the HyperShift operator sets this in HostedControlPlane for an
older cluster version, the old HostedControlPlane controller (launched
from the hosted release payload) will not recognize or propagate the
new property.  But that's not a terrible thing, and issues with
fetching update recommendations from the default update service will
still be reported in the RetrievedUpdates condition [4] with a message
mentioning the update service URI if there are problems [5].  Although
it doesn't look like we capture that condition currently:

  $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionFailing ConditionType = "ClusterVersionFailing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionProgressing ConditionType = "ClusterVersionProgressing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionAvailable ConditionType = "ClusterVersionAvailable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted"

We could opt to collect it in follow-up work if issues occur
frequently enough that dropping down to the ClusterVersion level to
debug becomes tedious.

If, in the future, we grow an option to give the hosted CVO kubeconfig
access to both the management and hosted Kubernetes APIs, we could
drop --upstream and have the hosted CVO reach out and read this
configuration off HostedControlPlane or HostedCluster directly.

I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch
up with channel, availableUpdates, and conditionalUpdates, 2023-01-17, openshift#1954).
The recent 1ae3a36 (Always include AWS default security group in
worker security groups, 2024-02-05, openshift#3527) suggests that the "v1alpha1
is a superset of v1beta1" policy remains current.

The cmd/install/assets, client/applyconfiguration,
docs/content/reference/api.md, hack/app-sre/saas_template.yaml, and
vendor changes are automatic via:

  $ make update

[1]: https://issues.redhat.com/browse/XCMSTRAT-513
[2]: https://issues.redhat.com/browse/OTA-1185
[3]: openshift/cluster-version-operator#1035
[4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706
[5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365
@wking wking force-pushed the command-line-upstream branch 2 times, most recently from 20c0fa3 to 64fb226 Compare February 23, 2024 05:52
@DavidHurta
Copy link
Contributor

/cc

@openshift-ci openshift-ci bot requested a review from DavidHurta February 26, 2024 18:24
Copy link
Member

@petr-muller petr-muller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/hold

I'm not a great fan of the upstream name so I'm holding the PR for the case you or anyone has similar doubts, but feel free to ignore that and unhold.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 1, 2024
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 1, 2024
@wking wking force-pushed the command-line-upstream branch from 64fb226 to 66cc580 Compare March 7, 2024 19:51
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Mar 7, 2024
wking added a commit to wking/cluster-version-operator that referenced this pull request Mar 7, 2024
[1] includes [2] pitching local update services in managed regions.
However, HyperShift currently forces ClusterVersion spec.upstream
empty [3].  This commit is part of making the upstream update service
configurable on HyperShift, so those clusters can hear about and
assess known issues with updates in environments where the default
https://api.openshift.com/api/upgrades_info update service is not
accessible, or in which an alternative update service was desired for
testing or policy reasons. This includes [2], mentioned above, but
would also include any other instance of
disconnected/restricted-network use. The alternative for folks who
want HyperShift updates in places where api.openshift.com is
inaccessible are:

* Don't use an update service and manage that aspect manually.  But the
  update service declares multiple new releases each week, as well as
  delivering information about known risks/issues for local clusters
  to evalute their exposure.  That's a lot of information to manage
  manually, if folks decide not to plug into the existing update
  service tooling.
* Run a local update service, and fiddle with DNS and X.509 certs so
  that packets aimed at api.openshift.com get routed to your local
  service . This one requires less long-term effort than manually
  replacing the entire update service system, but it requires your
  clusters to trust an X.509 Certificate Authority that is willing to
  sign certificates for your local service saying "yup, that one is
  definitely api.openshift.com".

One possible data path would be:

1. HostedCluster (management cluster).
2. HostedControlPlane (management cluster).
3. ClusterVersion spec.upstream (hosted API).
4. Cluster-version operator container (managment cluster).

that pathway would only require changes to the HyperShift repo.  But
to avoid URI passing through the customer-accessible hosted API, this
commit adds a new --update-service command-line option to support:

1. HostedCluster (management cluster).
2. HostedControlPlane (management cluster).
3. Cluster-version operator Deployment --update-service command line
   option (management cluster).
4. Cluster-version operator container (managment cluster).

If, in the future, we grow an option to give the hosted CVO kubeconfig
access to both the management and hosted Kubernetes APIs, we could
drop --update-service and have the hosted CVO reach out and read this
configuration off HostedControlPlane or HostedCluster directly.

I'd initially gone with --upstream, but moved to --update-service and
updateService, etc. based on Petr's reasonable point that "upstream"
is pretty generic, "update service" is not much longer and is much
more specific, and diverging from the ClusterVersion spec.upstream
precedent isn't that terrible [4].

[1]: https://issues.redhat.com/browse/XCMSTRAT-513
[2]: https://issues.redhat.com/browse/OTA-1185
[3]: https://github.com/openshift/hypershift/blob/5e50e633fefd88aab9588d660c4b5daddd950d9a/control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go#L1059
[4]: openshift#1035 (review)
@wking wking force-pushed the command-line-upstream branch from 66cc580 to 04c32b5 Compare March 7, 2024 19:56
@wking wking changed the title OTA-1210: *: Add --upstream command-line option OTA-1210: *: Add --update-service command-line option Mar 7, 2024
[1] includes [2] pitching local update services in managed regions.
However, HyperShift currently forces ClusterVersion spec.upstream
empty [3].  This commit is part of making the upstream update service
configurable on HyperShift, so those clusters can hear about and
assess known issues with updates in environments where the default
https://api.openshift.com/api/upgrades_info update service is not
accessible, or in which an alternative update service was desired for
testing or policy reasons. This includes [2], mentioned above, but
would also include any other instance of
disconnected/restricted-network use. The alternative for folks who
want HyperShift updates in places where api.openshift.com is
inaccessible are:

* Don't use an update service and manage that aspect manually.  But the
  update service declares multiple new releases each week, as well as
  delivering information about known risks/issues for local clusters
  to evalute their exposure.  That's a lot of information to manage
  manually, if folks decide not to plug into the existing update
  service tooling.
* Run a local update service, and fiddle with DNS and X.509 certs so
  that packets aimed at api.openshift.com get routed to your local
  service . This one requires less long-term effort than manually
  replacing the entire update service system, but it requires your
  clusters to trust an X.509 Certificate Authority that is willing to
  sign certificates for your local service saying "yup, that one is
  definitely api.openshift.com".

One possible data path would be:

1. HostedCluster (management cluster).
2. HostedControlPlane (management cluster).
3. ClusterVersion spec.upstream (hosted API).
4. Cluster-version operator container (managment cluster).

that pathway would only require changes to the HyperShift repo.  But
to avoid URI passing through the customer-accessible hosted API, this
commit adds a new --update-service command-line option to support:

1. HostedCluster (management cluster).
2. HostedControlPlane (management cluster).
3. Cluster-version operator Deployment --update-service command line
   option (management cluster).
4. Cluster-version operator container (managment cluster).

If, in the future, we grow an option to give the hosted CVO kubeconfig
access to both the management and hosted Kubernetes APIs, we could
drop --update-service and have the hosted CVO reach out and read this
configuration off HostedControlPlane or HostedCluster directly.

I'd initially gone with --upstream, but moved to --update-service and
updateService, etc. based on Petr's reasonable point that "upstream"
is pretty generic, "update service" is not much longer and is much
more specific, and diverging from the ClusterVersion spec.upstream
precedent isn't that terrible [4].

[1]: https://issues.redhat.com/browse/XCMSTRAT-513
[2]: https://issues.redhat.com/browse/OTA-1185
[3]: https://github.com/openshift/hypershift/blob/5e50e633fefd88aab9588d660c4b5daddd950d9a/control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go#L1059
[4]: openshift#1035 (review)
@wking wking force-pushed the command-line-upstream branch from 04c32b5 to b4abe48 Compare March 7, 2024 20:08
@wking
Copy link
Member Author

wking commented Mar 9, 2024

/retest-required

@wking
Copy link
Member Author

wking commented Mar 11, 2024

Rename request address; thanks!

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 11, 2024
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 13, 2024
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 13, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: petr-muller, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@shellyyang1989
Copy link
Contributor

PR pre-merge tested and passed.

  1. Set upstream on HCP and got correct available updates
# oc edit hc -n clusters pr-3576
{
  autoscaling: {},
  channel: candidate-4.16,
  upstream: https://raw.githubusercontent.com/shellyyang1989/upgrade-cincy/master/cincy-conditional-edge.json
...
}

# oc adm upgrade --kubeconfig=./hc.kubeconfig --include-not-recommended
Cluster version is 4.16.0-0.test-2024-03-28-103443-ci-ln-7s8866t-latest

Upstream: https://raw.githubusercontent.com/shellyyang1989/upgrade-cincy/master/cincy-conditional-edge.json
Channel: candidate-4.16

Recommended updates:

  VERSION                            IMAGE
  4.17.0-0.nightly-2023-12-20-333333 registry.ci.openshift.org/ocp/release@sha256:caf073ce29232978c331d421c06ca5c2736ce5461962775fdd760b05fb2496a0
  4.17.0-0.nightly-2023-12-19-222222 registry.ci.openshift.org/ocp/release@sha256:e385a786f122c6c0e8848ecb9901f510676438f17af8a5c4c206807a9bc0bf28

Supported but not recommended updates:

  Version: 4.17.0-0.nightly-2023-12-18-111111
  Image: registry.ci.openshift.org/ocp/release@sha256:a5cd1b44e5b25b8a617d92a1f947297f56fc9bad104c117a8e452f932e1e2fd0
  Recommended: False
  Reason: ExposedToRisks
  Message: Too many CI failures on this release, so do not update to it https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2021-11-24-075634

  Version: 4.17.0-0.nightly-2023-12-17-000000
  Image: registry.ci.openshift.org/ocp/release@sha256:66c753e8b75d172f2a3f7ba13363383a76ecbc7ecdc00f3a423bef4ea8560405
  Recommended: False
  Reason: SomeInvokerThing
  Message: On clusters on default invoker user, this imaginary bug can happen. https://bug.example.com/a
  1. Regression test on OCP by setting upstream in CV and got correct updates
# oc adm upgrade 
Cluster version is 4.16.0-0.test-2024-03-29-025123-ci-ln-bzlt8y2-latest

Upstream: https://raw.githubusercontent.com/shellyyang1989/upgrade-cincy/master/cincy.json
Channel: stable-4.16 (available channels: channel-a, channel-b, nightly-4.9)

Recommended updates:

  VERSION                            IMAGE
  4.17.0-0.nightly-2021-08-28-150051 registry.svc.ci.openshift.org/ocp/release@sha256:4f0ee87e83419d2e0a86bb386585a66652e6a072f50bcb42180ff547b0c995d6

@shellyyang1989
Copy link
Contributor

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Mar 29, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 29, 2024

@wking: This pull request references OTA-1210 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

Details

In response to this:

XCMSTRAT-513 includes OTA-1185 pitching local update services in managed regions. However, HyperShift currently forces ClusterVersion spec.upstream empty. This commit is part of making the upstream update service configurable on HyperShift, so those clusters can hear about and assess known issues with updates in environments where the default https://api.openshift.com/api/upgrades_info update service is not accessible, or in which an alternative update service was desired for testing or policy reasons. This includes OTA-1185, mentioned above, but would also include any other instance of disconnected/restricted-network use. The alternative for folks who want HyperShift updates in places where api.openshift.com is inaccessible are:

  • Don't use an update service and manage that aspect manually. But the update service declares multiple new releases each week, as well as delivering information about known risks/issues for local clusters to evalute their exposure. That's a lot of information to manage manually, if folks decide not to plug into the existing update service tooling.
  • Run a local update service, and fiddle with DNS and X.509 certs so that packets aimed at api.openshift.com get routed to your local service . This one requires less long-term effort than manually replacing the entire update service system, but it requires your clusters to trust an X.509 Certificate Authority that is willing to sign certificates for your local service saying "yup, that one is definitely api.openshift.com".

One possible data path would be:

  1. HostedCluster (management cluster).
  2. HostedControlPlane (management cluster).
  3. ClusterVersion spec.upstream (hosted API).
  4. Cluster-version operator container (managment cluster).

that pathway would only require changes to the HyperShift repo. But to avoid URI passing through the customer-accessible hosted API, this commit adds a new --upstream command-line option to support:

  1. HostedCluster (management cluster).
  2. HostedControlPlane (management cluster).
  3. Cluster-version operator Deployment --upstream command line option (management cluster).
  4. Cluster-version operator container (managment cluster).

If, in the future, we grow an option to give the hosted CVO kubeconfig access to both the management and hosted Kubernetes APIs, we could drop --upstream and have the hosted CVO reach out and read this configuration off HostedControlPlane or HostedCluster directly.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD e3677e0 and 2 for PR HEAD b4abe48 in total

@wking
Copy link
Member Author

wking commented Mar 29, 2024

cluster-node-tuning-operator watches are unrelated (openshift/machine-config-operator#4295). Not worth a retest, since that was the only failure.

/override ci/prow/e2e-agnostic-ovn

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 29, 2024

@wking: Overrode contexts on behalf of wking: ci/prow/e2e-agnostic-ovn

Details

In response to this:

cluster-node-tuning-operator watches are unrelated (openshift/machine-config-operator#4295). Not worth a retest, since that was the only failure.

/override ci/prow/e2e-agnostic-ovn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 29, 2024

@wking: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 69bee91 into openshift:master Mar 29, 2024
@wking wking deleted the command-line-upstream branch March 29, 2024 20:19
wking added a commit to wking/hypershift that referenced this pull request Mar 29, 2024
[1] includes [2] pitching local update services in managed regions.
However, HyperShift had (until this commit) been forcing
ClusterVersion spec.upstream empty.  This commit is part of making the
upstream update service configurable on HyperShift, so those clusters
can hear about and assess known issues with updates in environments
where the default https://api.openshift.com/api/upgrades_info update
service is not accessible, or in which an alternative update service
was desired for testing or policy reasons. This includes [2],
mentioned above, but would also include any other instance of
disconnected/restricted-network use. The alternative for folks who
want HyperShift updates in places where api.openshift.com is
inaccessible are:

* Don't use an update service and manage that aspect manually.  But the
  update service declares multiple new releases each week, as well as
  delivering information about known risks/issues for local clusters
  to evalute their exposure.  That's a lot of information to manage
  manually, if folks decide not to plug into the existing update
  service tooling.
* Run a local update service, and fiddle with DNS and X.509 certs so
  that packets aimed at api.openshift.com get routed to your local
  service . This one requires less long-term effort than manually
  replacing the entire update service system, but it requires your
  clusters to trust an X.509 Certificate Authority that is willing to
  sign certificates for your local service saying "yup, that one is
  definitely api.openshift.com".

The implementation is similar to e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related
status properties piped up via CVO -> ClusterVersion.status ->
HostedControlPlane.status -> HostedCluster.status.  This commit adds a
new spec property that is piped down via:

1. HostedCluster (management cluster) spec.updateService.
2. HostedControlPlane (management cluster) spec.updateService.
3. Cluster-version operator Deployment --update-service command line
   option (management cluster), and also ClusterVersion spec.upstream.
4. Cluster-version operator container (managment cluster).

The CVO's new --update-service option is from [3], and we're using
that pathway instead of ClusterVersion spec, because we don't want the
update service URI to come via the the customer-accessible hosted API.
It's ok for the channel (which is used as a query parameter in the
update-service GET requests) to continue to flow in via ClusterVersion
spec.  I'm still populating ClusterVersion's spec.upstream in this
commit for discoverability, although because --update-service is being
used, the CVO will ignore the ClusterVersion spec.upstream value.

When the HyperShift operator sets this in HostedControlPlane for an
older cluster version, the old HostedControlPlane controller (launched
from the hosted release payload) will not recognize or propagate the
new property.  But that's not a terrible thing, and issues with
fetching update recommendations from the default update service will
still be reported in the RetrievedUpdates condition [4] with a message
mentioning the update service URI if there are problems [5].  Although
it doesn't look like we capture that condition currently:

  $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionFailing ConditionType = "ClusterVersionFailing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionProgressing ConditionType = "ClusterVersionProgressing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionAvailable ConditionType = "ClusterVersionAvailable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted"

We could opt to collect it in follow-up work if issues occur
frequently enough that dropping down to the ClusterVersion level to
debug becomes tedious.

If, in the future, we grow an option to give the hosted CVO kubeconfig
access to both the management and hosted Kubernetes APIs, we could
drop --update-service and have the hosted CVO reach out and read this
configuration off HostedControlPlane or HostedCluster directly.

I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch
up with channel, availableUpdates, and conditionalUpdates, 2023-01-17, openshift#1954).
The recent 1ae3a36 (Always include AWS default security group in
worker security groups, 2024-02-05, openshift#3527) suggests that the "v1alpha1
is a superset of v1beta1" policy remains current.

The cmd/install/assets, client/applyconfiguration,
docs/content/reference/api.md, hack/app-sre/saas_template.yaml, and
vendor changes are automatic via:

  $ make update

[1]: https://issues.redhat.com/browse/XCMSTRAT-513
[2]: https://issues.redhat.com/browse/OTA-1185
[3]: openshift/cluster-version-operator#1035
[4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706
[5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

This PR has been included in build cluster-version-operator-container-v4.16.0-202403292312.p0.g69bee91.assembly.stream.el9 for distgit cluster-version-operator.
All builds following this will include this PR.

pamelachristie pushed a commit to pamelachristie/hypershift that referenced this pull request Mar 31, 2025
[1] includes [2] pitching local update services in managed regions.
However, HyperShift had (until this commit) been forcing
ClusterVersion spec.upstream empty.  This commit is part of making the
upstream update service configurable on HyperShift, so those clusters
can hear about and assess known issues with updates in environments
where the default https://api.openshift.com/api/upgrades_info update
service is not accessible, or in which an alternative update service
was desired for testing or policy reasons. This includes [2],
mentioned above, but would also include any other instance of
disconnected/restricted-network use. The alternative for folks who
want HyperShift updates in places where api.openshift.com is
inaccessible are:

* Don't use an update service and manage that aspect manually.  But the
  update service declares multiple new releases each week, as well as
  delivering information about known risks/issues for local clusters
  to evalute their exposure.  That's a lot of information to manage
  manually, if folks decide not to plug into the existing update
  service tooling.
* Run a local update service, and fiddle with DNS and X.509 certs so
  that packets aimed at api.openshift.com get routed to your local
  service . This one requires less long-term effort than manually
  replacing the entire update service system, but it requires your
  clusters to trust an X.509 Certificate Authority that is willing to
  sign certificates for your local service saying "yup, that one is
  definitely api.openshift.com".

The implementation is similar to e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related
status properties piped up via CVO -> ClusterVersion.status ->
HostedControlPlane.status -> HostedCluster.status.  This commit adds a
new spec property that is piped down via:

1. HostedCluster (management cluster) spec.updateService.
2. HostedControlPlane (management cluster) spec.updateService.
3. Cluster-version operator Deployment --update-service command line
   option (management cluster), and also ClusterVersion spec.upstream.
4. Cluster-version operator container (managment cluster).

The CVO's new --update-service option is from [3], and we're using
that pathway instead of ClusterVersion spec, because we don't want the
update service URI to come via the the customer-accessible hosted API.
It's ok for the channel (which is used as a query parameter in the
update-service GET requests) to continue to flow in via ClusterVersion
spec.  I'm still populating ClusterVersion's spec.upstream in this
commit for discoverability, although because --update-service is being
used, the CVO will ignore the ClusterVersion spec.upstream value.

When the HyperShift operator sets this in HostedControlPlane for an
older cluster version, the old HostedControlPlane controller (launched
from the hosted release payload) will not recognize or propagate the
new property.  But that's not a terrible thing, and issues with
fetching update recommendations from the default update service will
still be reported in the RetrievedUpdates condition [4] with a message
mentioning the update service URI if there are problems [5].  Although
it doesn't look like we capture that condition currently:

  $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionFailing ConditionType = "ClusterVersionFailing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionProgressing ConditionType = "ClusterVersionProgressing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionAvailable ConditionType = "ClusterVersionAvailable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted"

We could opt to collect it in follow-up work if issues occur
frequently enough that dropping down to the ClusterVersion level to
debug becomes tedious.

If, in the future, we grow an option to give the hosted CVO kubeconfig
access to both the management and hosted Kubernetes APIs, we could
drop --update-service and have the hosted CVO reach out and read this
configuration off HostedControlPlane or HostedCluster directly.

I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch
up with channel, availableUpdates, and conditionalUpdates, 2023-01-17, openshift#1954).
The recent 1ae3a36 (Always include AWS default security group in
worker security groups, 2024-02-05, openshift#3527) suggests that the "v1alpha1
is a superset of v1beta1" policy remains current.

The cmd/install/assets, client/applyconfiguration,
docs/content/reference/api.md, hack/app-sre/saas_template.yaml, and
vendor changes are automatic via:

  $ make update

[1]: https://issues.redhat.com/browse/XCMSTRAT-513
[2]: https://issues.redhat.com/browse/OTA-1185
[3]: openshift/cluster-version-operator#1035
[4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706
[5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants