-
Notifications
You must be signed in to change notification settings - Fork 434
OTA-1211: api/v1beta1/hostedcluster_types: Add spec.updateService #3576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTA-1211: api/v1beta1/hostedcluster_types: Add spec.updateService #3576
Conversation
|
@wking: This pull request references OTA-1211 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
3a93d89 to
e204679
Compare
hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
Outdated
Show resolved
Hide resolved
e204679 to
872d597
Compare
✅ Deploy Preview for hypershift-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
|
openshift/cluster-version-operator#1035 needs to land support for /hold |
872d597 to
7d39a44
Compare
|
PR pre-merge tested and passed, so /label qe-approved |
|
@wking: This pull request references OTA-1211 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[1] includes [2] pitching local update services in managed regions. However, HyperShift had (until this commit) been forcing ClusterVersion spec.upstream empty. This commit is part of making the upstream update service configurable on HyperShift, so those clusters can hear about and assess known issues with updates in environments where the default https://api.openshift.com/api/upgrades_info update service is not accessible, or in which an alternative update service was desired for testing or policy reasons. This includes [2], mentioned above, but would also include any other instance of disconnected/restricted-network use. The alternative for folks who want HyperShift updates in places where api.openshift.com is inaccessible are: * Don't use an update service and manage that aspect manually. But the update service declares multiple new releases each week, as well as delivering information about known risks/issues for local clusters to evalute their exposure. That's a lot of information to manage manually, if folks decide not to plug into the existing update service tooling. * Run a local update service, and fiddle with DNS and X.509 certs so that packets aimed at api.openshift.com get routed to your local service . This one requires less long-term effort than manually replacing the entire update service system, but it requires your clusters to trust an X.509 Certificate Authority that is willing to sign certificates for your local service saying "yup, that one is definitely api.openshift.com". The implementation is similar to e438076 (api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related status properties piped up via CVO -> ClusterVersion.status -> HostedControlPlane.status -> HostedCluster.status. This commit adds a new spec property that is piped down via: 1. HostedCluster (management cluster) spec.updateService. 2. HostedControlPlane (management cluster) spec.updateService. 3. Cluster-version operator Deployment --update-service command line option (management cluster), and also ClusterVersion spec.upstream. 4. Cluster-version operator container (managment cluster). The CVO's new --update-service option is from [3], and we're using that pathway instead of ClusterVersion spec, because we don't want the update service URI to come via the the customer-accessible hosted API. It's ok for the channel (which is used as a query parameter in the update-service GET requests) to continue to flow in via ClusterVersion spec. I'm still populating ClusterVersion's spec.upstream in this commit for discoverability, although because --update-service is being used, the CVO will ignore the ClusterVersion spec.upstream value. When the HyperShift operator sets this in HostedControlPlane for an older cluster version, the old HostedControlPlane controller (launched from the hosted release payload) will not recognize or propagate the new property. But that's not a terrible thing, and issues with fetching update recommendations from the default update service will still be reported in the RetrievedUpdates condition [4] with a message mentioning the update service URI if there are problems [5]. Although it doesn't look like we capture that condition currently: $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/ api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionFailing ConditionType = "ClusterVersionFailing" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionProgressing ConditionType = "ClusterVersionProgressing" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionAvailable ConditionType = "ClusterVersionAvailable" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted" We could opt to collect it in follow-up work if issues occur frequently enough that dropping down to the ClusterVersion level to debug becomes tedious. If, in the future, we grow an option to give the hosted CVO kubeconfig access to both the management and hosted Kubernetes APIs, we could drop --update-service and have the hosted CVO reach out and read this configuration off HostedControlPlane or HostedCluster directly. I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch up with channel, availableUpdates, and conditionalUpdates, 2023-01-17, openshift#1954). The recent 1ae3a36 (Always include AWS default security group in worker security groups, 2024-02-05, openshift#3527) suggests that the "v1alpha1 is a superset of v1beta1" policy remains current. The cmd/install/assets, client/applyconfiguration, docs/content/reference/api.md, hack/app-sre/saas_template.yaml, and vendor changes are automatic via: $ make update [1]: https://issues.redhat.com/browse/XCMSTRAT-513 [2]: https://issues.redhat.com/browse/OTA-1185 [3]: openshift/cluster-version-operator#1035 [4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706 [5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365
7d39a44 to
de4bcbe
Compare
|
openshift/cluster-version-operator#1035 merged and I've updated here with 7d39a44 -> de4bcbe to use the |
|
@wking: This pull request references OTA-1211 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
I can launch clusters with this new HostedControlPlane controller via Cluster Bot and |
|
/hold cancel |
|
@wking: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
sjenning
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, just one nit
| ControlPlaneReleaseImage *string `json:"controlPlaneReleaseImage,omitempty"` | ||
|
|
||
| // updateService may be used to specify the preferred upstream update service. | ||
| // By default it will use the appropriate update service for the cluster and region. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default, the field is empty. Any defaulting would occur above this API in OCM/ACM/etc. This comment confused me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe as a follow-on clean up s/By default it/By default, the CVO/g
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More future alternative wording options:
By default, lower-level components will use the appropriate update service for the cluster and region.
or:
When empty, lower-level components will use the appropriate update service for the cluster and region.
to avoid committing to the CVO as the lower-level actor that is filling in an opinion when HostedControlPlane has an empty-string here.
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sjenning, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
To make it easier for folks working at the management-cluster level to notice and address issues with update-recommendation retrieval. Especially useful since de4bcbe (api/v1beta1/hostedcluster_types: Add spec.updateService, 2024-03-29, openshift#3576) made updateService a configurable knob, because: * Users might configure a broken updateService, and the new condition would tell them that it wasn't working. * Users might install a 4.15 or earlier HostedCluster, where the HostedControlPlane controller (which is extracted from the HostedCluster's release image) was too old to have de4bcbe, so it doesn't propagate the configured updateService, and the cluster-version operator ends up using the default upstream update service. That doesn't work in disconnected/restricted-network environments where api.openshift.com is unreachable. The new condition would tell them that it wasn't working, and that the CVO was using the api.openshift.com service. They'd be left on their own to realize that the updateService feature required a 4.16 or later HostedCluster release image.
To make it easier for folks working at the management-cluster level to notice and address issues with update-recommendation retrieval. Especially useful since de4bcbe (api/v1beta1/hostedcluster_types: Add spec.updateService, 2024-03-29, openshift#3576) made updateService a configurable knob, because: * Users might configure a broken updateService, and the new condition would tell them that it wasn't working. * Users might install a 4.15 or earlier HostedCluster, where the HostedControlPlane controller (which is extracted from the HostedCluster's release image) was too old to have de4bcbe, so it doesn't propagate the configured updateService, and the cluster-version operator ends up using the default upstream update service. That doesn't work in disconnected/restricted-network environments where api.openshift.com is unreachable. The new condition would tell them that it wasn't working, and that the CVO was using the api.openshift.com service. They'd be left on their own to realize that the updateService feature required a 4.16 or later HostedCluster release image.
To make it easier for folks working at the management-cluster level to notice and address issues with update-recommendation retrieval. Especially useful since de4bcbe (api/v1beta1/hostedcluster_types: Add spec.updateService, 2024-03-29, openshift#3576) made updateService a configurable knob, because: * Users might configure a broken updateService, and the new condition would tell them that it wasn't working. * Users might install a 4.15 or earlier HostedCluster, where the HostedControlPlane controller (which is extracted from the HostedCluster's release image) was too old to have de4bcbe, so it doesn't propagate the configured updateService, and the cluster-version operator ends up using the default upstream update service. That doesn't work in disconnected/restricted-network environments where api.openshift.com is unreachable. The new condition would tell them that it wasn't working, and that the CVO was using the api.openshift.com service. They'd be left on their own to realize that the updateService feature required a 4.16 or later HostedCluster release image.
To make it easier for folks working at the management-cluster level to notice and address issues with update-recommendation retrieval. Especially useful since de4bcbe (api/v1beta1/hostedcluster_types: Add spec.updateService, 2024-03-29, openshift#3576) made updateService a configurable knob, because: * Users might configure a broken updateService, and the new condition would tell them that it wasn't working. * Users might install a 4.15 or earlier HostedCluster, where the HostedControlPlane controller (which is extracted from the HostedCluster's release image) was too old to have de4bcbe, so it doesn't propagate the configured updateService, and the cluster-version operator ends up using the default upstream update service. That doesn't work in disconnected/restricted-network environments where api.openshift.com is unreachable. The new condition would tell them that it wasn't working, and that the CVO was using the api.openshift.com service. They'd be left on their own to realize that the updateService feature required a 4.16 or later HostedCluster release image.
XCMSTRAT-513 includes OTA-1185 pitching local update services in managed regions. However, HyperShift had (until this pull) been forcing ClusterVersion
spec.upstreamempty. This pull is part of making the upstream update service configurable on HyperShift, so those clusters can hear about and assess known issues with updates in environments where the default https://api.openshift.com/api/upgrades_info update service is not accessible, or in which an alternative update service was desired for testing or policy reasons. This includes OTA-1185, mentioned above, but would also include any other instance of disconnected/restricted-network use. The alternative for folks who want HyperShift updates in places where api.openshift.com is inaccessible are:The implementation is similar to e438076 (#1954), where I added update-related status properties piped up via CVO -> ClusterVersion
.status-> HostedControlPlane.status-> HostedCluster.status. This commit adds a new spec property that is piped down via:spec.updateService.spec.UpdateService.--update-servicecommand line option (management cluster), and also ClusterVersionspec.upstream.The CVO's new
--update-serviceoption is from openshift/cluster-version-operator#1035, and we're using that pathway instead of ClusterVersionspec, because we don't want the update service URI to come via the the customer-accessible hosted API. It's ok for thechannel(which is used as a query parameter in the update-service GET requests) to continue to flow in via ClusterVersionspec. I'm still populating ClusterVersion'sspec.upstreamin this commit for discoverability, although because--update-serviceis being used, the CVO will ignore the ClusterVersionspec.upstreamvalue.When the HyperShift operator sets this in HostedControlPlane for an older cluster version, the old HostedControlPlane controller (launched from the hosted release payload) will not recognize or propagate the new property. But that's not a terrible thing, and issues with fetching update recommendations from the default update service will still be reported in the
RetrievedUpdatescondition with a message mentioning the update service URI if there are problems. Although it doesn't look like we capture that condition currently:We could opt to collect it in follow-up work if issues occur frequently enough that dropping down to the ClusterVersion level to debug becomes tedious.
If, in the future, we grow an option to give the hosted CVO kubeconfig access to both the management and hosted Kubernetes APIs, we could drop
--update-serviceand have the hosted CVO reach out and read this configuration off HostedControlPlane or HostedCluster directly.I'm bumping v1alpha1 as described in 4af5ffa (#1954). The recent 1ae3a36 (#3527) suggests that the "v1alpha1 is a superset of v1beta1" policy remains current.
Fixes OTA-1211