HOSTEDCP-1419: Always include AWS default security group in worker security groups #3527

csrwng · 2024-02-06T05:36:17Z

What this PR does / why we need it:
Before this commit, the default security group was only used if no security groups were specified on a NodePool. After this change, the default security group will always be used on all workers. Any security groups defined in the NodePool will be added to the default.

Security group creation has been removed from the create infra CLI, given that it's a duplicate of the default one created by the control plane operator.

Which issue(s) this PR fixes (optional, use fixes #<issue_number>(, fixes #<issue_number>, ...) format, where issue_number might be a GitHub issue, or a Jira story:
Fixes #HOSTEDCP-1419

Checklist

Subject and description added to both, commit and PR.
Relevant issues have been referenced.
This change includes docs.
This change includes unit tests.

openshift-ci-robot · 2024-02-06T05:36:21Z

@csrwng: This pull request references HOSTEDCP-1419 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

Details

In response to this:

What this PR does / why we need it:
Before this commit, the default security group was only used if no security groups were specified on a NodePool. After this change, the default security group will always be used on all workers. Any security groups defined in the NodePool will be added to the default.

Security group creation has been removed from the create infra CLI, given that it's a duplicate of the default one created by the control plane operator.

Which issue(s) this PR fixes (optional, use fixes #<issue_number>(, fixes #<issue_number>, ...) format, where issue_number might be a GitHub issue, or a Jira story:
Fixes #HOSTEDCP-1419

Checklist

Subject and description added to both, commit and PR.

Relevant issues have been referenced.

This change includes docs.

This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2024-02-06T05:36:36Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

netlify · 2024-02-06T05:37:21Z

✅ Deploy Preview for hypershift-docs ready!

Name	Link
🔨 Latest commit	`493b1c8`
🔍 Latest deploy log	https://app.netlify.com/sites/hypershift-docs/deploys/65c1c5544dad0f0008ce3f0b
😎 Deploy Preview	https://deploy-preview-3527--hypershift-docs.netlify.app/reference/api
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

csrwng · 2024-02-06T05:37:23Z

/test verify
/test unit
/test e2e-aws

openshift-ci · 2024-02-06T05:37:54Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: csrwng

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [csrwng]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Before this commit, the default security group was only used if no security groups were specified on a NodePool. After this change, the default security group will always be used on all workers. Any security groups defined in the NodePool will be added to the default. Security group creation has been removed from the create infra CLI, given that it's a duplicate of the default one created by the control plane operator.

bryan-cox · 2024-02-06T20:57:18Z

/lgtm

openshift-ci-robot · 2024-02-06T22:53:19Z

/retest-required

Remaining retests: 0 against base HEAD 8b92350 and 2 for PR HEAD 1ae3a36 in total

bryan-cox · 2024-02-07T02:35:36Z

/retest

csrwng · 2024-02-07T13:27:33Z

/retest-required

openshift-ci-robot · 2024-02-07T19:26:38Z

/retest-required

Remaining retests: 0 against base HEAD 5e50e63 and 1 for PR HEAD 1ae3a36 in total

openshift-ci-robot · 2024-02-08T06:13:14Z

/retest-required

Remaining retests: 0 against base HEAD 14d2436 and 0 for PR HEAD 1ae3a36 in total

openshift-ci · 2024-02-08T08:51:02Z

@csrwng: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-ibmcloud-iks	`1ae3a36`	link	false	`/test e2e-ibmcloud-iks`
ci/prow/e2e-ibmcloud-roks	`1ae3a36`	link	false	`/test e2e-ibmcloud-roks`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci-robot · 2024-02-08T09:49:47Z

/hold

Revision 1ae3a36 was retested 3 times: holding

csrwng · 2024-02-08T20:05:24Z

/override ci/prow/e2e-kubevirt-aws-ovn

openshift-ci · 2024-02-08T20:07:22Z

@csrwng: Overrode contexts on behalf of csrwng: ci/prow/e2e-kubevirt-aws-ovn

Details

In response to this:

/override ci/prow/e2e-kubevirt-aws-ovn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

csrwng · 2024-02-08T20:09:24Z

/hold cancel

openshift-bot · 2024-02-09T01:49:22Z

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-hypershift-container-v4.16.0-202402090109.p0.g77ffd43.assembly.stream.el9 for distgit hypershift.
All builds following this will include this PR.

[1] includes [2] pitching local update services in managed regions. However, HyperShift currently forces ClusterVersion spec.upstream empty [3]. This commit is part of making the upstream update service configurable on HyperShift, so those clusters can hear about and assess known issues with updates in environments where the default https://api.openshift.com/api/upgrades_info update service is not accessible, or in which an alternative update service was desired for testing or policy reasons. This includes [2], mentioned above, but would also include any other instance of disconnected/restricted-network use. The alternative for folks who want HyperShift updates in places where api.openshift.com is inaccessible are: * Don't use an update service and manage that aspect manually. But the update service declares multiple new releases each week, as well as delivering information about known risks/issues for local clusters to evalute their exposure. That's a lot of information to manage manually, if folks decide not to plug into the existing update service tooling. * Run a local update service, and fiddle with DNS and X.509 certs so that packets aimed at api.openshift.com get routed to your local service . This one requires less long-term effort than manually replacing the entire update service system, but it requires your clusters to trust an X.509 Certificate Authority that is willing to sign certificates for your local service saying "yup, that one is definitely api.openshift.com". The implementation is similar to e438076 (api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related status properties piped up via CVO -> ClusterVersion.status -> HostedControlPlane.status -> HostedCluster.status. This commit adds a new spec property that is piped down via: 1. HostedCluster (management cluster) spec.upstream. 2. HostedControlPlane (management cluster) spec.upstream. 3. Cluster-version operator Deployment --upstream command line option (management cluster), and also ClusterVersion spec.upstream. 4. Cluster-version operator container (managment cluster). The CVO's new --upstream option is from [3], and we're using that pathway instead of ClusterVersion spec, because we don't want the update service URI to come via the the customer-accessible hosted API. It's ok for the channel (which is used as a query parameter in the update-service GET requests) to continue to flow in via ClusterVersion spec. I'm still populating ClusterVersion's spec.upstream in this commit for discoverability, although because --upstream is being used, the CVO will ignore the ClusterVersion spec.upstream value. When the HyperShift operator sets this in HostedControlPlane for an older cluster version, the old HostedControlPlane controller (launched from the hosted release payload) will not recognize or propagate the new property. But that's not a terrible thing, and issues with fetching update recommendations from the default update service will still be reported in the RetrievedUpdates condition [4] with a message mentioning the update service URI if there are problems [5]. Although it doesn't look like we capture that condition currently: $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/ api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionFailing ConditionType = "ClusterVersionFailing" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionProgressing ConditionType = "ClusterVersionProgressing" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionAvailable ConditionType = "ClusterVersionAvailable" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted" We could opt to collect it in follow-up work if issues occur frequently enough that dropping down to the ClusterVersion level to debug becomes tedious. If, in the future, we grow an option to give the hosted CVO kubeconfig access to both the management and hosted Kubernetes APIs, we could drop --upstream and have the hosted CVO reach out and read this configuration off HostedControlPlane or HostedCluster directly. I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch up with channel, availableUpdates, and conditionalUpdates, 2023-01-17, default security group in worker security groups, 2024-02-05, openshift#3527) suggests that the "v1alpha1 is a superset of v1beta1" policy remains current. [1]: https://issues.redhat.com/browse/XCMSTRAT-513 [2]: https://issues.redhat.com/browse/OTA-1185 [3]: openshift/cluster-version-operator#1035 [4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706 [5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365

[1] includes [2] pitching local update services in managed regions. However, HyperShift had (until this commit) been forcing ClusterVersion spec.upstream empty. This commit is part of making the upstream update service configurable on HyperShift, so those clusters can hear about and assess known issues with updates in environments where the default https://api.openshift.com/api/upgrades_info update service is not accessible, or in which an alternative update service was desired for testing or policy reasons. This includes [2], mentioned above, but would also include any other instance of disconnected/restricted-network use. The alternative for folks who want HyperShift updates in places where api.openshift.com is inaccessible are: * Don't use an update service and manage that aspect manually. But the update service declares multiple new releases each week, as well as delivering information about known risks/issues for local clusters to evalute their exposure. That's a lot of information to manage manually, if folks decide not to plug into the existing update service tooling. * Run a local update service, and fiddle with DNS and X.509 certs so that packets aimed at api.openshift.com get routed to your local service . This one requires less long-term effort than manually replacing the entire update service system, but it requires your clusters to trust an X.509 Certificate Authority that is willing to sign certificates for your local service saying "yup, that one is definitely api.openshift.com". The implementation is similar to e438076 (api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related status properties piped up via CVO -> ClusterVersion.status -> HostedControlPlane.status -> HostedCluster.status. This commit adds a new spec property that is piped down via: 1. HostedCluster (management cluster) spec.upstream. 2. HostedControlPlane (management cluster) spec.upstream. 3. Cluster-version operator Deployment --upstream command line option (management cluster), and also ClusterVersion spec.upstream. 4. Cluster-version operator container (managment cluster). The CVO's new --upstream option is from [3], and we're using that pathway instead of ClusterVersion spec, because we don't want the update service URI to come via the the customer-accessible hosted API. It's ok for the channel (which is used as a query parameter in the update-service GET requests) to continue to flow in via ClusterVersion spec. I'm still populating ClusterVersion's spec.upstream in this commit for discoverability, although because --upstream is being used, the CVO will ignore the ClusterVersion spec.upstream value. When the HyperShift operator sets this in HostedControlPlane for an older cluster version, the old HostedControlPlane controller (launched from the hosted release payload) will not recognize or propagate the new property. But that's not a terrible thing, and issues with fetching update recommendations from the default update service will still be reported in the RetrievedUpdates condition [4] with a message mentioning the update service URI if there are problems [5]. Although it doesn't look like we capture that condition currently: $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/ api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionFailing ConditionType = "ClusterVersionFailing" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionProgressing ConditionType = "ClusterVersionProgressing" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionAvailable ConditionType = "ClusterVersionAvailable" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted" We could opt to collect it in follow-up work if issues occur frequently enough that dropping down to the ClusterVersion level to debug becomes tedious. If, in the future, we grow an option to give the hosted CVO kubeconfig access to both the management and hosted Kubernetes APIs, we could drop --upstream and have the hosted CVO reach out and read this configuration off HostedControlPlane or HostedCluster directly. I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch up with channel, availableUpdates, and conditionalUpdates, 2023-01-17, default security group in worker security groups, 2024-02-05, openshift#3527) suggests that the "v1alpha1 is a superset of v1beta1" policy remains current. [1]: https://issues.redhat.com/browse/XCMSTRAT-513 [2]: https://issues.redhat.com/browse/OTA-1185 [3]: openshift/cluster-version-operator#1035 [4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706 [5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365

[1] includes [2] pitching local update services in managed regions. However, HyperShift had (until this commit) been forcing ClusterVersion spec.upstream empty. This commit is part of making the upstream update service configurable on HyperShift, so those clusters can hear about and assess known issues with updates in environments where the default https://api.openshift.com/api/upgrades_info update service is not accessible, or in which an alternative update service was desired for testing or policy reasons. This includes [2], mentioned above, but would also include any other instance of disconnected/restricted-network use. The alternative for folks who want HyperShift updates in places where api.openshift.com is inaccessible are: * Don't use an update service and manage that aspect manually. But the update service declares multiple new releases each week, as well as delivering information about known risks/issues for local clusters to evalute their exposure. That's a lot of information to manage manually, if folks decide not to plug into the existing update service tooling. * Run a local update service, and fiddle with DNS and X.509 certs so that packets aimed at api.openshift.com get routed to your local service . This one requires less long-term effort than manually replacing the entire update service system, but it requires your clusters to trust an X.509 Certificate Authority that is willing to sign certificates for your local service saying "yup, that one is definitely api.openshift.com". The implementation is similar to e438076 (api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related status properties piped up via CVO -> ClusterVersion.status -> HostedControlPlane.status -> HostedCluster.status. This commit adds a new spec property that is piped down via: 1. HostedCluster (management cluster) spec.upstream. 2. HostedControlPlane (management cluster) spec.upstream. 3. Cluster-version operator Deployment --upstream command line option (management cluster), and also ClusterVersion spec.upstream. 4. Cluster-version operator container (managment cluster). The CVO's new --upstream option is from [3], and we're using that pathway instead of ClusterVersion spec, because we don't want the update service URI to come via the the customer-accessible hosted API. It's ok for the channel (which is used as a query parameter in the update-service GET requests) to continue to flow in via ClusterVersion spec. I'm still populating ClusterVersion's spec.upstream in this commit for discoverability, although because --upstream is being used, the CVO will ignore the ClusterVersion spec.upstream value. When the HyperShift operator sets this in HostedControlPlane for an older cluster version, the old HostedControlPlane controller (launched from the hosted release payload) will not recognize or propagate the new property. But that's not a terrible thing, and issues with fetching update recommendations from the default update service will still be reported in the RetrievedUpdates condition [4] with a message mentioning the update service URI if there are problems [5]. Although it doesn't look like we capture that condition currently: $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/ api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionFailing ConditionType = "ClusterVersionFailing" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionProgressing ConditionType = "ClusterVersionProgressing" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionAvailable ConditionType = "ClusterVersionAvailable" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted" We could opt to collect it in follow-up work if issues occur frequently enough that dropping down to the ClusterVersion level to debug becomes tedious. If, in the future, we grow an option to give the hosted CVO kubeconfig access to both the management and hosted Kubernetes APIs, we could drop --upstream and have the hosted CVO reach out and read this configuration off HostedControlPlane or HostedCluster directly. I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch up with channel, availableUpdates, and conditionalUpdates, 2023-01-17, openshift#1954). The recent 1ae3a36 (Always include AWS default security group in worker security groups, 2024-02-05, openshift#3527) suggests that the "v1alpha1 is a superset of v1beta1" policy remains current. [1]: https://issues.redhat.com/browse/XCMSTRAT-513 [2]: https://issues.redhat.com/browse/OTA-1185 [3]: openshift/cluster-version-operator#1035 [4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706 [5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365

[1] includes [2] pitching local update services in managed regions. However, HyperShift had (until this commit) been forcing ClusterVersion spec.upstream empty. This commit is part of making the upstream update service configurable on HyperShift, so those clusters can hear about and assess known issues with updates in environments where the default https://api.openshift.com/api/upgrades_info update service is not accessible, or in which an alternative update service was desired for testing or policy reasons. This includes [2], mentioned above, but would also include any other instance of disconnected/restricted-network use. The alternative for folks who want HyperShift updates in places where api.openshift.com is inaccessible are: * Don't use an update service and manage that aspect manually. But the update service declares multiple new releases each week, as well as delivering information about known risks/issues for local clusters to evalute their exposure. That's a lot of information to manage manually, if folks decide not to plug into the existing update service tooling. * Run a local update service, and fiddle with DNS and X.509 certs so that packets aimed at api.openshift.com get routed to your local service . This one requires less long-term effort than manually replacing the entire update service system, but it requires your clusters to trust an X.509 Certificate Authority that is willing to sign certificates for your local service saying "yup, that one is definitely api.openshift.com". The implementation is similar to e438076 (api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related status properties piped up via CVO -> ClusterVersion.status -> HostedControlPlane.status -> HostedCluster.status. This commit adds a new spec property that is piped down via: 1. HostedCluster (management cluster) spec.upstream. 2. HostedControlPlane (management cluster) spec.upstream. 3. Cluster-version operator Deployment --upstream command line option (management cluster), and also ClusterVersion spec.upstream. 4. Cluster-version operator container (managment cluster). The CVO's new --upstream option is from [3], and we're using that pathway instead of ClusterVersion spec, because we don't want the update service URI to come via the the customer-accessible hosted API. It's ok for the channel (which is used as a query parameter in the update-service GET requests) to continue to flow in via ClusterVersion spec. I'm still populating ClusterVersion's spec.upstream in this commit for discoverability, although because --upstream is being used, the CVO will ignore the ClusterVersion spec.upstream value. When the HyperShift operator sets this in HostedControlPlane for an older cluster version, the old HostedControlPlane controller (launched from the hosted release payload) will not recognize or propagate the new property. But that's not a terrible thing, and issues with fetching update recommendations from the default update service will still be reported in the RetrievedUpdates condition [4] with a message mentioning the update service URI if there are problems [5]. Although it doesn't look like we capture that condition currently: $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/ api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionFailing ConditionType = "ClusterVersionFailing" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionProgressing ConditionType = "ClusterVersionProgressing" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionAvailable ConditionType = "ClusterVersionAvailable" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted" We could opt to collect it in follow-up work if issues occur frequently enough that dropping down to the ClusterVersion level to debug becomes tedious. If, in the future, we grow an option to give the hosted CVO kubeconfig access to both the management and hosted Kubernetes APIs, we could drop --upstream and have the hosted CVO reach out and read this configuration off HostedControlPlane or HostedCluster directly. I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch up with channel, availableUpdates, and conditionalUpdates, 2023-01-17, openshift#1954). The recent 1ae3a36 (Always include AWS default security group in worker security groups, 2024-02-05, openshift#3527) suggests that the "v1alpha1 is a superset of v1beta1" policy remains current. The cmd/install/assets, docs/content/reference/api.md, and vendor changes are automatic via: $ make update [1]: https://issues.redhat.com/browse/XCMSTRAT-513 [2]: https://issues.redhat.com/browse/OTA-1185 [3]: openshift/cluster-version-operator#1035 [4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706 [5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365

[1] includes [2] pitching local update services in managed regions. However, HyperShift had (until this commit) been forcing ClusterVersion spec.upstream empty. This commit is part of making the upstream update service configurable on HyperShift, so those clusters can hear about and assess known issues with updates in environments where the default https://api.openshift.com/api/upgrades_info update service is not accessible, or in which an alternative update service was desired for testing or policy reasons. This includes [2], mentioned above, but would also include any other instance of disconnected/restricted-network use. The alternative for folks who want HyperShift updates in places where api.openshift.com is inaccessible are: * Don't use an update service and manage that aspect manually. But the update service declares multiple new releases each week, as well as delivering information about known risks/issues for local clusters to evalute their exposure. That's a lot of information to manage manually, if folks decide not to plug into the existing update service tooling. * Run a local update service, and fiddle with DNS and X.509 certs so that packets aimed at api.openshift.com get routed to your local service . This one requires less long-term effort than manually replacing the entire update service system, but it requires your clusters to trust an X.509 Certificate Authority that is willing to sign certificates for your local service saying "yup, that one is definitely api.openshift.com". The implementation is similar to e438076 (api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related status properties piped up via CVO -> ClusterVersion.status -> HostedControlPlane.status -> HostedCluster.status. This commit adds a new spec property that is piped down via: 1. HostedCluster (management cluster) spec.upstream. 2. HostedControlPlane (management cluster) spec.upstream. 3. Cluster-version operator Deployment --upstream command line option (management cluster), and also ClusterVersion spec.upstream. 4. Cluster-version operator container (managment cluster). The CVO's new --upstream option is from [3], and we're using that pathway instead of ClusterVersion spec, because we don't want the update service URI to come via the the customer-accessible hosted API. It's ok for the channel (which is used as a query parameter in the update-service GET requests) to continue to flow in via ClusterVersion spec. I'm still populating ClusterVersion's spec.upstream in this commit for discoverability, although because --upstream is being used, the CVO will ignore the ClusterVersion spec.upstream value. When the HyperShift operator sets this in HostedControlPlane for an older cluster version, the old HostedControlPlane controller (launched from the hosted release payload) will not recognize or propagate the new property. But that's not a terrible thing, and issues with fetching update recommendations from the default update service will still be reported in the RetrievedUpdates condition [4] with a message mentioning the update service URI if there are problems [5]. Although it doesn't look like we capture that condition currently: $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/ api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionFailing ConditionType = "ClusterVersionFailing" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionProgressing ConditionType = "ClusterVersionProgressing" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionAvailable ConditionType = "ClusterVersionAvailable" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted" We could opt to collect it in follow-up work if issues occur frequently enough that dropping down to the ClusterVersion level to debug becomes tedious. If, in the future, we grow an option to give the hosted CVO kubeconfig access to both the management and hosted Kubernetes APIs, we could drop --upstream and have the hosted CVO reach out and read this configuration off HostedControlPlane or HostedCluster directly. I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch up with channel, availableUpdates, and conditionalUpdates, 2023-01-17, openshift#1954). The recent 1ae3a36 (Always include AWS default security group in worker security groups, 2024-02-05, openshift#3527) suggests that the "v1alpha1 is a superset of v1beta1" policy remains current. The cmd/install/assets, client/applyconfiguration, docs/content/reference/api.md, hack/app-sre/saas_template.yaml, and vendor changes are automatic via: $ make update [1]: https://issues.redhat.com/browse/XCMSTRAT-513 [2]: https://issues.redhat.com/browse/OTA-1185 [3]: openshift/cluster-version-operator#1035 [4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706 [5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365

[1] includes [2] pitching local update services in managed regions. However, HyperShift had (until this commit) been forcing ClusterVersion spec.upstream empty. This commit is part of making the upstream update service configurable on HyperShift, so those clusters can hear about and assess known issues with updates in environments where the default https://api.openshift.com/api/upgrades_info update service is not accessible, or in which an alternative update service was desired for testing or policy reasons. This includes [2], mentioned above, but would also include any other instance of disconnected/restricted-network use. The alternative for folks who want HyperShift updates in places where api.openshift.com is inaccessible are: * Don't use an update service and manage that aspect manually. But the update service declares multiple new releases each week, as well as delivering information about known risks/issues for local clusters to evalute their exposure. That's a lot of information to manage manually, if folks decide not to plug into the existing update service tooling. * Run a local update service, and fiddle with DNS and X.509 certs so that packets aimed at api.openshift.com get routed to your local service . This one requires less long-term effort than manually replacing the entire update service system, but it requires your clusters to trust an X.509 Certificate Authority that is willing to sign certificates for your local service saying "yup, that one is definitely api.openshift.com". The implementation is similar to e438076 (api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related status properties piped up via CVO -> ClusterVersion.status -> HostedControlPlane.status -> HostedCluster.status. This commit adds a new spec property that is piped down via: 1. HostedCluster (management cluster) spec.updateService. 2. HostedControlPlane (management cluster) spec.updateService. 3. Cluster-version operator Deployment --update-service command line option (management cluster), and also ClusterVersion spec.upstream. 4. Cluster-version operator container (managment cluster). The CVO's new --update-service option is from [3], and we're using that pathway instead of ClusterVersion spec, because we don't want the update service URI to come via the the customer-accessible hosted API. It's ok for the channel (which is used as a query parameter in the update-service GET requests) to continue to flow in via ClusterVersion spec. I'm still populating ClusterVersion's spec.upstream in this commit for discoverability, although because --update-service is being used, the CVO will ignore the ClusterVersion spec.upstream value. When the HyperShift operator sets this in HostedControlPlane for an older cluster version, the old HostedControlPlane controller (launched from the hosted release payload) will not recognize or propagate the new property. But that's not a terrible thing, and issues with fetching update recommendations from the default update service will still be reported in the RetrievedUpdates condition [4] with a message mentioning the update service URI if there are problems [5]. Although it doesn't look like we capture that condition currently: $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/ api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionFailing ConditionType = "ClusterVersionFailing" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionProgressing ConditionType = "ClusterVersionProgressing" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionAvailable ConditionType = "ClusterVersionAvailable" api/hypershift/v1beta1/hostedcluster_conditions.go: ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted" We could opt to collect it in follow-up work if issues occur frequently enough that dropping down to the ClusterVersion level to debug becomes tedious. If, in the future, we grow an option to give the hosted CVO kubeconfig access to both the management and hosted Kubernetes APIs, we could drop --update-service and have the hosted CVO reach out and read this configuration off HostedControlPlane or HostedCluster directly. I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch up with channel, availableUpdates, and conditionalUpdates, 2023-01-17, openshift#1954). The recent 1ae3a36 (Always include AWS default security group in worker security groups, 2024-02-05, openshift#3527) suggests that the "v1alpha1 is a superset of v1beta1" policy remains current. The cmd/install/assets, client/applyconfiguration, docs/content/reference/api.md, hack/app-sre/saas_template.yaml, and vendor changes are automatic via: $ make update [1]: https://issues.redhat.com/browse/XCMSTRAT-513 [2]: https://issues.redhat.com/browse/OTA-1185 [3]: openshift/cluster-version-operator#1035 [4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706 [5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 6, 2024

openshift-ci bot added do-not-merge/needs-area do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Feb 6, 2024

openshift-ci bot added area/ci-tooling Indicates the PR includes changes for CI or tooling area/cli Indicates the PR includes changes for CLI area/documentation Indicates the PR includes changes for documentation labels Feb 6, 2024

openshift-ci bot added area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed do-not-merge/needs-area labels Feb 6, 2024

csrwng marked this pull request as ready for review February 6, 2024 17:35

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 6, 2024

openshift-ci bot requested review from hasueki and isco-rodriguez February 6, 2024 17:36

csrwng force-pushed the additional_sgs branch from 493b1c8 to 1ae3a36 Compare February 6, 2024 20:14

openshift-ci bot assigned bryan-cox Feb 6, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 6, 2024

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 8, 2024

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 8, 2024

openshift-merge-bot bot merged commit 77ffd43 into openshift:main Feb 8, 2024

wking mentioned this pull request Feb 14, 2024

OTA-1211: api/v1beta1/hostedcluster_types: Add spec.updateService #3576

Merged

4 tasks

sjenning mentioned this pull request Feb 14, 2024

openshift/api: unskew new hypershift e2e presub openshift/release#48815

Merged

csrwng deleted the additional_sgs branch July 12, 2024 15:55

HOSTEDCP-1419: Always include AWS default security group in worker security groups #3527

HOSTEDCP-1419: Always include AWS default security group in worker security groups #3527

Uh oh!

Conversation

csrwng commented Feb 6, 2024

Uh oh!

openshift-ci-robot commented Feb 6, 2024 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Feb 6, 2024

Uh oh!

netlify bot commented Feb 6, 2024

✅ Deploy Preview for hypershift-docs ready!

Uh oh!

csrwng commented Feb 6, 2024

Uh oh!

openshift-ci bot commented Feb 6, 2024

Uh oh!

bryan-cox commented Feb 6, 2024

Uh oh!

openshift-ci-robot commented Feb 6, 2024

Uh oh!

bryan-cox commented Feb 7, 2024

Uh oh!

csrwng commented Feb 7, 2024

Uh oh!

openshift-ci-robot commented Feb 7, 2024

Uh oh!

openshift-ci-robot commented Feb 8, 2024

Uh oh!

openshift-ci bot commented Feb 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Feb 8, 2024

Uh oh!

csrwng commented Feb 8, 2024

Uh oh!

openshift-ci bot commented Feb 8, 2024

Uh oh!

csrwng commented Feb 8, 2024

Uh oh!

openshift-bot commented Feb 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

openshift-ci-robot commented Feb 6, 2024 •

edited by openshift-ci bot

Loading

openshift-ci bot commented Feb 8, 2024 •

edited

Loading