Skip to content

Conversation

@csrwng
Copy link
Contributor

@csrwng csrwng commented Feb 6, 2024

What this PR does / why we need it:
Before this commit, the default security group was only used if no security groups were specified on a NodePool. After this change, the default security group will always be used on all workers. Any security groups defined in the NodePool will be added to the default.

Security group creation has been removed from the create infra CLI, given that it's a duplicate of the default one created by the control plane operator.

Which issue(s) this PR fixes (optional, use fixes #<issue_number>(, fixes #<issue_number>, ...) format, where issue_number might be a GitHub issue, or a Jira story:
Fixes #HOSTEDCP-1419

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 6, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 6, 2024

@csrwng: This pull request references HOSTEDCP-1419 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

Details

In response to this:

What this PR does / why we need it:
Before this commit, the default security group was only used if no security groups were specified on a NodePool. After this change, the default security group will always be used on all workers. Any security groups defined in the NodePool will be added to the default.

Security group creation has been removed from the create infra CLI, given that it's a duplicate of the default one created by the control plane operator.

Which issue(s) this PR fixes (optional, use fixes #<issue_number>(, fixes #<issue_number>, ...) format, where issue_number might be a GitHub issue, or a Jira story:
Fixes #HOSTEDCP-1419

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added do-not-merge/needs-area do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Feb 6, 2024
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 6, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@netlify
Copy link

netlify bot commented Feb 6, 2024

Deploy Preview for hypershift-docs ready!

Name Link
🔨 Latest commit 493b1c8
🔍 Latest deploy log https://app.netlify.com/sites/hypershift-docs/deploys/65c1c5544dad0f0008ce3f0b
😎 Deploy Preview https://deploy-preview-3527--hypershift-docs.netlify.app/reference/api
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@csrwng
Copy link
Contributor Author

csrwng commented Feb 6, 2024

/test verify
/test unit
/test e2e-aws

@openshift-ci openshift-ci bot added area/ci-tooling Indicates the PR includes changes for CI or tooling area/cli Indicates the PR includes changes for CLI area/documentation Indicates the PR includes changes for documentation labels Feb 6, 2024
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 6, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: csrwng

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed do-not-merge/needs-area labels Feb 6, 2024
@csrwng csrwng marked this pull request as ready for review February 6, 2024 17:35
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 6, 2024
Before this commit, the default security group was only used if no
security groups were specified on a NodePool. After this change, the
default security group will always be used on all workers. Any security
groups defined in the NodePool will be added to the default.

Security group creation has been removed from the create infra CLI,
given that it's a duplicate of the default one created by the control
plane operator.
@bryan-cox
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 6, 2024
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 8b92350 and 2 for PR HEAD 1ae3a36 in total

@bryan-cox
Copy link
Member

/retest

@csrwng
Copy link
Contributor Author

csrwng commented Feb 7, 2024

/retest-required

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 5e50e63 and 1 for PR HEAD 1ae3a36 in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 14d2436 and 0 for PR HEAD 1ae3a36 in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 8, 2024

@csrwng: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-ibmcloud-iks 1ae3a36 link false /test e2e-ibmcloud-iks
ci/prow/e2e-ibmcloud-roks 1ae3a36 link false /test e2e-ibmcloud-roks

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link

/hold

Revision 1ae3a36 was retested 3 times: holding

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 8, 2024
@csrwng
Copy link
Contributor Author

csrwng commented Feb 8, 2024

/override ci/prow/e2e-kubevirt-aws-ovn

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 8, 2024

@csrwng: Overrode contexts on behalf of csrwng: ci/prow/e2e-kubevirt-aws-ovn

Details

In response to this:

/override ci/prow/e2e-kubevirt-aws-ovn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@csrwng
Copy link
Contributor Author

csrwng commented Feb 8, 2024

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 8, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 77ffd43 into openshift:main Feb 8, 2024
@openshift-bot
Copy link

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-hypershift-container-v4.16.0-202402090109.p0.g77ffd43.assembly.stream.el9 for distgit hypershift.
All builds following this will include this PR.

wking added a commit to wking/hypershift that referenced this pull request Feb 14, 2024
[1] includes [2] pitching local update services in managed regions.
However, HyperShift currently forces ClusterVersion spec.upstream
empty [3].  This commit is part of making the upstream update service
configurable on HyperShift, so those clusters can hear about and
assess known issues with updates in environments where the default
https://api.openshift.com/api/upgrades_info update service is not
accessible, or in which an alternative update service was desired for
testing or policy reasons. This includes [2], mentioned above, but
would also include any other instance of
disconnected/restricted-network use. The alternative for folks who
want HyperShift updates in places where api.openshift.com is
inaccessible are:

* Don't use an update service and manage that aspect manually.  But the
  update service declares multiple new releases each week, as well as
  delivering information about known risks/issues for local clusters
  to evalute their exposure.  That's a lot of information to manage
  manually, if folks decide not to plug into the existing update
  service tooling.
* Run a local update service, and fiddle with DNS and X.509 certs so
  that packets aimed at api.openshift.com get routed to your local
  service . This one requires less long-term effort than manually
  replacing the entire update service system, but it requires your
  clusters to trust an X.509 Certificate Authority that is willing to
  sign certificates for your local service saying "yup, that one is
  definitely api.openshift.com".

The implementation is similar to e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related
status properties piped up via CVO -> ClusterVersion.status ->
HostedControlPlane.status -> HostedCluster.status.  This commit adds a
new spec property that is piped down via:

1. HostedCluster (management cluster) spec.upstream.
2. HostedControlPlane (management cluster) spec.upstream.
3. Cluster-version operator Deployment --upstream command line option
   (management cluster), and also ClusterVersion spec.upstream.
4. Cluster-version operator container (managment cluster).

The CVO's new --upstream option is from [3], and we're using that
pathway instead of ClusterVersion spec, because we don't want the
update service URI to come via the the customer-accessible hosted API.
It's ok for the channel (which is used as a query parameter in the
update-service GET requests) to continue to flow in via ClusterVersion
spec.  I'm still populating ClusterVersion's spec.upstream in this
commit for discoverability, although because --upstream is being used,
the CVO will ignore the ClusterVersion spec.upstream value.

When the HyperShift operator sets this in HostedControlPlane for an
older cluster version, the old HostedControlPlane controller (launched
from the hosted release payload) will not recognize or propagate the
new property.  But that's not a terrible thing, and issues with
fetching update recommendations from the default update service will
still be reported in the RetrievedUpdates condition [4] with a message
mentioning the update service URI if there are problems [5].  Although
it doesn't look like we capture that condition currently:

  $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionFailing ConditionType = "ClusterVersionFailing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionProgressing ConditionType = "ClusterVersionProgressing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionAvailable ConditionType = "ClusterVersionAvailable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted"

We could opt to collect it in follow-up work if issues occur
frequently enough that dropping down to the ClusterVersion level to
debug becomes tedious.

If, in the future, we grow an option to give the hosted CVO kubeconfig
access to both the management and hosted Kubernetes APIs, we could
drop --upstream and have the hosted CVO reach out and read this
configuration off HostedControlPlane or HostedCluster directly.

I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch
up with channel, availableUpdates, and conditionalUpdates, 2023-01-17,
default security group in worker security groups, 2024-02-05, openshift#3527)
suggests that the "v1alpha1 is a superset of v1beta1" policy remains
current.

[1]: https://issues.redhat.com/browse/XCMSTRAT-513
[2]: https://issues.redhat.com/browse/OTA-1185
[3]: openshift/cluster-version-operator#1035
[4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706
[5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365
wking added a commit to wking/hypershift that referenced this pull request Feb 14, 2024
[1] includes [2] pitching local update services in managed regions.
However, HyperShift had (until this commit) been forcing
ClusterVersion spec.upstream empty.  This commit is part of making the
upstream update service configurable on HyperShift, so those clusters
can hear about and assess known issues with updates in environments
where the default https://api.openshift.com/api/upgrades_info update
service is not accessible, or in which an alternative update service
was desired for testing or policy reasons. This includes [2],
mentioned above, but would also include any other instance of
disconnected/restricted-network use. The alternative for folks who
want HyperShift updates in places where api.openshift.com is
inaccessible are:

* Don't use an update service and manage that aspect manually.  But the
  update service declares multiple new releases each week, as well as
  delivering information about known risks/issues for local clusters
  to evalute their exposure.  That's a lot of information to manage
  manually, if folks decide not to plug into the existing update
  service tooling.
* Run a local update service, and fiddle with DNS and X.509 certs so
  that packets aimed at api.openshift.com get routed to your local
  service . This one requires less long-term effort than manually
  replacing the entire update service system, but it requires your
  clusters to trust an X.509 Certificate Authority that is willing to
  sign certificates for your local service saying "yup, that one is
  definitely api.openshift.com".

The implementation is similar to e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related
status properties piped up via CVO -> ClusterVersion.status ->
HostedControlPlane.status -> HostedCluster.status.  This commit adds a
new spec property that is piped down via:

1. HostedCluster (management cluster) spec.upstream.
2. HostedControlPlane (management cluster) spec.upstream.
3. Cluster-version operator Deployment --upstream command line option
   (management cluster), and also ClusterVersion spec.upstream.
4. Cluster-version operator container (managment cluster).

The CVO's new --upstream option is from [3], and we're using that
pathway instead of ClusterVersion spec, because we don't want the
update service URI to come via the the customer-accessible hosted API.
It's ok for the channel (which is used as a query parameter in the
update-service GET requests) to continue to flow in via ClusterVersion
spec.  I'm still populating ClusterVersion's spec.upstream in this
commit for discoverability, although because --upstream is being used,
the CVO will ignore the ClusterVersion spec.upstream value.

When the HyperShift operator sets this in HostedControlPlane for an
older cluster version, the old HostedControlPlane controller (launched
from the hosted release payload) will not recognize or propagate the
new property.  But that's not a terrible thing, and issues with
fetching update recommendations from the default update service will
still be reported in the RetrievedUpdates condition [4] with a message
mentioning the update service URI if there are problems [5].  Although
it doesn't look like we capture that condition currently:

  $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionFailing ConditionType = "ClusterVersionFailing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionProgressing ConditionType = "ClusterVersionProgressing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionAvailable ConditionType = "ClusterVersionAvailable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted"

We could opt to collect it in follow-up work if issues occur
frequently enough that dropping down to the ClusterVersion level to
debug becomes tedious.

If, in the future, we grow an option to give the hosted CVO kubeconfig
access to both the management and hosted Kubernetes APIs, we could
drop --upstream and have the hosted CVO reach out and read this
configuration off HostedControlPlane or HostedCluster directly.

I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch
up with channel, availableUpdates, and conditionalUpdates, 2023-01-17,
default security group in worker security groups, 2024-02-05, openshift#3527)
suggests that the "v1alpha1 is a superset of v1beta1" policy remains
current.

[1]: https://issues.redhat.com/browse/XCMSTRAT-513
[2]: https://issues.redhat.com/browse/OTA-1185
[3]: openshift/cluster-version-operator#1035
[4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706
[5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365
wking added a commit to wking/hypershift that referenced this pull request Feb 14, 2024
[1] includes [2] pitching local update services in managed regions.
However, HyperShift had (until this commit) been forcing
ClusterVersion spec.upstream empty.  This commit is part of making the
upstream update service configurable on HyperShift, so those clusters
can hear about and assess known issues with updates in environments
where the default https://api.openshift.com/api/upgrades_info update
service is not accessible, or in which an alternative update service
was desired for testing or policy reasons. This includes [2],
mentioned above, but would also include any other instance of
disconnected/restricted-network use. The alternative for folks who
want HyperShift updates in places where api.openshift.com is
inaccessible are:

* Don't use an update service and manage that aspect manually.  But the
  update service declares multiple new releases each week, as well as
  delivering information about known risks/issues for local clusters
  to evalute their exposure.  That's a lot of information to manage
  manually, if folks decide not to plug into the existing update
  service tooling.
* Run a local update service, and fiddle with DNS and X.509 certs so
  that packets aimed at api.openshift.com get routed to your local
  service . This one requires less long-term effort than manually
  replacing the entire update service system, but it requires your
  clusters to trust an X.509 Certificate Authority that is willing to
  sign certificates for your local service saying "yup, that one is
  definitely api.openshift.com".

The implementation is similar to e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related
status properties piped up via CVO -> ClusterVersion.status ->
HostedControlPlane.status -> HostedCluster.status.  This commit adds a
new spec property that is piped down via:

1. HostedCluster (management cluster) spec.upstream.
2. HostedControlPlane (management cluster) spec.upstream.
3. Cluster-version operator Deployment --upstream command line option
   (management cluster), and also ClusterVersion spec.upstream.
4. Cluster-version operator container (managment cluster).

The CVO's new --upstream option is from [3], and we're using that
pathway instead of ClusterVersion spec, because we don't want the
update service URI to come via the the customer-accessible hosted API.
It's ok for the channel (which is used as a query parameter in the
update-service GET requests) to continue to flow in via ClusterVersion
spec.  I'm still populating ClusterVersion's spec.upstream in this
commit for discoverability, although because --upstream is being used,
the CVO will ignore the ClusterVersion spec.upstream value.

When the HyperShift operator sets this in HostedControlPlane for an
older cluster version, the old HostedControlPlane controller (launched
from the hosted release payload) will not recognize or propagate the
new property.  But that's not a terrible thing, and issues with
fetching update recommendations from the default update service will
still be reported in the RetrievedUpdates condition [4] with a message
mentioning the update service URI if there are problems [5].  Although
it doesn't look like we capture that condition currently:

  $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionFailing ConditionType = "ClusterVersionFailing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionProgressing ConditionType = "ClusterVersionProgressing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionAvailable ConditionType = "ClusterVersionAvailable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted"

We could opt to collect it in follow-up work if issues occur
frequently enough that dropping down to the ClusterVersion level to
debug becomes tedious.

If, in the future, we grow an option to give the hosted CVO kubeconfig
access to both the management and hosted Kubernetes APIs, we could
drop --upstream and have the hosted CVO reach out and read this
configuration off HostedControlPlane or HostedCluster directly.

I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch
up with channel, availableUpdates, and conditionalUpdates, 2023-01-17, openshift#1954).
The recent 1ae3a36 (Always include AWS default security group in
worker security groups, 2024-02-05, openshift#3527) suggests that the "v1alpha1
is a superset of v1beta1" policy remains current.

[1]: https://issues.redhat.com/browse/XCMSTRAT-513
[2]: https://issues.redhat.com/browse/OTA-1185
[3]: openshift/cluster-version-operator#1035
[4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706
[5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365
wking added a commit to wking/hypershift that referenced this pull request Feb 14, 2024
[1] includes [2] pitching local update services in managed regions.
However, HyperShift had (until this commit) been forcing
ClusterVersion spec.upstream empty.  This commit is part of making the
upstream update service configurable on HyperShift, so those clusters
can hear about and assess known issues with updates in environments
where the default https://api.openshift.com/api/upgrades_info update
service is not accessible, or in which an alternative update service
was desired for testing or policy reasons. This includes [2],
mentioned above, but would also include any other instance of
disconnected/restricted-network use. The alternative for folks who
want HyperShift updates in places where api.openshift.com is
inaccessible are:

* Don't use an update service and manage that aspect manually.  But the
  update service declares multiple new releases each week, as well as
  delivering information about known risks/issues for local clusters
  to evalute their exposure.  That's a lot of information to manage
  manually, if folks decide not to plug into the existing update
  service tooling.
* Run a local update service, and fiddle with DNS and X.509 certs so
  that packets aimed at api.openshift.com get routed to your local
  service . This one requires less long-term effort than manually
  replacing the entire update service system, but it requires your
  clusters to trust an X.509 Certificate Authority that is willing to
  sign certificates for your local service saying "yup, that one is
  definitely api.openshift.com".

The implementation is similar to e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related
status properties piped up via CVO -> ClusterVersion.status ->
HostedControlPlane.status -> HostedCluster.status.  This commit adds a
new spec property that is piped down via:

1. HostedCluster (management cluster) spec.upstream.
2. HostedControlPlane (management cluster) spec.upstream.
3. Cluster-version operator Deployment --upstream command line option
   (management cluster), and also ClusterVersion spec.upstream.
4. Cluster-version operator container (managment cluster).

The CVO's new --upstream option is from [3], and we're using that
pathway instead of ClusterVersion spec, because we don't want the
update service URI to come via the the customer-accessible hosted API.
It's ok for the channel (which is used as a query parameter in the
update-service GET requests) to continue to flow in via ClusterVersion
spec.  I'm still populating ClusterVersion's spec.upstream in this
commit for discoverability, although because --upstream is being used,
the CVO will ignore the ClusterVersion spec.upstream value.

When the HyperShift operator sets this in HostedControlPlane for an
older cluster version, the old HostedControlPlane controller (launched
from the hosted release payload) will not recognize or propagate the
new property.  But that's not a terrible thing, and issues with
fetching update recommendations from the default update service will
still be reported in the RetrievedUpdates condition [4] with a message
mentioning the update service URI if there are problems [5].  Although
it doesn't look like we capture that condition currently:

  $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionFailing ConditionType = "ClusterVersionFailing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionProgressing ConditionType = "ClusterVersionProgressing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionAvailable ConditionType = "ClusterVersionAvailable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted"

We could opt to collect it in follow-up work if issues occur
frequently enough that dropping down to the ClusterVersion level to
debug becomes tedious.

If, in the future, we grow an option to give the hosted CVO kubeconfig
access to both the management and hosted Kubernetes APIs, we could
drop --upstream and have the hosted CVO reach out and read this
configuration off HostedControlPlane or HostedCluster directly.

I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch
up with channel, availableUpdates, and conditionalUpdates, 2023-01-17, openshift#1954).
The recent 1ae3a36 (Always include AWS default security group in
worker security groups, 2024-02-05, openshift#3527) suggests that the "v1alpha1
is a superset of v1beta1" policy remains current.

[1]: https://issues.redhat.com/browse/XCMSTRAT-513
[2]: https://issues.redhat.com/browse/OTA-1185
[3]: openshift/cluster-version-operator#1035
[4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706
[5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365
wking added a commit to wking/hypershift that referenced this pull request Feb 14, 2024
[1] includes [2] pitching local update services in managed regions.
However, HyperShift had (until this commit) been forcing
ClusterVersion spec.upstream empty.  This commit is part of making the
upstream update service configurable on HyperShift, so those clusters
can hear about and assess known issues with updates in environments
where the default https://api.openshift.com/api/upgrades_info update
service is not accessible, or in which an alternative update service
was desired for testing or policy reasons. This includes [2],
mentioned above, but would also include any other instance of
disconnected/restricted-network use. The alternative for folks who
want HyperShift updates in places where api.openshift.com is
inaccessible are:

* Don't use an update service and manage that aspect manually.  But the
  update service declares multiple new releases each week, as well as
  delivering information about known risks/issues for local clusters
  to evalute their exposure.  That's a lot of information to manage
  manually, if folks decide not to plug into the existing update
  service tooling.
* Run a local update service, and fiddle with DNS and X.509 certs so
  that packets aimed at api.openshift.com get routed to your local
  service . This one requires less long-term effort than manually
  replacing the entire update service system, but it requires your
  clusters to trust an X.509 Certificate Authority that is willing to
  sign certificates for your local service saying "yup, that one is
  definitely api.openshift.com".

The implementation is similar to e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related
status properties piped up via CVO -> ClusterVersion.status ->
HostedControlPlane.status -> HostedCluster.status.  This commit adds a
new spec property that is piped down via:

1. HostedCluster (management cluster) spec.upstream.
2. HostedControlPlane (management cluster) spec.upstream.
3. Cluster-version operator Deployment --upstream command line option
   (management cluster), and also ClusterVersion spec.upstream.
4. Cluster-version operator container (managment cluster).

The CVO's new --upstream option is from [3], and we're using that
pathway instead of ClusterVersion spec, because we don't want the
update service URI to come via the the customer-accessible hosted API.
It's ok for the channel (which is used as a query parameter in the
update-service GET requests) to continue to flow in via ClusterVersion
spec.  I'm still populating ClusterVersion's spec.upstream in this
commit for discoverability, although because --upstream is being used,
the CVO will ignore the ClusterVersion spec.upstream value.

When the HyperShift operator sets this in HostedControlPlane for an
older cluster version, the old HostedControlPlane controller (launched
from the hosted release payload) will not recognize or propagate the
new property.  But that's not a terrible thing, and issues with
fetching update recommendations from the default update service will
still be reported in the RetrievedUpdates condition [4] with a message
mentioning the update service URI if there are problems [5].  Although
it doesn't look like we capture that condition currently:

  $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionFailing ConditionType = "ClusterVersionFailing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionProgressing ConditionType = "ClusterVersionProgressing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionAvailable ConditionType = "ClusterVersionAvailable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted"

We could opt to collect it in follow-up work if issues occur
frequently enough that dropping down to the ClusterVersion level to
debug becomes tedious.

If, in the future, we grow an option to give the hosted CVO kubeconfig
access to both the management and hosted Kubernetes APIs, we could
drop --upstream and have the hosted CVO reach out and read this
configuration off HostedControlPlane or HostedCluster directly.

I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch
up with channel, availableUpdates, and conditionalUpdates, 2023-01-17, openshift#1954).
The recent 1ae3a36 (Always include AWS default security group in
worker security groups, 2024-02-05, openshift#3527) suggests that the "v1alpha1
is a superset of v1beta1" policy remains current.

The cmd/install/assets, docs/content/reference/api.md, and vendor
changes are automatic via:

  $ make update

[1]: https://issues.redhat.com/browse/XCMSTRAT-513
[2]: https://issues.redhat.com/browse/OTA-1185
[3]: openshift/cluster-version-operator#1035
[4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706
[5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365
wking added a commit to wking/hypershift that referenced this pull request Feb 14, 2024
[1] includes [2] pitching local update services in managed regions.
However, HyperShift had (until this commit) been forcing
ClusterVersion spec.upstream empty.  This commit is part of making the
upstream update service configurable on HyperShift, so those clusters
can hear about and assess known issues with updates in environments
where the default https://api.openshift.com/api/upgrades_info update
service is not accessible, or in which an alternative update service
was desired for testing or policy reasons. This includes [2],
mentioned above, but would also include any other instance of
disconnected/restricted-network use. The alternative for folks who
want HyperShift updates in places where api.openshift.com is
inaccessible are:

* Don't use an update service and manage that aspect manually.  But the
  update service declares multiple new releases each week, as well as
  delivering information about known risks/issues for local clusters
  to evalute their exposure.  That's a lot of information to manage
  manually, if folks decide not to plug into the existing update
  service tooling.
* Run a local update service, and fiddle with DNS and X.509 certs so
  that packets aimed at api.openshift.com get routed to your local
  service . This one requires less long-term effort than manually
  replacing the entire update service system, but it requires your
  clusters to trust an X.509 Certificate Authority that is willing to
  sign certificates for your local service saying "yup, that one is
  definitely api.openshift.com".

The implementation is similar to e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related
status properties piped up via CVO -> ClusterVersion.status ->
HostedControlPlane.status -> HostedCluster.status.  This commit adds a
new spec property that is piped down via:

1. HostedCluster (management cluster) spec.upstream.
2. HostedControlPlane (management cluster) spec.upstream.
3. Cluster-version operator Deployment --upstream command line option
   (management cluster), and also ClusterVersion spec.upstream.
4. Cluster-version operator container (managment cluster).

The CVO's new --upstream option is from [3], and we're using that
pathway instead of ClusterVersion spec, because we don't want the
update service URI to come via the the customer-accessible hosted API.
It's ok for the channel (which is used as a query parameter in the
update-service GET requests) to continue to flow in via ClusterVersion
spec.  I'm still populating ClusterVersion's spec.upstream in this
commit for discoverability, although because --upstream is being used,
the CVO will ignore the ClusterVersion spec.upstream value.

When the HyperShift operator sets this in HostedControlPlane for an
older cluster version, the old HostedControlPlane controller (launched
from the hosted release payload) will not recognize or propagate the
new property.  But that's not a terrible thing, and issues with
fetching update recommendations from the default update service will
still be reported in the RetrievedUpdates condition [4] with a message
mentioning the update service URI if there are problems [5].  Although
it doesn't look like we capture that condition currently:

  $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionFailing ConditionType = "ClusterVersionFailing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionProgressing ConditionType = "ClusterVersionProgressing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionAvailable ConditionType = "ClusterVersionAvailable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted"

We could opt to collect it in follow-up work if issues occur
frequently enough that dropping down to the ClusterVersion level to
debug becomes tedious.

If, in the future, we grow an option to give the hosted CVO kubeconfig
access to both the management and hosted Kubernetes APIs, we could
drop --upstream and have the hosted CVO reach out and read this
configuration off HostedControlPlane or HostedCluster directly.

I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch
up with channel, availableUpdates, and conditionalUpdates, 2023-01-17, openshift#1954).
The recent 1ae3a36 (Always include AWS default security group in
worker security groups, 2024-02-05, openshift#3527) suggests that the "v1alpha1
is a superset of v1beta1" policy remains current.

The cmd/install/assets, client/applyconfiguration,
docs/content/reference/api.md, hack/app-sre/saas_template.yaml, and
vendor changes are automatic via:

  $ make update

[1]: https://issues.redhat.com/browse/XCMSTRAT-513
[2]: https://issues.redhat.com/browse/OTA-1185
[3]: openshift/cluster-version-operator#1035
[4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706
[5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365
wking added a commit to wking/hypershift that referenced this pull request Mar 29, 2024
[1] includes [2] pitching local update services in managed regions.
However, HyperShift had (until this commit) been forcing
ClusterVersion spec.upstream empty.  This commit is part of making the
upstream update service configurable on HyperShift, so those clusters
can hear about and assess known issues with updates in environments
where the default https://api.openshift.com/api/upgrades_info update
service is not accessible, or in which an alternative update service
was desired for testing or policy reasons. This includes [2],
mentioned above, but would also include any other instance of
disconnected/restricted-network use. The alternative for folks who
want HyperShift updates in places where api.openshift.com is
inaccessible are:

* Don't use an update service and manage that aspect manually.  But the
  update service declares multiple new releases each week, as well as
  delivering information about known risks/issues for local clusters
  to evalute their exposure.  That's a lot of information to manage
  manually, if folks decide not to plug into the existing update
  service tooling.
* Run a local update service, and fiddle with DNS and X.509 certs so
  that packets aimed at api.openshift.com get routed to your local
  service . This one requires less long-term effort than manually
  replacing the entire update service system, but it requires your
  clusters to trust an X.509 Certificate Authority that is willing to
  sign certificates for your local service saying "yup, that one is
  definitely api.openshift.com".

The implementation is similar to e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related
status properties piped up via CVO -> ClusterVersion.status ->
HostedControlPlane.status -> HostedCluster.status.  This commit adds a
new spec property that is piped down via:

1. HostedCluster (management cluster) spec.updateService.
2. HostedControlPlane (management cluster) spec.updateService.
3. Cluster-version operator Deployment --update-service command line
   option (management cluster), and also ClusterVersion spec.upstream.
4. Cluster-version operator container (managment cluster).

The CVO's new --update-service option is from [3], and we're using
that pathway instead of ClusterVersion spec, because we don't want the
update service URI to come via the the customer-accessible hosted API.
It's ok for the channel (which is used as a query parameter in the
update-service GET requests) to continue to flow in via ClusterVersion
spec.  I'm still populating ClusterVersion's spec.upstream in this
commit for discoverability, although because --update-service is being
used, the CVO will ignore the ClusterVersion spec.upstream value.

When the HyperShift operator sets this in HostedControlPlane for an
older cluster version, the old HostedControlPlane controller (launched
from the hosted release payload) will not recognize or propagate the
new property.  But that's not a terrible thing, and issues with
fetching update recommendations from the default update service will
still be reported in the RetrievedUpdates condition [4] with a message
mentioning the update service URI if there are problems [5].  Although
it doesn't look like we capture that condition currently:

  $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionFailing ConditionType = "ClusterVersionFailing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionProgressing ConditionType = "ClusterVersionProgressing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionAvailable ConditionType = "ClusterVersionAvailable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted"

We could opt to collect it in follow-up work if issues occur
frequently enough that dropping down to the ClusterVersion level to
debug becomes tedious.

If, in the future, we grow an option to give the hosted CVO kubeconfig
access to both the management and hosted Kubernetes APIs, we could
drop --update-service and have the hosted CVO reach out and read this
configuration off HostedControlPlane or HostedCluster directly.

I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch
up with channel, availableUpdates, and conditionalUpdates, 2023-01-17, openshift#1954).
The recent 1ae3a36 (Always include AWS default security group in
worker security groups, 2024-02-05, openshift#3527) suggests that the "v1alpha1
is a superset of v1beta1" policy remains current.

The cmd/install/assets, client/applyconfiguration,
docs/content/reference/api.md, hack/app-sre/saas_template.yaml, and
vendor changes are automatic via:

  $ make update

[1]: https://issues.redhat.com/browse/XCMSTRAT-513
[2]: https://issues.redhat.com/browse/OTA-1185
[3]: openshift/cluster-version-operator#1035
[4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706
[5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365
@csrwng csrwng deleted the additional_sgs branch July 12, 2024 15:55
pamelachristie pushed a commit to pamelachristie/hypershift that referenced this pull request Mar 31, 2025
[1] includes [2] pitching local update services in managed regions.
However, HyperShift had (until this commit) been forcing
ClusterVersion spec.upstream empty.  This commit is part of making the
upstream update service configurable on HyperShift, so those clusters
can hear about and assess known issues with updates in environments
where the default https://api.openshift.com/api/upgrades_info update
service is not accessible, or in which an alternative update service
was desired for testing or policy reasons. This includes [2],
mentioned above, but would also include any other instance of
disconnected/restricted-network use. The alternative for folks who
want HyperShift updates in places where api.openshift.com is
inaccessible are:

* Don't use an update service and manage that aspect manually.  But the
  update service declares multiple new releases each week, as well as
  delivering information about known risks/issues for local clusters
  to evalute their exposure.  That's a lot of information to manage
  manually, if folks decide not to plug into the existing update
  service tooling.
* Run a local update service, and fiddle with DNS and X.509 certs so
  that packets aimed at api.openshift.com get routed to your local
  service . This one requires less long-term effort than manually
  replacing the entire update service system, but it requires your
  clusters to trust an X.509 Certificate Authority that is willing to
  sign certificates for your local service saying "yup, that one is
  definitely api.openshift.com".

The implementation is similar to e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related
status properties piped up via CVO -> ClusterVersion.status ->
HostedControlPlane.status -> HostedCluster.status.  This commit adds a
new spec property that is piped down via:

1. HostedCluster (management cluster) spec.updateService.
2. HostedControlPlane (management cluster) spec.updateService.
3. Cluster-version operator Deployment --update-service command line
   option (management cluster), and also ClusterVersion spec.upstream.
4. Cluster-version operator container (managment cluster).

The CVO's new --update-service option is from [3], and we're using
that pathway instead of ClusterVersion spec, because we don't want the
update service URI to come via the the customer-accessible hosted API.
It's ok for the channel (which is used as a query parameter in the
update-service GET requests) to continue to flow in via ClusterVersion
spec.  I'm still populating ClusterVersion's spec.upstream in this
commit for discoverability, although because --update-service is being
used, the CVO will ignore the ClusterVersion spec.upstream value.

When the HyperShift operator sets this in HostedControlPlane for an
older cluster version, the old HostedControlPlane controller (launched
from the hosted release payload) will not recognize or propagate the
new property.  But that's not a terrible thing, and issues with
fetching update recommendations from the default update service will
still be reported in the RetrievedUpdates condition [4] with a message
mentioning the update service URI if there are problems [5].  Although
it doesn't look like we capture that condition currently:

  $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionFailing ConditionType = "ClusterVersionFailing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionProgressing ConditionType = "ClusterVersionProgressing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionAvailable ConditionType = "ClusterVersionAvailable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted"

We could opt to collect it in follow-up work if issues occur
frequently enough that dropping down to the ClusterVersion level to
debug becomes tedious.

If, in the future, we grow an option to give the hosted CVO kubeconfig
access to both the management and hosted Kubernetes APIs, we could
drop --update-service and have the hosted CVO reach out and read this
configuration off HostedControlPlane or HostedCluster directly.

I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch
up with channel, availableUpdates, and conditionalUpdates, 2023-01-17, openshift#1954).
The recent 1ae3a36 (Always include AWS default security group in
worker security groups, 2024-02-05, openshift#3527) suggests that the "v1alpha1
is a superset of v1beta1" policy remains current.

The cmd/install/assets, client/applyconfiguration,
docs/content/reference/api.md, hack/app-sre/saas_template.yaml, and
vendor changes are automatic via:

  $ make update

[1]: https://issues.redhat.com/browse/XCMSTRAT-513
[2]: https://issues.redhat.com/browse/OTA-1185
[3]: openshift/cluster-version-operator#1035
[4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706
[5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/ci-tooling Indicates the PR includes changes for CI or tooling area/cli Indicates the PR includes changes for CLI area/documentation Indicates the PR includes changes for documentation area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants