Skip to content

Conversation

@2uasimojo
Copy link
Member

@2uasimojo 2uasimojo commented Feb 27, 2024

Propose an enhancement to introduce a new CRD, ManagedClusterVersion. This is a namespaced object to be used by fleet management software to provide a common view into managed clusters' version/upgrade information.

HIVE-2366
HIVE-2428

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 27, 2024
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 27, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 27, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from 2uasimojo. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment


If SNO/MicroShift clusters are part of a fleet, their fleet manager may
broker their ClusterVersion objects in the manner described [above](#workflow-description).
In this scenario they are the same as any other OpenShift spoke.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is to use ACM to manage some aspects of MicroShift deployments. I don't know if hive is involved in the integration between ACM and MicroShift.

MicroShift does not have a ClusterVersion API because upgrades are not driven by the CVO. MicroShift uses a ConfigMap to report its version data.

If hive is part of the integration of ACM and MicroShift, will hive have a separate implementation of where to get the version details for MicroShift?

If hive is not present, would something else need to create the ManagedClusterVersion CR in the ACM hub cluster? What will do that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MicroShift does not have a ClusterVersion API

I didn't realize.

MicroShift uses a ConfigMap to report its version data.

Does that ConfigMap have the same scope of information as ClusterVersion? Not that we need to get into it here, but if so... why wouldn't we be using CVO?

If hive is part of the integration of ACM and MicroShift

It's not. Uh, unless Assisted supports MicroShift? Does it?

If hive is not present, would something else need to create the ManagedClusterVersion CR in the ACM hub cluster? What will do that?

Yes, exactly the point of making this CRD common rather than scoped to hive. In the case of hypershift, the idea is for hypershift to do it. If there are other fleet manager thingies in the world, they would (or could) do the same.

In the ACM scenario, both hive and hypershift would be present, each managing their own subset of clusters, each generating ManagedClusterVersion CRs for their subset, resulting in (identically-schemaed) objects for every spoke the ACM instance manages. That's the dream :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed that Assisted doesn't do MicroShift, so I think we're in the clear here in terms of hive (not) having to understand the ConfigMap thing.

That doesn't mean ACM doesn't/won't support MicroShift, but since upgrades there are such a different beast, I don't imagine they'll be using this mechanism at all. I'll update accordingly.

enhancement, the \*CM layer is responsible for driving upgrades. To do so,
it needs visibility into the spoke cluster's ClusterVersion data. Today the
only mechanisms available for accessing this information entail logging into,
or running an agent in, the spoke cluster. This is not ideal:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the example of the agent? Klusterlet (https://operatorhub.io/operator/klusterlet) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, yes.

Except does klusterlet have a way to initiate communication with the hub, or does it only pull from the hub?

I get confused with the different OCMs -- is this the one ACM uses? Does Hypershift use this one as well?

In any case, is it worth mentioning/discussing in the document?

want a common way to view version and upgrade information, regardless of the
software layer between me and the spokes, so that I can simplify my code,
reduce my test surface, and spend less on maintenance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As SRE we want to get the recommended version information from the cluster-version-operator because it has the capability to evaluate conditional update risks and come up with recommended updates

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to expand this little more. let me know if you need more context on this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add your suggestion above. What else are you thinking?

Propose an enhancement to introduce a new CRD, ManagedClusterVersion.
This is a *namespaced* object to be used by fleet management software to
provide a common view into managed clusters' version/upgrade
information.

HIVE-2366
HIVE-2428
@2uasimojo 2uasimojo force-pushed the HIVE-2428/ManagedClusterVersion branch from 0e67aa6 to 16f08dd Compare March 6, 2024 23:41

# ManagedClusterVersion CRD

## Summary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This summary looks similar to https://github.com/kubernetes/enhancements/tree/master/keps/sig-multicluster/4322-cluster-inventory . Determining if usage is appropriate and how we would build extensions could assist both efforts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've read that KEP. It seems intentionally vague and non-prescriptive, and also not far enough along to obviate the need for us to invent pieces that are out of its scope. As currently proposed, the ManagedClusterVersion CRD is not intended to replace or wrap the ClusterDeployment/HostedCluster, nor to satisfy most of the use cases described (or hinted at) in the KEP. IMHO attempting to design in anticipation of that goal would a) be impossible; and b) inflate the effort and extend the timeline untenably.

I can see this EP incorporating including a spec.clusterManager.name field and matching x-k8s.io/cluster-manager label on the proposed CRD, if you think that's a good idea.

Re generated names: I can see value in prefixing the name of the ManagedClusterVersion CRD with the name of its manager (hive-$cdname/hypershift-$hcname) to preclude conflicts in cases where a single hub is managing spokes under different managers. However, I don't see value in adding a unique slug. In fact, I see it being beneficial not to do that, as I can map deterministically between the two CRDs without needing to rely on further labels/fields. Thoughts?

want a common way to view version and upgrade information, regardless of the
software layer between me and the spokes, so that I can simplify my code,
reduce my test surface, and spend less on maintenance.
* As a Site Reliability Engineer (SRE) I want to get the recommended version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make more sense to develop a tool for SRE that sits outside of a cluster and scans clusters instead of running the agent in every cluster?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually exactly what this proposal is all about. Hive and hypershift are exactly such tools today: they sit on a hub cluster and collect data from the spokes*. This proposal is about adding ClusterVersion data to what is collected, and doing it in a CRD that both hive and hypershift (and others) can share.

*Though TBH I don't know whether hypershift does it via an in-cluster agent that reports back to the hub. Hive for sure does not -- the controller on the hub polls spoke clusters via clients constructed from admin kubeconfigs.

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 16, 2024
@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 23, 2024
Copy link
Member Author

@2uasimojo 2uasimojo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/remove-lifecycle rotten


# ManagedClusterVersion CRD

## Summary
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've read that KEP. It seems intentionally vague and non-prescriptive, and also not far enough along to obviate the need for us to invent pieces that are out of its scope. As currently proposed, the ManagedClusterVersion CRD is not intended to replace or wrap the ClusterDeployment/HostedCluster, nor to satisfy most of the use cases described (or hinted at) in the KEP. IMHO attempting to design in anticipation of that goal would a) be impossible; and b) inflate the effort and extend the timeline untenably.

I can see this EP incorporating including a spec.clusterManager.name field and matching x-k8s.io/cluster-manager label on the proposed CRD, if you think that's a good idea.

Re generated names: I can see value in prefixing the name of the ManagedClusterVersion CRD with the name of its manager (hive-$cdname/hypershift-$hcname) to preclude conflicts in cases where a single hub is managing spokes under different managers. However, I don't see value in adding a unique slug. In fact, I see it being beneficial not to do that, as I can map deterministically between the two CRDs without needing to rely on further labels/fields. Thoughts?

want a common way to view version and upgrade information, regardless of the
software layer between me and the spokes, so that I can simplify my code,
reduce my test surface, and spend less on maintenance.
* As a Site Reliability Engineer (SRE) I want to get the recommended version
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually exactly what this proposal is all about. Hive and hypershift are exactly such tools today: they sit on a hub cluster and collect data from the spokes*. This proposal is about adding ClusterVersion data to what is collected, and doing it in a CRD that both hive and hypershift (and others) can share.

*Though TBH I don't know whether hypershift does it via an in-cluster agent that reports back to the hub. Hive for sure does not -- the controller on the hub polls spoke clusters via clients constructed from admin kubeconfigs.

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 23, 2024
@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 22, 2024
@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 29, 2024
@openshift-bot
Copy link

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Jun 6, 2024
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 6, 2024

@openshift-bot: Closed this PR.

Details

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@2uasimojo
Copy link
Member Author

/cc @derekwaynecarr @csrwng

@openshift-ci openshift-ci bot requested review from csrwng and derekwaynecarr July 24, 2024 18:33
@2uasimojo
Copy link
Member Author

/cc @jnpacker @vkareh @JoelSpeed @berenss

@openshift-ci openshift-ci bot requested a review from berenss July 24, 2024 18:48
@openshift-ci openshift-ci bot requested a review from jupierce July 24, 2024 18:51
@2uasimojo
Copy link
Member Author

nts: Address how the CRD is lifecycled on a given hub. Maybe each controller ensures it is at least the max version it can handle: upgrade if lower, no-op if it is already greater or equal.

@LalatenduMohanty
Copy link
Member

/remove-lifecycle rotten

@LalatenduMohanty
Copy link
Member

/lifecycle frozen

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 27, 2024
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 27, 2024

@LalatenduMohanty: The lifecycle/frozen label cannot be applied to Pull Requests.

Details

In response to this:

/lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@2uasimojo
Copy link
Member Author

/reopen
/remove-lifecycle rotten
/lifecycle frozen

@openshift-ci openshift-ci bot reopened this Aug 27, 2024
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 27, 2024

@2uasimojo: Reopened this PR.

Details

In response to this:

/reopen
/remove-lifecycle rotten
/lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 27, 2024

@2uasimojo: The lifecycle/frozen label cannot be applied to Pull Requests.

Details

In response to this:

/reopen
/remove-lifecycle rotten
/lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 25, 2024
@2uasimojo
Copy link
Member Author

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 25, 2024
@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 24, 2024
@2uasimojo
Copy link
Member Author

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 24, 2024
@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 22, 2024
@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 29, 2024
@openshift-bot
Copy link

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Dec 7, 2024
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 7, 2024

@openshift-bot: Closed this PR.

Details

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@2uasimojo
Copy link
Member Author

Just a note in light of recent discussions around the UX of upgrades in managed services: We still have a gap in that nothing is Watch()ing the clusterversion object on the spoke. So updating the hub's ManagedClusterVersion requires a poke/poll. Otherwise it can become stale, and stay so indefinitely.

@2uasimojo
Copy link
Member Author

This EP represents a consolidated solution that would (could) apply to both HCP and hive. Both have now implemented standalone solutions (cf. openshift/hive#2206; I don't have the HCP one on hand).

@2uasimojo 2uasimojo deleted the HIVE-2428/ManagedClusterVersion branch September 3, 2025 18:43
@wking
Copy link
Member

wking commented Sep 3, 2025

I don't have the HCP one on hand

HostedCluster lifted up the properties relevant to update planning in openshift/hypershift#1954 and openshift/hypershift#4744.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants