-
Notifications
You must be signed in to change notification settings - Fork 535
Cluster Profiles #200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster Profiles #200
Conversation
|
Looks good. It'd be nice to see something regarding testing, like a requirement that we have a CI job that assures us that we don't inadvertently create a new profile via a typo by checking against a list of cataloged/approved profile names? |
|
/cc @abhinavdahiya |
|
This enhancement supports #202 |
|
/cc @smarterclayton |
|
@deads2k ptal |
enhancements/cvo/cluster-profiles.md
Outdated
| The following annotations may be used to include/exclude manifests for a given profile: | ||
|
|
||
| ``` | ||
| exclude.release.openshift.io/[identifier]=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should avoid excludes. Every labels selector we have today is a positive selection. Negative exclusion will include manifests without the owners of those manifests choosing to support a profile. Instead of owners knowing the profiles they need to support, we'll end up with case where bugs are opened against components who then respond with, "wait, what profile? that isn't going to work".
To the common counter argument of "but then it takes a lot of work to create a new profile and all the teams have to aware of it", I say "good!". We don't want that many profiles or our support matrix will grow and customers will be trying to figure out which profile is right for them. Profiles should come with a cost and if component owners are expected to support their components in these profiles, they really should be aware even if it's exactly an optional assignment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, an alternative would be to have a positive list somewhere else which qualifies an operator for a certain profile. Then excludes might make it easier to express profiles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deads2k i think this is a good argument.
|
Aside from what @deads2k already commented about profile being strictly a list of inclusions, I'm missing information are we envisioning profile to support other use cases, such as platform or deployment configuration model (iow. number of masters, workers), etc. From the initial description of the problem I was assuming it should but it's not covered there. |
@soltysh yes there are likely other uses for profiles (CRC comes to mind). However, at this time, the only concrete use case I have is that of IBM public cloud, which is what drives the 2 user stories I have here. It'd be great to get input from other folks that have use cases for this to provide feedback on whether they think what I'm proposing here makes sense to them. I also like @deads2k's proposal of only having positive selectors, so I will be updating the enhancement to reflect that. With that we'll also need to come up with a plan for rollout given that manifests will first need to be prepared for profiles support before it's switched on in the CVO. |
|
@praveenkumar @gbraad fyi |
|
Thanks @csrwng, For CRC we want to have 2 different use case.
|
What I'm proposing should allow this use case, given that per profile each operator can provide a different manifest. In the case of the operator deployment itself, the alternate manifest for your profile could specify lower resource requirements. For the operand part, the operator itself would need to be modified to create deployments with lower resource requirements given some setting such as an environment variable telling it to do so. |
enhancements/cvo/cluster-profiles.md
Outdated
|
|
||
| ## Motivation | ||
|
|
||
| In order to support different deployment models in which not all operators rendered by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any other known use case for profiles we can include here to include a better level of context/expectations?
What do we consider best practices for profiles vs no-op operators? e.g would we want a "baremetal profile" or would we want no-op "baremetal" operator deployed everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it would be good to include other scenarios we want this design to cover and the bare metal one is a good one consider
Perhaps "no-op operator" isn't a good way to classify the baremetal-operator case though - for example, do we want bare metal specific CRDs to be installed on all platforms? It's not just a question of a process running that does nothing ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@markmc i think its an error that we have namespaces and crds deployed to a cluster for contexts that are not appropriate. we should aspire to move away from that rather than continue to lean into it. for example, every cluster has a openshift-kni-infra or openshift-ovirt-infra even where it is not appropriate.
the initial use case for cluster profile was to exclude specific operators entirely from being applied. the most notable example was the cluster version operator not installing kube-* operators in deployments where the control plane is hosted on an external supervisor cluster a la IBM Public Cloud.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(feedback taken - #212 won't pursue the cluster profile path, instead it will propose a new SLO for bare metal)
But on the point of CRDs ... your POV seems at odds with Clayton's example of the CNO in this comment: #212 (comment)
Would be great to get a resolution on this, because the path of a bare metal SLO that is installed, running, but "disabled" on other platforms would seem to inevitably be introducing more stuff "deployed to a cluster for contexts that are not appropriate".
enhancements/cvo/cluster-profiles.md
Outdated
| ``` | ||
| This would make the CVO render this manifest only when `CLUSTER_PROFILE=[identifier]` | ||
| has been specified. For the default cluster profile (no `CLUSTER_PROFILE` variable), | ||
| manifests that have any include annotation will always be excluded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to cover how we expect cluster profiles to be exposed to users other than this environment variable - i.e. I presume in most cases we're not expecting customers to modify the CVO deployment to add the env var
e.g. would this be an install-config.yaml thing, templated into bootkube.sh, passed to cvo render, and templated into the CVO deployment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@markmc the initial use case scenario was for ROKS offering on IBM Cloud. I would want to find a model that worked there first before expanding scope much more. If we want to think about two use cases, CRC (excluding HA deployment topology versus single node concerns) maybe a good counter example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine ... it's just entirely unclear to me how we expect this env variable to be set in the hypershift, CRC, or any other case. Maybe I'm missing something obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left that part undefined (and now adding text that explicitly says so) because potential use cases such as hypershift and CRC take completely different approaches to getting a useable cluster up and running. Both bypass the regular installer in some form, but describing how that should be done would confuse this proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For CRC, we are currently leaning towards a configmap containing the cluster information which is set at install time openshift/cluster-version-operator#404
| As a user, I can create a cluster in which node selectors for certain operators target | ||
| worker nodes instead of master nodes. | ||
|
|
||
| ### Design |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is not part of the template. Should it be removed and have the content moved up to the Proposal section?
| variable. For a given cluster, only one CVO profile may be in effect. | ||
|
|
||
| NOTE: The mechanism by which the environment variable is set on the CVO deployment is | ||
| out of the scope of this design. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there notes about how this works for IBM today? Are the CRC folks on-board with managing their own CVO Deployment (seems like that might make updates difficult, as the CVO updates its own Deployment to bump itself?)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We (CRC) were actually considering using a config map to set the profile, but I don't know how well it fits the IBM use case.
See https://github.com/openshift/cluster-version-operator/pull/404/files#diff-d540a41404f678a1c438f4c1e5b92a87R330-R337
|
|
||
| Cluster profiles are a way to support different deployment models for OpenShift clusters. | ||
| A profile is an identifier that the Cluster Version Operator uses to determine | ||
| which manifests to apply. Operators can be excluded completely or can have different |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are "excluded" here and some subsequent "include/exclude" references stale after this discussion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect that when you don't specify a profile explicitly in a manifest through an 'include', this manifest is excluded from this profile. So I think this specific reference is not stale.
|
/hold |
|
@wking PTAL |
| include.release.openshift.io/[identifier]=true | ||
| ``` | ||
| This would make the CVO render this manifest only when `CLUSTER_PROFILE=[identifier]` | ||
| has been specified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably add a follow-up paragraph that explicitly says manifests with no
include.release.openshift.io/* annotations will always be included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be additional information to clarify, but would this hold back merging?
Note: #200 (comment) if this is addressed this should not be an issue, but just concerned about the freeze deadline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding text like:
Manifests with no annotations prefixed by
include.release.openshift.io/are always included. Manifests with annotations prefixed byinclude.release.openshift.io/with values other thantrueare undefined.
would cover my concerns (with undefined in the nasal demons sense).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gbraad can you open a follow up to add this, I think we really need to get something in about this.
|
Can we pull this reasoning about non-goals and scoping out into the enhancement somewhere? Like the template's Alternatives section? I think it's useful to make it more discoverable than "dig through resolved threads on the PR". |
FWIW, there's some text in #212 under "Alternatives: Use a CVO cluster profile" that you can reuse or reference - https://github.com/openshift/enhancements/pull/212/files#diff-10846701f494dacbcae49f251fe274b0R415-R435 |
@csrwng Can we address this concern? I also felt the same when I gone through the PR again.
We can borrow some of the texts from here. |
Support different sets of manifests applied by the CVO based on an installation cluster profile.
|
Thx @LalatenduMohanty, updated and squashed. |
LalatenduMohanty
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: crawford, csrwng, LalatenduMohanty, sdodson The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@sdodson We should be able to merge this PR now. |
|
/hold cancel |
Support different sets of manifests applied by the CVO based on an installation cluster profile.