Cluster Profiles #200

csrwng · 2020-02-04T20:41:58Z

Support different sets of manifests applied by the CVO based on an installation cluster profile.

sdodson · 2020-02-04T20:59:55Z

Looks good. It'd be nice to see something regarding testing, like a requirement that we have a CI job that assures us that we don't inadvertently create a new profile via a typo by checking against a list of cataloged/approved profile names?

sdodson · 2020-02-04T21:00:11Z

/cc @abhinavdahiya

csrwng · 2020-02-04T23:56:02Z

This enhancement supports #202

derekwaynecarr · 2020-02-05T01:26:49Z

/cc @smarterclayton

csrwng · 2020-02-05T18:17:17Z

@deads2k ptal

deads2k · 2020-02-06T13:39:52Z

enhancements/cvo/cluster-profiles.md

+The following annotations may be used to include/exclude manifests for a given profile:
+
+```
+exclude.release.openshift.io/[identifier]=true


I think we should avoid excludes. Every labels selector we have today is a positive selection. Negative exclusion will include manifests without the owners of those manifests choosing to support a profile. Instead of owners knowing the profiles they need to support, we'll end up with case where bugs are opened against components who then respond with, "wait, what profile? that isn't going to work".

To the common counter argument of "but then it takes a lot of work to create a new profile and all the teams have to aware of it", I say "good!". We don't want that many profiles or our support matrix will grow and customers will be trying to figure out which profile is right for them. Profiles should come with a cost and if component owners are expected to support their components in these profiles, they really should be aware even if it's exactly an optional assignment.

That said, an alternative would be to have a positive list somewhere else which qualifies an operator for a certain profile. Then excludes might make it easier to express profiles.

@deads2k i think this is a good argument.

soltysh · 2020-02-06T14:33:21Z

Aside from what @deads2k already commented about profile being strictly a list of inclusions, I'm missing information are we envisioning profile to support other use cases, such as platform or deployment configuration model (iow. number of masters, workers), etc. From the initial description of the problem I was assuming it should but it's not covered there.

csrwng · 2020-02-06T15:48:03Z

Aside from what @deads2k already commented about profile being strictly a list of inclusions, I'm missing information are we envisioning profile to support other use cases, such as platform or deployment configuration model (iow. number of masters, workers), etc. From the initial description of the problem I was assuming it should but it's not covered there.

@soltysh yes there are likely other uses for profiles (CRC comes to mind). However, at this time, the only concrete use case I have is that of IBM public cloud, which is what drives the 2 user stories I have here. It'd be great to get input from other folks that have use cases for this to provide feedback on whether they think what I'm proposing here makes sense to them.

I also like @deads2k's proposal of only having positive selectors, so I will be updating the enhancement to reflect that. With that we'll also need to come up with a plan for rollout given that manifests will first need to be prepared for profiles support before it's switched on in the CVO.

csrwng · 2020-02-07T13:21:59Z

@praveenkumar @gbraad fyi

praveenkumar · 2020-02-07T14:05:52Z

Thanks @csrwng, For CRC we want to have 2 different use case.

Exclusions of some manifests which we don't want to use by default like monitoring.
If profile can also support the resource limit then operators should use less resource if it is a single node cluster (not sure if that is in the scope of this enhancement)

csrwng · 2020-02-07T14:17:20Z

2. If profile can also support the resource limit then operators should use less resource if it is a single node cluster (not sure if that is in the scope of this enhancement)

What I'm proposing should allow this use case, given that per profile each operator can provide a different manifest. In the case of the operator deployment itself, the alternate manifest for your profile could specify lower resource requirements. For the operand part, the operator itself would need to be modified to create deployments with lower resource requirements given some setting such as an environment variable telling it to do so.

enxebre · 2020-02-17T08:44:03Z

enhancements/cvo/cluster-profiles.md

+
+## Motivation
+
+In order to support different deployment models in which not all operators rendered by


is there any other known use case for profiles we can include here to include a better level of context/expectations?
What do we consider best practices for profiles vs no-op operators? e.g would we want a "baremetal profile" or would we want no-op "baremetal" operator deployed everywhere?

I agree it would be good to include other scenarios we want this design to cover and the bare metal one is a good one consider

Perhaps "no-op operator" isn't a good way to classify the baremetal-operator case though - for example, do we want bare metal specific CRDs to be installed on all platforms? It's not just a question of a process running that does nothing ...

@markmc i think its an error that we have namespaces and crds deployed to a cluster for contexts that are not appropriate. we should aspire to move away from that rather than continue to lean into it. for example, every cluster has a openshift-kni-infra or openshift-ovirt-infra even where it is not appropriate.

the initial use case for cluster profile was to exclude specific operators entirely from being applied. the most notable example was the cluster version operator not installing kube-* operators in deployments where the control plane is hosted on an external supervisor cluster a la IBM Public Cloud.

(feedback taken - #212 won't pursue the cluster profile path, instead it will propose a new SLO for bare metal)

But on the point of CRDs ... your POV seems at odds with Clayton's example of the CNO in this comment: #212 (comment)

Would be great to get a resolution on this, because the path of a bare metal SLO that is installed, running, but "disabled" on other platforms would seem to inevitably be introducing more stuff "deployed to a cluster for contexts that are not appropriate".

enhancements/cvo/cluster-profiles.md

markmc · 2020-02-17T17:32:18Z

enhancements/cvo/cluster-profiles.md

+```
+This would make the CVO render this manifest only when `CLUSTER_PROFILE=[identifier]`
+has been specified. For the default cluster profile (no `CLUSTER_PROFILE` variable), 
+manifests that have any include annotation will always be excluded.


It would be good to cover how we expect cluster profiles to be exposed to users other than this environment variable - i.e. I presume in most cases we're not expecting customers to modify the CVO deployment to add the env var

e.g. would this be an install-config.yaml thing, templated into bootkube.sh, passed to cvo render, and templated into the CVO deployment?

@markmc the initial use case scenario was for ROKS offering on IBM Cloud. I would want to find a model that worked there first before expanding scope much more. If we want to think about two use cases, CRC (excluding HA deployment topology versus single node concerns) maybe a good counter example.

That's fine ... it's just entirely unclear to me how we expect this env variable to be set in the hypershift, CRC, or any other case. Maybe I'm missing something obvious.

I left that part undefined (and now adding text that explicitly says so) because potential use cases such as hypershift and CRC take completely different approaches to getting a useable cluster up and running. Both bypass the regular installer in some form, but describing how that should be done would confuse this proposal.

For CRC, we are currently leaning towards a configmap containing the cluster information which is set at install time openshift/cluster-version-operator#404

enhancements/cvo/cluster-profiles.md

wking · 2020-03-18T21:35:33Z

enhancements/cvo/cluster-profiles.md

+As a user, I can create a cluster in which node selectors for certain operators target
+worker nodes instead of master nodes.
+
+### Design


This section is not part of the template. Should it be removed and have the content moved up to the Proposal section?

wking · 2020-03-18T21:38:35Z

enhancements/cvo/cluster-profiles.md

+variable. For a given cluster, only one CVO profile may be in effect.
+
+NOTE: The mechanism by which the environment variable is set on the CVO deployment is 
+out of the scope of this design.


Are there notes about how this works for IBM today? Are the CRC folks on-board with managing their own CVO Deployment (seems like that might make updates difficult, as the CVO updates its own Deployment to bump itself?)?

We (CRC) were actually considering using a config map to set the profile, but I don't know how well it fits the IBM use case.
See https://github.com/openshift/cluster-version-operator/pull/404/files#diff-d540a41404f678a1c438f4c1e5b92a87R330-R337

wking · 2020-03-18T21:44:37Z

enhancements/cvo/cluster-profiles.md

+
+Cluster profiles are a way to support different deployment models for OpenShift clusters. 
+A profile is an identifier that the Cluster Version Operator uses to determine
+which manifests to apply. Operators can be excluded completely or can have different


Are "excluded" here and some subsequent "include/exclude" references stale after this discussion?

I would expect that when you don't specify a profile explicitly in a manifest through an 'include', this manifest is excluded from this profile. So I think this specific reference is not stale.

sdodson · 2020-07-20T12:17:23Z

/hold

LalatenduMohanty · 2020-07-20T12:21:29Z

@wking PTAL

wking · 2020-07-20T13:56:55Z

enhancements/cvo/cluster-profiles.md

+include.release.openshift.io/[identifier]=true
+```
+This would make the CVO render this manifest only when `CLUSTER_PROFILE=[identifier]`
+has been specified. 


Probably add a follow-up paragraph that explicitly says manifests with no
include.release.openshift.io/* annotations will always be included.

This would be additional information to clarify, but would this hold back merging?

Note: #200 (comment) if this is addressed this should not be an issue, but just concerned about the freeze deadline

Adding text like:

Manifests with no annotations prefixed by include.release.openshift.io/ are always included. Manifests with annotations prefixed by include.release.openshift.io/ with values other than true are undefined.

would cover my concerns (with undefined in the nasal demons sense).

@gbraad can you open a follow up to add this, I think we really need to get something in about this.

wking · 2020-07-20T13:59:58Z

Can we pull this reasoning about non-goals and scoping out into the enhancement somewhere? Like the template's Alternatives section? I think it's useful to make it more discoverable than "dig through resolved threads on the PR".

markmc · 2020-07-20T14:10:48Z

Can we pull this reasoning about non-goals and scoping out into the enhancement somewhere? Like the template's Alternatives section? I think it's useful to make it more discoverable than "dig through resolved threads on the PR".

FWIW, there's some text in #212 under "Alternatives: Use a CVO cluster profile" that you can reuse or reference - https://github.com/openshift/enhancements/pull/212/files#diff-10846701f494dacbcae49f251fe274b0R415-R435

LalatenduMohanty · 2020-07-20T14:20:33Z

Can we pull this reasoning about non-goals and scoping out into the enhancement somewhere? Like the template's Alternatives section? I think it's useful to make it more discoverable than "dig through resolved threads on the PR".

@csrwng Can we address this concern? I also felt the same when I gone through the PR again.

FWIW, there's some text in #212 under "Alternatives: Use a CVO cluster profile" that you can reuse or reference - https://github.com/openshift/enhancements/pull/212/files#diff-10846701f494dacbcae49f251fe274b0R415-R435

We can borrow some of the texts from here.

Support different sets of manifests applied by the CVO based on an installation cluster profile.

csrwng · 2020-07-20T14:38:26Z

Thx @LalatenduMohanty, updated and squashed.
@wking is the text in the alternatives section what you had in mind?

LalatenduMohanty

/lgtm

openshift-ci-robot · 2020-07-20T15:30:54Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: crawford, csrwng, LalatenduMohanty, sdodson

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [sdodson]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

LalatenduMohanty · 2020-07-20T15:31:10Z

@sdodson We should be able to merge this PR now.

sdodson · 2020-07-21T14:47:02Z

/hold cancel

openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Feb 4, 2020

openshift-ci-robot requested review from ericavonb and jsafrane February 4, 2020 20:42

openshift-ci-robot requested a review from abhinavdahiya February 4, 2020 21:00

openshift-ci-robot requested a review from smarterclayton February 5, 2020 01:26

openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 5, 2020

deads2k reviewed Feb 6, 2020

View reviewed changes

enxebre reviewed Feb 17, 2020

View reviewed changes

markmc mentioned this pull request Feb 17, 2020

baremetal: propose a new Second Level Operator for bare metal #212

Merged

markmc reviewed Feb 17, 2020

View reviewed changes

enhancements/cvo/cluster-profiles.md Show resolved Hide resolved

markmc reviewed Feb 17, 2020

View reviewed changes

enhancements/cvo/cluster-profiles.md Show resolved Hide resolved

markmc reviewed Feb 17, 2020

View reviewed changes

markmc reviewed Feb 18, 2020

View reviewed changes

enhancements/cvo/cluster-profiles.md Show resolved Hide resolved

openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 20, 2020

enxebre mentioned this pull request Mar 17, 2020

Monitoring for ironic-prometheus-exporter openshift/machine-api-operator#340

Closed

wking reviewed Mar 18, 2020

View reviewed changes

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 20, 2020

wking reviewed Jul 20, 2020

View reviewed changes

Cluster Profiles

2f13911

Support different sets of manifests applied by the CVO based on an installation cluster profile.

csrwng force-pushed the cluster_profiles branch from 2185fda to 2f13911 Compare July 20, 2020 14:37

openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jul 20, 2020

LalatenduMohanty approved these changes Jul 20, 2020

View reviewed changes

openshift-ci-robot assigned LalatenduMohanty Jul 20, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 20, 2020

openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 21, 2020

openshift-merge-robot merged commit d38ee88 into openshift:master Jul 21, 2020

guillaumerose mentioned this pull request Jul 30, 2020

Add cluster profile parameter in install-config.yaml openshift/installer#3986

Closed

wking mentioned this pull request Jul 30, 2020

Cluster profile: add details about the default profile #414

Closed

praveenkumar mentioned this pull request Aug 25, 2020

Avoid running replicas for cluster services crc-org/snc#224

Closed

hardys mentioned this pull request Sep 29, 2020

*: add InstallType to InstallConfig openshift/installer#4209

Closed

andfasano mentioned this pull request Oct 1, 2020

Allow CVO to deploy CBO openshift/cluster-baremetal-operator#25

Merged

rkukura mentioned this pull request Oct 12, 2020

Add single-node-developer Cluster Profile #482

Merged

30 tasks

guillaumerose mentioned this pull request Nov 23, 2020

Per-profile system reserved resources openshift/machine-config-operator#2249

Closed

ravidbro mentioned this pull request Dec 2, 2020

Pass CLUSTER_PROFILE env var to CVO render openshift/installer#4444

Merged

csrwng mentioned this pull request Dec 3, 2020

oc adm release new: add check for the existence of profile annotations in release manifests openshift/oc#663

Closed

csrwng mentioned this pull request Apr 6, 2021

Bug 1948714: IBM Cloud manifest profile patch openshift/cluster-image-registry-operator#643

Merged

csrwng mentioned this pull request Nov 1, 2021

Omit 3CMO manifests from CVO openshift/ibm-roks-toolkit#284

Merged


		## Motivation

		In order to support different deployment models in which not all operators rendered by

Cluster Profiles #200

Cluster Profiles #200

Uh oh!

Conversation

csrwng commented Feb 4, 2020

Uh oh!

sdodson commented Feb 4, 2020

Uh oh!

sdodson commented Feb 4, 2020

Uh oh!

csrwng commented Feb 4, 2020

Uh oh!

derekwaynecarr commented Feb 5, 2020

Uh oh!

csrwng commented Feb 5, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

soltysh commented Feb 6, 2020

Uh oh!

csrwng commented Feb 6, 2020

Uh oh!

csrwng commented Feb 7, 2020

Uh oh!

praveenkumar commented Feb 7, 2020

Uh oh!

csrwng commented Feb 7, 2020

Uh oh!

enxebre Feb 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

derekwaynecarr Feb 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sdodson commented Jul 20, 2020

Uh oh!

LalatenduMohanty commented Jul 20, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gbraad Jul 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

enxebre Feb 17, 2020 •

edited

Loading

derekwaynecarr Feb 17, 2020 •

edited

Loading

gbraad Jul 20, 2020 •

edited

Loading

wking commented Jul 20, 2020 •

edited

Loading