Skip to content

Conversation

@csrwng
Copy link
Contributor

@csrwng csrwng commented Feb 4, 2020

Support different sets of manifests applied by the CVO based on an installation cluster profile.

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Feb 4, 2020
@sdodson
Copy link
Member

sdodson commented Feb 4, 2020

Looks good. It'd be nice to see something regarding testing, like a requirement that we have a CI job that assures us that we don't inadvertently create a new profile via a typo by checking against a list of cataloged/approved profile names?

@sdodson
Copy link
Member

sdodson commented Feb 4, 2020

/cc @abhinavdahiya

@csrwng
Copy link
Contributor Author

csrwng commented Feb 4, 2020

This enhancement supports #202

@derekwaynecarr
Copy link
Member

/cc @smarterclayton

@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 5, 2020
@csrwng
Copy link
Contributor Author

csrwng commented Feb 5, 2020

@deads2k ptal

The following annotations may be used to include/exclude manifests for a given profile:

```
exclude.release.openshift.io/[identifier]=true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should avoid excludes. Every labels selector we have today is a positive selection. Negative exclusion will include manifests without the owners of those manifests choosing to support a profile. Instead of owners knowing the profiles they need to support, we'll end up with case where bugs are opened against components who then respond with, "wait, what profile? that isn't going to work".

To the common counter argument of "but then it takes a lot of work to create a new profile and all the teams have to aware of it", I say "good!". We don't want that many profiles or our support matrix will grow and customers will be trying to figure out which profile is right for them. Profiles should come with a cost and if component owners are expected to support their components in these profiles, they really should be aware even if it's exactly an optional assignment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, an alternative would be to have a positive list somewhere else which qualifies an operator for a certain profile. Then excludes might make it easier to express profiles.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k i think this is a good argument.

@soltysh
Copy link

soltysh commented Feb 6, 2020

Aside from what @deads2k already commented about profile being strictly a list of inclusions, I'm missing information are we envisioning profile to support other use cases, such as platform or deployment configuration model (iow. number of masters, workers), etc. From the initial description of the problem I was assuming it should but it's not covered there.

@csrwng
Copy link
Contributor Author

csrwng commented Feb 6, 2020

Aside from what @deads2k already commented about profile being strictly a list of inclusions, I'm missing information are we envisioning profile to support other use cases, such as platform or deployment configuration model (iow. number of masters, workers), etc. From the initial description of the problem I was assuming it should but it's not covered there.

@soltysh yes there are likely other uses for profiles (CRC comes to mind). However, at this time, the only concrete use case I have is that of IBM public cloud, which is what drives the 2 user stories I have here. It'd be great to get input from other folks that have use cases for this to provide feedback on whether they think what I'm proposing here makes sense to them.

I also like @deads2k's proposal of only having positive selectors, so I will be updating the enhancement to reflect that. With that we'll also need to come up with a plan for rollout given that manifests will first need to be prepared for profiles support before it's switched on in the CVO.

@csrwng
Copy link
Contributor Author

csrwng commented Feb 7, 2020

@praveenkumar @gbraad fyi

@praveenkumar
Copy link

Thanks @csrwng, For CRC we want to have 2 different use case.

  1. Exclusions of some manifests which we don't want to use by default like monitoring.
  2. If profile can also support the resource limit then operators should use less resource if it is a single node cluster (not sure if that is in the scope of this enhancement)

@csrwng
Copy link
Contributor Author

csrwng commented Feb 7, 2020

2. If profile can also support the resource limit then operators should use less resource if it is a single node cluster (not sure if that is in the scope of this enhancement)

What I'm proposing should allow this use case, given that per profile each operator can provide a different manifest. In the case of the operator deployment itself, the alternate manifest for your profile could specify lower resource requirements. For the operand part, the operator itself would need to be modified to create deployments with lower resource requirements given some setting such as an environment variable telling it to do so.


## Motivation

In order to support different deployment models in which not all operators rendered by
Copy link
Member

@enxebre enxebre Feb 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any other known use case for profiles we can include here to include a better level of context/expectations?
What do we consider best practices for profiles vs no-op operators? e.g would we want a "baremetal profile" or would we want no-op "baremetal" operator deployed everywhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it would be good to include other scenarios we want this design to cover and the bare metal one is a good one consider

Perhaps "no-op operator" isn't a good way to classify the baremetal-operator case though - for example, do we want bare metal specific CRDs to be installed on all platforms? It's not just a question of a process running that does nothing ...

Copy link
Member

@derekwaynecarr derekwaynecarr Feb 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markmc i think its an error that we have namespaces and crds deployed to a cluster for contexts that are not appropriate. we should aspire to move away from that rather than continue to lean into it. for example, every cluster has a openshift-kni-infra or openshift-ovirt-infra even where it is not appropriate.

the initial use case for cluster profile was to exclude specific operators entirely from being applied. the most notable example was the cluster version operator not installing kube-* operators in deployments where the control plane is hosted on an external supervisor cluster a la IBM Public Cloud.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(feedback taken - #212 won't pursue the cluster profile path, instead it will propose a new SLO for bare metal)

But on the point of CRDs ... your POV seems at odds with Clayton's example of the CNO in this comment: #212 (comment)

Would be great to get a resolution on this, because the path of a bare metal SLO that is installed, running, but "disabled" on other platforms would seem to inevitably be introducing more stuff "deployed to a cluster for contexts that are not appropriate".

```
This would make the CVO render this manifest only when `CLUSTER_PROFILE=[identifier]`
has been specified. For the default cluster profile (no `CLUSTER_PROFILE` variable),
manifests that have any include annotation will always be excluded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to cover how we expect cluster profiles to be exposed to users other than this environment variable - i.e. I presume in most cases we're not expecting customers to modify the CVO deployment to add the env var

e.g. would this be an install-config.yaml thing, templated into bootkube.sh, passed to cvo render, and templated into the CVO deployment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markmc the initial use case scenario was for ROKS offering on IBM Cloud. I would want to find a model that worked there first before expanding scope much more. If we want to think about two use cases, CRC (excluding HA deployment topology versus single node concerns) maybe a good counter example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine ... it's just entirely unclear to me how we expect this env variable to be set in the hypershift, CRC, or any other case. Maybe I'm missing something obvious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left that part undefined (and now adding text that explicitly says so) because potential use cases such as hypershift and CRC take completely different approaches to getting a useable cluster up and running. Both bypass the regular installer in some form, but describing how that should be done would confuse this proposal.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For CRC, we are currently leaning towards a configmap containing the cluster information which is set at install time openshift/cluster-version-operator#404

@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 20, 2020
As a user, I can create a cluster in which node selectors for certain operators target
worker nodes instead of master nodes.

### Design
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is not part of the template. Should it be removed and have the content moved up to the Proposal section?

variable. For a given cluster, only one CVO profile may be in effect.

NOTE: The mechanism by which the environment variable is set on the CVO deployment is
out of the scope of this design.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there notes about how this works for IBM today? Are the CRC folks on-board with managing their own CVO Deployment (seems like that might make updates difficult, as the CVO updates its own Deployment to bump itself?)?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We (CRC) were actually considering using a config map to set the profile, but I don't know how well it fits the IBM use case.
See https://github.com/openshift/cluster-version-operator/pull/404/files#diff-d540a41404f678a1c438f4c1e5b92a87R330-R337


Cluster profiles are a way to support different deployment models for OpenShift clusters.
A profile is an identifier that the Cluster Version Operator uses to determine
which manifests to apply. Operators can be excluded completely or can have different
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are "excluded" here and some subsequent "include/exclude" references stale after this discussion?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect that when you don't specify a profile explicitly in a manifest through an 'include', this manifest is excluded from this profile. So I think this specific reference is not stale.

@sdodson
Copy link
Member

sdodson commented Jul 20, 2020

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 20, 2020
@LalatenduMohanty
Copy link
Member

@wking PTAL

include.release.openshift.io/[identifier]=true
```
This would make the CVO render this manifest only when `CLUSTER_PROFILE=[identifier]`
has been specified.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably add a follow-up paragraph that explicitly says manifests with no
include.release.openshift.io/* annotations will always be included.

Copy link

@gbraad gbraad Jul 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be additional information to clarify, but would this hold back merging?

Note: #200 (comment) if this is addressed this should not be an issue, but just concerned about the freeze deadline

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding text like:

Manifests with no annotations prefixed by include.release.openshift.io/ are always included. Manifests with annotations prefixed by include.release.openshift.io/ with values other than true are undefined.

would cover my concerns (with undefined in the nasal demons sense).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gbraad can you open a follow up to add this, I think we really need to get something in about this.

@wking
Copy link
Member

wking commented Jul 20, 2020

Can we pull this reasoning about non-goals and scoping out into the enhancement somewhere? Like the template's Alternatives section? I think it's useful to make it more discoverable than "dig through resolved threads on the PR".

@markmc
Copy link
Contributor

markmc commented Jul 20, 2020

Can we pull this reasoning about non-goals and scoping out into the enhancement somewhere? Like the template's Alternatives section? I think it's useful to make it more discoverable than "dig through resolved threads on the PR".

FWIW, there's some text in #212 under "Alternatives: Use a CVO cluster profile" that you can reuse or reference - https://github.com/openshift/enhancements/pull/212/files#diff-10846701f494dacbcae49f251fe274b0R415-R435

@LalatenduMohanty
Copy link
Member

Can we pull this reasoning about non-goals and scoping out into the enhancement somewhere? Like the template's Alternatives section? I think it's useful to make it more discoverable than "dig through resolved threads on the PR".

@csrwng Can we address this concern? I also felt the same when I gone through the PR again.

FWIW, there's some text in #212 under "Alternatives: Use a CVO cluster profile" that you can reuse or reference - https://github.com/openshift/enhancements/pull/212/files#diff-10846701f494dacbcae49f251fe274b0R415-R435

We can borrow some of the texts from here.

Support different sets of manifests applied by the CVO based
on an installation cluster profile.
@csrwng csrwng force-pushed the cluster_profiles branch from 2185fda to 2f13911 Compare July 20, 2020 14:37
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jul 20, 2020
@csrwng
Copy link
Contributor Author

csrwng commented Jul 20, 2020

Thx @LalatenduMohanty, updated and squashed.
@wking is the text in the alternatives section what you had in mind?

Copy link
Member

@LalatenduMohanty LalatenduMohanty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 20, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: crawford, csrwng, LalatenduMohanty, sdodson

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@LalatenduMohanty
Copy link
Member

@sdodson We should be able to merge this PR now.

@sdodson
Copy link
Member

sdodson commented Jul 21, 2020

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 21, 2020
@openshift-merge-robot openshift-merge-robot merged commit d38ee88 into openshift:master Jul 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.