Skip to content

Conversation

@guillaumerose
Copy link
Contributor

This PR better defines the default profile. Previously, there could be a misunderstanding between no-profile, empty-profile and default profile.

More context: openshift/cluster-version-operator#404 (comment)

cc @wking @csrwng

as needed.

Manifests with no explicit inclusion annotations implicitly belong to the `default` profile.
If no profile is configured, the cluster-version operator should use the `default` profile.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking this over some more, we may need a more involved reword. Above in master we have:

This would make the CVO render this manifest only when CLUSTER_PROFILE=[identifier]
has been specified.

But that's not actually true, because as the next master paragraph points out, you can list multiple profiles. I suggest we reword this section to say something like:

This environment variable would have to be specified in the CVO deployment.
When no CLUSTER_PROFILE=[identifier] variable is specified, the default cluster profile
is in effect. [NOTE: so I guess no need to echo this with a new line]

The following annotation may be used to include manifests for a given profile:

include.release.openshift.io/[identifier]=true

Manifests may support inclusion in multiple profiles by including as many of these annotations
as needed.
Manifests with any explicit inclusion annotations only belong to the declared profiles.
Manifests with no explicit inclusion annotations implicitly belong to the default profile.

The cluster-version operator will only reconcile manifests which belong to its configured cluster profile.

Also note that the implicit default here somewhat pushes back against @deads2k's point about avoiding exclusion. It means that you don't need to put in the awareness-raising work he wants for the default profile. But it also means that we don't break compat with the existing default-only profile out of the gate. I'm fine with that. I'd also be ok with "too bad, you need to get everyone to annotate their existing manifests with include.release.openshift.io/[identifier]=true before teaching the CVO anything", if folks feel like the implicit default membership is too magical.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that you don't need to put in the awareness-raising work he wants for the default profile. But it also means that we don't break compat with the existing default-only profile out of the gate.

I'd favour first getting the 'magic' default behaviour in CVO, then add 'default' awareness to manifests, and finally remove this magic behaviour when everyone has switched.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we specify instead that "Manifests with no explicit inclusion annotations implicitly belong to all profiles."

This would mean that all existing manifests, as well as all newly introduced manifests, are included in all profiles, unless someone specifically adds annotations to indicate otherwise. If an annotation is added to a manifest, then annotations will be needed for all profiles in which that manifest should be included.

This approach will help avoid accidental divergence between what functionality is included in different cluster profiles. Any new manifest will be included in all profiles, unless it has annotations for specific profiles.

It will also mean that a new profile (such as a single-node-cluster profile for CRC, for which we are working on an enhancement PR) can be added and then modified incrementally, rather than requiring immediate PRs to be merged into every repo containing any manifests that should be included in that profile.

My assumption here is that many or most manifests, at least initially, will not need to vary between the small set of profiles that exist, and thus will not require any annotations. But if that assumption proves invalid over time, once we've added explicit annotations to every manifest, we could then change the CVO to require an explicit annotation for any manifest to be included in any profile.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would mean that all existing manifests, as well as all newly introduced manifests, are included in all profiles, unless someone specifically adds annotations to indicate otherwise... This approach will help avoid accidental divergence between what functionality is included in different cluster profiles

That diverges from David's:

To the common counter argument of "but then it takes a lot of work to create a new profile and all the teams have to aware of it", I say "good!"...

If folks want to push back against all (but the default?) profile being explicitly opt-in for each manifest, I'm not going to stand in the way, but please go fight it out with @deads2k and report back here once you've reached a consensus ;).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not looking for any fights - just for clarity on what we'll need to do to add and maintain the single-node profile for CRC, for which we are working on a separate enhancement PR. This profile will ideally remain identical to normal OCP deployments, except without HA, with a few features not important for application development left out, and with things tuned to minimize resource requirements.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k 's previous comment seemed to call for explicitly talking new additions into joining each profile. That's obviously more work, but his point is that thinking each manifest addition through is useful. Including by default in the absence of explicit exclusion-action is also internally consistent. CVO can easily implement either. I want some kind of consensus, so we don't flop back and forth depending on who asked most recently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still like my reasoning in #200 (comment). Having to modify every manifest you want to include in a profile does a couple things

  1. Requires a cost to adding a profile. This cost is updating a few hundred file in a few dozen repos. While this seems large initially, the long term cost born by teams for each new topology is a much larger ongoing effort. If it's not worth the few days to find and update, then the profile may not be valuable enough to support.
  2. This exposes the profile to every component which can be impacted. And because the repos are separate, each component owner will be explicitly aware and cannot claim ignorance in the future. Component owners should question the change and ensure their component/operator can handle it.

To help provide an example and to be sure we can avoid accidental code defaults, I'm in support of actually updating all the existing manifests to include a reference to the default. I'm even willing to help make those PRs and get them merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with @deads2k's reasoning on this, as long as getting patches merged to add annotations to manifests across all needed repositories does not turn out to be an insurmountable obstacle. We plan to have an enhancement PR for adding a single node profile for CRC in review soon. This will show that most manifests in OpenShift will need to be included in this profile, many with no profile-specific changes. I'd expect we'd need to merge a complete set of initial annotations early in development in order to allow the profile to be used to produce a cluster equivalent to the current CRC. Then additional PRs would be merged to further optimize the profile for CRC. Does this seem workable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k : #482 reminded me that this might still be in flight, and seems like rhbz#1871890 still links a bunch of open PRs, mostly in the kube-API sphere of influence. And seems like maybe there aren't even PRs up yet for many components outside the kube-API sphere of influence. Any preliminary thoughts on whether this is something that can be accomplished with a single driving dev, vs. something where you need to call for teams to self-serve in aos-devel@, and then run around and poke folks who don't act on that call?

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 11, 2020
@derekwaynecarr
Copy link
Member

I prefer explicit profile to manifest association over implicit.

Agree with @deads2k on the cost and clear impact to each component.

@csrwng please ack.

@derekwaynecarr
Copy link
Member

prefer names that map to responsibility matrix.

instead of default we use self-managed-ha
instead of roks we use external-managed
Etc.

@deads2k
Copy link
Contributor

deads2k commented Aug 18, 2020

prefer names that map to responsibility matrix.

instead of default we use self-managed-ha
instead of roks we use external-managed
Etc.

I like these names:

  1. include.release.openshift.io/self-managed-ha=true
  2. include.release.openshift.io/external-managed=true

@csrwng @wking

@csrwng
Copy link
Contributor

csrwng commented Aug 18, 2020

@deads2k 👍 the names sound good to me

@csrwng
Copy link
Contributor

csrwng commented Aug 18, 2020

And yes, agree that all manifests should be modified

@wking
Copy link
Member

wking commented Aug 18, 2020

include.release.openshift.io/self-managed-ha=true

I dislike acronyms, so I would prefer self-managed-high-availability or some such as the profile name.

include.release.openshift.io/external-managed=true

I'm ok with this. I guess the concern would be that at some point in the future we might have a profile where something else is managed externally, and might want to guard against this with external-control-plane or some such?

$ oc adm release extract --to manifests quay.io/openshift-release-dev/ocp-release:4.6.0-fc.1-x86_64
$ grep -lr 'exclude.release.openshift.io/internal-openshift-hosted: "true"' manifests | sort

lists a bunch of things that are hard for me to fit under one name. For example, the etcd and kube operators fall under "control plane", but we also exclude the config, insights, autoscaling, and machine-API operators, and those aren't things I'd naively expect to fall under "control plane" naming.

@deads2k
Copy link
Contributor

deads2k commented Aug 18, 2020

I dislike acronyms, so I would prefer self-managed-high-availability or some such as the profile name.

ok

I'm ok with this. I guess the concern would be that at some point in the future we might have a profile where something else is managed externally, and might want to guard against this with external-control-plane or some such?

as you found below, it's more than just the control plane

@wking
Copy link
Member

wking commented Aug 18, 2020

it's more than just the control plane

But it's not everything. Is there a word or phrase that clearly reflects what is still managed?

@csrwng
Copy link
Contributor

csrwng commented Aug 18, 2020

@wking not sure I could give you one word, but in general for hosted clusters, we are running components that generally do not need to run on master nodes. Components that should be excluded includes machine api and related components, since those need to be running on masters so they can provision workers. There's some exceptions to that specific to IBM's use case such as the oauth server and the insights operator. The config operator so far seems to manage cloud-specific config stuff which is also not needed for IBM, but not sure it shouldn't be included if we had a more general use case for master-node-less clusters.

As painful as it sounds, it may make more sense to have a profile specifically for IBM Cloud, and in the future if we have a more general use case for it, add a new profile for that other use case. @derekwaynecarr thoughts?

@derekwaynecarr
Copy link
Member

since this PR is oriented around naming the existing self-managed-highly-available profile that is our present default, I think we can avoid bike-shedding too much on the name of the other variations.

deads2k added a commit to deads2k/cluster-kube-apiserver-operator that referenced this pull request Aug 19, 2020
This is matches openshift/enhancements#414 and doesn't change existing behavior
deads2k added a commit to deads2k/cluster-kube-apiserver-operator that referenced this pull request Aug 19, 2020
This is matches openshift/enhancements#414 and doesn't change existing behavior
deads2k added a commit to deads2k/cluster-etcd-operator that referenced this pull request Aug 24, 2020
This is matches openshift/enhancements#414 and doesn't change existing behavior
deads2k added a commit to deads2k/api that referenced this pull request Aug 24, 2020
This is matches openshift/enhancements#414 and doesn't change existing behavior
deads2k added a commit to deads2k/api that referenced this pull request Aug 24, 2020
This is matches openshift/enhancements#414 and doesn't change existing behavior
deads2k added a commit to deads2k/cluster-kube-controller-manager-operator that referenced this pull request Aug 24, 2020
deads2k added a commit to deads2k/api that referenced this pull request Aug 24, 2020
deads2k added a commit to deads2k/cluster-kube-apiserver-operator that referenced this pull request Aug 24, 2020
This is matches openshift/enhancements#414 and doesn't change existing behavior
deads2k added a commit to deads2k/cluster-kube-scheduler-operator that referenced this pull request Aug 24, 2020
guillaumerose added a commit to guillaumerose/cluster-csi-snapshot-controller-operator that referenced this pull request Oct 22, 2020
This is matches openshift/enhancements#414 and doesn't change existing behavior
guillaumerose added a commit to guillaumerose/cluster-csi-snapshot-controller-operator that referenced this pull request Oct 22, 2020
This is matches openshift/enhancements#414 and doesn't change existing behavior
guillaumerose added a commit to guillaumerose/cluster-image-registry-operator that referenced this pull request Oct 22, 2020
This is matches openshift/enhancements#414 and doesn't change existing behavior
guillaumerose added a commit to guillaumerose/cluster-csi-snapshot-controller-operator that referenced this pull request Oct 23, 2020
This is matches openshift/enhancements#414 and doesn't change existing behavior
guillaumerose added a commit to guillaumerose/cluster-image-registry-operator that referenced this pull request Oct 23, 2020
This is matches openshift/enhancements#414 and doesn't change existing behavior
guillaumerose added a commit to guillaumerose/cluster-ingress-operator that referenced this pull request Oct 23, 2020
This is matches openshift/enhancements#414 and doesn't change existing behavior
guillaumerose added a commit to guillaumerose/cluster-machine-approver that referenced this pull request Oct 23, 2020
This is matches openshift/enhancements#414 and doesn't change existing behavior
guillaumerose added a commit to guillaumerose/cluster-machine-approver that referenced this pull request Oct 23, 2020
This is matches openshift/enhancements#414 and doesn't change existing behavior
guillaumerose added a commit to guillaumerose/cluster-network-operator that referenced this pull request Oct 23, 2020
This is matches openshift/enhancements#414 and doesn't change existing behavior
guillaumerose added a commit to guillaumerose/cluster-node-tuning-operator that referenced this pull request Oct 23, 2020
This is matches openshift/enhancements#414 and doesn't change existing behavior
guillaumerose added a commit to guillaumerose/cluster-samples-operator that referenced this pull request Oct 23, 2020
This is matches openshift/enhancements#414 and doesn't change existing behavior
guillaumerose added a commit to guillaumerose/cluster-csi-snapshot-controller-operator that referenced this pull request Oct 23, 2020
This is matches openshift/enhancements#414 and doesn't change existing behavior
guillaumerose added a commit to guillaumerose/cluster-autoscaler-operator that referenced this pull request Oct 23, 2020
This is matches openshift/enhancements#414 and doesn't change existing behavior
no `CLUSTER_PROFILE=[identifier]` variable is specified, the `default` cluster profile
is in effect.
no `CLUSTER_PROFILE=[identifier]` variable is specified, the `self-managed-highly-available`
cluster profile is in effect.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self-managed-highly-available seems like a bad name for the "default" profile unless "who manages it" and "whether it is highly available" are the only axes on which future profiles are allowed to differ from the default.

Like, if you weren't peeking ahead at future plans, why would you call out self-managed and highly-available as the defining attributes of the default OCP experience, but not mention live-upgradeable or rhcos-control-plane or various other things?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the known axes, e.g. ROKS moves some control plane components off-cluster (so not self-managed) and CRC is dev-oriented, low resource (so not HA). If we invent more axes later, maybe we'll wish we had been more specific, but 🤷🔮

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we invent more axes later, maybe we'll wish we had been more specific

Yeah, I meant we should be less specific; just call it default, rather than trying to guess now all of the ways it's going to be different from other future profiles.

guillaumerose added a commit to guillaumerose/cluster-ingress-operator that referenced this pull request Oct 23, 2020
This is matches openshift/enhancements#414 and doesn't change existing behavior
guillaumerose added a commit to guillaumerose/cloud-credential-operator that referenced this pull request Nov 7, 2020
This is matches openshift/enhancements#414 and doesn't change existing behavior
guillaumerose added a commit to guillaumerose/cloud-credential-operator that referenced this pull request Nov 18, 2020
This is matches openshift/enhancements#414 and doesn't change existing behavior
@guillaumerose
Copy link
Contributor Author

Everything here has been captured in others PRs.

ming1013 pushed a commit to ming1013/cloud-credential-operator that referenced this pull request Dec 15, 2025
This is matches openshift/enhancements#414 and doesn't change existing behavior
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants