Skip to content

Allow excluding manifests from CVO payload#252

Merged
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
csrwng:hosted
Nov 14, 2019
Merged

Allow excluding manifests from CVO payload#252
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
csrwng:hosted

Conversation

@csrwng
Copy link
Copy Markdown
Contributor

@csrwng csrwng commented Oct 17, 2019

Allows specifying an identifier with the EXCLUDE_MANIFESTS environment variable that will be used to skip manifests that contain an annotation like exclude.release.openshift.io/<identifier>=true

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 17, 2019
@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 17, 2019
@csrwng csrwng changed the title WIP: Add flags to specify which manifests to exclude or include Add flags to specify which manifests to exclude or include Nov 4, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 4, 2019
@csrwng
Copy link
Copy Markdown
Contributor Author

csrwng commented Nov 4, 2019

@crawford @abhinavdahiya this should be ready for a review. This is a first step towards introducing the concept of profiles where each component will declare what files go in a given profile. When that is introduced, these flags will be removed.

@abhinavdahiya
Copy link
Copy Markdown
Contributor

this should be ready for a review. This is a first step towards introducing the concept of profiles where each component will declare what files go in a given profile.

Is there a backing enhancement that provides some context on how this change is on the path to profiles.

Who can set these cli flags? How do we find out about these flags are being set?
Setting these flags should probably make your clusters Not Upgradeable for now.

Also this change overlaps with what clusterversion.overrides do? Why do we this new feature when we can achieve the same thing with clusterversion API

Comment thread pkg/cvo/sync_worker.go Outdated
// updated by the run method only
payload *payload.Update

excludeFilter []string
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this called a filter but is a string slice? this doesn't really provide context on how to use this to filter.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a slice of regular expressions. I can add a comment explaining what they are.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe plural Filters?

Comment thread pkg/payload/payload.go Outdated
return false
}

func LoadUpdateWithFilters(dir, releaseImage string, excludes, includes []string) (*Update, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name is LoadUpdateWithFilters, but it doen't take any filters though...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does (excludes, includes)

@abhinavdahiya
Copy link
Copy Markdown
Contributor

Who can set these cli flags? How do we find out about these flags are being set?
Setting these flags should probably make your clusters Not Upgradeable for now.

if these flags are not supposed to be set by any Openshift 4 user, these should be behind a build time flag.

@csrwng
Copy link
Copy Markdown
Contributor Author

csrwng commented Nov 5, 2019

Is there a backing enhancement that provides some context on how this change is on the path to profiles.

There isn't an enhancement proposal for profiles yet, I will create one. The target for profiles would be 4.4.

Who can set these cli flags? How do we find out about these flags are being set?

These flags would be set in the deployment for the CVO in a hosted environment, where the cvo runs as a pod in a management server but operates on a user cluster control plane (also hosted as pods on the management server)

Setting these flags should probably make your clusters Not Upgradeable for now.

Ack, will make that change.

Also this change overlaps with what clusterversion.overrides do? Why do we this new feature when we can achieve the same thing with clusterversion API

One of the benefits of hosted clusters is enabling cluster admin rights for users to their own cluster and making it hard for them to bring down the control plane on their own. Having the clusterversion resource (inside the user cluster) control which manifests get laid down puts too much of the responsibility on the user to keep it in a good working state.

@csrwng
Copy link
Copy Markdown
Contributor Author

csrwng commented Nov 5, 2019

if these flags are not supposed to be set by any Openshift 4 user, these should be behind a build time flag.

The release is meant to be run unchanged in hosted mode, it won't get its own build.

@csrwng
Copy link
Copy Markdown
Contributor Author

csrwng commented Nov 5, 2019

@abhinavdahiya thinking more about this.... since this is only a temporary enhancement that will be removed once we add a way to specify a profile, maybe instead of adding flags it should just be a set of env vars, wdyt?

@csrwng csrwng changed the title Add flags to specify which manifests to exclude or include Allow excluding manifests from CVO payload Nov 7, 2019
@csrwng
Copy link
Copy Markdown
Contributor Author

csrwng commented Nov 7, 2019

@abhinavdahiya I've now simplified this some and removed the includes part (less to remove later). We should only need to specify what to exclude. I've also removed the additional command flags and switched to reading environment variables to specify the regular expressions for manifest exclusion. I did not make the change to switch to not Upgradeable when these filters are specified because we will still need the CVO to apply upgrades to non-control plane components.

We need to get this in for the current release. Please let me know what else you'd like to see changed so we can get this merged.

/cc @derekwaynecarr

@abhinavdahiya
Copy link
Copy Markdown
Contributor

Who can set these cli flags? How do we find out about these flags are being set?

These flags would be set in the deployment for the CVO in a hosted environment, where the cvo runs as a pod in a management server but operates on a user cluster control plane (also hosted as pods on the management server)

So that means the customer is not allowed to set/effect these values in external-hosted-profile.

but what about self-hosted-profile clusters, there the users have permission and access to modify their cluster-version-operator deployment to add these flags/envs. How do we prevent that or if we can't prevent that, how do we know the customer is doing this.

Also this change overlaps with what clusterversion.overrides do? Why do we this new feature when we can achieve the same thing with clusterversion API

One of the benefits of hosted clusters is enabling cluster admin rights for users to their own cluster and making it hard for them to bring down the control plane on their own. Having the clusterversion resource (inside the user cluster) control which manifests get laid down puts too much of the responsibility on the user to keep it in a good working state.

So the users are allowed to modify the clusterversion object in external-hosted-profile? which will affect how cvo behaves, and affect how it managed the cluster components, but we think using overrides section would be too much access to the user?

@abhinavdahiya I've now simplified this some and removed the includes part (less to remove later). We should only need to specify what to exclude. I've also removed the additional command flags and switched to reading environment variables to specify the regular expressions for manifest exclusion. I did not make the change to switch to not Upgradeable when these filters are specified because we will still need the CVO to apply upgrades to non-control plane components.

As I mentioned above, there is a chance users will set these values in self-hosted cases, and not knowing how/when these are being set is a blocker in my opinion.

We need to get this in for the current release.

current? 4.3 or 4.4 ?

Please let me know what else you'd like to see changed so we can get this merged.

/cc @derekwaynecarr

Comment thread pkg/cvo/cvo.go Outdated

// excludeManifestRegularExpressions is a set of regular expresions that are used
// to exclude manifest file names that match them.
excludeManifestRegularExpressions []string
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not store regex compiled here? compiling these regex every time is known to cause problems.

@csrwng
Copy link
Copy Markdown
Contributor Author

csrwng commented Nov 8, 2019

but what about self-hosted-profile clusters, there the users have permission and access to modify their cluster-version-operator deployment to add these flags/envs. How do we prevent that or if we can't prevent that, how do we know the customer is doing this.

Users can do this today. They can modify the cvo deployment in all kinds of ways. They can specify an alternate image, alternate namespace or a different payload directory (https://github.com/openshift/cluster-version-operator/blob/master/pkg/start/start.go#L85-L87).
How are these variables different?

So the users are allowed to modify the clusterversion object in external-hosted-profile? which will affect how cvo behaves, and affect how it managed the cluster components, but we think using overrides section would be too much access to the user?

Yes they will be able to change the clusterversion resource and if they like, add additional overrides, but they can't add things that are already excluded on the management side.

As I mentioned above, there is a chance users will set these values in self-hosted cases, and not knowing how/when these are being set is a blocker in my opinion.

Again, not sure how it changes what you can do today. If you are privileged enough to change the cluster version operator deployment, then there's really not much we can do to stop you from shooting yourself in the foot.

current? 4.3 or 4.4 ?

4.3 (we have exception approval)

@csrwng
Copy link
Copy Markdown
Contributor Author

csrwng commented Nov 8, 2019

Pushed update to parse regular expressions once on startup

@csrwng
Copy link
Copy Markdown
Contributor Author

csrwng commented Nov 8, 2019

@abhinavdahiya I've submitted a change to make the cvo not upgradable when manifest excludes are set (verified that it doesn't affect upgrades for externally hosted mode). Please let me know if this addresses your concern.

Comment thread pkg/cvo/upgradeable.go Outdated
return nil
}
return &configv1.ClusterOperatorStatusCondition{
Type: configv1.ClusterStatusConditionType("UpgradableManifestExcludes"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/openshift/cluster-version-operator/pull/252/files#diff-d9212d08828a9f9d8b54a2e90f31500aR195
the conditions need to start with Upgradeable

UpgradableManifestExcludes -> UpgradeableManifestExcludes

hmm, also this conditions names is not that great.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔

Comment thread pkg/cvo/upgradeable.go Outdated
Type: configv1.ClusterStatusConditionType("UpgradableManifestExcludes"),
Status: configv1.ConditionFalse,
Reason: "ManifestExcludesSet",
Message: "Certain manifests are excluded from the operator's payload. The cluster cannot be upgraded.",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/openshift/cluster-version-operator/pull/252/files#diff-d9212d08828a9f9d8b54a2e90f31500aR215

based on that, something like Excluding manifests from the release image prevents upgrades. Please remove the exclude filters before continuing.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like regex on manifests, I'd like this to be a higher level name that manifests are associated with. If that's the case, then this shouldn't even be a condition because expectation is that they are tested and we don't need this alert.

@abhinavdahiya
Copy link
Copy Markdown
Contributor

/retest

Comment thread pkg/cvo/cvo.go Outdated

// excludeManifestsFilter is a set of regular expresions that are used
// to exclude manifest file names that match them.
excludeManifestsFilter payload.RegexpFilter
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the payload should be required to identify the manifests that match a string, rather than a regex. I do not like regex and I don't want arbitrary regular expressions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, string matching should be sufficient for what we need, will switch to that

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the payload should be required to identify the manifests that match a string...

I'm trying to catch up with the previous discussion in this PR, but if the folks setting this are creating release images, can't they just remove the manifests they don't want pushed? And if they aren't writing to release images, why is the existing overrides insufficient. Do we have the enhancement explaining how this is expected to fit together yet?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want a separate release image for this case. The release image should be the same we use everywhere.

And no, the enhancement is not ready, since I think the design of profiles needs a little bit more discussion. The purpose of this PR is simply to a stop-gap in 4.3 until we get to profiles.

@abhinavdahiya
Copy link
Copy Markdown
Contributor

Updated to use the single getOverrides method. Is that what you meant @abhinavdahiya ?

So the sync_work already has a list of overrides,

Overrides []configv1.ComponentOverride

and this change adds another style of override/skip at the syncworker level.

// exclude is an identifier used to determine which
// manifests should be excluded based on an annotation
// of the form exclude.release.openshift.io/<identifier>=true
exclude string

And the sync work can change at each sync, but the sync worker is only configured once at startup.
is that acceptable?

secondly, i think we lost the reporting excludes are set with the current change, until we have supported profiles I would like to make sure we know when users are setting this using env.

@abhinavdahiya
Copy link
Copy Markdown
Contributor

cc @smarterclayton

does this look more in-line to what you were hoping?

@csrwng
Copy link
Copy Markdown
Contributor Author

csrwng commented Nov 13, 2019

And the sync work can change at each sync, but the sync worker is only configured once at startup.
is that acceptable?

Yes I think that is acceptable because the exclude identifier will be statically configured for the cvo deployment, unlike overrides which can be modified via the api on the clusterversion CR.

secondly, i think we lost the reporting excludes are set with the current change, until we have supported profiles I would like to make sure we know when users are setting this using env.

I didn't add the condition back because of this comment from @smarterclayton

@smarterclayton
Copy link
Copy Markdown
Contributor

Looks fine to me.

Comment thread pkg/start/start.go
PayloadOverride: os.Getenv("PAYLOAD_OVERRIDE"),
ResyncInterval: minResyncPeriod,
EnableMetrics: true,
Exclude: os.Getenv("EXCLUDE_MANIFESTS"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When exclude is set we should log this.

Excluding manifests for %q

in the main logs so we can see it if we have bugs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@csrwng
Copy link
Copy Markdown
Contributor Author

csrwng commented Nov 13, 2019

listing hosted zones: Throttling: Rate exceeded\n\tstatus code: 400

/retest

@wking
Copy link
Copy Markdown
Member

wking commented Nov 13, 2019

4.3 (we have exception approval)

Is there a bug we can link from this PR for this exception?

Comment thread pkg/cvo/sync_worker.go
// exclude is an identifier used to determine which
// manifests should be excluded based on an annotation
// of the form exclude.release.openshift.io/<identifier>=true
exclude string
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want a single string, here or a []string of identifiers to exclude? Related to this thread.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wking the idea is that the identifier would be the equivalent of a single profile. If you have a different exclusion profile, you should use a different identifier. Manifests in the individual operators can contain exclusion annotations for multiple profiles.

@abhinavdahiya
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 14, 2019
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, csrwng

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 14, 2019
@openshift-merge-robot openshift-merge-robot merged commit ac394f1 into openshift:master Nov 14, 2019
jonesbr17 added a commit to jonesbr17/cluster-cloud-controller-manager-operator that referenced this pull request Nov 1, 2021
Configures the CVO to exclude manifests in deployments with external control planes
See openshift/cluster-version-operator#252
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants