Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
284 changes: 284 additions & 0 deletions enhancements/olm/max-openshift-versions-for-operators.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,284 @@
---
title: Allow OLM Managed Operators to Specify a Max OpenShift Version

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @awgreene,

I know that this EP is merged already. However, it only came to my attention now.
I raised some concerns/questions over it in the channel which might be valid we address here.

  1. We have the OCP label LABEL com.redhat.openshift.versions: v4.5-v4.7 that can be used in the image index already. What happens if a user adds the label com.redhat.openshift.versions in the image and the maxOCPversion annotation in the CSV? Also, as described in the https://github.com/openshift/enhancements/pull/592/files#r570641713 how will we ensure that the value in the annotation is compatible with minKubeversion?
  • Should not CVP verify if it?
  • Should not this annotation and its compatibility with minKubeversion be validated via the operator-framework/api and then, in this way, it is checked via operator-sdk bundle validate and etc?
  1. Could also we add the annotation for maxKubeVersion in order to make it compatible? Also, would not make sense deprecated the attr minKubeVersion and add it via annotation as well? And then, we have a minOCPVersion too?

  2. Would be possible to make this annotation be generic and valid for any vendor instead of being specific for openshift?

c/c @kevinrizza @dmesser

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @camilamacedo86 - thanks for the review!

We have the OCP label LABEL com.redhat.openshift.versions: v4.5-v4.7 that can be used in the image index already. What happens if a user adds the label com.redhat.openshift.versions in the image and the maxOCPversion annotation in the CSV?

Can you elaborate on this label and its impact? I believe that @gallettilance and I discussed this earlier and I had some concerns with overloading the existing field.

Also, as described in the https://github.com/openshift/enhancements/pull/592/files#r570641713 how will we ensure that the value in the annotation is compatible with minKubeversion? Should not CVP verify if it?

There is no guarantee that these annotation will not conflict. If either of these constraints are not met the operator will not be installed.

Should not this annotation and its compatibility with minKubeversion be validated via the operator-framework/api and then, in this way, it is checked via operator-sdk bundle validate and etc?

Again, these are independent constraints. Also, if this approach was pursued, are you proposing that the validation library hardcodes the relationship between OCP and K8s version?

Could also we add the annotation for maxKubeVersion in order to make it compatible?

This could certainly be added.

Also, would not make sense deprecated the attr minKubeVersion and add it via annotation as well? And then, we have a minOCPVersion too?

We could remove this field from existing CRDs but I do not believe that it is worth doing so until we release a new version of the CRD as existing CRs would need to be migrated or OLM would need to ship a conversion webhook.

Would be possible to make this annotation be generic and valid for any vendor instead of being specific for openshift?

While I agree with this in principle, I am not sure how users could provide specific versions for different Kubernetes variations without unique annotations. For example, how could an Operator Author generically specify that their operator's maxVersion is 4.7 for openshift but 1.21 for vanilla k8s?

Copy link

@camilamacedo86 camilamacedo86 Feb 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @awgreene,

Sorry, if I could not clarifies. My mainly questions/concerns are about:

Is it aligned with what exists already or are we planning to change what exists to be aligned with?

Otherwise, how it works and how it should be used can end up bring confusion and not to be intuitive to the end-users and us indeed. (e.g skipRange (runtime check) which does not work exactly as skips (remove from the index catalogue) )

How users will know that they did not make something wrong? How we will know what should be the expected behaviour? How can we prevent an operator be published with an OCP label and max version or minKubeversioon that is not compatible and will never be installed?

Example A:

  • LABEL com.redhat.openshift.versions: v4.5-v4.7
  • maxOCPversion=4.6

Will the operator be installed or not in 4.7?

Example B:

  • LABEL com.redhat.openshift.versions: v4.5-v4.7
  • maxOCPversion=4.8

Will the operator be installed in 4.8 or not?

  • And then, will the maxOCPversion prevent the operator be in the index catalogue as it is done for skips or will it works as skipRange (runtime check)?
  • How the users will know that they added an invalid value for this field before publish and check that the operator is not installed? Could we prevent this scenario? Would be possible to add a check to the bundle validate for that? E.g ensure that the value in maxOCPVersion is in the range of com.redhat.openshift.versions OR raise a warning that the com.redhat.openshift.versions will be overwritten by maxOCPVersion.

Could we provide a solution that is valid for upstream and downstream?

Do we really need to create a specific annotation for OCP or could we have for example maxVendorVersion and compare that version with the vendor version one? I am not sure if it would be possible, however, did we check if we could use k8s api to get the vendor version and do the check no matter the vendor kind or this information cannot be obtained by k8s api?

WDYT? Could I clarify better my concerns?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@camilamacedo86 the label decides which catalogs the bundle will be in. the min/maxVersion field decides which clusters it can be installed on.

They are related, but independent. There is nothing invalid about saying "put this bundle in the v4.8 catalog" and then saying "this bundle can't be installed on v4.8". It's a poor choice by the operator team, but there's no need to explicitly forbid it. (Among other things, a cluster admin could still reference the v4.8 catalog from another cluster version which might be able to install the operator. Also someday we're going to have the ability for admins to force the installation despite requirements not being met).

Do we really need to create a specific annotation for OCP or could we have for example maxVendorVersion and compare that version with the vendor version one? I am not sure if it would be possible, however, did we check if we could use k8s api to get the vendor version and do the check no matter the vendor kind or this information cannot be obtained by k8s api?

what would operator teams set "maxVendorVersion" to? Openshift has both an OCP version and a kube version. Both versions can be retrieved and compared.

Copy link

@camilamacedo86 camilamacedo86 Feb 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the clarifications. Fell free to close this one :-)

authors:
- "@awgreene"
reviewers:
- TBD
- "@kevinrizza"
approvers:
- "@ecordell"
- "@spadgett"
creation-date: 2021-01-11
last-updated: 2021-01-19
status: provisional
see-also:
- N/A
replaces:
- N/A
superseded-by:
- N/A
---

# Allow OLM Managed Operators to Specify a Max OpenShift Version

## Release Signoff Checklist

- [ ] Enhancement is `implementable`
- [ ] Design details are appropriately documented from clear requirements
- [ ] Test plan is defined
- [ ] Operational readiness criteria is defined
- [ ] Graduation criteria for dev preview, tech preview, GA
- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/)

## Definitions

**OLM Managed Operators**: Operators that are installed, managed, and upgraded by the [Operator Lifecycle Manager (OLM)](https://github.com/operator-framework/operator-lifecycle-manager) project.

**Operator Author**: The team responsible for the development of an operator.

**Index Maintainers**: The tean that maintains a collection of [Operator Bundles](https://olm.operatorframework.io/docs/glossary/#bundle) in the form of an [Index](https://olm.operatorframework.io/docs/glossary/#index).

**Minor OpenShift Upgrade**: When the OpenShift Cluster is upgraded to the next minor version, for example the upgrade from version 4.6 to 4.7. See [Semantic Versioning](https://semver.org/) for more details.

**Patch OpenShift Upgrades**: When the OpenShift Cluster is upgraded to the next patch version, for example the upgrade from version 4.6.1 to 4.6.2. See [Semantic Versioning](https://semver.org/) for more details.

**maxOpenShiftVersion**: The maximum `{major}.{minor}` OpenShift version that an operator supports, example: `4.5`.

## Summary

OpenShift Cluster Admins introduce new services to their clusters by way of OLM Managed Operators. When performing a Minor OpenShift Upgrade, there is no guarantee that these operators will continue to run on the upgraded cluster version.

Many operators shipped by Red Hat are rigorously tested on a specific set of OpenShift versions. The teams responsible for this testing know exactly which `{major}.{minor}` versions of OpenShift that their operators can run on. OLM should allow Operator Authors to specify the maximum `{major}.{minor}` version of OpenShift that their operator may run on.

With this information OLM should:

- Prevent Minor OpenShift Upgrades when it is possible to determine that one or more installed operators will not run on the next minor OpenShift version.
- Prevent the operator from being installed on OpenShift clusters whose version is greater than the `maxOpenShiftVersion` specified by the operator.

## Motivation

The primary purpose of this enhancement is to define how OpenShift can protect existing services introduced by OLM Managed Operators when performing Minor OpenShift Upgrades.

### Goals

- Prevent Minor OpenShift Upgrades when an installed operator declares that it does not support the next minor OpenShift version.
- Warn Cluster Admins when an installed operator does not explicitly declare support for the next minor OpenShift version.
- Prevent OLM from installing operators whose `maxOpenShiftVersion` is less than the current [ClusterVersion](https://github.com/openshift/api/blob/a9e731090f5ed361e5ab887d0ccd55c1db7fc633/config/v1/types_cluster_version.go#L11-L13).

### Non-Goals

- Ensure that services will not be disrupted from a cluster upgrade.
- Guarantee that all installed OLM Managed Operators will continue to run after a Minor OpenShift Upgrade.
- Prevent Patch OpenShift Upgrades.

## Proposal

The [ClusterVersionOperator](https://github.com/openshift/cluster-version-operator) provides OLM with the means to prevent Minor OpenShift Upgrades by setting the `operator-lifecycle-manager's` [ClusterOperator's upgradeable condition](https://github.com/openshift/api/blob/5935a5beec4bb8e1e81dd0fe9ebc2af36b9a09ae/config/v1/types_cluster_operator.go#L171-L175) to `False`.

OLM currently reports that it is upgradeable as soon its deployments are successfully rolled out. OLM should instead set its `upgradeable condition` to reflect if any OLM Managed Operators indicate that they will not run on the next minor OpenShift version.

The bulk of this enhancement focuses on defining:

- How operators will define their `maxOpenShiftVersion`.
- The steps that OLM will take to determine upgrade safety based on the collection of installed operators.
- The steps that OLM will take to prevent the operator from being installed on OpenShift clusters whose version is greater than the `maxOpenShiftVersion` specified by the operator.

### Allowing Operator Authors to Define a Maximum OpenShift Version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I specify both minKubeversionand maxOpenShiftVersion? If yes, what happens if minKubeversion is v0.19 and maxOpenShiftVersion is v4.6? I suggest you call out the various permutations / combinations and the expected result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they're independent constraints, I would expect both of them must be met to install, along with any other constraints.


Given that this is an OpenShift specific feature, the OLM team would rather not expand the `Spec` of the [ClusterServiceVersion (CSV)](https://olm.operatorframework.io/docs/concepts/crds/clusterserviceversion/), used on vanilla Kubernetes clusters, to include a `maxOpenShiftVersion` field.

Instead, OLM will rely on the presence of an annotation on a CSV to specify the operator's `maxOpenShiftVersion`. This annotation can be added in one of two ways:

1. By defining the `maxOpenShiftVersion` as a property of the bundle using the [Declarative Index Config](https://github.com/operator-framework/enhancements/blob/1cf4a0363d918e810f638036539388622265d466/enhancements/declarative-index-config.md) (recommended approach).
2. By adding the annotation to the CSV directly.

Let's review the benefits associated with defining an operator's `maxOpenShiftVersion` using the Declarative Index Config.

#### Defining the Maximum OpenShift Version using the Declarative Index Config

The Declarative Index Config provides the means to define [indexes](https://olm.operatorframework.io/docs/glossary/#index) in a declarative way. When this feature is available, OLM will allow Index Maintainers to specify each bundles `maxOpenShiftVersion`, an example of which can be seen below:

```json=
"etcd": {
"name": "etcd",
"bundles": [
{
"path": "quay.io/foo/etcdv0.0.1",
"version": "v0.0.1",
"channels": ["alpha"],
"operators.coreos.com/maxOpenShiftVersion": "4.6", # Prevent upgrades to 4.7
},
{
"path": "quay.io/foo/etcdv0.0.2",
"version": "v0.0.2",
"channels": ["alpha"],
"operators.coreos.com/maxOpenShiftVersion": "4.7", # Prevent upgrades to 4.8
},
],
...
...
...
}
```

Like all bundle properties, these properties will be propagated to the CSV as an annotation. The value of the bundle's `maxOpenShiftVersion` can be easily updated without releasing a new bundle as soon as the existing bundle has been tested against a new version of OpenShift.

#### How OLM Determines Upgradeability Status

When OLM reconciles a CSV it will check the set of annotations for a key that matches `operators.coreos.com/maxOpenShiftVersion`. The value associated with this key specifies the maximum `{major}.{minor}` OpenShift version that the operators supports. An example of this annotation can be seen below:

```yaml=
apiVersion: operators.coreos.com/v1alpha1
kind: ClusterServiceVersion
metadata:
annotations:
# Prevent upgrades to OpenShift Version 4.9
operators.coreos.com/maxOpenShiftVersion: "4.8"
```

Assume that the CSV shown above is the only CSV on the Cluster. In this case, OLM would compare the value of the `operators.coreos.com/maxOpenShiftVersion` annotation against the version of the cluster to decide its upgradeable status.

As this is an opt-in feature, there are instances where OLM will be unable to determine if the upgrade is safe:

- The CSV is missing the `operators.coreos.com/maxOpenShiftVersion` annotation
- The value associated with the `operators.coreos.com/maxOpenShiftVersion` is not a valid `{major}.{minor}` Semantic Version (Valid Example: x.y).

OLM will report the following upgradeable conditions based on the collection of CSVs present on cluster:

| Key | Meaning |
| ---------------------| -------------------------------------------------------------------------------------------------- |
| Upgradeable CSV | A CSV with a valid upgradeable annotation that will run on the next Minor version of OpenShift |
| Non-Upgradeable CSV | A CSV with a valid upgradeable annotation that will not run on the next Minor version of OpenShift |
| Undeterminable CSV | A CSV that does not include a valid upgradeable annotation |

| Upgradeable CSV Present | Indeterminate CSV Present | Non-Upgradeable CSV Present | Status | Message |
| ----------------------- | ------------------------- | --------------------------- |---------------- | -------------------------------------------------------------------------------------------------- |
| Yes | No | No | Upgradeable | Ready for upgrade |
| Don't Care | Yes | No | Upgradeable | The following operators may not run on the next OpenShift Version: namespace/foo, namespace/bar" |
| Don't Care | Don't Care | Yes | Not Upgradeable | The following operators will not run on the next OpenShift Version: namespace/foo, namespace/bar" |

### Preventing OLM Managed Operators Installation on Unsupported OpenShift Cluster Versions

OLM Managed Operators can already specify a `minKubeVersion` on their Operator Bundles which prevents OLM from installing the operator on a Kubernetes Clusters whose version is less than the specified value.
Similarly, the resolver will be updated to use the `operators.coreos.com/maxOpenShiftVersion` annotation to prevent operators from being installed on an OpenShift Version that is not supported. The resolver will be updated to:

1. Check for the presence of the ClusterVersion API.
2. If present, the resolver will check if the operator being installed includes a `maxOpenShiftVersion`. If a `maxOpenShiftVersion` is specified and the cluster version is less than the specified version, OLM will prevent the operator from being installed.

### User Stories

#### Story 1

As an OpenShift cluster admin, I want OLM to block cluster upgrades if one or more of the operators that it manages will will not run on the next minor OpenShift version.

#### Story 2

As an operator author using OLM to manage the lifecycle of my operator, I want OLM to prevent upgrades to specific `{major}.{minor}` versions of OpenShift that I know are not supported by my operator.

#### Story 3

As an operator author using OLM to manage the lifecycle of my operator, I want OLM to prevent installations of my operators on OpenShift Clusters whose versions are greater the `maxOpenShiftVersion` defined by my operator.

#### Story 4

As the index maintainer, I want to be able to dynamically change which `{major}.{minor}` OpenShift versions that are supported by my operator without releasing a new version of the CSV.

### Risks and Mitigations

- The most immediate risk to this solution is the fact that OLM will prevent Minor OpenShift Upgrades when an OLM Managed Operator does not support the next minor OpenShift version.
This risk is mitigated given that admins will still be able to apply security fixes with patch updates whether or not OLM is blocking Minor OpenShift Upgrades. This gives them a safe window to either remove the minor-blocking operator or update it to a version that is compatible with the next minor OpenShift version.
In extreme cases, cluster admins can override the CVO's upgrade checks via means documented elsewhere.

## Design Details

### Open Questions

- When a CSV is present that does not include a `maxOpenShiftVersion` annotation, how will this information be surfaced by the UI and CLI?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has the UI team been looped into this so they know work is coming for them?
cc @spadgett


### Test Plan

#### Proposed Unit Tests

- Logic used to determine upgrade safety should be thoroughly tested

#### Proposed E2E Tests

- The cluster upgrade is not blocked by CSVs that do not report a `maxOpenShiftVersion`.
- The cluster upgrade is not blocked by CSVs that report an invalid `maxOpenShiftVersion`.
- The cluster upgrade is not blocked by CSVs that report a `maxOpenShiftVersion` greater than the current OpenShift version.
- The cluster upgrade is blocked by CSVs that report a `maxOpenShiftVersion` less than or equal to the current OpenShift version..

For operators that specify a `maxOpenShiftVersion`:

- The operator can be installed on a Kubernetes cluster.
- The operator can be installed on OpenShift clusters whose version is less than or equal to the `maxOpenShiftVersion` specified by the operator.
- The operator can be installed on OpenShift clusters whose version is greater than the `maxOpenShiftVersion` specified by the operator.

For operators that do not specify a `maxOpenShiftVersion`:

- The operator continues to be installed on both OpenShift and vanilla Kubernetes clusters.

### Graduation Criteria

The goal of this enhancement is to provide Cluster Admins some level of assurance that service made available via OLM Managed Operators continue to remain available on a targeted OpenShift Cluster Version.

This feature will initially be introduced as a Generally Available Feature, but upgrade compatibility checks may be added overtime.

Proposed GA Features:

- OLM determines upgrade safety based on CSV annotations.
- OLM prevents operators from being installed on OpenShift clusters whose versions are greater than than the specified `maxOpenShiftVersion`.
- Console highlights upgrade safety in the UI.
- User facing documentation available at olm.operatorframework.io.
- Comprehensive unit tests exist.
- Comprehensive e2e tests exist.

#### Removing a deprecated feature

Not applicable.

#### Upgrade / Downgrade Strategy

Not applicable.

#### Version Skew Strategy

The proposed feature should only require integration with the CVO component and OLM Managed Operators.

## Implementation History

- Initial enhancement proposal created.

## Drawbacks

- OpenShift users might be upset when a Minor OpenShift Upgrade is block by OLM if they are unfamiliar with this feature.
- There are instances where an operator might not be supported on the targeted version that the operator is being upgraded to. If the customer is unable to remove the existing operator, this may place them in a position where they must choose between upgrading the cluster or removing the operator.
This is arguably better than upgrading the cluster only to find out that a core service is no longer available on cluster, but I suspect that multiple tickets will be opened against OLM and the operators it installs when the customer finds themselves unable to upgrade.

## Possible Future Work
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another bit of future work that's at least related to this is providing the ability to override the install requirement.


### Determine upgrade safety using OLM Managed Operator RBAC

In addition to giving Operator Authors the ability to declare a maximum OpenShift Version, OLM could determine upgrade readiness based on default OpenShift GVKs and those required by installed operators. At a high level, the following workflow could be implemented:

1. CVO provides OLM with a list of default GVKs at OpenShift version `x.y.z`. This information should be provided by the CVO team, they could possibly retrieve it from the same source that stores OpenShift upgrade channels.
2. OLM retrieves the list of default GVKs available on OpenShift version `x.y.z`.
3. OLM generates a list of required APIs by aggregating the GVKs defined as permissions and ClusterPermissions in existing ClusterServiceVersions. Any operators that uses a wildcard (*) for the Group, Version, or Kind in its permissions/clusterPermissions would automatically get flagged as OLM would be unable to guarantee that all GVK required by the operator would be available on cluster version `x.y.z`.
4. OLM would then compare the list of apis on the cluster at version `x.y.z` against those required by installed OLM Managed Operators. If a required GVK is unavailable, OLM would report that it is no longer upgradeable.
If the required GVKs are available, OLM would mark itself as upgradeable, possibly using a special Message/Reason to signified that it has compared required GVKs against available GVKs at target version.
If OLM is unable to guarantee that the required GVKs are available on OpenShift version `x.y.z` that could be highlighted as well, the cluster admin could then force the upgrade if they feel comfortable doing so.

#### Pros

- This feature is available by default and does not require any additional effort from operator authors, giving some level of assurance that the upgrade is safe.

#### Cons

- This solution forces the CVO (or API team?) to host and provide available GVK information for each `{major}.{minor}` OpenShift version
- This solution could falsely identify an upgrade as safe. Rather than tell customers that their operators will keep working, we should specify that all GVKs required by the operator are available on the `x.y.z` version of OpenShift.
- There are many instances where an operator may not be supported on a specific version of OpenShift outside of required GVKs versus available GVKs.
- Operator authors must specify complete GVKs in their CSVs if OLM is to report true/false for upgrade safety.

## Infrastructure Needed

No additional infrastructure is needed.