Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 56 additions & 33 deletions enhancements/update/eus-upgrades-mvp.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,34 +117,31 @@ use.
This allows us to *inform* the admin for removals that are more than one minor
version away and *block* upgrades for removals which are imminent.

### MCO - Enforce OpenShift's defined host component version skew policies

The MCO, will set Upgradeable=False whenever any MachineConfigPool has one more
more nodes present which fall outside of a defined list of constraints. For
instance, if OpenShift has a defined Kubelet Version Skew of N-1, the node
constraints enforced by the MCO defined in OCP 4.7 (Kube 1.20) would be as follows:

```yaml
node.status.nodeInfo.kubeletVersion:
- v1.20
```

If the policy were to change allowing for a version skew of N-2, v1.19 would be
added to the list of acceptable matches. As a result a cluster which had been
upgraded from 4.6 to 4.7 would allow a subsequent upgrade to 4.8 as long as all
kubelets were either v1.19 or v1.20. The 4.8 MCO would then evaluate the Upgradeable
condition based on its constraints, if v1.19 weren't allowed it would then
inhibit upgrades to 4.9. This means the MCO must set Upgradeable=False until it
has confirmed constraints have been met.

```yaml
node.status.nodeInfo.kubeletVersion:
- v1.20
- v1.19
```

The MCO is not responsible for defining these constraints and constraints are
only widened whenever we have CI testing proves them to be safe.
### APIServer - Enforce OpenShift's defined kubelet version skew policies

The API Server Operator will set `Upgradeable=False` whenever any of the nodes
within the cluster are at the skew limit; that is, when an upgrade of the API
Server would exceed the allowable kubelet version skew. For instance, if
OpenShift has a defined kubelet version skew of N-1, the API Server Operator
would report `Upgradeable=True` if all of the nodes are at N, and
`Upgradeable=False` if at least one of the nodes is not up to date. If the
kubelet skew policy were to change, allowing for a version skew of N-2, the API
Server Operator would report `Upgradeable=True` if all of the nodes are at N or
N-1, and `Upgradeable=False` if any of the nodes are at N-2.

It's critical to remember that the `Upgradeable` flag doesn't block z-stream
updates. This detail allows the flag to be used indiscriminately, since
OpenShift will never bump the API Server version in a z-stream update.

It's also worth pointing out that this mechanism is a heuristic. It isn't
necessarily guaranteed that every y-stream update will include a bump to the
API Server, so there may be false positives. In a similar vein, in order for
this check to remain valid, OpenShift will need to ensure that there are no
API Server version bumps larger than N+1 between y-stream updates. Otherwise,
the check may return a false negative (e.g. an existing cluster has N-1 skew
and then applies a y-stream update where the API Server version bumps N+2,
resulting in N-3 skew). If there are larger version bumps, the skew check will
need to be modified in a backport to the previous version.

These changes will need to be backported to 4.7 prior to 4.7 EOL.

Expand Down Expand Up @@ -304,8 +301,8 @@ that's broadly scoped as "EUS 4.6 to EUS 4.10 Validator"?

- CI tests are necessary which attempt to upgrade while violating kubelet to API
compatibility, ie: 4.6 to 4.7 upgrade with MachineConfigPools paused, then check
for Upgradeable=False condition to be set by MCO assuming that our rules only allow
for N-1 skew.
for Upgradeable=False condition to be set by the API Server Operator, assuming
that our rules only allow for N-1 skew.
- CI tests are necessary which install an OLM Operator which expresses a maxKubeVersion
or maxOCPVersion equal to the current cluster version and checks for Upgradeable=False
on OLM
Expand Down Expand Up @@ -347,7 +344,7 @@ In general, we try to use the same stages (alpha, beta, GA), regardless how the

These are generalized examples to consider, in addition to the aforementioned [maturity levels][maturity-levels].

##### Dev Preview -> Tech Preview
#### Dev Preview -> Tech Preview

- Ability to utilize the enhancement end to end
- End user documentation, relative API stability
Expand All @@ -356,7 +353,7 @@ These are generalized examples to consider, in addition to the aforementioned [m
- Enumerate service level indicators (SLIs), expose SLIs as metrics
- Write symptoms-based alerts for the component(s)

##### Tech Preview -> GA
#### Tech Preview -> GA

- More testing (upgrade, downgrade, scale)
- Sufficient time for feedback
Expand All @@ -368,7 +365,7 @@ These are generalized examples to consider, in addition to the aforementioned [m
**For non-optional features moving to GA, the graduation criteria must include
end to end tests.**

##### Removing a deprecated feature
#### Removing a deprecated feature

- Announce deprecation and support policy of the existing feature
- Deprecate the feature
Expand All @@ -393,6 +390,32 @@ The idea is to find the best form of an argument why this enhancement should _no

## Alternatives

### MCO Kubelet Skew Enforcement

Instead of the API Server Operator enforcing kubelet skew compliance through
the `Upgradeable` flag, the MCO could provide this functionality. Either of
these two operators are the obvious choice for such a check since they are
responsible for both halves of the kubelet-API Server interaction. It makes
more sense for the leading component to implement the check, however, since
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a critical point for discussion. The two arguments in the other direction are locality to the resolving action and locality of skew policy knowledge.

locality to the resolving action is about where a user has to go to resolve an Upgradeable=False condition. The natural assumption we've seen from users and support in the past is that the operator with the unexpected condition is where to take a resolving action. Now in this particular case, the kube-apiserver will be the spot for a resolving action 0% of the time. The kube-apiserver will never be able to upgrade a kubelet to close the skew. The MCO will sometimes (often?) be the spot to close skew. Either by waiting for a rollout or by unpausing a machine-config pool. It's not perfect (byo-host for instance), but it is significantly more than the 0% of the kube-apiserver.

locality of skew policy knowledge is about which component knows its skew policy. It's worth remembering that even in a kube control plane, skew policy between kubelet, kube-controller-manager, kube-scheduler, SDN (and its various variants) are different. Expanding to everything in our payload (and what they subsequently install) it gets even larger. While it is possible for the kube-apiserver to be a rally point for every team enforcing their skew policy, the distributed nature of clusteroperator conditions means that knowledge can be codified close to the component with the skew policy knowledge. In this case, the MCO is the closest component (since we lack a kubelet operator).

In the general case, the locality to the resolving action and the locality to the skew policy knowledge aligns extremely well. In this specific case, they meet in the MCO.

I've taken the liberty of demonstrating in a PR openshift/machine-config-operator#2552

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I buy the first argument, but I'm struggling with "locality of skew policy knowledge". Between the kubelet and the API Server, it would be the API Server that dictates the skew policy based on its adherence to a backward-compatible API. This boils down to the same argument I've made in the PR: the leading component determines the acceptable skew. Based on that understanding, I would think the API Server Operator would be the place to at least store the skew policy.

Copy link
Contributor

@sttts sttts Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be the API Server that dictates the skew policy based on its adherence to a backward-compatible API

This is the point: the kube-apiserver dictates the API versions served only, the clients know which API versions they need and depend on. For that reason in upstream the API-Machinery team is NOT the one responsible for testing and fixing the skew of a release, but node team is. The API-Machinery team just follows the known, mechanical procedures of introducing new API versions, deprecating and removing old ones.

This is what David means with "locality of skew policy knowledge": the API consumers know and not the API provider.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or in yet other words:

  • locality to resolve skew problem: UX problem
  • locality to enforce skew policy: ppl problem, who owns the skew policy (kube-apiserver does not).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two sides: server and client. The client would be using the correct version of the API, but the server may have a backwards incompatible change where it does not serve the correct payload to the client. In turn, the client may have a backwards incompatible change, and not know what to do with a particular payload. So there needs to be two checks: one in the MCO and one within the API. Each component has a responsibility to check the skew, since either side could break.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly disagree with the server having to know its clients. This is not how it works in any of the API life cycle policies we apply in Kubernetes and OpenShift. The server has O(100) clients in an OpenShift cluster. It cannot follow individual skew policies, and I don't see why we should introduce another API philosophy here.

a backwards incompatible change

This is disallowed for the server and hence never happens (if we are doing our job well, and nearly always did so in the past).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not asking the server to know its clients though. We're asking its Operator to know them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To recap the discussion we had on this for those that were not able to attend:

  • The API server ships with a set of built-in types (pod and node).
  • The upstream project via API review/human or project operator oversight ensures that skew between API server and kubelet is n-2.
  • The API server operator is being asked to report Upgradeable=False to capture that API review/human/operational oversight because it is the component that must not upgrade.
  • The Machine Config Operator is the component that must upgrade, so reporting Upgradeable=False is confusing.
  • The Machine Config Operator should report a message in its reporting that admins should perform an action.

I agree that the API server (and its operators) should not have to understand all API types it serves and all its clients. I think its reasonable for API types that are built-in, whose upstream set of human operators/api-reviewers/etc. are trying to ensure that skew in either direction (api-server or kubelet) is preserved that both operators in our context should report a desired action (apiserver operator should say i cannot upgrade, machine config operator should promote an upgrade).

For types that are not built-in (CRDs, etc.), it is extremely reasonable to say that the future philosophy is not for the API server to block its own upgrade on those types because the upgrade of the API server itself is not what will evolve those CRD definitions, unlike the built-in types.

it's the leading edge that's going to violate the skew compliance first. In the
case of OpenShift, that leading edge is the API Server and it makes more sense
for it to determine whether a step forward is going to violate the skew policy.
On top of that, the gating mechanism we have today is the `Upgradeable=False`
flag, which indicates that a particular operator cannot be upgraded, thereby
halting the upgrade of the entire cluster. It doesn't make sense for the MCO to
assert this condition, since an upgrade of the MCO and its operands (RHCOS)
would actually reduce the skew. If the MCO were to use this mechanism to
enforce the skew, it would be a reinterpretation of the function of that flag
to instead indicate that the entire cluster cannot be upgraded. It's a subtle
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking this vs. the API, and yeah, current godocs are:

Upgradeable indicates whether the operator is in a state that is safe to upgrade...

Copy link
Member

@wking wking May 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although what's being proposed here means that should actually be:

Upgradeable indicates whether the operator is safe to upgrade based on the current cluster state...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've floated openshift/api#926 with this update.

but important distinction that preserves low coupling between individual
operators.

### MCO Rollout Gating

(This section was written assuming that the MCO would be responsible for
enforcing the node skew policy, but this plan has since been modified to make
the API Server Operator responsible for this enforcement.)

Rather than having MCO enforce version skew policies between OS managed
components and operator managed components it could simply set Upgradeable=False
whenever a rollout is in progress. This would preclude minor version upgrades in
Expand Down