-
Notifications
You must be signed in to change notification settings - Fork 533
eus-upgrades-mvp: don't enforce skew check in MCO #762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -117,34 +117,31 @@ use. | |
| This allows us to *inform* the admin for removals that are more than one minor | ||
| version away and *block* upgrades for removals which are imminent. | ||
|
|
||
| ### MCO - Enforce OpenShift's defined host component version skew policies | ||
|
|
||
| The MCO, will set Upgradeable=False whenever any MachineConfigPool has one more | ||
| more nodes present which fall outside of a defined list of constraints. For | ||
| instance, if OpenShift has a defined Kubelet Version Skew of N-1, the node | ||
| constraints enforced by the MCO defined in OCP 4.7 (Kube 1.20) would be as follows: | ||
|
|
||
| ```yaml | ||
| node.status.nodeInfo.kubeletVersion: | ||
| - v1.20 | ||
| ``` | ||
|
|
||
| If the policy were to change allowing for a version skew of N-2, v1.19 would be | ||
| added to the list of acceptable matches. As a result a cluster which had been | ||
| upgraded from 4.6 to 4.7 would allow a subsequent upgrade to 4.8 as long as all | ||
| kubelets were either v1.19 or v1.20. The 4.8 MCO would then evaluate the Upgradeable | ||
| condition based on its constraints, if v1.19 weren't allowed it would then | ||
| inhibit upgrades to 4.9. This means the MCO must set Upgradeable=False until it | ||
| has confirmed constraints have been met. | ||
|
|
||
| ```yaml | ||
| node.status.nodeInfo.kubeletVersion: | ||
| - v1.20 | ||
| - v1.19 | ||
| ``` | ||
|
|
||
| The MCO is not responsible for defining these constraints and constraints are | ||
| only widened whenever we have CI testing proves them to be safe. | ||
| ### APIServer - Enforce OpenShift's defined kubelet version skew policies | ||
|
|
||
| The API Server Operator will set `Upgradeable=False` whenever any of the nodes | ||
| within the cluster are at the skew limit; that is, when an upgrade of the API | ||
| Server would exceed the allowable kubelet version skew. For instance, if | ||
| OpenShift has a defined kubelet version skew of N-1, the API Server Operator | ||
| would report `Upgradeable=True` if all of the nodes are at N, and | ||
| `Upgradeable=False` if at least one of the nodes is not up to date. If the | ||
| kubelet skew policy were to change, allowing for a version skew of N-2, the API | ||
| Server Operator would report `Upgradeable=True` if all of the nodes are at N or | ||
| N-1, and `Upgradeable=False` if any of the nodes are at N-2. | ||
|
|
||
| It's critical to remember that the `Upgradeable` flag doesn't block z-stream | ||
| updates. This detail allows the flag to be used indiscriminately, since | ||
| OpenShift will never bump the API Server version in a z-stream update. | ||
|
|
||
| It's also worth pointing out that this mechanism is a heuristic. It isn't | ||
| necessarily guaranteed that every y-stream update will include a bump to the | ||
| API Server, so there may be false positives. In a similar vein, in order for | ||
| this check to remain valid, OpenShift will need to ensure that there are no | ||
| API Server version bumps larger than N+1 between y-stream updates. Otherwise, | ||
| the check may return a false negative (e.g. an existing cluster has N-1 skew | ||
| and then applies a y-stream update where the API Server version bumps N+2, | ||
| resulting in N-3 skew). If there are larger version bumps, the skew check will | ||
| need to be modified in a backport to the previous version. | ||
|
|
||
| These changes will need to be backported to 4.7 prior to 4.7 EOL. | ||
|
|
||
|
|
@@ -304,8 +301,8 @@ that's broadly scoped as "EUS 4.6 to EUS 4.10 Validator"? | |
|
|
||
| - CI tests are necessary which attempt to upgrade while violating kubelet to API | ||
| compatibility, ie: 4.6 to 4.7 upgrade with MachineConfigPools paused, then check | ||
| for Upgradeable=False condition to be set by MCO assuming that our rules only allow | ||
| for N-1 skew. | ||
| for Upgradeable=False condition to be set by the API Server Operator, assuming | ||
| that our rules only allow for N-1 skew. | ||
| - CI tests are necessary which install an OLM Operator which expresses a maxKubeVersion | ||
| or maxOCPVersion equal to the current cluster version and checks for Upgradeable=False | ||
| on OLM | ||
|
|
@@ -347,7 +344,7 @@ In general, we try to use the same stages (alpha, beta, GA), regardless how the | |
|
|
||
| These are generalized examples to consider, in addition to the aforementioned [maturity levels][maturity-levels]. | ||
|
|
||
| ##### Dev Preview -> Tech Preview | ||
| #### Dev Preview -> Tech Preview | ||
|
|
||
| - Ability to utilize the enhancement end to end | ||
| - End user documentation, relative API stability | ||
|
|
@@ -356,7 +353,7 @@ These are generalized examples to consider, in addition to the aforementioned [m | |
| - Enumerate service level indicators (SLIs), expose SLIs as metrics | ||
| - Write symptoms-based alerts for the component(s) | ||
|
|
||
| ##### Tech Preview -> GA | ||
| #### Tech Preview -> GA | ||
|
|
||
| - More testing (upgrade, downgrade, scale) | ||
| - Sufficient time for feedback | ||
|
|
@@ -368,7 +365,7 @@ These are generalized examples to consider, in addition to the aforementioned [m | |
| **For non-optional features moving to GA, the graduation criteria must include | ||
| end to end tests.** | ||
|
|
||
| ##### Removing a deprecated feature | ||
| #### Removing a deprecated feature | ||
|
|
||
| - Announce deprecation and support policy of the existing feature | ||
| - Deprecate the feature | ||
|
|
@@ -393,6 +390,32 @@ The idea is to find the best form of an argument why this enhancement should _no | |
|
|
||
| ## Alternatives | ||
|
|
||
| ### MCO Kubelet Skew Enforcement | ||
|
|
||
| Instead of the API Server Operator enforcing kubelet skew compliance through | ||
| the `Upgradeable` flag, the MCO could provide this functionality. Either of | ||
| these two operators are the obvious choice for such a check since they are | ||
| responsible for both halves of the kubelet-API Server interaction. It makes | ||
| more sense for the leading component to implement the check, however, since | ||
| it's the leading edge that's going to violate the skew compliance first. In the | ||
| case of OpenShift, that leading edge is the API Server and it makes more sense | ||
| for it to determine whether a step forward is going to violate the skew policy. | ||
| On top of that, the gating mechanism we have today is the `Upgradeable=False` | ||
| flag, which indicates that a particular operator cannot be upgraded, thereby | ||
| halting the upgrade of the entire cluster. It doesn't make sense for the MCO to | ||
| assert this condition, since an upgrade of the MCO and its operands (RHCOS) | ||
| would actually reduce the skew. If the MCO were to use this mechanism to | ||
| enforce the skew, it would be a reinterpretation of the function of that flag | ||
| to instead indicate that the entire cluster cannot be upgraded. It's a subtle | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just checking this vs. the API, and yeah, current godocs are:
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Although what's being proposed here means that should actually be:
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've floated openshift/api#926 with this update. |
||
| but important distinction that preserves low coupling between individual | ||
| operators. | ||
|
|
||
| ### MCO Rollout Gating | ||
|
|
||
| (This section was written assuming that the MCO would be responsible for | ||
| enforcing the node skew policy, but this plan has since been modified to make | ||
| the API Server Operator responsible for this enforcement.) | ||
|
|
||
| Rather than having MCO enforce version skew policies between OS managed | ||
| components and operator managed components it could simply set Upgradeable=False | ||
| whenever a rollout is in progress. This would preclude minor version upgrades in | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a critical point for discussion. The two arguments in the other direction are locality to the resolving action and locality of skew policy knowledge.
locality to the resolving action is about where a user has to go to resolve an Upgradeable=False condition. The natural assumption we've seen from users and support in the past is that the operator with the unexpected condition is where to take a resolving action. Now in this particular case, the kube-apiserver will be the spot for a resolving action 0% of the time. The kube-apiserver will never be able to upgrade a kubelet to close the skew. The MCO will sometimes (often?) be the spot to close skew. Either by waiting for a rollout or by unpausing a machine-config pool. It's not perfect (byo-host for instance), but it is significantly more than the 0% of the kube-apiserver.
locality of skew policy knowledge is about which component knows its skew policy. It's worth remembering that even in a kube control plane, skew policy between kubelet, kube-controller-manager, kube-scheduler, SDN (and its various variants) are different. Expanding to everything in our payload (and what they subsequently install) it gets even larger. While it is possible for the kube-apiserver to be a rally point for every team enforcing their skew policy, the distributed nature of clusteroperator conditions means that knowledge can be codified close to the component with the skew policy knowledge. In this case, the MCO is the closest component (since we lack a kubelet operator).
In the general case, the locality to the resolving action and the locality to the skew policy knowledge aligns extremely well. In this specific case, they meet in the MCO.
I've taken the liberty of demonstrating in a PR openshift/machine-config-operator#2552
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I buy the first argument, but I'm struggling with "locality of skew policy knowledge". Between the kubelet and the API Server, it would be the API Server that dictates the skew policy based on its adherence to a backward-compatible API. This boils down to the same argument I've made in the PR: the leading component determines the acceptable skew. Based on that understanding, I would think the API Server Operator would be the place to at least store the skew policy.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the point: the kube-apiserver dictates the API versions served only, the clients know which API versions they need and depend on. For that reason in upstream the API-Machinery team is NOT the one responsible for testing and fixing the skew of a release, but node team is. The API-Machinery team just follows the known, mechanical procedures of introducing new API versions, deprecating and removing old ones.
This is what David means with "locality of skew policy knowledge": the API consumers know and not the API provider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or in yet other words:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two sides: server and client. The client would be using the correct version of the API, but the server may have a backwards incompatible change where it does not serve the correct payload to the client. In turn, the client may have a backwards incompatible change, and not know what to do with a particular payload. So there needs to be two checks: one in the MCO and one within the API. Each component has a responsibility to check the skew, since either side could break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly disagree with the server having to know its clients. This is not how it works in any of the API life cycle policies we apply in Kubernetes and OpenShift. The server has O(100) clients in an OpenShift cluster. It cannot follow individual skew policies, and I don't see why we should introduce another API philosophy here.
This is disallowed for the server and hence never happens (if we are doing our job well, and nearly always did so in the past).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not asking the server to know its clients though. We're asking its Operator to know them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To recap the discussion we had on this for those that were not able to attend:
I agree that the API server (and its operators) should not have to understand all API types it serves and all its clients. I think its reasonable for API types that are built-in, whose upstream set of human operators/api-reviewers/etc. are trying to ensure that skew in either direction (api-server or kubelet) is preserved that both operators in our context should report a desired action (apiserver operator should say i cannot upgrade, machine config operator should promote an upgrade).
For types that are not built-in (CRDs, etc.), it is extremely reasonable to say that the future philosophy is not for the API server to block its own upgrade on those types because the upgrade of the API server itself is not what will evolve those CRD definitions, unlike the built-in types.