-
Notifications
You must be signed in to change notification settings - Fork 1.9k
OSDOCS#6630: second iteration of how updates work doc #64077
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@skopacz1: This pull request references OSDOCS-6630 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
| Certain conditions can prevent updates from proceeding. | ||
| These conditions are either determined by the CVO itself, or reported by individual cluster Operators that detect some details about the cluster that the Operator considers problematic for the update. | ||
|
|
||
| // to do: potentially add an example of a precondition to the bullet above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in reference to this discussion thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think coming up with a specific example would be helpful, I'll try to provide one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one possible option would be LowDesiredVersion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you said that LowDesiredVersion only applies to 4.14 and later: unless there's another good example that applies to earlier versions, I think I'll save the inclusion of this specific example for another PR so I can merge this current PR into all live versions of the doc
| While the additional update actions take place, these cluster Operators temporarily set their `Progressing` condition to `True`. | ||
| ==== | ||
|
|
||
| // to do: potentially reword the note above to clarify that specific resources are being applied at one time, and not necessarily all the resources for that component. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in reference to this discussion thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the point about making the doc use "Cluster Operator" (an OpenShift component == piece of software with a bunch of manifests) and ClusterOperator (is a cluster resource created with a single manifest) consistently (and tuning the descriptions around) could be a useful improvement.
| Message: Nodes with substantial numbers of containers and CPU contention may not reconcile machine configuration https://bugzilla.redhat.com/show_bug.cgi?id=2111817#c22 | ||
| ---- | ||
|
|
||
| // to do: determine whether the rest of the lines in this module should still be included, since this is pretty in-depth even for this sort of descriptive doc, according to Trevor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in reference to this discussion thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if doc-tooling allows this, but a possible compromise between my desire to focus on the abstract description and @petr-muller's desire to give folks a peek under the hood would be to say that everything in the above oc adm upgrade output is sourced from ClusterVersion, and link folks over to the API docs where they can learn about the structure of status.availableUpdates on their own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I'm leaning towards keeping the two examples because I think they convey the oc adm upgrade output is sourced from ClusterVersion idea on an example which I believe works slightly better than abstract descriptions, but I could live with what Trevor proposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with a separate "how does oc adm upgrade work?" with the paired examples that demonstrate that currently it's just a pretty-printer for ClusterVersion status. That is interesting information for folks who think the command is too magical. I just don't think that delving into that implementation belongs in this Evaluation of update availability section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should discourage folks to run commands like oc get clusterversion version -o json | jq '.status.availableUpdates' unless there is no way they can get some information. If the information can retrieved through oc command then we do not need to talk about directly querying clusterversion . Because we do not users modifying clusterversion or any underlying resources. If they get in to habit of doing that it is risky for multiple reasons. The UX is hard. They need to parse information which might not be useful for them. They accidentally modify a resource which might cause issues because QE does not test by directly modifying or querying the resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should discourage folks to run commands like ... unless there is no way they can get some information.
It's hard to make people understand how upgrade works without actually showing them the guts of ClusterVersion. Context is important. These examples are presented in the context of "run command to see internals / how things work", not "run command to get the list of available updates".
I just don't think that delving into that implementation belongs in this Evaluation of update availability section
I agree with this. My synthesis of all these opinions is that we could have a short section on the ClusterVersion object itself: its existence, the fact that you should never modify it directly, that CVO operates over it and that oc adm upgrade pretty prints it. Then we could clean the other sections from mentioning the resource.
|
🤖 Updated build preview is available at: Build log: https://circleci.com/gh/ocpdocs-previewbot/openshift-docs/28307 |
|
@skopacz1: This pull request references OSDOCS-6630 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@skopacz1: No Jira issue is referenced in the title of this pull request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
| Message: Nodes with substantial numbers of containers and CPU contention may not reconcile machine configuration https://bugzilla.redhat.com/show_bug.cgi?id=2111817#c22 | ||
| ---- | ||
|
|
||
| // to do: determine whether the rest of the lines in this module should still be included, since this is pretty in-depth even for this sort of descriptive doc, according to Trevor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should discourage folks to run commands like oc get clusterversion version -o json | jq '.status.availableUpdates' unless there is no way they can get some information. If the information can retrieved through oc command then we do not need to talk about directly querying clusterversion . Because we do not users modifying clusterversion or any underlying resources. If they get in to habit of doing that it is risky for multiple reasons. The UX is hard. They need to parse information which might not be useful for them. They accidentally modify a resource which might cause issues because QE does not test by directly modifying or querying the resources.
976974d to
9be1f4d
Compare
| One of the resources that the Cluster Version Operator (CVO) monitors is `ClusterVersion`. | ||
|
|
||
| `ClusterVersion` is a custom resource object that contains information relating to the cluster's version, such as the current and desired versions of the cluster. | ||
| When the CVO observes that the desired version does not match the current version in the `ClusterVersion` resource, it attempts to initiate an update to reconcile the cluster with this new desired state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we de-emphasize updates here with something like:
The CVO continually reconciles the cluster with the target state declared in ClusterVersion
spec. When the desired release differs from the current one, that reconciliation updates the cluster.
to make it clear that "updating or not?" is a subset of the reconciliation the CVO is always doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading this now (with more coffee in my system): is sentence 1 meant to cover "reconciliation in general" and sentence 2 meant to cover "updates as a subset of that reconciliation"?
| When the CVO observes that the desired version does not match the current version in the `ClusterVersion` resource, it attempts to initiate an update to reconcile the cluster with this new desired state. | ||
|
|
||
|
|
||
| //to-do: this might be heading overload, consider deleting this heading if the context switch from the previous paragraph to this content is smooth enough to not require one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one way to structure would be a generic "consume spec, reconcile, report in status" to underline how we match the usual Kubernetes pattern. Another way to structure would be to have a few sections:
- Reconciling the currently accepted target release, which effectively consumes
status.desiredand reports viaProgressing,Failing, etc. - Providing next-hop advice, which consumes
spec.upstreamandchanneland reports instatus.availableUpdatesandconditionalUpdatesandRetrievedUpdates.- Also in this line, we're consuming ClusterOperator
Upgradeableand producing ClusterVersionUpgradeable.
- Also in this line, we're consuming ClusterOperator
- Accepting a proposed next hop, which consumes
spec.desiredUpdate,status.availableUpdates,conditionalUpdates,Upgradeable, and release signatures, and reports instatus.desiredandRetrievePayload.
Each of those touches up against admin activity. During an update, an admin will bump up against all of those controller loops. Outside of updates, admins will mostly care if there are issues reconciling the currently accepted target release, until they start planning and preparing for their next round of updates.
I'm personally agnostic about whether it's easier to explain ClusterVersion as a generic Kube spec/status resource that happens to be about cluster reconciliation and updates, or if it's easier to explain ClusterVersion as interacting with a series of controllers broken down by use-case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the purpose of the PR is to tie up loose ends, I'm thinking this new loose end is worth some deliberation before it's implemented. I like the structure as it is now, so if you're fine with it as well, I think I'll save this feedback for a v3 of this doc
| The CVO continuously evaluates its cluster characteristics against the conditional risk information for each conditional update. If the CVO finds that the cluster matches the criteria, the CVO stores this information in the `conditionalUpdates` field of its `ClusterVersion` resource. | ||
| If the CVO finds that the cluster does not match the risks of an update, or that there are no risks associated with the update, it stores the target version in the `availableUpdates` field of its `ClusterVersion` resource. | ||
|
|
||
| The user interface, either the web console or the OpenShift CLI (`oc`), presents this information in sectioned headings to the administrator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to link out from this concept section to "here's the docs for driving those interfaces to consume this infomation"? Or do only do procedure -> context links, and not context -> proceedure links?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's tricky, do you mean linking to the CLI and web console update procedures, where there's a step or two showing how to view the available updates?
I can't link inline in this module, it would have to be in an additional resources list at the end of the section. And if you mean to link to these pages, the context might be lost by the time they get to the additional resources section and see links to update procedures. Maybe I can preface the links with "To learn more about viewing available updates, see the following....".
9be1f4d to
b27f9f5
Compare
|
LGTM |
| One of the resources that the Cluster Version Operator (CVO) monitors is `ClusterVersion`. | ||
|
|
||
| The `ClusterVersion` custom resource object is the primary interface for managing the CVO. | ||
| Cluster administrators and other controllers can declare their desired state through `ClusterVersion` `spec` and observe how the CVO is delivering those requests in `status`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly we want to communicate to users through this paragraph?
I think I understand the intent behind this paragraph, but I do not think we are communicating it in a way which is easy to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern is with the text that says "cluster administrators can declare their desired state through clusterversion. @wking WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence is just pointing out that ClusterVersion follows Kubernetes' usual spec/status pattern. I think it's worth pointing out that the intended data flows are:
- Admin desires -> ClusterVersion
specdeclarations -> CVO attempts to deliver the desired state. - CVO has opinions on current state and progress -> ClusterVersion
statusreporting -> admin clarity on current situation.
but I'm open to rephrasing if we can express those two directions of data flow more clearly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is my suggested text.
OpenShift components and administrators can communicate/interact with CVO through ClusterVersion object. The desired CVO state should be declared through the Clusterversion object and the current CVO state can be seen through status of the ClusterVersion object.
Note: We do not suggest users to directly modify the ClusterVersion object. They should use the standard interfaces e.g. CLI and web console to declare their desired update etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with that suggested text, I can implement it if there's no opposition to it.
|
@shellyyang1989 could you PTAL when you have a chance? Thanks! |
|
LGTM |
|
/label peer-review-needed |
|
/label merge-review-needed |
jldohmann
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
/cherrypick enterprise-4.14 |
|
/cherrypick enterprise-4.13 |
|
/cherrypick enterprise-4.12 |
|
/cherrypick enterprise-4.11 |
|
@jldohmann: new pull request created: #66397 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@jldohmann: #64077 failed to apply on top of branch "enterprise-4.13": DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@jldohmann: #64077 failed to apply on top of branch "enterprise-4.12": DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@jldohmann: #64077 failed to apply on top of branch "enterprise-4.11": DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/cherrypick enterprise-4.13 |
|
@jldohmann: #64077 failed to apply on top of branch "enterprise-4.13": DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@skopacz1 it looks like all the auto CPs to every branch but 4.14 failed, so you'll need to manually CP. Please lmk if you have any questions and feel free to ping me once those CPs are up and i'll merge them 🙂 thanks! |
[enterprise-4.13] Manual CP of #64077
[enterprise-4.12] Manual CP of #64077
[enterprise-4.11] Manual CP of #64077
OSDOCS-6630
Versions: 4.11+
This PR further refines the documentation for how cluster updates work, since there were a lot of feedback items that were deferred when the original documentation was implemented.
QE review:
Preview: How cluster updates work