OSDOCS#6630: second iteration of how updates work doc #64077

skopacz1 · 2023-08-30T13:13:15Z

Versions: 4.11+

This PR further refines the documentation for how cluster updates work, since there were a lot of feedback items that were deferred when the original documentation was implemented.

QE review:

QE has approved this change.

Preview: How cluster updates work

openshift-ci-robot · 2023-08-30T13:13:18Z

@skopacz1: This pull request references OSDOCS-6630 which is a valid jira issue.

Details

In response to this:

OSDOCS-6630

Versions: 4.10+

This PR further refines the documentation for how cluster updates work, since there were a lot of feedback items that were deferred when the original documentation was implemented.

QE review:

QE has approved this change.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

skopacz1 · 2023-08-30T13:13:38Z

cc. @wking @petr-muller @LalatenduMohanty

modules/update-process-workflow.adoc

skopacz1 · 2023-08-30T13:16:18Z

modules/update-process-workflow.adoc

 Certain conditions can prevent updates from proceeding.
 These conditions are either determined by the CVO itself, or reported by individual cluster Operators that detect some details about the cluster that the Operator considers problematic for the update.

+// to do: potentially add an example of a precondition to the bullet above.


This is in reference to this discussion thread.

I think coming up with a specific example would be helpful, I'll try to provide one

one possible option would be LowDesiredVersion.

Since you said that LowDesiredVersion only applies to 4.14 and later: unless there's another good example that applies to earlier versions, I think I'll save the inclusion of this specific example for another PR so I can merge this current PR into all live versions of the doc

skopacz1 · 2023-08-30T13:17:36Z

modules/update-manifest-application.adoc

 While the additional update actions take place, these cluster Operators temporarily set their `Progressing` condition to `True`.
 ====

+// to do: potentially reword the note above to clarify that specific resources are being applied at one time, and not necessarily all the resources for that component.


This is in reference to this discussion thread.

I think the point about making the doc use "Cluster Operator" (an OpenShift component == piece of software with a bunch of manifests) and ClusterOperator (is a cluster resource created with a single manifest) consistently (and tuning the descriptions around) could be a useful improvement.

modules/update-manifest-application.adoc

skopacz1 · 2023-08-30T13:18:52Z

modules/update-evaluate-availability.adoc

  Message: Nodes with substantial numbers of containers and CPU contention may not reconcile machine configuration https://bugzilla.redhat.com/show_bug.cgi?id=2111817#c22
 ----

+// to do: determine whether the rest of the lines in this module should still be included, since this is pretty in-depth even for this sort of descriptive doc, according to Trevor.


This is in reference to this discussion thread.

I'm not sure if doc-tooling allows this, but a possible compromise between my desire to focus on the abstract description and @petr-muller's desire to give folks a peek under the hood would be to say that everything in the above oc adm upgrade output is sourced from ClusterVersion, and link folks over to the API docs where they can learn about the structure of status.availableUpdates on their own.

Personally I'm leaning towards keeping the two examples because I think they convey the oc adm upgrade output is sourced from ClusterVersion idea on an example which I believe works slightly better than abstract descriptions, but I could live with what Trevor proposes.

I'm ok with a separate "how does oc adm upgrade work?" with the paired examples that demonstrate that currently it's just a pretty-printer for ClusterVersion status. That is interesting information for folks who think the command is too magical. I just don't think that delving into that implementation belongs in this Evaluation of update availability section.

We should discourage folks to run commands like oc get clusterversion version -o json | jq '.status.availableUpdates' unless there is no way they can get some information. If the information can retrieved through oc command then we do not need to talk about directly querying clusterversion . Because we do not users modifying clusterversion or any underlying resources. If they get in to habit of doing that it is risky for multiple reasons. The UX is hard. They need to parse information which might not be useful for them. They accidentally modify a resource which might cause issues because QE does not test by directly modifying or querying the resources.

We should discourage folks to run commands like ... unless there is no way they can get some information.

It's hard to make people understand how upgrade works without actually showing them the guts of ClusterVersion. Context is important. These examples are presented in the context of "run command to see internals / how things work", not "run command to get the list of available updates".

I just don't think that delving into that implementation belongs in this Evaluation of update availability section

I agree with this. My synthesis of all these opinions is that we could have a short section on the ClusterVersion object itself: its existence, the fact that you should never modify it directly, that CVO operates over it and that oc adm upgrade pretty prints it. Then we could clean the other sections from mentioning the resource.

ocpdocs-previewbot · 2023-08-30T13:36:11Z

🤖 Updated build preview is available at:
https://64077--docspreview.netlify.app

Build log: https://circleci.com/gh/ocpdocs-previewbot/openshift-docs/28307

openshift-ci-robot · 2023-08-30T14:41:06Z

@skopacz1: This pull request references OSDOCS-6630 which is a valid jira issue.

Details

In response to this:

OSDOCS-6630

Versions: 4.10+

This PR further refines the documentation for how cluster updates work, since there were a lot of feedback items that were deferred when the original documentation was implemented.

QE review:

QE has approved this change.

Preview: How cluster updates work

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2023-08-30T14:41:16Z

@skopacz1: No Jira issue is referenced in the title of this pull request.
To reference a jira issue, add 'XYZ-NNN:' to the title of this pull request and request another refresh with /jira refresh.

Details

In response to this:

OSDOCS-6630

Versions: 4.10+

This PR further refines the documentation for how cluster updates work, since there were a lot of feedback items that were deferred when the original documentation was implemented.

QE review:

QE has approved this change.

Preview: How cluster updates work

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

LalatenduMohanty · 2023-09-06T23:00:16Z

modules/update-evaluate-availability.adoc

  Message: Nodes with substantial numbers of containers and CPU contention may not reconcile machine configuration https://bugzilla.redhat.com/show_bug.cgi?id=2111817#c22
 ----

+// to do: determine whether the rest of the lines in this module should still be included, since this is pretty in-depth even for this sort of descriptive doc, according to Trevor.


We should discourage folks to run commands like oc get clusterversion version -o json | jq '.status.availableUpdates' unless there is no way they can get some information. If the information can retrieved through oc command then we do not need to talk about directly querying clusterversion . Because we do not users modifying clusterversion or any underlying resources. If they get in to habit of doing that it is risky for multiple reasons. The UX is hard. They need to parse information which might not be useful for them. They accidentally modify a resource which might cause issues because QE does not test by directly modifying or querying the resources.

modules/update-cluster-version-object.adoc

wking · 2023-09-11T20:49:31Z

modules/update-cluster-version-object.adoc

+One of the resources that the Cluster Version Operator (CVO) monitors is `ClusterVersion`.
+
+`ClusterVersion` is a custom resource object that contains information relating to the cluster's version, such as the current and desired versions of the cluster.
+When the CVO observes that the desired version does not match the current version in the `ClusterVersion` resource, it attempts to initiate an update to reconcile the cluster with this new desired state.


can we de-emphasize updates here with something like:

The CVO continually reconciles the cluster with the target state declared in ClusterVersion spec. When the desired release differs from the current one, that reconciliation updates the cluster.

to make it clear that "updating or not?" is a subset of the reconciliation the CVO is always doing?

Reading this now (with more coffee in my system): is sentence 1 meant to cover "reconciliation in general" and sentence 2 meant to cover "updates as a subset of that reconciliation"?

wking · 2023-09-11T21:02:43Z

modules/update-cluster-version-object.adoc

+When the CVO observes that the desired version does not match the current version in the `ClusterVersion` resource, it attempts to initiate an update to reconcile the cluster with this new desired state.
+
+
+//to-do: this might be heading overload, consider deleting this heading if the context switch from the previous paragraph to this content is smooth enough to not require one.


one way to structure would be a generic "consume spec, reconcile, report in status" to underline how we match the usual Kubernetes pattern. Another way to structure would be to have a few sections:

Reconciling the currently accepted target release, which effectively consumes status.desired and reports via Progressing, Failing, etc.

Providing next-hop advice, which consumes spec.upstream and channel and reports in status.availableUpdates and conditionalUpdates and RetrievedUpdates.

Also in this line, we're consuming ClusterOperator Upgradeable and producing ClusterVersion Upgradeable.

Accepting a proposed next hop, which consumes spec.desiredUpdate, status.availableUpdates, conditionalUpdates, Upgradeable, and release signatures, and reports in status.desired and RetrievePayload.

Each of those touches up against admin activity. During an update, an admin will bump up against all of those controller loops. Outside of updates, admins will mostly care if there are issues reconciling the currently accepted target release, until they start planning and preparing for their next round of updates.

I'm personally agnostic about whether it's easier to explain ClusterVersion as a generic Kube spec/status resource that happens to be about cluster reconciliation and updates, or if it's easier to explain ClusterVersion as interacting with a series of controllers broken down by use-case.

Although the purpose of the PR is to tie up loose ends, I'm thinking this new loose end is worth some deliberation before it's implemented. I like the structure as it is now, so if you're fine with it as well, I think I'll save this feedback for a v3 of this doc

modules/update-cvo.adoc

wking · 2023-09-11T21:32:19Z

modules/update-evaluate-availability.adoc

 The CVO continuously evaluates its cluster characteristics against the conditional risk information for each conditional update. If the CVO finds that the cluster matches the criteria, the CVO stores this information in the `conditionalUpdates` field of its `ClusterVersion` resource.
 If the CVO finds that the cluster does not match the risks of an update, or that there are no risks associated with the update, it stores the target version in the `availableUpdates` field of its `ClusterVersion` resource.

 The user interface, either the web console or the OpenShift CLI (`oc`), presents this information in sectioned headings to the administrator.


do we want to link out from this concept section to "here's the docs for driving those interfaces to consume this infomation"? Or do only do procedure -> context links, and not context -> proceedure links?

It's tricky, do you mean linking to the CLI and web console update procedures, where there's a step or two showing how to view the available updates?

I can't link inline in this module, it would have to be in an additional resources list at the end of the section. And if you mean to link to these pages, the context might be lost by the time they get to the additional resources section and see links to update procedures. Maybe I can preface the links with "To learn more about viewing available updates, see the following....".

petr-muller · 2023-09-14T08:53:21Z

LGTM

LalatenduMohanty · 2023-09-14T17:43:49Z

modules/update-cluster-version-object.adoc

+One of the resources that the Cluster Version Operator (CVO) monitors is `ClusterVersion`.
+
+The `ClusterVersion` custom resource object is the primary interface for managing the CVO.
+Cluster administrators and other controllers can declare their desired state through `ClusterVersion` `spec` and observe how the CVO is delivering those requests in `status`.


What exactly we want to communicate to users through this paragraph?

I think I understand the intent behind this paragraph, but I do not think we are communicating it in a way which is easy to understand.

My main concern is with the text that says "cluster administrators can declare their desired state through clusterversion. @wking WDYT?

This sentence is just pointing out that ClusterVersion follows Kubernetes' usual spec/status pattern. I think it's worth pointing out that the intended data flows are:

Admin desires -> ClusterVersion spec declarations -> CVO attempts to deliver the desired state.

CVO has opinions on current state and progress -> ClusterVersion status reporting -> admin clarity on current situation.

but I'm open to rephrasing if we can express those two directions of data flow more clearly.

Here is my suggested text.

OpenShift components and administrators can communicate/interact with CVO through ClusterVersion object. The desired CVO state should be declared through the Clusterversion object and the current CVO state can be seen through status of the ClusterVersion object.

Note: We do not suggest users to directly modify the ClusterVersion object. They should use the standard interfaces e.g. CLI and web console to declare their desired update etc.

I'm happy with that suggested text, I can implement it if there's no opposition to it.

skopacz1 · 2023-10-10T14:47:36Z

@shellyyang1989 could you PTAL when you have a chance? Thanks!

shellyyang1989 · 2023-10-16T02:02:18Z

LGTM

skopacz1 · 2023-10-16T13:38:06Z

/label peer-review-needed

skopacz1 · 2023-10-17T14:58:28Z

/label merge-review-needed

jldohmann

LGTM

jldohmann · 2023-10-17T16:45:40Z

/cherrypick enterprise-4.14

jldohmann · 2023-10-17T16:45:44Z

/cherrypick enterprise-4.13

jldohmann · 2023-10-17T16:45:48Z

/cherrypick enterprise-4.12

jldohmann · 2023-10-17T16:45:53Z

/cherrypick enterprise-4.11

openshift-cherrypick-robot · 2023-10-17T16:46:28Z

@jldohmann: new pull request created: #66397

Details

In response to this:

/cherrypick enterprise-4.14

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-cherrypick-robot · 2023-10-17T16:46:28Z

@jldohmann: #64077 failed to apply on top of branch "enterprise-4.13":

Applying: OSDOCS-6630: second iteration of how updates work doc
Using index info to reconstruct a base tree...
M	updating/understanding_updates/how-updates-work.adoc
Falling back to patching base and 3-way merge...
Auto-merging updating/understanding_updates/how-updates-work.adoc
CONFLICT (content): Merge conflict in updating/understanding_updates/how-updates-work.adoc
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 OSDOCS-6630: second iteration of how updates work doc
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Details

In response to this:

/cherrypick enterprise-4.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-cherrypick-robot · 2023-10-17T16:46:31Z

@jldohmann: #64077 failed to apply on top of branch "enterprise-4.12":

Applying: OSDOCS-6630: second iteration of how updates work doc
Using index info to reconstruct a base tree...
M	modules/update-manifest-application.adoc
M	updating/understanding_updates/how-updates-work.adoc
Falling back to patching base and 3-way merge...
Auto-merging updating/understanding_updates/how-updates-work.adoc
CONFLICT (content): Merge conflict in updating/understanding_updates/how-updates-work.adoc
Auto-merging modules/update-manifest-application.adoc
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 OSDOCS-6630: second iteration of how updates work doc
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Details

In response to this:

/cherrypick enterprise-4.12

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-cherrypick-robot · 2023-10-17T16:46:36Z

@jldohmann: #64077 failed to apply on top of branch "enterprise-4.11":

Applying: OSDOCS-6630: second iteration of how updates work doc
Using index info to reconstruct a base tree...
M	modules/update-manifest-application.adoc
M	updating/understanding_updates/how-updates-work.adoc
Falling back to patching base and 3-way merge...
Auto-merging updating/understanding_updates/how-updates-work.adoc
CONFLICT (content): Merge conflict in updating/understanding_updates/how-updates-work.adoc
Auto-merging modules/update-manifest-application.adoc
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 OSDOCS-6630: second iteration of how updates work doc
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Details

In response to this:

/cherrypick enterprise-4.11

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jldohmann · 2023-10-17T17:00:04Z

/cherrypick enterprise-4.13

openshift-cherrypick-robot · 2023-10-17T17:00:49Z

@jldohmann: #64077 failed to apply on top of branch "enterprise-4.13":

Applying: OSDOCS-6630: second iteration of how updates work doc
Using index info to reconstruct a base tree...
M	updating/understanding_updates/how-updates-work.adoc
Falling back to patching base and 3-way merge...
Auto-merging updating/understanding_updates/how-updates-work.adoc
CONFLICT (content): Merge conflict in updating/understanding_updates/how-updates-work.adoc
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 OSDOCS-6630: second iteration of how updates work doc
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Details

In response to this:

/cherrypick enterprise-4.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jldohmann · 2023-10-17T17:06:23Z

@skopacz1 it looks like all the auto CPs to every branch but 4.14 failed, so you'll need to manually CP. Please lmk if you have any questions and feel free to ping me once those CPs are up and i'll merge them 🙂 thanks!

[enterprise-4.13] Manual CP of #64077

[enterprise-4.12] Manual CP of #64077

[enterprise-4.11] Manual CP of #64077

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Aug 30, 2023

openshift-ci bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Aug 30, 2023

skopacz1 commented Aug 30, 2023

View reviewed changes

modules/update-process-workflow.adoc Show resolved Hide resolved

skopacz1 commented Aug 30, 2023

View reviewed changes

modules/update-manifest-application.adoc Show resolved Hide resolved

skopacz1 commented Aug 30, 2023

View reviewed changes

skopacz1 changed the title ~~OSDOCS-6630: second iteration of how updates work doc~~ OSDOCS#6630: second iteration of how updates work doc Aug 30, 2023

openshift-ci-robot removed the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Aug 30, 2023

LalatenduMohanty suggested changes Sep 6, 2023

View reviewed changes

skopacz1 force-pushed the OSDOCS-6630 branch from 976974d to 9be1f4d Compare September 11, 2023 19:24

openshift-ci bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Sep 11, 2023

wking reviewed Sep 11, 2023

View reviewed changes

modules/update-cluster-version-object.adoc Outdated Show resolved Hide resolved

wking reviewed Sep 11, 2023

View reviewed changes

modules/update-cvo.adoc Show resolved Hide resolved

wking reviewed Sep 11, 2023

View reviewed changes

skopacz1 force-pushed the OSDOCS-6630 branch from 9be1f4d to b27f9f5 Compare September 12, 2023 16:01

LalatenduMohanty suggested changes Sep 14, 2023

View reviewed changes

skopacz1 force-pushed the OSDOCS-6630 branch from b27f9f5 to 04129cb Compare October 10, 2023 14:44

kelbrown20 added peer-review-done Signifies that the peer review team has reviewed this PR and removed peer-review-in-progress Signifies that the peer review team is reviewing this PR peer-review-needed Signifies that the peer review team needs to review this PR labels Oct 16, 2023

skopacz1 force-pushed the OSDOCS-6630 branch from 04129cb to 8030660 Compare October 17, 2023 14:49

OSDOCS-6630: second iteration of how updates work doc

904d9ad

skopacz1 force-pushed the OSDOCS-6630 branch from 8030660 to 904d9ad Compare October 17, 2023 14:50

openshift-ci bot added the merge-review-needed Signifies that the merge review team needs to review this PR label Oct 17, 2023

jldohmann approved these changes Oct 17, 2023

View reviewed changes

jldohmann merged commit 92e47a3 into openshift:main Oct 17, 2023

openshift-cherrypick-robot mentioned this pull request Oct 17, 2023

[enterprise-4.14] OSDOCS#6630: second iteration of how updates work doc #66397

Merged

jldohmann removed the merge-review-needed Signifies that the merge review team needs to review this PR label Oct 17, 2023

This was referenced Oct 17, 2023

[enterprise-4.13] Manual CP of #64077 #66412

Merged

[enterprise-4.12] Manual CP of #64077 #66414

Merged

[enterprise-4.11] Manual CP of #64077 #66417

Merged

jldohmann added a commit that referenced this pull request Oct 17, 2023

Merge pull request #66412 from skopacz1/OSDOCS-6630_4.13

aecf5ce

[enterprise-4.13] Manual CP of #64077

jldohmann added a commit that referenced this pull request Oct 17, 2023

Merge pull request #66414 from skopacz1/OSDOCS-6630_4.12

c5bfe0c

[enterprise-4.12] Manual CP of #64077

jldohmann added a commit that referenced this pull request Oct 17, 2023

Merge pull request #66417 from skopacz1/OSDOCS-6630_4.11

8204e4b

[enterprise-4.11] Manual CP of #64077

skopacz1 deleted the OSDOCS-6630 branch October 31, 2023 13:46

		When the CVO observes that the desired version does not match the current version in the `ClusterVersion` resource, it attempts to initiate an update to reconcile the cluster with this new desired state.


		//to-do: this might be heading overload, consider deleting this heading if the context switch from the previous paragraph to this content is smooth enough to not require one.

OSDOCS#6630: second iteration of how updates work doc #64077

OSDOCS#6630: second iteration of how updates work doc #64077

Uh oh!

Conversation

skopacz1 commented Aug 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Aug 30, 2023 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skopacz1 commented Aug 30, 2023

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ocpdocs-previewbot commented Aug 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Aug 30, 2023 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Aug 30, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

petr-muller commented Sep 14, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skopacz1 commented Oct 10, 2023

Uh oh!

shellyyang1989 commented Oct 16, 2023

Uh oh!

skopacz1 commented Oct 16, 2023

Uh oh!

skopacz1 commented Aug 30, 2023 •

edited

Loading

openshift-ci-robot commented Aug 30, 2023 •

edited by openshift-ci bot

Loading

ocpdocs-previewbot commented Aug 30, 2023 •

edited

Loading

openshift-ci-robot commented Aug 30, 2023 •

edited by openshift-ci bot

Loading