Skip to content

Conversation

@sagidlow
Copy link
Contributor

@sagidlow sagidlow commented Aug 11, 2021

Applies to 4.7+
QE ack required.
SME ack required.
Preview Link: Understanding CVO conditions
JIRA Link: https://issues.redhat.com/browse/OSDOCS-2428

@netlify
Copy link

netlify bot commented Aug 11, 2021

✔️ Deploy Preview for osdocs ready!

🔨 Explore the source changes: 5c377fe

🔍 Inspect the deploy log: https://app.netlify.com/sites/osdocs/deploys/61a63a50ed7a0e0008439369

😎 Browse the preview: https://deploy-preview-35453--osdocs.netlify.app/openshift-enterprise/latest/updating/understanding-the-update-service

@openshift-ci openshift-ci bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Aug 11, 2021
@sagidlow sagidlow added branch/enterprise-4.7 branch/enterprise-4.8 peer-review-needed Signifies that the peer review team needs to review this PR labels Aug 11, 2021
@sagidlow sagidlow modified the milestones: Future Release, Next Release Aug 11, 2021
@sagidlow
Copy link
Contributor Author

Requesting review from @wking, @sdodson, @jottofar

@sagidlow sagidlow requested review from jottofar and sdodson August 17, 2021 20:51
Copy link
Member

@wking wking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a bunch of nits inline, but overall looks good to me, and will be nice to have these in our official docs :). Apologies for the nits and stale bits that you're inheriting from the CVO's local docs... Among which, I don't think this current list is complete, but haven't run a full audit of our current code. We could do that audit, but it would take some time, and I dunno if we need to be complete before landing something is useful. I'm also not clear if this is really an openshift-docs thing, or if we'd be better off landing this in runbooks, where we can link it from the alerts like CannotRetrieveUpdates that will go off when cluster is sad. Recent CVO work has overhauled those alerts, and perhaps things like this are now sufficiently self-describing that we don't need to cover them independently in docs? Or maybe there are some subset of especially-confusing types/reasons that are worth calling out in docs, beyond what can be covered by runbooks and alerts?

@openshift-ci openshift-ci bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 10, 2021
@sagidlow
Copy link
Contributor Author

I left a bunch of nits inline, but overall looks good to me, and will be nice to have these in our official docs :). Apologies for the nits and stale bits that you're inheriting from the CVO's local docs... Among which, I don't think this current list is complete, but haven't run a full audit of our current code. We could do that audit, but it would take some time, and I dunno if we need to be complete before landing something is useful. I'm also not clear if this is really an openshift-docs thing, or if we'd be better off landing this in runbooks, where we can link it from the alerts like CannotRetrieveUpdates that will go off when cluster is sad. Recent CVO work has overhauled those alerts, and perhaps things like this are now sufficiently self-describing that we don't need to cover them independently in docs? Or maybe there are some subset of especially-confusing types/reasons that are worth calling out in docs, beyond what can be covered by runbooks and alerts?

@wking, the content drastically changed since the last time you looked at it. I pulled out all the stale information, and mostly did it based on these two links:

Would love to get your new thoughts, if you have time :)

@sdodson
Copy link
Member

sdodson commented Oct 19, 2021

LGTM, lets get this in.

@sagidlow
Copy link
Contributor Author

@sdodson Who is the QE that should review this?

@sdodson
Copy link
Member

sdodson commented Oct 20, 2021

@jianlinliu is QE for OTA team, I'm unsure if they cover CVO / ClusterOperator definitions as well.

@sagidlow
Copy link
Contributor Author

@jianlinliu Could you review this from the QE side? TY

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/cluster-version operator/Cluster Version Operator (CVO)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do these conditions appear?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sdodson - If I'm not mistaken the CVO conditions appear during an upgrade in the Log right? I'm not 100% sure but thought you might know :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for delay in getting to this, they are reflected in the clusterversion object, which can be used programmatically, parts of which are expressed in the Cluster Settings page during upgrades.
oc get clusterversion -o yaml

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a Degraded state represents persistent observation of a condition
Is condition the right word here? It seems odd that the Degraded condition reports on a condition. Maybe problem or error or such??

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just drop "of a condition" all together?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what missing condition means. The CV is not reporting any Upgradeable condition?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just saying that in absence of Upgradeable=False it's assumed to be Upgradeable. Some operators may set Unknown or not set this condition while initializing, though hopefully that's brief.

@mburke5678
Copy link
Contributor

@sagidlow A few more nits. Otherwise LGTM

@mburke5678 mburke5678 added peer-review-done Signifies that the peer review team has reviewed this PR and removed peer-review-needed Signifies that the peer review team needs to review this PR labels Oct 20, 2021
@jianlinliu
Copy link

The ClusterOperator is a custom resource object which holds the current state of the Operator. The ClusterOperator conveys the state to the rest of the cluster.

The topic is talking about CVO conditions, but the main description is talking about "CO".

And go through the whole PR, sounds like there is no much description about the relationship and dependency between the status of COs and the conditions of CVO, then how it affect the final upgrade operations.

And per my knowledge, CVO collects and summarizes the status of all COs, then determinate CVO conditions, sometimes, even CVO also set some conditions by itself. For example:

            "status": {
                "availableUpdates": null,
                "conditions": [
                    {
                        "lastTransitionTime": "2021-10-21T00:46:18Z",
                        "message": "Done applying 4.8.0-0.nightly-2021-10-20-155651",
                        "status": "True",
                        "type": "Available"
                    },
                    {
                        "lastTransitionTime": "2021-10-21T00:46:00Z",
                        "status": "False",
                        "type": "Failing"
                    },
                    {
                        "lastTransitionTime": "2021-10-21T00:46:18Z",
                        "message": "Cluster version is 4.8.0-0.nightly-2021-10-20-155651",
                        "status": "False",
                        "type": "Progressing"
                    },
                    {
                        "lastTransitionTime": "2021-10-21T00:11:57Z",
                        "message": "Unable to retrieve available updates: currently reconciling cluster version 4.8.0-0.nightly-2021-10-20-155651 not found in the \"stable-4.8\" channel",
                        "reason": "VersionNotFound",
                        "status": "False",
                        "type": "RetrievedUpdates"
                    },
                    {
                        "lastTransitionTime": "2021-10-21T00:12:28Z",
                        "message": "Kubernetes 1.22 and therefore OpenShift 4.9 remove several APIs which require admin consideration. Please see\nthe knowledge article https://access.redhat.com/articles/6329921 for details and instructions.\n",
                        "reason": "AdminAckRequired",
                        "status": "False",
                        "type": "Upgradeable"
                    }
                ],

In the above example, "Upgradeable=False" and "RetrievedUpdates=False" are set by CVO itself. BTW, this PR is also not mentioned RetrievedUpdates condition.

So this PR sounds like a bit confused for user.

@openshift-ci openshift-ci bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 30, 2021
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Feb 28, 2022
@openshift-ci
Copy link

openshift-ci bot commented Feb 28, 2022

@sagidlow: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bergerhoffer
Copy link
Contributor

The branch/enterprise-4.11 label has been added to this PR.

This is because your PR targets the main branch and is labeled for enterprise-4.10. And any PR going into main must also target the latest version branch (enterprise-4.11).

If the update in your PR does NOT apply to version 4.11 onward, please retarget this PR to go directly into the appropriate version branch or branches (enterprise-4.x) instead of main.

@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 13, 2022
@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this May 13, 2022
@openshift-ci
Copy link

openshift-ci bot commented May 13, 2022

@openshift-bot: Closed this PR.

Details

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

branch/enterprise-4.7 branch/enterprise-4.8 branch/enterprise-4.9 branch/enterprise-4.10 branch/enterprise-4.11 lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. peer-review-done Signifies that the peer review team has reviewed this PR size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants