-
Notifications
You must be signed in to change notification settings - Fork 213
Bug 1826115: pkg/cvo/status: Always set status.desired to match current CVO #353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1826115: pkg/cvo/status: Always set status.desired to match current CVO #353
Conversation
|
@wking: This pull request references Bugzilla bug 1826115, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
244ccfc to
3fd35dc
Compare
|
/hold I'm going to have to rework this a bit more, because blindly dropping |
|
/assign @smarterclayton I'm still working on updating the unit tests for this, but assigning to Clayton for review, because in 961873d (#82) he added a bunch of: // Prefers the payload version over the operator's version (although in general they will remain in synccomments, and I'm reversing that here. |
As described in [1], bumping status.desired when a new spec.desiredUpdate is set leads to trouble like: $ oc adm upgrade --to 4.3.13 Updating to 4.3.13 $ oc get -o json clusterversion version | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + " " + .status + " " + .message' | sort 2020-04-20T20:57:12Z RetrievedUpdates True 2020-04-20T21:23:40Z Available True Done applying 4.3.10 2020-04-20T22:01:55Z Upgradeable False Cluster operator kube-apiserver cannot be upgraded: DefaultSecurityContextConstraintsUpgradeable: Default SecurityContextConstraints object(s) have mutated [privileged] 2020-04-20T22:16:40Z Failing True Precondition "ClusterVersionUpgradeable" failed because of "DefaultSecurityContextConstraints_Mutated": Cluster operator kube-apiserver cannot be upgraded: DefaultSecurityContextConstraintsUpgradeable: Default SecurityContextConstraints object(s) have mutated [privileged] 2020-04-20T22:16:40Z Progressing True Unable to apply 4.3.13: it may not be safe to apply this update $ oc get -o json clusterversion version | jq -r '.status.desired.version' 4.3.13 $ oc adm upgrade --to=4.3.13 --force info: Cluster is already at version 4.3.13 $ oc adm upgrade --clear Cleared the update field, still at 4.3.13 $ oc get -o json clusterversion version | jq -r '.status.desired.version' 4.3.10 Where the "already at..." and "still at ..." messages are relying on the status.desired semantics of [2]: > desired is the version that the cluster is reconciling towards. Ideally the property would have been called 'current' or some such, but it's too late to change it now. When the user sets spec.desiredUpdate, they are making their intention clear (and bumping history at this point is appropriate, because we need *somewhere* to store the 'verified' history entry). But to match the "reconciling towards" semantics, this commit shifts the actual status.desired bump so it happens right after the new CVO comes up (with the new currentVersion). This change reverses the earlier: Prefers the payload version over the operator's version... comments from 961873d (sync: Do config syncing in the background, 2019-01-11, openshift#82). It also sets the stage for a world in which [3] has been fixed and the CVO continues to apply the current release's manifests while vetting a new target's preconditions in parallel. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1826115 [2]: https://github.com/openshift/api/blob/0f159fee64dbf711d40dac3fa2ec8b563a2aaca8/config/v1/types_cluster_version.go#L82-L87 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1822752
3fd35dc to
e9c53b1
Compare
|
Ok, 3fd35dc -> e9c53b1 should get unit tests passing in this new world. But it's a pretty large change to the tests which reverses the earlier choice of explicitly preferring the sync worker's status over the operator version. So still would be good to have @smarterclayton sign off on the new direction. |
|
/hold cancel |
|
Hmm, still need to update the integration tests. |
| // TODO: prune Z versions over transitions to Y versions, keep initial installed version | ||
| pruneStatusHistory(config, 50) | ||
|
|
||
| config.Status.Desired = desired |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still not clear why this does not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syncStatus is feeding mergeOperatorHistory status.Actual, which diverges from optr.currentVersion() when an update has been requested and we're working through preconditions. With this commit, we're always setting Desired to currentVersion().
LalatenduMohanty
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
@LalatenduMohanty: changing LGTM is restricted to collaborators DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/lgtm cancel |
|
@LalatenduMohanty: changing LGTM is restricted to collaborators DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@wking: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
@wking: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
|
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
|
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
|
@wking: PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@openshift-bot: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@wking: This pull request references Bugzilla bug 1826115. The bug has been updated to no longer refer to the pull request using the external bug tracker. All external bug links have been closed. The bug has been moved to the NEW state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
As described in the bug, bumping
status.desiredwhen a newspec.desiredUpdateis set leads to trouble like:Where the "already at..." and "still at ..." messages are relying on the
status.desiredsemantics:When the user sets
spec.desiredUpdate, they are making their intention clear (and bumping history at this point is appropriate, because we need somewhere to store theverifiedhistory entry). But to match the "reconciling towards" semantics, this commit shifts the actualstatus.desiredbump so it happens right after the new CVO comes up (with the newcurrentVersion).