-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Bug 2028930: test/extended/operators: Allow early KubeletMinorVersionUnsupportedNextUpgrade #26531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 2028930: test/extended/operators: Allow early KubeletMinorVersionUnsupportedNextUpgrade #26531
Conversation
0f2c107 to
4179fb3
Compare
test/extended/operators/operators.go
Outdated
| if cond.Get("status").String() != string(expected) { | ||
| reason := cond.Get("reason").String() | ||
| if conditionType == configv1.OperatorUpgradeable && (name == "kube-storage-version-migrator" || // https://bugzilla.redhat.com/show_bug.cgi?id=1928141 | ||
| name == "openshift-controller-manager" || // https://bugzilla.redhat.com/show_bug.cgi?id=1948011 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be removed now, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That bug shipped with 4.8, and was not backported. We have chained-update tests that install 4.7 and update through 4.10, and attempt all the hops in a single test-suite invocation, so we can't drop this from the 4.10 suite unless we backport the fix to 4.7.z. We'll be able to drop it from 4.11, unless we extend the number of hops in our multi-hop chain jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update the code comment to tell me which openshift release I can remove this in or someone (like me) will "fix" it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ca2b8ad adds a comment like what you request.
test/extended/operators/operators.go
Outdated
| if conditionType == configv1.OperatorUpgradeable && (name == "kube-storage-version-migrator" || // https://bugzilla.redhat.com/show_bug.cgi?id=1928141 | ||
| name == "openshift-controller-manager" || // https://bugzilla.redhat.com/show_bug.cgi?id=1948011 | ||
| name == "service-ca") { // https://bugzilla.redhat.com/show_bug.cgi?id=1948012 | ||
| name == "service-ca") || // https://bugzilla.redhat.com/show_bug.cgi?id=1948012 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be removed now, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shipped in 4.8 with no backports, so we need to keep it for the same multi-hop reason discussed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update the code comment to tell me which openshift release I can remove this in or someone (like me) will "fix" it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ca2b8ad adds a comment like what you request.
test/extended/operators/operators.go
Outdated
| } | ||
| if cond.Get("status").String() != string(expected) { | ||
| reason := cond.Get("reason").String() | ||
| if conditionType == configv1.OperatorUpgradeable && (name == "kube-storage-version-migrator" || // https://bugzilla.redhat.com/show_bug.cgi?id=1928141 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you get this fix merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just went MODIFIED in the past few hours 🎉. But we'll need to keep this line for many 4.y to avoid issues with chained-update jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just went MODIFIED in the past few hours tada. But we'll need to keep this line for many 4.y to avoid issues with chained-update jobs.
update the comment to tell me which openshift release I can remove this in .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ca2b8ad adds a comment like what you request.
test/extended/operators/operators.go
Outdated
| name == "openshift-controller-manager" || // https://bugzilla.redhat.com/show_bug.cgi?id=1948011 | ||
| name == "service-ca") { // https://bugzilla.redhat.com/show_bug.cgi?id=1948012 | ||
| name == "service-ca") || // https://bugzilla.redhat.com/show_bug.cgi?id=1948012 | ||
| (name == "kube-apiserver" && reason == "KubeletMinorVersion_KubeletMinorVersionUnsupportedNextUpgrade") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to comment why this is an acceptable upgradeable=false condition or future me will not remember.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit messages and git blame ... for the win ;). If you need something inline, tell me how much of that commit message you want copied into the file, or suggest the alternative text you would like to see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I'd like something inline. much easier to review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would you like the whole commit message inline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the whole commit message is way too wordy for an inline comment, but you haven't clarified how you'd like it compacted, and obviously you think "folks who care can blame to get the details" is too compact. Rather than guessing at a middle ground, I've just inlined the relevant motivation from the commit message in a comment with c72bb0c. Let me know if you have particular text you'd like to see instead.
David asked for comments pointing out that we cannot remove these exceptions yet [1]. For example, the fixed-in-4.8 bugs still impact the 4.7 -> ... -> 4.10 jobs [2], so we'd need to wait until we were no longer running 4.10 jobs with this suite code, before we can drop those exceptions from the master test suite. And if we added 4.7 -> ... -> 4.11 jobs, we'd need to wait until 4.11 forked off. But even if these comments go stale, they'll at least remind folks about some of the issues they should think through before dropping the exceptions. [1]: openshift#26531 (comment) [2]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-informing#release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci
…xtUpgrade Before running the post-update unpause monitor. Without this, 4.8->4.9->4.10 jobs are updating successfully to 4.10, unpausing compute, running the post-unpause compute-settling monitor, and failing on [1,2]: : [sig-arch][Early] Managed cluster should start all core operators [Skipped:Disconnected] [Suite:openshift/conformance/parallel] 0s fail [github.com/onsi/[email protected]+incompatible/internal/leafnodes/runner.go:113]: Oct 17 23:28:57.284: Some cluster operators are not ready: kube-apiserver (Upgradeable=False KubeletMinorVersion_KubeletMinorVersionUnsupportedNextUpgrade: KubeletMinorVersionUpgradeable: Kubelet minor versions on nodes ip-10-0-135-91.ec2.internal, ip-10-0-168-151.ec2.internal, and ip-10-0-192-244.ec2.internal will not be supported in the next OpenShift minor version upgrade.) With this change, that skew guard from [3] is allowed to trip early in the suite. But if it trips for long enough to set off alerts, we'd still fail on that. Personally, I'd much rather have the motivational message in the commit, and not in the inline comment. Folks could answer "why is this line here?" with 'git blame ...'. But David wanted an inline comment [4], so that's what I've done here. [1]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-informing#periodic-ci-openshift-release-master-nightly-4.10-upgrade-from-stable-4.8-e2e-aws-upgrade-paused [2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-upgrade-from-stable-4.8-e2e-aws-upgrade-paused/1449821870344900608 [3]: openshift/cluster-kube-apiserver-operator#1199 [4]: openshift#26531 (comment)
4179fb3 to
c72bb0c
Compare
|
/lgtm |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
David asked for comments pointing out that we cannot remove these exceptions yet [1]. For example, the fixed-in-4.8 bugs still impact the 4.7 -> ... -> 4.10 jobs [2], so we'd need to wait until we were no longer running 4.10 jobs with this suite code, before we can drop those exceptions from the master test suite. And if we added 4.7 -> ... -> 4.11 jobs, we'd need to wait until 4.11 forked off. But even if these comments go stale, they'll at least remind folks about some of the issues they should think through before dropping the exceptions. [1]: openshift#26531 (comment) [2]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-informing#release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci
…xtUpgrade Before running the post-update unpause monitor. Without this, 4.8->4.9->4.10 jobs are updating successfully to 4.10, unpausing compute, running the post-unpause compute-settling monitor, and failing on [1,2]: : [sig-arch][Early] Managed cluster should start all core operators [Skipped:Disconnected] [Suite:openshift/conformance/parallel] 0s fail [github.com/onsi/[email protected]+incompatible/internal/leafnodes/runner.go:113]: Oct 17 23:28:57.284: Some cluster operators are not ready: kube-apiserver (Upgradeable=False KubeletMinorVersion_KubeletMinorVersionUnsupportedNextUpgrade: KubeletMinorVersionUpgradeable: Kubelet minor versions on nodes ip-10-0-135-91.ec2.internal, ip-10-0-168-151.ec2.internal, and ip-10-0-192-244.ec2.internal will not be supported in the next OpenShift minor version upgrade.) With this change, that skew guard from [3] is allowed to trip early in the suite. But if it trips for long enough to set off alerts, we'd still fail on that. Personally, I'd much rather have the motivational message in the commit, and not in the inline comment. Folks could answer "why is this line here?" with 'git blame ...'. But David wanted an inline comment [4], so that's what I've done here. [1]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-informing#periodic-ci-openshift-release-master-nightly-4.10-upgrade-from-stable-4.8-e2e-aws-upgrade-paused [2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-upgrade-from-stable-4.8-e2e-aws-upgrade-paused/1449821870344900608 [3]: openshift/cluster-kube-apiserver-operator#1199 [4]: openshift#26531 (comment)
c72bb0c to
619200e
Compare
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
5 similar comments
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
@wking: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
8 similar comments
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
David asked for comments pointing out that we cannot remove these exceptions yet [1]. For example, the fixed-in-4.8 bugs still impact the 4.7 -> ... -> 4.10 jobs [2], so we'd need to wait until we were no longer running 4.10 jobs with this suite code, before we can drop those exceptions from the master test suite. And if we added 4.7 -> ... -> 4.11 jobs, we'd need to wait until 4.11 forked off. But even if these comments go stale, they'll at least remind folks about some of the issues they should think through before dropping the exceptions. [1]: openshift#26531 (comment) [2]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-informing#release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci
…xtUpgrade Before running the post-update unpause monitor. Without this, 4.8->4.9->4.10 jobs are updating successfully to 4.10, unpausing compute, running the post-unpause compute-settling monitor, and failing on [1,2]: : [sig-arch][Early] Managed cluster should start all core operators [Skipped:Disconnected] [Suite:openshift/conformance/parallel] 0s fail [github.com/onsi/[email protected]+incompatible/internal/leafnodes/runner.go:113]: Oct 17 23:28:57.284: Some cluster operators are not ready: kube-apiserver (Upgradeable=False KubeletMinorVersion_KubeletMinorVersionUnsupportedNextUpgrade: KubeletMinorVersionUpgradeable: Kubelet minor versions on nodes ip-10-0-135-91.ec2.internal, ip-10-0-168-151.ec2.internal, and ip-10-0-192-244.ec2.internal will not be supported in the next OpenShift minor version upgrade.) With this change, that skew guard from [3] is allowed to trip early in the suite. But if it trips for long enough to set off alerts, we'd still fail on that. Personally, I'd much rather have the motivational message in the commit, and not in the inline comment. Folks could answer "why is this line here?" with 'git blame ...'. But David wanted an inline comment [4], so that's what I've done here. [1]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-informing#periodic-ci-openshift-release-master-nightly-4.10-upgrade-from-stable-4.8-e2e-aws-upgrade-paused [2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-upgrade-from-stable-4.8-e2e-aws-upgrade-paused/1449821870344900608 [3]: openshift/cluster-kube-apiserver-operator#1199 [4]: openshift#26531 (comment)
|
@wking: All pull requests linked via external trackers have merged: Bugzilla bug 2028930 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Before running the post-update unpause monitor. Without this, 4.8->4.9->4.10 jobs are updating successfully to 4.10, unpausing compute, running the post-unpause compute-settling monitor, and failing on:
With this change, that skew guard from openshift/cluster-kube-apiserver-operator#1199 is allowed to trip early in the suite. But if it trips for long enough to set off alerts, we'd still fail on that.