-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Bug 1885322: Increase OVN upgrade timeout by 15m #26878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1885322: Increase OVN upgrade timeout by 15m #26878
Conversation
|
@Davoska: This pull request references Bugzilla bug 1885322, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh |
|
@Davoska: This pull request references Bugzilla bug 1885322, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh |
|
@Davoska: This pull request references Bugzilla bug 1885322, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 6 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/hold |
be2b25d to
6ea3f7c
Compare
|
@Davoska: This pull request references Bugzilla bug 1885322, which is valid. 6 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/unhold |
I dunno if it's worth picking #26219 back in here too. Might make folks less worried that we'd missed something in manual conflict resolution. But I'm not all that picky about 4.7 update code, so I don't mind some risk of minor nits if we do accidentally land a manual-resolution mistake. |
|
/hold |
durationToSoftFailure was added in 4447a19 (allow longer upgrade times to run tests, but continue to fail at 75 minutes, 2020-08-12, openshift#25411), but didn't get the 2x on rollbacks we'e been adding to maximumDuration since a53efd5 (Support --options on upgrade tests to abort in progress, 2019-04-29, openshift#22726). That's recently been causing the cluster-version operator's A->B->A rollback CI jobs to time out [1]. This commit catches durationToSoftFailure up with the "2x on rollbacks" approach, and also mentions "aborted" in messages for those types of tests, to help remind folks what's going on. An alternative approach would be to teach clusterUpgrade to treat rollbacks as two separate hops (one for A->B, and another for B->A). But that would be a more involved restructuring, and since we already had the 2x maximumDuration precedent in place, I haven't gone in that direction. [1]: openshift/cluster-version-operator#514 (comment)
The failure message for "[sig-cluster-lifecycle] cluster upgrade should be fast" is ambiguous: upgrade to registry.build01.ci.openshift.org/ci-op-h21l7wld/release@sha256:2148b1c121946ac4f186bb22b166247d3df0ad9cf3966f05dc2a6ea27bc53927 took too long: 86.33481681223333 What does 86.33481681223333 represent? Is it seconds, minutes, hours? What does "too long" mean? Without looking at the test code, one cannot tell. With that info added to the failure string, it's easier to understand: upgrade to registry.build01.ci.openshift.org/ci-op-h21l7wld/release@sha256:2148b1c121946ac4f186bb22b166247d3df0ad9cf3966f05dc2a6ea27bc53927 took too long: 86.3 minutes, expected 75 minutes or less
The OVN upgrade jobs are expected to take longer than OpenShiftSDN. There is more context to this here: https://bugzilla.redhat.com/show_bug.cgi?id=1942164 Signed-off-by: Jamo Luhrsen <[email protected]>
The original PR openshift#26202 use parens in the wrong place and the actual time calcs were happening wrong and garbage values were being used, like this job [0] where the test output looked like: : [sig-cluster-lifecycle] cluster upgrade should complete in -1386946h6m26.707003392s minutes This should fix that. [0] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-gcp-ovn-upgrade/1414923427529101312 Signed-off-by: Jamo Luhrsen <[email protected]>
5c56275 (Bug 1942164: Fix time calc ordering for upgrades, 2021-07-13, openshift#26324) adjusted how the failure-mode test-case name was formed, but did not adjust the success-mode test-case name. This commit restructures to make it impossible to diverge going forward.
6ea3f7c to
5502ec3
Compare
Thank you for the feedback. I have removed the #26219 from the commit as you suggested. Now, this pull request should only increase the OVN upgrade timeout by 15 minutes and add some minor changes to the code. But it will not add the AWS delay. I have also cherry-picked the #26207 to add the commit of the fix to the history of the I will keep the label Edit (clarifying commits): #26207 3c5c821 -> abe8f4e #26202 5ec3836 -> ee62651 #26324 5c56275 -> 9772ac4 #26327 bbb3a70 -> 5502ec3 |
|
/retest-required |
1 similar comment
|
/retest-required |
|
/retest-required |
|
@Davoska: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/unhold |
|
@Davoska: This pull request references Bugzilla bug 1885322, which is valid. 6 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
LalatenduMohanty
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
@LalatenduMohanty, the #26202 is also backported with this pull request in the ee62651 commit. Although cherry-picking the original PR #26202 would also bring changes from the #26219 because the #26202 modifies some of its code. I have manually resolved the conflict so that the changes regarding the #26219 are not backported. |
|
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
|
/remove-lifecycle stale |
|
I have updated the comment #26878 (comment) to clarify the backported commits. |
LalatenduMohanty
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
/hold cancel |
|
/assign @soltysh |
|
/approve |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Davoska, LalatenduMohanty The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@soltysh, let me know if there are any complications with the pull request. |
|
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
|
As the bug itself is closed due to the OCP 4.7 being EOL, closing the pull request. /close |
|
@Davoska: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@Davoska: This pull request references Bugzilla bug 1885322. The bug has been updated to no longer refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Cherry-picking #25977, #26207, #26202, #26324, and #26327 to backport the fix regarding the bug 1885322 back to release-4.7.