Skip to content

Conversation

@DavidHurta
Copy link
Contributor

@DavidHurta DavidHurta commented Mar 3, 2022

Cherry-picking #25977, #26207, #26202, #26324, and #26327 to backport the fix regarding the bug 1885322 back to release-4.7.

@openshift-ci openshift-ci bot added bugzilla/severity-low Referenced Bugzilla bug's severity is low for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Mar 3, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 3, 2022

@Davoska: This pull request references Bugzilla bug 1885322, which is invalid:

  • expected the bug to target the "4.7.z" release, but it targets "---" instead
  • expected dependent Bugzilla bug 1879099 to target a release in 4.8.0, 4.8.z, but it targets "4.7.0" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 1885322: Increase OVN upgrade timeout by 15m

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested review from csrwng and mfojtik March 3, 2022 12:55
@DavidHurta
Copy link
Contributor Author

/bugzilla refresh

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 3, 2022

@Davoska: This pull request references Bugzilla bug 1885322, which is invalid:

  • expected dependent Bugzilla bug 1879099 to target a release in 4.8.0, 4.8.z, but it targets "4.7.0" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@DavidHurta
Copy link
Contributor Author

/bugzilla refresh

@openshift-ci openshift-ci bot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Mar 3, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 3, 2022

@Davoska: This pull request references Bugzilla bug 1885322, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.7.z) matches configured target release for branch (4.7.z)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
  • dependent bug Bugzilla bug 1929650 is in the state CLOSED (ERRATA), which is one of the valid states (VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE))
  • dependent Bugzilla bug 1929650 targets the "4.8.z" release, which is one of the valid target releases: 4.8.0, 4.8.z
  • bug has dependents

Requesting review from QA contact:
/cc @jianlinliu

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested a review from jianlinliu March 3, 2022 13:34
@DavidHurta
Copy link
Contributor Author

DavidHurta commented Mar 3, 2022

/hold
I need to double-check the contents of the commits.
Fixing the Pull Request - I have backported too much code.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 3, 2022
@DavidHurta DavidHurta force-pushed the cluster-upgrade-should-be-fast-4.7 branch from be2b25d to 6ea3f7c Compare March 3, 2022 19:50
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 3, 2022

@Davoska: This pull request references Bugzilla bug 1885322, which is valid.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.7.z) matches configured target release for branch (4.7.z)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
  • dependent bug Bugzilla bug 1929650 is in the state CLOSED (ERRATA), which is one of the valid states (VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE))
  • dependent Bugzilla bug 1929650 targets the "4.8.z" release, which is one of the valid target releases: 4.8.0, 4.8.z
  • bug has dependents

Requesting review from QA contact:
/cc @jianlinliu

Details

In response to this:

Bug 1885322: Increase OVN upgrade timeout by 15m

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@DavidHurta
Copy link
Contributor Author

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 3, 2022
@wking
Copy link
Member

wking commented Mar 4, 2022

$ cherry-pick-diff origin/release-4.7..origin/pr/26878 origin/release-4.8
e7456a82fd -> 6ea3f7c74d test/e2e/upgrade/upgrade: Unify duration-overshoot test-case name
785fe46c2b -> bdf1a8b0ba Bug 1942164: Fix time calc ordering for upgrades
834623fab6 -> 0ecd19ad3d Increase OVN upgrade timeout by 15m
a23e3eacd8 -> 1b35a9f313 test/e2e/upgrade: Relax 'too long' soft timeout for rollback jobs

e7456a82fd -> 6ea3f7c74d test/e2e/upgrade/upgrade: Unify duration-overshoot test-case name
...very clean, just some line number offsets...

785fe46c2b -> bdf1a8b0ba Bug 1942164: Fix time calc ordering for upgrades
...very clean, just some line number offsets...

834623fab6 -> 0ecd19ad3d Increase OVN upgrade timeout by 15m
...a bit messy, because 834623fab6 was touching some code from #26219 (AWS delay) that hasn't been taken back to 4.7.  Seems like 0ecd19ad3d is rolling those two changes into one commit...

a23e3eacd8 -> 1b35a9f313 test/e2e/upgrade: Relax 'too long' soft timeout for rollback jobs
...pretty clean: line offsets and minor context changes...

I dunno if it's worth picking #26219 back in here too. Might make folks less worried that we'd missed something in manual conflict resolution. But I'm not all that picky about 4.7 update code, so I don't mind some risk of minor nits if we do accidentally land a manual-resolution mistake.

@DavidHurta
Copy link
Contributor Author

/hold
Resolving the #26878 (comment) discussion.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 4, 2022
wking and others added 5 commits March 4, 2022 11:27
durationToSoftFailure was added in 4447a19 (allow longer upgrade
times to run tests, but continue to fail at 75 minutes, 2020-08-12, openshift#25411),
but didn't get the 2x on rollbacks we'e been adding to maximumDuration
since a53efd5 (Support --options on upgrade tests to abort in
progress, 2019-04-29, openshift#22726).  That's recently been causing the
cluster-version operator's A->B->A rollback CI jobs to time out [1].
This commit catches durationToSoftFailure up with the "2x on
rollbacks" approach, and also mentions "aborted" in messages for those
types of tests, to help remind folks what's going on.

An alternative approach would be to teach clusterUpgrade to treat
rollbacks as two separate hops (one for A->B, and another for B->A).
But that would be a more involved restructuring, and since we already
had the 2x maximumDuration precedent in place, I haven't gone in that
direction.

[1]: openshift/cluster-version-operator#514 (comment)
The failure message for "[sig-cluster-lifecycle] cluster upgrade should be fast" is ambiguous:

upgrade to registry.build01.ci.openshift.org/ci-op-h21l7wld/release@sha256:2148b1c121946ac4f186bb22b166247d3df0ad9cf3966f05dc2a6ea27bc53927 took too long: 86.33481681223333

What does 86.33481681223333 represent? Is it seconds, minutes, hours? What does "too long" mean? Without looking at the test code, one cannot tell. With that info added to the failure string, it's easier to understand:

upgrade to registry.build01.ci.openshift.org/ci-op-h21l7wld/release@sha256:2148b1c121946ac4f186bb22b166247d3df0ad9cf3966f05dc2a6ea27bc53927 took too long: 86.3 minutes, expected 75 minutes or less
The OVN upgrade jobs are expected to take longer than
OpenShiftSDN.

There is more context to this here:
  https://bugzilla.redhat.com/show_bug.cgi?id=1942164

Signed-off-by: Jamo Luhrsen <[email protected]>
The original PR openshift#26202 use
parens in the wrong place and the actual time calcs were happening
wrong and garbage values were being used, like this job [0] where
the test output looked like:
  : [sig-cluster-lifecycle] cluster upgrade should complete in -1386946h6m26.707003392s minutes

This should fix that.

[0] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-gcp-ovn-upgrade/1414923427529101312

Signed-off-by: Jamo Luhrsen <[email protected]>
5c56275 (Bug 1942164: Fix time calc ordering for upgrades,
2021-07-13, openshift#26324) adjusted how the failure-mode test-case name was
formed, but did not adjust the success-mode test-case name.  This
commit restructures to make it impossible to diverge going forward.
@DavidHurta DavidHurta force-pushed the cluster-upgrade-should-be-fast-4.7 branch from 6ea3f7c to 5502ec3 Compare March 4, 2022 10:42
@DavidHurta
Copy link
Contributor Author

DavidHurta commented Mar 4, 2022

$ cherry-pick-diff origin/release-4.7..origin/pr/26878 origin/release-4.8
e7456a82fd -> 6ea3f7c74d test/e2e/upgrade/upgrade: Unify duration-overshoot test-case name
785fe46c2b -> bdf1a8b0ba Bug 1942164: Fix time calc ordering for upgrades
834623fab6 -> 0ecd19ad3d Increase OVN upgrade timeout by 15m
a23e3eacd8 -> 1b35a9f313 test/e2e/upgrade: Relax 'too long' soft timeout for rollback jobs

e7456a82fd -> 6ea3f7c74d test/e2e/upgrade/upgrade: Unify duration-overshoot test-case name
...very clean, just some line number offsets...

785fe46c2b -> bdf1a8b0ba Bug 1942164: Fix time calc ordering for upgrades
...very clean, just some line number offsets...

834623fab6 -> 0ecd19ad3d Increase OVN upgrade timeout by 15m
...a bit messy, because 834623fab6 was touching some code from #26219 (AWS delay) that hasn't been taken back to 4.7.  Seems like 0ecd19ad3d is rolling those two changes into one commit...

a23e3eacd8 -> 1b35a9f313 test/e2e/upgrade: Relax 'too long' soft timeout for rollback jobs
...pretty clean: line offsets and minor context changes...

I dunno if it's worth picking #26219 back in here too. Might make folks less worried that we'd missed something in manual conflict resolution. But I'm not all that picky about 4.7 update code, so I don't mind some risk of minor nits if we do accidentally land a manual-resolution mistake.

Thank you for the feedback. I have removed the #26219 from the commit as you suggested. Now, this pull request should only increase the OVN upgrade timeout by 15 minutes and add some minor changes to the code. But it will not add the AWS delay.

I have also cherry-picked the #26207 to add the commit of the fix to the history of the release-4.7 branch.

I will keep the label do-not-merge/hold for the moment. Let me know if my changes were adequate or if I should change something. It's my first backport.

Edit (clarifying commits):
New mapping [original -> new]:
#25977 a23e3ea -> 395a299
Relaxing soft timeout for rollback jobs which is used in the OVN upgrade timeout by 15m fix, minor context changes, wording...

#26207 3c5c821 -> abe8f4e
Clarifying failure message without the mentioning that upgrades on AWS should take longer since we are not backporting #26219.

#26202 5ec3836 -> ee62651
Increasing OVN upgrade timeout by 15m without backporting the #26219.

#26324 5c56275 -> 9772ac4
Fix to the Increase OVN upgrade timeout by 15m.

#26327 bbb3a70 -> 5502ec3
Unifying duration-overshoot test-case names...

@DavidHurta
Copy link
Contributor Author

/retest-required

1 similar comment
@DavidHurta
Copy link
Contributor Author

/retest-required

@DavidHurta
Copy link
Contributor Author

/retest-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 7, 2022

@Davoska: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@DavidHurta
Copy link
Contributor Author

DavidHurta commented Mar 7, 2022

/unhold
Failing checks have passed, removing the do-not-merge/hold label.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 7, 2022

@Davoska: This pull request references Bugzilla bug 1885322, which is valid.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.7.z) matches configured target release for branch (4.7.z)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
  • dependent bug Bugzilla bug 1929650 is in the state CLOSED (ERRATA), which is one of the valid states (VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE))
  • dependent Bugzilla bug 1929650 targets the "4.8.z" release, which is one of the valid target releases: 4.8.0, 4.8.z
  • bug has dependents

Requesting review from QA contact:
/cc @jianlinliu

Details

In response to this:

Bug 1885322: Increase OVN upgrade timeout by 15m

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 7, 2022
Copy link
Member

@LalatenduMohanty LalatenduMohanty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 22, 2022
@LalatenduMohanty
Copy link
Member

/hold

@Davoska I can not confirm if #26202 is also back ported with this PR. Can you confirm?

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 22, 2022
@DavidHurta
Copy link
Contributor Author

@LalatenduMohanty, the #26202 is also backported with this pull request in the ee62651 commit.

Although cherry-picking the original PR #26202 would also bring changes from the #26219 because the #26202 modifies some of its code. I have manually resolved the conflict so that the changes regarding the #26219 are not backported.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 28, 2022
@DavidHurta
Copy link
Contributor Author

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 25, 2022
@DavidHurta
Copy link
Contributor Author

I have updated the comment #26878 (comment) to clarify the backported commits.

Copy link
Member

@LalatenduMohanty LalatenduMohanty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@LalatenduMohanty
Copy link
Member

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 5, 2022
@DavidHurta
Copy link
Contributor Author

/assign @soltysh
Hello, tagging regarding the approved and backport-risk-assessed labels 👋

@LalatenduMohanty
Copy link
Member

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 8, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Davoska, LalatenduMohanty
Once this PR has been reviewed and has the lgtm label, please ask for approval from soltysh by writing /assign @soltysh in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@DavidHurta
Copy link
Contributor Author

@soltysh, let me know if there are any complications with the pull request.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 23, 2022
@DavidHurta
Copy link
Contributor Author

As the bug itself is closed due to the OCP 4.7 being EOL, closing the pull request.

/close

@openshift-ci openshift-ci bot closed this Nov 23, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 23, 2022

@Davoska: Closed this PR.

Details

In response to this:

As the bug itself is closed due to the OCP 4.7 being EOL, closing the pull request.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 23, 2022

@Davoska: This pull request references Bugzilla bug 1885322. The bug has been updated to no longer refer to the pull request using the external bug tracker.

Details

In response to this:

Bug 1885322: Increase OVN upgrade timeout by 15m

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugzilla/severity-low Referenced Bugzilla bug's severity is low for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants