Skip to content

Conversation

@vrutkovs
Copy link
Contributor

@vrutkovs vrutkovs commented Jun 10, 2021

Due to AWS LB issue apiserver rollout takes twice as much. This bumps upgrade duration on AWS to 90 mins.

Upgrade tests:

[sig-cluster-lifecycle] cluster upgrade should complete in 60m (90m on AWS) | 1h7m50s
[sig-cluster-lifecycle] cluster upgrade should complete in 60m (90m on AWS) expand_less | 1h32m30s
-- | --
upgrade to registry.build01.ci.openshift.org/ci-ln-chzn6vb/release@sha256:1e6a8003e66a4b68e03128e204b679bd6db5ca354edb802f0709699b0490cde8 took too long: 92.50404250443333

@openshift-ci openshift-ci bot requested review from bparees and deads2k June 10, 2021 17:25
infra, err := c.ConfigV1().Infrastructures().Get(context.Background(), "cluster", metav1.GetOptions{})
framework.ExpectNoError(err)
if infra.Status.PlatformStatus.Type == configv1.AWSPlatformType {
// due to AWS LB bug upgrades take longer on AWS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a bug in our AWS LB code, or in AWS itself?

are we tracking a resolution? this would also impact customers, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(in that their upgrades will take longer. obviously they don't have the same timeout to worry about)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bug in AWS NLB healthchecks, which we worked around by making kubeapi-operator be resilient. This makes us wait twice - ~12m during reboot and ~12m during apiserver rollout.

IIUC its this issue, but @smarterclayton might have better reference for this.

/cc @smarterclayton

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not the nu26 issue. It's https://bugzilla.redhat.com/show_bug.cgi?id=1943804. As a result, we take 12 extra minutes to rollout apiserver twice (120s -> 240s per machine, 3 machines, 2 separate rollouts, machines and apiserver)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go ahead and change the JUnit to be "Cluster upgrade should complete within 60m (90m on AWS)"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added bug reference to the comment

@openshift-ci openshift-ci bot requested a review from smarterclayton June 10, 2021 17:30
@bparees
Copy link
Contributor

bparees commented Jun 10, 2021

/lgtm

note, this might push us over the timeout limits on the multistep upgrade tests in the future (e.g. 4.8->4.9->4.10)

@vrutkovs
Copy link
Contributor Author

note, this might push us over the timeout limits on the multistep upgrade tests in the future (e.g. 4.8->4.9->4.10)

Right, we'll work with DPTP to extend those

@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 10, 2021
@wking
Copy link
Member

wking commented Jun 10, 2021

There is also #26202 in this space with a blanket raise to 90m, vs. this PR's platform-specific raise.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

18 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

Due to AWS LB issue apiserver rollout takes twice as much. This bumps upgrade duration on AWS to 90 mins
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 11, 2021

@vrutkovs: This pull request references Bugzilla bug 1970315, which is invalid:

  • expected the bug to target the "4.8.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 1970315: upgrade test: expect upgrades to take longer on AWS

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Jun 11, 2021
@vrutkovs
Copy link
Contributor Author

/bugzilla refresh

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 11, 2021

@vrutkovs: This pull request references Bugzilla bug 1970315, which is invalid:

  • expected the bug to target the "4.8.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vrutkovs
Copy link
Contributor Author

/bugzilla refresh

@openshift-ci openshift-ci bot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Jun 11, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 11, 2021

@vrutkovs: This pull request references Bugzilla bug 1970315, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.8.0) matches configured target release for branch (4.8.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @zhaozhanqi

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot removed the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Jun 11, 2021
@openshift-ci openshift-ci bot requested a review from zhaozhanqi June 11, 2021 12:04
@vrutkovs vrutkovs changed the title Bug 1970315: upgrade test: expect upgrades to take longer on AWS upgrade test: expect upgrades to take longer on AWS Jun 11, 2021
@openshift-ci openshift-ci bot removed the bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. label Jun 11, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 11, 2021

@vrutkovs: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

Details

In response to this:

upgrade test: expect upgrades to take longer on AWS

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot removed the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Jun 11, 2021
@stbenjam
Copy link
Member

@vrutkovs Do you have a new bug to use for this PR? Looks like the other one which you removed was the wrong bug.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 11, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 11, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bparees, stbenjam, vrutkovs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 11, 2021

@vrutkovs: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-disruptive 587d519 link /test e2e-aws-disruptive
ci/prow/e2e-gcp-disruptive 587d519 link /test e2e-gcp-disruptive
ci/prow/e2e-aws-upgrade 587d519 link /test e2e-aws-upgrade

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@vrutkovs vrutkovs changed the title upgrade test: expect upgrades to take longer on AWS Bug 1970975: upgrade test: expect upgrades to take longer on AWS Jun 11, 2021
@openshift-ci openshift-ci bot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Jun 11, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 11, 2021

@vrutkovs: This pull request references Bugzilla bug 1970975, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.8.0) matches configured target release for branch (4.8.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @wangke19

Details

In response to this:

Bug 1970975: upgrade test: expect upgrades to take longer on AWS

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested a review from wangke19 June 11, 2021 15:15
@openshift-merge-robot openshift-merge-robot merged commit 15ba47a into openshift:master Jun 11, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 11, 2021

@vrutkovs: All pull requests linked via external trackers have merged:

Bugzilla bug 1970975 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1970975: upgrade test: expect upgrades to take longer on AWS

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants