-
Notifications
You must be signed in to change notification settings - Fork 2k
ci-operator/step-registry/openshift/e2e/test: Raise timeout to 4h #22289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci-operator/step-registry/openshift/e2e/test: Raise timeout to 4h #22289
Conversation
ci-operator/jobs/openshift/release/openshift-release-master-periodics.yaml
Outdated
Show resolved
Hide resolved
a12356c to
2717192
Compare
vrutkovs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
/lgtm
|
@wking: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/lgtm cancel |
The 4.9 to 4.10 to 4.9 rollbacks keep hitting the 3h timeout: $ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=96h&type=junit&name=4.10-upgrade-from-stable-4.9&search=Process+did+not+finish+before.*timeout' | grep 'rollback.*failures match' periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade-rollback (all) - 4 runs, 100% failed, 75% of failures match = 75% impact periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-upgrade-rollback (all) - 2 runs, 100% failed, 50% of failures match = 50% impact $ w3m -dump -cols 200 'https://search.ci.openshift.org/search?maxAge=96h&type=junit&context=0&name=4.10-upgrade-from-stable-4.9&search=Process+did+not+finish+before.*timeout' | jq -r 'to_entries[].value | to_entries[].value[] | .name + " " + .context[0]' | sed -n 's/\(.*rollback\) .*before \([^ ]*\) timeout.*/\1 \2/p' periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade-rollback 3h0m0s periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade-rollback 3h0m0s periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade-rollback 3h0m0s periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-upgrade-rollback 3h0m0s We've had the 3h timeout since 90fadfe (steps/openshift-e2e-test: e2e tests can take longer than 2h, 2021-01-06, openshift#14674). We still need some time for setup and teardown in the wrapping Prow job [1], so I'm also setting timeout on the jobs, via [2]. That leaves us exposed to situations where other jobs that use this same step hang up and spend so long in the step, that a wrapping Plank/Prow timeout leaves them with too little time to finish their teardown/gather. But if that happens, maybe the test-platform folks will give us either a way to override a single step's timeout for a job, or a blanket increase in the Plank/Prow cap. [1]: https://docs.ci.openshift.org/docs/architecture/timeouts/ [2]: openshift/ci-tools#2294
2717192 to
c1c1989
Compare
vrutkovs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vrutkovs, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@wking: Updated the following 3 configmaps:
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The 4.9 to 4.10 to 4.9 rollbacks keep hitting the 3h timeout:
We've had the 3h timeout since 90fadfe (#14674). We still need some time for setup and teardown in the wrapping Prow job, so I'm also setting
decoration_configto raise that limit for these jobs. That leaves us exposed to situations where other jobs that use this same step hang up and spend so long in the step, that a wrapping Plank/Prow timeout leaves them with too little time to finish their teardown/gather. But if that happens, maybe the test-platform folks will give us either a way to override a single step's timeout for a job, or a blanket increase in the Plank/Prow cap.