Skip to content

Conversation

@staebler
Copy link
Contributor

@staebler staebler commented Jun 24, 2020

The Run function in the Destroyer interface was modified to take a context as a parameter. This provides a way for the user stop the uninstall after a period of time by providing a context with a deadline.

The baremetal, libvirt, openstack, and ovirt providers do not provide a means by which most requests made to the provider can be stopped prematurely. In these cases, the context is checked prior to making the requests as a best effort. But the uninstall may continue for a period of time after the context is done.

The RunWithContext function introduced in #3765 for AWS has been obsoleted since the Run function now accepts a context.

This will be used by Hive to backoff uninstall attempts.

https://issues.redhat.com/browse/CO-974

@staebler staebler force-pushed the destroy_with_context branch 2 times, most recently from 8234c43 to 7e98bab Compare June 24, 2020 17:32
@abhinavdahiya
Copy link
Contributor

A "--timeout" flag was added to the openshift-install destroy cluster command. This allows the user to time out the destroy after a specified number of seconds.

If the users want to add a timeout, they should invoke the installer with timeout openshift-install ...
I don't think the installer destroy should provide that api.

@abhinavdahiya
Copy link
Contributor

also if you can maybe split these into per-platform, switch destroy to pass context that would be very helpful in reviewing.

@staebler staebler force-pushed the destroy_with_context branch 2 times, most recently from e31c1f0 to 0a722d8 Compare June 24, 2020 19:03
@staebler
Copy link
Contributor Author

Update to put the changes for each provider in separate commits and to remove the "--timeout" flag.

@staebler staebler force-pushed the destroy_with_context branch from 0a722d8 to 6ec4a5a Compare June 24, 2020 19:28
@abhinavdahiya
Copy link
Contributor

0195751

So i think GCP needs a relook. there is a global context for Uninstaller that is used to create new contexts with tighter timeouts for requests. This change just overwrites that as part of Run, while also removing the local tigher context for API calls.

1685b1f

hmm.. the placement to the ctx Errs seems a little random to me, is it possible to provide a little more context in the commit message? setting the convention like check for ctx before each item to be deleted or maybe only once per cycle etc. is what we need to try should help guide if change is good.

ed8ccf3

this needs review from openstack folks, cc @mandre @Fedosin

@staebler staebler force-pushed the destroy_with_context branch 2 times, most recently from 2f25439 to 4f68b5b Compare June 26, 2020 15:05
@staebler
Copy link
Contributor Author

0195751

So i think GCP needs a relook. there is a global context for Uninstaller that is used to create new contexts with tighter timeouts for requests. This change just overwrites that as part of Run, while also removing the local tigher context for API calls.

I fixed this to use the 2-minute timeout for the calls to get the session and services.

1685b1f

hmm.. the placement to the ctx Errs seems a little random to me, is it possible to provide a little more context in the commit message? setting the convention like check for ctx before each item to be deleted or maybe only once per cycle etc. is what we need to try should help guide if change is good.

I fixed this to check the context directly before every request.

@staebler staebler force-pushed the destroy_with_context branch from 4f68b5b to 78e20f8 Compare June 26, 2020 16:26
@mandre
Copy link
Member

mandre commented Jun 29, 2020

/test e2e-openstack

@staebler staebler force-pushed the destroy_with_context branch from 78e20f8 to a330776 Compare June 29, 2020 15:01
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign crawford
You can assign the PR to them by writing /assign @crawford in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@staebler
Copy link
Contributor Author

78e20f8a6...a33077627

  • In gcp destroy, change PollImmediateInfinite to PollImmediateUntil to stop the polling as soon as the context is done rather than waiting for the polling interval to elapse.

Copy link
Member

@mandre mandre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenStack changes look good, I only have some reservations about the logging.

@pierreprinetti
Copy link
Member

/test e2e-openstack

@pierreprinetti
Copy link
Member

/retest

@jstuever
Copy link
Contributor

gcp changes looks good to me.

staebler added 5 commits July 21, 2020 13:11
The Run function in the Destroyer interface was modified to take a context as
a parameter. This provides a way for the user stop the uninstall after a
period of time by providing a context with a deadline.

This will be used by Hive to backoff uninstall attempts.

https://issues.redhat.com/browse/CO-974
Use the context passed into the Run function for AWS.
Use the context passed into the Run function for Azure.
Use the context passed into the Run function for GCP.
Use the context passed into the Run function for vSphere.
staebler added 3 commits July 21, 2020 13:11
Use the context passed into the Run function for libvirt.

Note that there is no means by which to abort in-flight requests
made to libvirt. If the context completes while the uninstall
is making a request, then the uninstall will not return until
the request has completed.
Use the context passed into the Run function for OpenStack.

Note that there is no means by which to abort in-flight requests
made to OpenStack. If the context completes while the uninstall
is making a request, then the uninstall will not return until
the request has completed.
Use the context passed into the Run function for ovirt.

Note that there is no means by which to abort in-flight requests
made to ovirt. If the context completes while the uninstall
is making a request, then the uninstall will not return until
the request has completed.
@staebler staebler force-pushed the destroy_with_context branch from a330776 to 783ba77 Compare July 21, 2020 17:21
@staebler
Copy link
Contributor Author

a33077627...783ba77d2

@pierreprinetti
Copy link
Member

/retest

@openshift-ci-robot
Copy link
Contributor

@staebler: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-libvirt 783ba77 link /test e2e-libvirt
ci/prow/e2e-ovirt 783ba77 link /test e2e-ovirt
ci/prow/e2e-crc 783ba77 link /test e2e-crc
ci/prow/e2e-aws-fips 783ba77 link /test e2e-aws-fips
ci/prow/e2e-aws-workers-rhel7 783ba77 link /test e2e-aws-workers-rhel7
ci/prow/e2e-metal-ipi 783ba77 link /test e2e-metal-ipi

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@pierreprinetti
Copy link
Member

openstack-e2e is passing now.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 10, 2020
@openshift-merge-robot
Copy link
Contributor

@staebler: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/openstack-manifests 783ba77 link /test openstack-manifests

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 8, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 14, 2021

@staebler: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/openstack-manifests 783ba77 link /test openstack-manifests

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci-robot
Copy link
Contributor

@openshift-bot: Closed this PR.

Details

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants