provide a means to abandon deprovisioning #1017

staebler · 2020-06-17T15:05:01Z

When there are AWS resources that cannot be destroyed, the list of blocked ARNs is added to the ClusterDeprovision status in the blockedResources field. The user can then take action on those blocked resources.

The AWS destroyer will back off destroy attempts. Each attempt is limited to 5 minutes. The backoff starts at 5 minutes, doubles after each failed attempt, and caps at 24 hours.

If the user wants to abandon a deprovision, the user can add the "hive.openshift.io/abandon-deprovision" annotation to the ClusterDeployment. When this annotation is present with a true value, the clusterdeployment controller will remove the deprovison finalizer from the ClusterDeployment without waiting for the ClusterDeprovision to complete.

https://issues.redhat.com/browse/CO-943

openshift-ci-robot · 2020-06-17T15:05:26Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: staebler

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [staebler]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

staebler · 2020-06-17T15:05:58Z

/hold

This is using a fork of the installer until openshift/installer#3765 merges.

staebler · 2020-06-17T15:24:45Z

This is an example of the status of a ClusterDeprovision when there is an untagged instance connected to the VPC.

  status:
    blockedResources:
    - arn:aws:ec2:us-east-1:125931421481:dhcp-options/dopt-0b5ec6b5149cd3f58
    - arn:aws:ec2:us-east-1:125931421481:network-interface/eni-044b8a17b7c24a0fa
    - arn:aws:ec2:us-east-1:125931421481:subnet/subnet-02afe98c8ac60c7a7
    - arn:aws:ec2:us-east-1:125931421481:vpc/vpc-0f19f125f3d24bfc9

joelddiaz

looks good

joelddiaz · 2020-06-18T12:07:28Z

contrib/pkg/deprovision/awstagdeprovision.go

+				return true, err
+			}
+			if o.clusterDeprovision == "" {
+				return


Do we not also return err here even if we will not be posting status to a ClusterDeprovision object?

If there are any blocked resources, then we want to keep trying to uninstall in the backoff loop. If we return err here, then the uninstall will stop and the pod will fail.

However, maybe it would be good to distinguish between an error because the context expired and an error for other reasons. In the former case, we want to keep trying. In the latter case, we want to abort. I'll look into that.

staebler · 2020-06-19T14:09:51Z

Now blocked on openshift/installer#3772.

dgoodwin · 2020-06-22T14:37:03Z

pkg/clusterresource/vsphere.go

 	IngressVIP string

-	// DNSVIP is the virtual IP address for DNS
-	DNSVIP string


A little strange, is this meant to be in this PR?

Yes. The DNSVIP was removed from the installer types in the version of the installer that I am vendoring.

dgoodwin · 2020-06-22T14:42:12Z

pkg/apis/hive/v1/clusterdeprovision_types.go

 	Completed bool `json:"completed,omitempty"`
+
+	// BlockedResources is a list of cloud resources that the deprovision has not been able to delete
+	BlockedResources []string `json:"blockedResources,omitempty"`


How are things looking for the other cloud providers, will a flat list of strings be sufficient as far as we can see? Wondering if we should go with something that lets us store a little more data or not.

That's a good idea. I can change BlockedResources to a struct that for now has nothing but a name field.

dgoodwin · 2020-08-04T16:56:57Z

Installer PR merged but we've got major conflicts in here now.

staebler · 2020-08-04T17:33:47Z

Installer PR merged but we've got major conflicts in here now.

Only one of the two installer PRs has merged. The openshift/installer#3772 PR still has not.

When there are AWS resources that cannot be destroyed, the list of blocked ARNs is added to the ClusterDeprovision status in the `blockedResources` field. The user can then take action on those blocked resources. The AWS destroyer will back off destroy attempts. Each attempt is limited to 5 minutes. The backoff starts at 5 minutes, doubles after each failed attempt, and caps at 24 hours. If the user wants to abandon a deprovision, the user can add the "hive.openshift.io/abandon-deprovision" annotation to the ClusterDeployment. When this annotation is present with a true value, the clusterdeployment controller will remove the deprovison finalizer from the ClusterDeployment without waiting for the ClusterDeprovision to complete. https://issues.redhat.com/browse/CO-943

openshift-merge-robot · 2020-10-14T19:24:46Z

@staebler: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/unit	`e3bef6c`	link	`/test unit`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

abhinavdahiya · 2020-11-02T17:29:20Z

contrib/pkg/deprovision/awstagdeprovision.go

+			Steps:    1 << 8, // large enough to make cap the effective bound
+			Cap:      24 * time.Hour,
+		},
+		func() (done bool, returnErr error) {


nit: named returns are not great as they allow empty returrn in the function definition which makes it difficult to read what is being returned and when was it set..

abhinavdahiya · 2020-11-02T17:34:51Z

contrib/pkg/deprovision/awstagdeprovision.go

+			namespace, _, err := kubeconfig.Namespace()
+			if err != nil {
+				o.logger.WithError(err).Error("could not get the namespace")
+				return
+			}


does this use the context setup using the kubeconfig to get the namespace? How many users would use something like that? I personally haven't So should we also allow for setting namespace directly?

abhinavdahiya · 2020-11-02T17:37:31Z

pkg/controller/clusterdeployment/clusterdeployment_controller.go

+	// Stop waiting for deprovision if the abandon-deprovision annotation is true
+	if value, ok := cd.Annotations[constants.AbandonDeprovisionAnnotation]; ok {
+		logger := cdLog.WithField(constants.AbandonDeprovisionAnnotation, value)
+		if abandon, err := strconv.ParseBool(value); abandon && err == nil {


Suggested change

if abandon, err := strconv.ParseBool(value); abandon && err == nil {

if abandon, err := strconv.ParseBool(value); err == nil && abandon {

wouldn't that be more appropriate?

abhinavdahiya · 2020-11-02T17:38:50Z

pkg/controller/clusterdeployment/clusterdeployment_controller.go

+			logger.Warn("adandoning deprovision")
+			err = r.removeClusterDeploymentFinalizer(cd, cdLog)
+			if err != nil {
+				cdLog.WithError(err).Log(controllerutils.LogLevel(err), "error removing finalizer")


we will return deprovisioned = true even when there was an error to remove the finalizer.. shouldn't we return false here?

openshift-bot · 2021-01-31T19:23:41Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

gregsheremeta · 2021-02-02T19:25:12Z

/remove-lifecycle stale

openshift-bot · 2021-05-04T01:18:25Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2021-06-03T04:55:41Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-ci · 2021-06-03T04:55:48Z

@staebler: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci · 2021-06-09T21:01:57Z

@staebler: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/verify	`e3bef6c`	link	`/test verify`
ci/prow/e2e	`e3bef6c`	link	`/test e2e`
ci/prow/unit	`e3bef6c`	link	`/test unit`
ci/prow/images	`e3bef6c`	link	`/test images`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

dgoodwin · 2021-06-10T11:10:50Z

/close

openshift-ci · 2021-06-10T11:10:58Z

@dgoodwin: Closed this PR.

Details

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot requested review from joelddiaz and twiest June 17, 2020 15:05

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 17, 2020

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 17, 2020

staebler force-pushed the stop_deprovision branch from 5461264 to 503f38c Compare June 17, 2020 16:58

joelddiaz reviewed Jun 18, 2020

View reviewed changes

staebler changed the title ~~provide a means to adandon deprovisioning~~ provide a means to abandon deprovisioning Jun 18, 2020

openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 20, 2020

dgoodwin reviewed Jun 22, 2020

View reviewed changes

twiest removed their request for review October 1, 2020 18:18

vendor: bump installer version

673377e

staebler force-pushed the stop_deprovision branch from 503f38c to abc3652 Compare October 14, 2020 18:53

openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 14, 2020

staebler force-pushed the stop_deprovision branch from abc3652 to e3bef6c Compare October 14, 2020 19:13

abhinavdahiya reviewed Nov 2, 2020

View reviewed changes

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 31, 2021

openshift-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 2, 2021

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 4, 2021

openshift-ci bot closed this Jun 10, 2021

	if abandon, err := strconv.ParseBool(value); abandon && err == nil {
	if abandon, err := strconv.ParseBool(value); err == nil && abandon {

provide a means to abandon deprovisioning #1017

provide a means to abandon deprovisioning #1017

Uh oh!

Conversation

staebler commented Jun 17, 2020

Uh oh!

openshift-ci-robot commented Jun 17, 2020

Uh oh!

staebler commented Jun 17, 2020

Uh oh!

staebler commented Jun 17, 2020

Uh oh!

joelddiaz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

staebler commented Jun 19, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dgoodwin commented Aug 4, 2020

Uh oh!

staebler commented Aug 4, 2020

Uh oh!

openshift-merge-robot commented Oct 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openshift-bot commented Jan 31, 2021

Uh oh!

gregsheremeta commented Feb 2, 2021

Uh oh!

openshift-bot commented May 4, 2021

Uh oh!

openshift-bot commented Jun 3, 2021

Uh oh!

openshift-ci bot commented Jun 3, 2021

Uh oh!

openshift-ci bot commented Jun 9, 2021

Uh oh!

dgoodwin commented Jun 10, 2021

Uh oh!

openshift-ci bot commented Jun 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

openshift-merge-robot commented Oct 14, 2020 •

edited

Loading