Skip to content

Conversation

@staebler
Copy link
Contributor

When there are AWS resources that cannot be destroyed, the list of blocked ARNs is added to the ClusterDeprovision status in the blockedResources field. The user can then take action on those blocked resources.

The AWS destroyer will back off destroy attempts. Each attempt is limited to 5 minutes. The backoff starts at 5 minutes, doubles after each failed attempt, and caps at 24 hours.

If the user wants to abandon a deprovision, the user can add the "hive.openshift.io/abandon-deprovision" annotation to the ClusterDeployment. When this annotation is present with a true value, the clusterdeployment controller will remove the deprovison finalizer from the ClusterDeployment without waiting for the ClusterDeprovision to complete.

https://issues.redhat.com/browse/CO-943

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: staebler

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 17, 2020
@staebler
Copy link
Contributor Author

/hold

This is using a fork of the installer until openshift/installer#3765 merges.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 17, 2020
@staebler
Copy link
Contributor Author

This is an example of the status of a ClusterDeprovision when there is an untagged instance connected to the VPC.

  status:
    blockedResources:
    - arn:aws:ec2:us-east-1:125931421481:dhcp-options/dopt-0b5ec6b5149cd3f58
    - arn:aws:ec2:us-east-1:125931421481:network-interface/eni-044b8a17b7c24a0fa
    - arn:aws:ec2:us-east-1:125931421481:subnet/subnet-02afe98c8ac60c7a7
    - arn:aws:ec2:us-east-1:125931421481:vpc/vpc-0f19f125f3d24bfc9

Copy link
Contributor

@joelddiaz joelddiaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

return true, err
}
if o.clusterDeprovision == "" {
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not also return err here even if we will not be posting status to a ClusterDeprovision object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are any blocked resources, then we want to keep trying to uninstall in the backoff loop. If we return err here, then the uninstall will stop and the pod will fail.

However, maybe it would be good to distinguish between an error because the context expired and an error for other reasons. In the former case, we want to keep trying. In the latter case, we want to abort. I'll look into that.

@staebler staebler changed the title provide a means to adandon deprovisioning provide a means to abandon deprovisioning Jun 18, 2020
@staebler
Copy link
Contributor Author

Now blocked on openshift/installer#3772.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 20, 2020
IngressVIP string

// DNSVIP is the virtual IP address for DNS
DNSVIP string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little strange, is this meant to be in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The DNSVIP was removed from the installer types in the version of the installer that I am vendoring.

Completed bool `json:"completed,omitempty"`

// BlockedResources is a list of cloud resources that the deprovision has not been able to delete
BlockedResources []string `json:"blockedResources,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are things looking for the other cloud providers, will a flat list of strings be sufficient as far as we can see? Wondering if we should go with something that lets us store a little more data or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. I can change BlockedResources to a struct that for now has nothing but a name field.

@dgoodwin
Copy link
Contributor

dgoodwin commented Aug 4, 2020

Installer PR merged but we've got major conflicts in here now.

@staebler
Copy link
Contributor Author

staebler commented Aug 4, 2020

Installer PR merged but we've got major conflicts in here now.

Only one of the two installer PRs has merged. The openshift/installer#3772 PR still has not.

@twiest twiest removed their request for review October 1, 2020 18:18
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 14, 2020
When there are AWS resources that cannot be destroyed, the list of
blocked ARNs is added to the ClusterDeprovision status in the
`blockedResources` field. The user can then take action on those
blocked resources.

The AWS destroyer will back off destroy attempts. Each attempt is
limited to 5 minutes. The backoff starts at 5 minutes, doubles
after each failed attempt, and caps at 24 hours.

If the user wants to abandon a deprovision, the user can add the
"hive.openshift.io/abandon-deprovision" annotation to the ClusterDeployment.
When this annotation is present with a true value, the clusterdeployment
controller will remove the deprovison finalizer from the ClusterDeployment
without waiting for the ClusterDeprovision to complete.

https://issues.redhat.com/browse/CO-943
@openshift-merge-robot
Copy link
Contributor

openshift-merge-robot commented Oct 14, 2020

@staebler: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/unit e3bef6c link /test unit

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Steps: 1 << 8, // large enough to make cap the effective bound
Cap: 24 * time.Hour,
},
func() (done bool, returnErr error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: named returns are not great as they allow empty returrn in the function definition which makes it difficult to read what is being returned and when was it set..

Comment on lines +164 to +168
namespace, _, err := kubeconfig.Namespace()
if err != nil {
o.logger.WithError(err).Error("could not get the namespace")
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this use the context setup using the kubeconfig to get the namespace? How many users would use something like that? I personally haven't So should we also allow for setting namespace directly?

// Stop waiting for deprovision if the abandon-deprovision annotation is true
if value, ok := cd.Annotations[constants.AbandonDeprovisionAnnotation]; ok {
logger := cdLog.WithField(constants.AbandonDeprovisionAnnotation, value)
if abandon, err := strconv.ParseBool(value); abandon && err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if abandon, err := strconv.ParseBool(value); abandon && err == nil {
if abandon, err := strconv.ParseBool(value); err == nil && abandon {

wouldn't that be more appropriate?

logger.Warn("adandoning deprovision")
err = r.removeClusterDeploymentFinalizer(cd, cdLog)
if err != nil {
cdLog.WithError(err).Log(controllerutils.LogLevel(err), "error removing finalizer")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will return deprovisioned = true even when there was an error to remove the finalizer.. shouldn't we return false here?

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 31, 2021
@gregsheremeta
Copy link
Contributor

/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 2, 2021
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 4, 2021
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 3, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 3, 2021

@staebler: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 9, 2021

@staebler: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/verify e3bef6c link /test verify
ci/prow/e2e e3bef6c link /test e2e
ci/prow/unit e3bef6c link /test unit
ci/prow/images e3bef6c link /test images

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@dgoodwin
Copy link
Contributor

/close

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 10, 2021

@dgoodwin: Closed this PR.

Details

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot closed this Jun 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants