-
Notifications
You must be signed in to change notification settings - Fork 253
provide a means to abandon deprovisioning #1017
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: staebler The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/hold This is using a fork of the installer until openshift/installer#3765 merges. |
|
This is an example of the status of a ClusterDeprovision when there is an untagged instance connected to the VPC. |
5461264 to
503f38c
Compare
joelddiaz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
| return true, err | ||
| } | ||
| if o.clusterDeprovision == "" { | ||
| return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not also return err here even if we will not be posting status to a ClusterDeprovision object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are any blocked resources, then we want to keep trying to uninstall in the backoff loop. If we return err here, then the uninstall will stop and the pod will fail.
However, maybe it would be good to distinguish between an error because the context expired and an error for other reasons. In the former case, we want to keep trying. In the latter case, we want to abort. I'll look into that.
|
Now blocked on openshift/installer#3772. |
pkg/clusterresource/vsphere.go
Outdated
| IngressVIP string | ||
|
|
||
| // DNSVIP is the virtual IP address for DNS | ||
| DNSVIP string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little strange, is this meant to be in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The DNSVIP was removed from the installer types in the version of the installer that I am vendoring.
| Completed bool `json:"completed,omitempty"` | ||
|
|
||
| // BlockedResources is a list of cloud resources that the deprovision has not been able to delete | ||
| BlockedResources []string `json:"blockedResources,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are things looking for the other cloud providers, will a flat list of strings be sufficient as far as we can see? Wondering if we should go with something that lets us store a little more data or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. I can change BlockedResources to a struct that for now has nothing but a name field.
|
Installer PR merged but we've got major conflicts in here now. |
Only one of the two installer PRs has merged. The openshift/installer#3772 PR still has not. |
503f38c to
abc3652
Compare
When there are AWS resources that cannot be destroyed, the list of blocked ARNs is added to the ClusterDeprovision status in the `blockedResources` field. The user can then take action on those blocked resources. The AWS destroyer will back off destroy attempts. Each attempt is limited to 5 minutes. The backoff starts at 5 minutes, doubles after each failed attempt, and caps at 24 hours. If the user wants to abandon a deprovision, the user can add the "hive.openshift.io/abandon-deprovision" annotation to the ClusterDeployment. When this annotation is present with a true value, the clusterdeployment controller will remove the deprovison finalizer from the ClusterDeployment without waiting for the ClusterDeprovision to complete. https://issues.redhat.com/browse/CO-943
abc3652 to
e3bef6c
Compare
|
@staebler: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
| Steps: 1 << 8, // large enough to make cap the effective bound | ||
| Cap: 24 * time.Hour, | ||
| }, | ||
| func() (done bool, returnErr error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: named returns are not great as they allow empty returrn in the function definition which makes it difficult to read what is being returned and when was it set..
| namespace, _, err := kubeconfig.Namespace() | ||
| if err != nil { | ||
| o.logger.WithError(err).Error("could not get the namespace") | ||
| return | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this use the context setup using the kubeconfig to get the namespace? How many users would use something like that? I personally haven't So should we also allow for setting namespace directly?
| // Stop waiting for deprovision if the abandon-deprovision annotation is true | ||
| if value, ok := cd.Annotations[constants.AbandonDeprovisionAnnotation]; ok { | ||
| logger := cdLog.WithField(constants.AbandonDeprovisionAnnotation, value) | ||
| if abandon, err := strconv.ParseBool(value); abandon && err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if abandon, err := strconv.ParseBool(value); abandon && err == nil { | |
| if abandon, err := strconv.ParseBool(value); err == nil && abandon { |
wouldn't that be more appropriate?
| logger.Warn("adandoning deprovision") | ||
| err = r.removeClusterDeploymentFinalizer(cd, cdLog) | ||
| if err != nil { | ||
| cdLog.WithError(err).Log(controllerutils.LogLevel(err), "error removing finalizer") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will return deprovisioned = true even when there was an error to remove the finalizer.. shouldn't we return false here?
|
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
|
/remove-lifecycle stale |
|
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
|
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
|
@staebler: PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@staebler: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/close |
|
@dgoodwin: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
When there are AWS resources that cannot be destroyed, the list of blocked ARNs is added to the ClusterDeprovision status in the
blockedResourcesfield. The user can then take action on those blocked resources.The AWS destroyer will back off destroy attempts. Each attempt is limited to 5 minutes. The backoff starts at 5 minutes, doubles after each failed attempt, and caps at 24 hours.
If the user wants to abandon a deprovision, the user can add the "hive.openshift.io/abandon-deprovision" annotation to the ClusterDeployment. When this annotation is present with a true value, the clusterdeployment controller will remove the deprovison finalizer from the ClusterDeployment without waiting for the ClusterDeprovision to complete.
https://issues.redhat.com/browse/CO-943