Skip to content

Conversation

@eparis
Copy link
Member

@eparis eparis commented Feb 18, 2019

Today we find instance profiles that we need to delete through either the
associated role or the associated instance. We find those objects via aws
tags. If those objects have been deleted we are unable to find the instance
profiles because those are not tagged.

Since we now embed both cluster id inside the name this PR adds the ability to find those object and to delete them even if the roles and instances have been deleted.

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Feb 18, 2019
@eparis
Copy link
Member Author

eparis commented Feb 18, 2019

@wking get ready to throw up when you see what I did to your lovely destroy code :)

@wking
Copy link
Member

wking commented Feb 18, 2019

With cluster IDs in the profile names, new clusters won't be bothered by any previous instance profiles leaked by buggy reapers (the #1174 issue). So I'm in favor of adding the cluster ID here (and maybe removing the cluster name if we are concerned about name-length limits?), but I don't think we need to change the destroy code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the term clusterName instead of clusterID for this elsewhere in the installer code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would be to have a deleteIAMInstanceProfileByName function that is passed the name of the profile rather than making up an ARN with a fake account ID that the caller is trusting won't actually be used.

@eparis eparis force-pushed the instance-profile-with-id branch 2 times, most recently from dad35dd to fc40883 Compare February 19, 2019 16:49
@eparis
Copy link
Member Author

eparis commented Feb 19, 2019

updated to address review comments.

I've also changed my mind, I don't think this is a 'hack'. It is cleaning up what we created. We know we created it, we should clean it up.

@wking
Copy link
Member

wking commented Feb 19, 2019

I've also changed my mind, I don't think this is a 'hack'. It is cleaning up what we created. We know we created it, we should clean it up.

I think once another reaper leans in and deletes our instances and IAM roles, it's assuming responsibility for reaping instance profiles too. I don't think we need to support step 3 in:

  1. User neglects to call destroy cluster
  2. Reaper deletes most resources.
  3. User reassembles metadata.json and calls destroy cluster.

Are users who pass step 1 likely to take step 3 when creating a new cluster with the same name no longer has resource conflicts (because of your appended cluster ID)?

@eparis
Copy link
Member Author

eparis commented Feb 19, 2019

I'm not sure why #3 is 'User reassembles`. All the user needs to do is call destroy_cluster in #3 for this PR to be helpful.

While I agree that if something else starts to clean up a cluster it probably should clean up the entire cluster, I say that if the installer creates something the installer should be able to destroy it. A customer could (less likely admittedly) call destroy cluster and have an AWS API malfunction at just the wrong time leaving them in the same state. Running destroy cluster a second time can, and I say should, continue to finish the cleanup.

@wking
Copy link
Member

wking commented Feb 19, 2019

A customer could (less likely admittedly) call destroy cluster and have an AWS API malfunction at just the wrong time leaving them in the same state.

We currently order removal so roles and instances are not removed before the instance profile. So "network hiccup" is not sufficient, you'd need "AWS API returns no associated instance-profiles when that instance profile actually did exist" to get into trouble here.

@eparis
Copy link
Member Author

eparis commented Feb 20, 2019

I just need to remove the role, instance profile removal will fail. remove the instance, hickup, stuck. No?

@wking
Copy link
Member

wking commented Feb 20, 2019

I just need to remove the role, instance profile removal will fail. remove the instance, hickup, stuck. No?

We only attempt to remove roles after successfully removing associated instance profiles. Same for instances. So how do you get the role and instance removed first, except via a buggy external reaper?

@eparis
Copy link
Member Author

eparis commented Feb 20, 2019

/retest

@eparis eparis changed the title pkg/destroy: data/aws: horrible hack to make instance profiles discoverable pkg/destroy: data/aws: make instance profiles discoverable and delete them even if they are detached Feb 20, 2019
@eparis
Copy link
Member Author

eparis commented Feb 20, 2019

/retest

@eparis
Copy link
Member Author

eparis commented Feb 21, 2019

/retest

@eparis
Copy link
Member Author

eparis commented Feb 21, 2019

/rest

@eparis eparis force-pushed the instance-profile-with-id branch from ec88a6a to d9f8f73 Compare February 23, 2019 01:19
@eparis eparis changed the title pkg/destroy: data/aws: make instance profiles discoverable and delete them even if they are detached pkg/destroy: data/aws: delete instance profiles even if they are detached Feb 25, 2019
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should us metadata.InfraID now.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 10, 2019
@eparis eparis force-pushed the instance-profile-with-id branch from d9f8f73 to 64b7938 Compare March 12, 2019 16:47
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 12, 2019
Today we find instance profiles that we need to delete through either the
associated role or the associated instance. We find those objects via aws
tags. If those objects have been deleted we are unable to find the instance
profiles because those are not tagged.

Since we embed both the cluster id inside the name we can find the instance
profiles we created by name and destroy them that way.
@eparis eparis force-pushed the instance-profile-with-id branch from 64b7938 to 6ae7598 Compare March 12, 2019 16:50
@abhinavdahiya
Copy link
Contributor

ping @wking can you take another look at the PR?

Filters: filters,
Region: metadata.ClusterPlatformMetadata.AWS.Region,
Logger: logger,
ClusterID: metadata.InfraID,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cause a trivial conflict with #1365, which is a higher-priority bugfix.

}
return err
}
logger.WithField("name", name).Info("Deleted")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd rather have the caller set this context, like we do here, etc. The caller can chose which of the information available to it should be logged. Functions can add additional logging context as new information become available, but they shouldn't write their arguments directly into the logging context.

@wking
Copy link
Member

wking commented Mar 12, 2019

I left two minor nits inline. Otherwise, this looks fine to me, and I'm ok with it landing (with or without the nits getting addressed), although I still don't feel like we need it ;).

@abhinavdahiya
Copy link
Contributor

I left two minor nits inline. Otherwise, this looks fine to me, and I'm ok with it landing (with or without the nits getting addressed), although I still don't feel like we need it ;).

the nits don't look like blockers.
and i think the destroyer should try it best to cleanup the resources from AWS for the cluster, so I think this is useful change.

/lgtm

PS: It's sad that the instance profiles cannot be tagged :( .

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 12, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, eparis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 12, 2019
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 507b62e into openshift:master Mar 13, 2019
@eparis eparis deleted the instance-profile-with-id branch March 15, 2019 01:38
wking added a commit to wking/openshift-installer that referenced this pull request Oct 6, 2019
This reverts commit 6ae7598, openshift#1268.

That was a workaround to recover from openshift-dev clusters where an
in-house pruner is removing instances but not their associated
instance profiles.  Folks using the installer's destroy code won't
need it, and while the risk of accidental name collision is low, I
don't think it's worth taking that risk.  With this commit, folks
using external reapers are responsible for ensuring that they reap
instance profiles when they reap instances, and we get deletion logic
that is easier to explain to folks mixing multiple clusters in the
same account.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants