Skip to content

Conversation

@sadasu
Copy link
Contributor

@sadasu sadasu commented Dec 5, 2022

In the AWS cluster destroy/uninstall path, the Installer tries to find all the IAM Roles in the cluster and then attempts to delete the resources with the tag kubernetes.io/cluster/<cluster-name>. In the case of STS clusters, all IAM Roles are cleared outside the cluster (not by the Installer) and even trying to find them in the cluster results in errors because the Installer does not have the privileges to do that, let alone deleting them.

The goal of the fix is not attempt to delete these IAM Roles that Installer does not have access to.

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Dec 5, 2022
@openshift-ci-robot
Copy link
Contributor

@sadasu: This pull request references Jira Issue OCPBUGS-1769, which is invalid:

  • expected the bug to target the "4.13.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

In the AWS cluster destroy/uninstall path, the Installer tries to find all the IAM Roles in the cluster and then attempts to delete the resources with the tag kubernetes.io/cluster/<cluster-name>. In the case of STS clusters, all IAM Roles are cleared outside the cluster (not by the Installer) and even trying to find them in the cluster results in errors because the Installer does not have the privileges to do that, let alone deleting them.

The goal of the fix is not attempt to delete these IAM Roles created in the STS environment and also not try to find them in the first place. This way we are achieving this is by detecting if it is a STS cluster, identified by CredentialsMode == Manual. And when that is the case, we skip finding and deleting the IAM Roles in that cluster.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sadasu
Copy link
Contributor Author

sadasu commented Dec 5, 2022

/jira refresh

@openshift-ci-robot
Copy link
Contributor

@sadasu: This pull request references Jira Issue OCPBUGS-1769, which is invalid:

  • expected the bug to target the "4.13.0" version, but it targets "4.13" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested review from mtulio and r4f4 December 5, 2022 23:32
@sadasu
Copy link
Contributor Author

sadasu commented Dec 5, 2022

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Dec 5, 2022
@openshift-ci-robot
Copy link
Contributor

@sadasu: This pull request references Jira Issue OCPBUGS-1769, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.13.0) matches configured target version for branch (4.13.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (yuwan@redhat.com), skipping review request.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CredentialsMode can be set to Manual without implying STS use: https://docs.openshift.com/container-platform/4.11/installing/installing_aws/manually-creating-iam.html#manually-creating-iam-aws. Do we also want to skip deleting IAM roles in that case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of skipping the finding of IAM roles entirely, we could also check the error for AccessDenied and ignore it in that case, either here or inside findIAMRoles since it's the GetRoleWithContext call that is failing.

Copy link
Contributor

@patrickdillon patrickdillon Dec 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the installer can still create IAM roles for the control plane nodes when it is in manual mode: https://github.com/openshift/installer/blob/master/data/data/aws/cluster/master/main.tf#L18

Manual mode specifically disables IAM role creation by the cloud credential operator. So if we gate based on manual mode we will be leaking the IAM roles created by the installer.

@openshift-ci-robot
Copy link
Contributor

@sadasu: This pull request references Jira Issue OCPBUGS-1769, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.13.0) matches configured target version for branch (4.13.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (yuwan@redhat.com), skipping review request.

Details

In response to this:

In the AWS cluster destroy/uninstall path, the Installer tries to find all the IAM Roles in the cluster and then attempts to delete the resources with the tag kubernetes.io/cluster/<cluster-name>. In the case of STS clusters, all IAM Roles are cleared outside the cluster (not by the Installer) and even trying to find them in the cluster results in errors because the Installer does not have the privileges to do that, let alone deleting them.

The goal of the fix is not attempt to delete these IAM Roles that Installer does not have access to.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

@sadasu: This pull request references Jira Issue OCPBUGS-1769, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.13.0) matches configured target version for branch (4.13.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (yuwan@redhat.com), skipping review request.

Details

In response to this:

In the AWS cluster destroy/uninstall path, the Installer tries to find all the IAM Roles in the cluster and then attempts to delete the resources with the tag kubernetes.io/cluster/<cluster-name>. In the case of STS clusters, all IAM Roles are cleared outside the cluster (not by the Installer) and even trying to find them in the cluster results in errors because the Installer does not have the privileges to do that, let alone deleting them.

The goal of the fix is not attempt to delete these IAM Roles that Installer does not have access to. The current fix modifies a common method used by both findIAMRoles() and findIAMUsers()

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

@sadasu: This pull request references Jira Issue OCPBUGS-1769, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.13.0) matches configured target version for branch (4.13.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (yuwan@redhat.com), skipping review request.

Details

In response to this:

In the AWS cluster destroy/uninstall path, the Installer tries to find all the IAM Roles in the cluster and then attempts to delete the resources with the tag kubernetes.io/cluster/<cluster-name>. In the case of STS clusters, all IAM Roles are cleared outside the cluster (not by the Installer) and even trying to find them in the cluster results in errors because the Installer does not have the privileges to do that, let alone deleting them.

The goal of the fix is not attempt to delete these IAM Roles that Installer does not have access to.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Comment on lines 44 to 50
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks a bit weird to import a quota package to check if an error is AccessDenied. Either IsUnauthorized should be moved out of the quota package so it can be shared with multiple packages, or we should just have our own isUnauthorized function in this file. Actually, I think this whole block would look better with a switch

Suggested change
if quotaaws.IsUnauthorized(err) {
// Installer does not have access to this IAM role
// Ignore this IAM Role and donot report this error via
// lastError
search.unmatched[*role.Arn] = exists
continue
}
var awsErr awserr.Error
if errors.As(err, &awsErr) {
switch awsErr.Code() {
case "AccessDeniedException":
// Installer does not have access to this IAM role
// Ignore this IAM Role and do not report this error via
// lastError
search.logger.Debugf("AccessDenied to role %s. Expected if this is an STS install", *role.Arn)
fallthrough
case iam.ErrCodeNoSuchEntityException:
search.unmatched[*role.Arn] = exists
continue
default:
}
}
if lastError != nil {
search.logger.Debug(lastError)
}
lastError = errors.Wrapf(err, "get tags for %s", *role.Arn)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have one helper function which identifies unauthorized access for any AWS API's return code and that was the reason behind using the pre-existing method (unfortunately it exists in quota/aws).
Yes, this lends itself to using a switch statement.
I am concerned if logging a statement for the AccessDenied case would make the logs noisy hence stayed away from it. Happy to add it if it provides value to the customer (in non STS cases maybe?)

@sadasu
Copy link
Contributor Author

sadasu commented Dec 23, 2022

/retest-required

1 similar comment
@sadasu
Copy link
Contributor Author

sadasu commented Jan 4, 2023

/retest-required

@patrickdillon
Copy link
Contributor

/approve

Overall approach looks good. I agree that we could simplify the error checking a bit: we can just string match on the unauthorized error without bringing in another package.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 10, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: patrickdillon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 10, 2023
@sadasu sadasu force-pushed the aws-sts-uninstall branch from 6d903af to bf8531a Compare January 10, 2023 17:35
@sadasu sadasu force-pushed the aws-sts-uninstall branch from bf8531a to 54a7f13 Compare January 10, 2023 22:14
Copy link
Contributor

@r4f4 r4f4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one nitpicking comment but not worth blocking the merge on it.
/lgtm

)

const (
ErrCodeAccessDeniedException = "AccessDeniedException"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: probably doesn't need to be exported outside the package

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 10, 2023
@openshift-merge-robot openshift-merge-robot merged commit 14878f7 into openshift:master Jan 11, 2023
@openshift-ci-robot
Copy link
Contributor

@sadasu: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-1769 has been moved to the MODIFIED state.

Details

In response to this:

In the AWS cluster destroy/uninstall path, the Installer tries to find all the IAM Roles in the cluster and then attempts to delete the resources with the tag kubernetes.io/cluster/<cluster-name>. In the case of STS clusters, all IAM Roles are cleared outside the cluster (not by the Installer) and even trying to find them in the cluster results in errors because the Installer does not have the privileges to do that, let alone deleting them.

The goal of the fix is not attempt to delete these IAM Roles that Installer does not have access to.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 11, 2023

@sadasu: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-ibmcloud-ovn d958698c6d7f4b2f9521a22af1443f18b76a5a50 link false /test e2e-ibmcloud-ovn
ci/prow/okd-scos-e2e-aws-upgrade d958698c6d7f4b2f9521a22af1443f18b76a5a50 link false /test okd-scos-e2e-aws-upgrade
ci/prow/e2e-libvirt d958698c6d7f4b2f9521a22af1443f18b76a5a50 link false /test e2e-libvirt
ci/prow/okd-e2e-aws-ovn-upgrade d958698c6d7f4b2f9521a22af1443f18b76a5a50 link false /test okd-e2e-aws-ovn-upgrade
ci/prow/e2e-vsphere-ovn 6d903af68a314db8477631144cebd002653a0493 link true /test e2e-vsphere-ovn
ci/prow/e2e-metal-ipi-ovn-ipv6 6d903af68a314db8477631144cebd002653a0493 link true /test e2e-metal-ipi-ovn-ipv6
ci/prow/e2e-openstack-ovn 6d903af68a314db8477631144cebd002653a0493 link true /test e2e-openstack-ovn
ci/prow/e2e-aws-ovn-disruptive 54a7f13 link false /test e2e-aws-ovn-disruptive
ci/prow/e2e-aws-ovn-proxy 54a7f13 link false /test e2e-aws-ovn-proxy

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@patrickdillon
Copy link
Contributor

/cherry-pick release-4.12

@openshift-cherrypick-robot

@patrickdillon: new pull request created: #6847

Details

In response to this:

/cherry-pick release-4.12

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants