-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Bug 1817201: Fix intermittent deprovision loop on NoSuchHostedZone error #3359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1817201: Fix intermittent deprovision loop on NoSuchHostedZone error #3359
Conversation
|
@dgoodwin: This pull request references Bugzilla bug 1817201, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
This is showing up quite frequently in Hive deployments so we would like to get a fix out. I cannot reproduce the problem so this is only an attempt to fix the issue. We pull from master so no backports are needed on our behalf. |
|
/cc @staebler |
staebler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
pkg/destroy/aws/aws.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return nil? I would think that if there is no sharedZone then the resources that are deleted after this would also not exist, but here:
we proceed even if no sharedZone exists...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The not-found error here is for the case where the private zone does not exist. If the private zone exists, but there is no public zone, then getSharedHostedZone will return an empty string and a nil error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying.
As mentioned, this seems like an AWS problem so it is beyond our control to fix.
/lgtm
/approve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am really unsure, sharedZoneID is used again on line 1761 so that's going to probably fail, line 1770 maybe? I don't know this code or route53 very well, it feels like it's game over if we can't get the shared zone ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops was looking at an old load of the tab. :)
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: patrickdillon, staebler The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
6 similar comments
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
pkg/destroy/aws/aws.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you drop this warning, this will be repeated again and again on user stdout.
none of the other notfound handling code adds logging,.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold
…ror. It appears in some situations, AWS may return a route53 hosted zone when querying for tagged resources, despite the fact that the zone no longer exists. Update deprovision code to catch this error and skip route53 cleanup when present.
5df5607 to
81bf83d
Compare
|
Updated. |
|
I reproduced the problem this morning and was able to test this fix successfully through Hive. @abhinavdahiya can we get this hold lifted? Interestingly it appears to be related to an initial failure to install. Sample log from the first failure ended with: |
|
/hold cancel /lgtm |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
5 similar comments
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
@dgoodwin: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
@dgoodwin: All pull requests linked via external trackers have merged: openshift/installer#3359. Bugzilla bug 1817201 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
…ror.
It appears in some situations, AWS may return a route53 hosted zone when
querying for tagged resources, despite the fact that the zone no longer
exists.
Update deprovision code to catch this error and skip route53 cleanup
when present.