🐛 Handle errors returned by GetInstanceStatusByName in machine controller#2086
Conversation
These errors were ignored up to now, which could lead controller to attempt to recreate an existing instance under rare circumstances.
|
Hi @zioc. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
✅ Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
mdbooth
left a comment
There was a problem hiding this comment.
This is an excellent robustification, thanks!
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mdbooth The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/ok-to-test |
|
/cherry-pick release-0.10 |
|
@mdbooth: once the present PR merges, I will cherry-pick it on top of release-0.10 in a new PR and assign it to you. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/lgtm Awesome fix 🙏 |
|
@mdbooth: new pull request created: #2087 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
| openStackMachine.SetFailure(capierrors.UpdateMachineError, errors.New("virtual machine no longer exists")) | ||
| return nil, nil | ||
| } | ||
| instanceStatus, err = computeService.GetInstanceStatusByName(openStackMachine, openStackMachine.Name) |
There was a problem hiding this comment.
One thing I thought you would fix too is:
I don't like the fact we don't return an error if more than one server with the same name was found. This is error prone and can lead to cluster issues.
There was a problem hiding this comment.
Good point, and we have an existing pattern for it.
What this PR does / why we need it:
Errors returned by GetInstanceStatusByName were ignored up to now, which could lead controller to attempt to recreate a server under specific circumstances (for example if nova API always returns 404 errors - see detailled explanation in related issue)
These errors shouldn't be ignored, otherwise we may consider that server does not exists whereas nova api is misbehaving.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Fixes #2085
TODOs: