-
Notifications
You must be signed in to change notification settings - Fork 287
✨WIP: Refactor CreateInstance and CreateBastion #1153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mdbooth The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very, very promising. Great work. Probably it would be good for me to review this a second time, just to understand the changes even better ;)
|
@seanschneeweiss Thanks for an excellent review! It's going to take me a while to go through it. |
| return reconcile.Result{}, err | ||
| } | ||
| if !deleted { | ||
| return reconcile.Result{RequeueAfter: 10 * time.Second}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't it make sense here to use a exponential backoff (return reconcile.Result{}, nil) to not "DDoS" the API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated everything to use the controller's rate limiter. I've also customised the rate limiter to have a higher base delay. Without writing my own exponential backoff rate limiter (which I wasn't keen on doing) we're stuck with an exponent of 2, so I've set the base delay at 2 seconds. We will back off at intervals of 2s, 4s, 8s... up to a max of 1000s.
| if !openStackCluster.Status.Ready { | ||
| openStackCluster.Status.Ready = true | ||
|
|
||
| // If we're setting Ready, return early to update status and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great move :-)
| if instanceStatus == nil { | ||
| handleUpdateMachineError(logger, openStackMachine, errors.New("OpenStack instance cannot be found")) | ||
| return ctrl.Result{}, nil | ||
| return ctrl.Result{RequeueAfter: 10 * time.Second}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, for exp. Backoff:
| return ctrl.Result{RequeueAfter: 10 * time.Second}, nil | |
| return ctrl.Result{}, nil |
|
Great improvements :-) |
|
@chrischdi On exponential backoff, yes absolutely! I'd added that as a placeholder for something cleverer and didn't change it. Must not commit this series without addressing that, so adding a /hold In the meantime if controller-runtime does exp backoff it's probably good enough to just use that. |
|
The default backoff behaviour of controller-runtime is set here: and defined here: To summarise, each individual object has a 5ms backoff growing with an exponent of 2 up to a maximum of 1000 seconds. There is also a global limit of 10 reconciles per second. So retries will happen with a backoff of: I don't think this is great for us in practise, as most backoffs will be resource creation waits. We would likely almost always do the first 6 retries before success. We should probably customise it. e.g. a 10s backoff with an exponent of 1.5 up to 1 hour would look like: I'd also be inclined to retain the global 10 reconciles/s limit. However, for now I'm happy to use the default. We can revisit this. |
a5a8964 to
ee6ed57
Compare
seanschneeweiss
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you feel about setting openStackMachine.Status.Ready = false after the following positions:
| case infrav1.InstanceStateError: |
| default: |
main.go
Outdated
| } | ||
|
|
||
| func setupReconcilers(ctx context.Context, mgr ctrl.Manager) { | ||
| // Based on workqueue.DefaultControllerRateLimiter with a higher baseDelay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe to discuss this again here:
We not only have failures due to OpenStack API Calls.
I don't know if we really want to customize that.
When thinking about the "Happy Path" for CAPO I agree with you, it does not make sense.
But if we consider the existence of other controllers acting on the same resources as CAPO, the default is reasonable for doing retries to update the CR objects.
E.g. the reconcile fails at the end (when CAPO wants to apply its patch) because someone in the meantime updated the CR and by that increased the metadata.resourceVersion (maybe even only adding a label), this would result in using the increased backoff instead of using the default.
Also in the use-case of the PR / the discussion:
if !deleted {
return reconcile.Result{Requeue: true}, nil
}
Would this even make use the rate-limiting or result in an immediate retry (making the custom rate-limited obsolete for this use-case) because it is not returning an error?
Maybe better suited to post that on the PR 😄
I did not find that any other cloud provider does use something other than the default for their controller: https://cs.k8s.io/?q=RateLimiter&i=nope&files=&excludeFiles=vendor%2F.*&repos=kubernetes-sigs/cluster-api,kubernetes-sigs/cluster-api-operator,kubernetes-sigs/cluster-api-provider-aws,kubernetes-sigs/cluster-api-provider-azure,kubernetes-sigs/cluster-api-provider-digitalocean,kubernetes-sigs/cluster-api-provider-gcp,kubernetes-sigs/cluster-api-provider-ibmcloud,kubernetes-sigs/cluster-api-provider-kubemark,kubernetes-sigs/cluster-api-provider-kubevirt,kubernetes-sigs/cluster-api-provider-nested,kubernetes-sigs/cluster-api-provider-openstack,kubernetes-sigs/cluster-api-provider-packet,kubernetes-sigs/cluster-api-provider-vsphere
Note: I'm ok with the change (50:50), just wanted to raise some awareness about that :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of things.
Firstly I agree that we should consider this carefully. I need to rebase this PR anyway, and when I do I'm going to remove the rate limiting. I'll propose it again as a separate PR and we can examine it in more detail there. I think this will work acceptably well without modifying the rate limiter, although with a bit more load on the OpenStack API than is necessary.
Secondly, the controller-runtime documentation on this feature sucks, so I resorted to RTFS 🙄 AFAICT the rate limiter is used in exactly 2 use cases:
- Reconcile returns error
- Reconcile returns Requeue: true
Specifically, AFAICT watches are not added to the queue with the rate limiting interface, which means they will be enqueued immediately. That is, if a controller touches a machine object, the machine controller will reconcile that machine object immediately without rate limiting. We could actually verify this manually by setting it to something ridiculously large and checking that Reconcile is called immediately, but lets do that on the new PR.
Also, if Reconcile returns RequeueAfter, that also doesn't use the rate limiter: it uses the value passed in.
So the rate limiter is only ever used when invoked by code in our controller returning one of the 2 values which uses it.
|
Will this change mean that a cluster that (for whatever reason) can't reconcile, can still reconcile its bastion? |
I thought Ready was a one-way gate, i.e. we're not allowed to unset it once set? If that's not correct, though, I'm very interested in that. I actually have a draft doc which I need to clean up and push somewhere with my thoughts on the failed state. I do think we need a status marker for non-terminal failure. These would be examples of non-terminal failure. If we could use Ready that way it would be very interesting, but without knowing for sure I suspect it would break assumptions. I think this is a CAPI discussion. |
No. It won't start reconciling the bastion until the cluster is up. If the cluster never comes up it will never create the bastion. There's no fundamental reason we couldn't do this, though, but it would take a fair amount of refactoring. This is essentially the same direction I'm trying to take the reconciliation of instances, i.e. independent resources can be reconciled concurrently. |
I'm referring to another scenario though, where a cluster has already been created successfully (but configured without a bastion). Then one might decide to add a bastion (for easier troubleshooting). But if, at that point, for whatever reason, the cluster can't reconcile, would the bastion still reconcile successfully on its own (so its added and we can troubleshoot the cluster)?
Sounds promising. 🙏 |
No, this isn't going to help in that case. That said, the bastion requires most of the cluster infrastructure to be up already, so I'm not sure how early we could usefully move it in the cluster reconciliation process anyway. What's the use case, btw? I'd have thought that the bastion was most useful for debugging machines rather than the cluster? If we're failing to create networks/routers/security groups/load balancers that's surely all going to be debugged via object status/events/OpenStack API. Does the bastion help with any of that? |
|
I tested this on my env and it seems to work just fine. |
| openStackCluster.Status.BastionSecurityGroup = nil | ||
|
|
||
| return nil | ||
| return true, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like deleted is as simple as if err == nil? I guess you just want to be explicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently yes, but for the same reason as updating reconcile instance I want to be able to return an incomplete result. i.e. There was no error, but delete is not finished yet.
Common network and security group handling between CreateBastion() and CreateInstance(). A principal advantage of this refactor is that it makes the marshalling of OpenStackMachineSpec and Instance respectively into an InstanceSpec a cheap operation which makes no API calls.
Allow CreateBastion and CreateInstance to be called as reconcilers. Specifically they become idempotent and can additionally return a 'not yet complete' status by returning a nil InstanceStatus. This new status is handled in the controller by rescheduling reconciliation. To reflect this change we rename the methods to ReconcileBastion and ReconcileInstance. In making this change we also make some opportunistic changes to reconciliation of the Bastion: * Bastion reconciliation errors no longer put the cluster in a failed state * We mark the cluster Ready before creating the Bastion * We handle the case where the Bastion floating IP is already associated Apart from the Bastion changes, this is almost entirely code motion as can be seen in the unit tests. While we permit the Reconcile methods to return an incomplete state, nothing yet returns it. The only change in the unit tests is due to moving the GetInstanceStatusByName check which is common to the bastion and machines into reconcileInstance.
Refactor instance creation in machine controller and cluster controller (for the bastion) to call compute.ReconcileInstance() with an InstanceSpec.
|
@mdbooth: PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@mdbooth I had a look at CAPZ, CAPA, and CAPIBM to do so too. From the CRD and cluster-api docs I don't really identify whether to see this as a one-way gate or not. Definitely a question for CAPI. Personally I'd like to use it for non-terminal failure. |
|
I'll create a new PR when I'm working on this again. |
What this PR does / why we need it:
I have moved the following to #1191
The primary purpose of this this PR is to cleanup the interface of
compute.CreateInstanceand make the separation of concerns between the machine controller (for Machines), the cluster controller (for the Bastion), and compute (for actual server creation) better defined.The Bastion host is represented in the cluster spec as an
Instance. An OpenStackMachine is represented by context inOpenStackMachineas well asOpenStackCluster. So while they both create a server in the same way, they both source the server's parameters in slightly different ways using different source objects.At some point in history we also used
Instanceas the intermediate representation for an OpenStackMachine. That is, we combined parameters from anOpenStackMachineand anOpenStackClusterinto anInstance, then passed that to CreateInstance.Instanceis not ideal for this purpose as it is both Spec and Status. It contains fields which cannot be used as input parameters to CreateInstance. Therefore we refactored this into the internal-onlyInstanceSpec, which contains only spec fields.This refactor takes this further. Firstly we ensure that the code previously contained directly in CreateBastion and CreateInstance is strictly data transformation and therefore very cheap. Anything expensive moves into the new ReconcileInstance. Secondly we move this code to the controller where it is relevant, and add new unit tests covering the transformation. This also reduces the scope of the former CreateInstance unit tests, making them simpler.
The remainder in this PR which will need to be reworked is:
In refactoring bastion creation, we also take the opportunity to make all bastion errors non-fatal to the cluster. We also ensure we update the cluster status to Ready before creating the Bastion, which means that control plane creation can run in parallel with bastion creation.
Special notes for your reviewer:
I don't intend to squash these commits. They are intended to be independent logical steps. You may find it easier to review this PR by commit rather than as a single change.
/hold