Skip to content

Conversation

@mdbooth
Copy link
Contributor

@mdbooth mdbooth commented Feb 24, 2022

What this PR does / why we need it:

I have moved the following to #1191

The primary purpose of this this PR is to cleanup the interface of compute.CreateInstance and make the separation of concerns between the machine controller (for Machines), the cluster controller (for the Bastion), and compute (for actual server creation) better defined.

The Bastion host is represented in the cluster spec as an Instance. An OpenStackMachine is represented by context in OpenStackMachine as well as OpenStackCluster. So while they both create a server in the same way, they both source the server's parameters in slightly different ways using different source objects.

At some point in history we also used Instance as the intermediate representation for an OpenStackMachine. That is, we combined parameters from an OpenStackMachine and an OpenStackCluster into an Instance, then passed that to CreateInstance. Instance is not ideal for this purpose as it is both Spec and Status. It contains fields which cannot be used as input parameters to CreateInstance. Therefore we refactored this into the internal-only InstanceSpec, which contains only spec fields.

This refactor takes this further. Firstly we ensure that the code previously contained directly in CreateBastion and CreateInstance is strictly data transformation and therefore very cheap. Anything expensive moves into the new ReconcileInstance. Secondly we move this code to the controller where it is relevant, and add new unit tests covering the transformation. This also reduces the scope of the former CreateInstance unit tests, making them simpler.

The remainder in this PR which will need to be reworked is:

In refactoring bastion creation, we also take the opportunity to make all bastion errors non-fatal to the cluster. We also ensure we update the cluster status to Ready before creating the Bastion, which means that control plane creation can run in parallel with bastion creation.

Special notes for your reviewer:
I don't intend to squash these commits. They are intended to be independent logical steps. You may find it easier to review this PR by commit rather than as a single change.

/hold

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 24, 2022
@netlify
Copy link

netlify bot commented Feb 24, 2022

Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

Name Link
🔨 Latest commit 34cbf11
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-cluster-api-openstack/deploys/624450c36c71500009be6186
😎 Deploy Preview https://deploy-preview-1153--kubernetes-sigs-cluster-api-openstack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Feb 24, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mdbooth

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 24, 2022
Copy link
Contributor

@seanschneeweiss seanschneeweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very, very promising. Great work. Probably it would be good for me to review this a second time, just to understand the changes even better ;)

@mdbooth
Copy link
Contributor Author

mdbooth commented Mar 3, 2022

@seanschneeweiss Thanks for an excellent review! It's going to take me a while to go through it.

return reconcile.Result{}, err
}
if !deleted {
return reconcile.Result{RequeueAfter: 10 * time.Second}, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't it make sense here to use a exponential backoff (return reconcile.Result{}, nil) to not "DDoS" the API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated everything to use the controller's rate limiter. I've also customised the rate limiter to have a higher base delay. Without writing my own exponential backoff rate limiter (which I wasn't keen on doing) we're stuck with an exponent of 2, so I've set the base delay at 2 seconds. We will back off at intervals of 2s, 4s, 8s... up to a max of 1000s.

if !openStackCluster.Status.Ready {
openStackCluster.Status.Ready = true

// If we're setting Ready, return early to update status and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great move :-)

if instanceStatus == nil {
handleUpdateMachineError(logger, openStackMachine, errors.New("OpenStack instance cannot be found"))
return ctrl.Result{}, nil
return ctrl.Result{RequeueAfter: 10 * time.Second}, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, for exp. Backoff:

Suggested change
return ctrl.Result{RequeueAfter: 10 * time.Second}, nil
return ctrl.Result{}, nil

@chrischdi
Copy link
Member

Great improvements :-)

@mdbooth
Copy link
Contributor Author

mdbooth commented Mar 3, 2022

@chrischdi On exponential backoff, yes absolutely! I'd added that as a placeholder for something cleverer and didn't change it. Must not commit this series without addressing that, so adding a

/hold

In the meantime if controller-runtime does exp backoff it's probably good enough to just use that.

@mdbooth
Copy link
Contributor Author

mdbooth commented Mar 3, 2022

The default backoff behaviour of controller-runtime is set here:

https://github.com/kubernetes-sigs/controller-runtime/blob/eb292e5d9bd6c59663fc6777ae2b99f901f386e4/pkg/controller/controller.go#L120-L122

and defined here:

https://github.com/kubernetes/client-go/blob/eb103e0abf6218751302bf984d01ac45aae2ac6b/util/workqueue/default_rate_limiters.go#L39-L45

To summarise, each individual object has a 5ms backoff growing with an exponent of 2 up to a maximum of 1000 seconds. There is also a global limit of 10 reconciles per second. So retries will happen with a backoff of:

0.005s
0.025s
0.125s
0.625s
3.125s
15.625s
78.125s
390.625s
1000s
...

I don't think this is great for us in practise, as most backoffs will be resource creation waits. We would likely almost always do the first 6 retries before success. We should probably customise it. e.g. a 10s backoff with an exponent of 1.5 up to 1 hour would look like:

10s
15s
22.5s
33.75s
50.625s
75.9375s
113.90625s
170.859375s
256.2890625s
384.4335938s
576.6503906s
864.9755859s
1297.463379s
1946.195068s
2919.292603s
3600s

I'd also be inclined to retain the global 10 reconciles/s limit.

However, for now I'm happy to use the default. We can revisit this.

@mdbooth mdbooth force-pushed the bastion branch 2 times, most recently from a5a8964 to ee6ed57 Compare March 11, 2022 10:37
Copy link
Contributor

@seanschneeweiss seanschneeweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you feel about setting openStackMachine.Status.Ready = false after the following positions:


@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 12, 2022
main.go Outdated
}

func setupReconcilers(ctx context.Context, mgr ctrl.Manager) {
// Based on workqueue.DefaultControllerRateLimiter with a higher baseDelay
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe to discuss this again here:

We not only have failures due to OpenStack API Calls.
I don't know if we really want to customize that.

When thinking about the "Happy Path" for CAPO I agree with you, it does not make sense.

But if we consider the existence of other controllers acting on the same resources as CAPO, the default is reasonable for doing retries to update the CR objects.

E.g. the reconcile fails at the end (when CAPO wants to apply its patch) because someone in the meantime updated the CR and by that increased the metadata.resourceVersion (maybe even only adding a label), this would result in using the increased backoff instead of using the default.

Also in the use-case of the PR / the discussion:

	if !deleted {
		return reconcile.Result{Requeue: true}, nil
	}

Would this even make use the rate-limiting or result in an immediate retry (making the custom rate-limited obsolete for this use-case) because it is not returning an error?

Maybe better suited to post that on the PR 😄

I did not find that any other cloud provider does use something other than the default for their controller: https://cs.k8s.io/?q=RateLimiter&i=nope&files=&excludeFiles=vendor%2F.*&repos=kubernetes-sigs/cluster-api,kubernetes-sigs/cluster-api-operator,kubernetes-sigs/cluster-api-provider-aws,kubernetes-sigs/cluster-api-provider-azure,kubernetes-sigs/cluster-api-provider-digitalocean,kubernetes-sigs/cluster-api-provider-gcp,kubernetes-sigs/cluster-api-provider-ibmcloud,kubernetes-sigs/cluster-api-provider-kubemark,kubernetes-sigs/cluster-api-provider-kubevirt,kubernetes-sigs/cluster-api-provider-nested,kubernetes-sigs/cluster-api-provider-openstack,kubernetes-sigs/cluster-api-provider-packet,kubernetes-sigs/cluster-api-provider-vsphere

Note: I'm ok with the change (50:50), just wanted to raise some awareness about that :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of things.

Firstly I agree that we should consider this carefully. I need to rebase this PR anyway, and when I do I'm going to remove the rate limiting. I'll propose it again as a separate PR and we can examine it in more detail there. I think this will work acceptably well without modifying the rate limiter, although with a bit more load on the OpenStack API than is necessary.

Secondly, the controller-runtime documentation on this feature sucks, so I resorted to RTFS 🙄 AFAICT the rate limiter is used in exactly 2 use cases:

  • Reconcile returns error
  • Reconcile returns Requeue: true

Specifically, AFAICT watches are not added to the queue with the rate limiting interface, which means they will be enqueued immediately. That is, if a controller touches a machine object, the machine controller will reconcile that machine object immediately without rate limiting. We could actually verify this manually by setting it to something ridiculously large and checking that Reconcile is called immediately, but lets do that on the new PR.

Also, if Reconcile returns RequeueAfter, that also doesn't use the rate limiter: it uses the value passed in.

So the rate limiter is only ever used when invoked by code in our controller returning one of the 2 values which uses it.

@MPV
Copy link

MPV commented Mar 14, 2022

Will this change mean that a cluster that (for whatever reason) can't reconcile, can still reconcile its bastion?
Imagine in a case where you have a cluster created without a bastion, then the cluster fails, and you want CAPO to add a bastion so you can reach the cluster for troubleshooting. Curious about your thoughts on this.

@mdbooth
Copy link
Contributor Author

mdbooth commented Mar 14, 2022

How would you feel about setting openStackMachine.Status.Ready = false after the following positions:

I thought Ready was a one-way gate, i.e. we're not allowed to unset it once set? If that's not correct, though, I'm very interested in that.

I actually have a draft doc which I need to clean up and push somewhere with my thoughts on the failed state. I do think we need a status marker for non-terminal failure. These would be examples of non-terminal failure. If we could use Ready that way it would be very interesting, but without knowing for sure I suspect it would break assumptions. I think this is a CAPI discussion.

@mdbooth
Copy link
Contributor Author

mdbooth commented Mar 14, 2022

Will this change mean that a cluster that (for whatever reason) can't reconcile, can still reconcile its bastion? Imagine in a case where you have a cluster created without a bastion, then the cluster fails, and you want CAPO to add a bastion so you can reach the cluster for troubleshooting. Curious about your thoughts on this.

No. It won't start reconciling the bastion until the cluster is up. If the cluster never comes up it will never create the bastion.

There's no fundamental reason we couldn't do this, though, but it would take a fair amount of refactoring. This is essentially the same direction I'm trying to take the reconciliation of instances, i.e. independent resources can be reconciled concurrently.

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 14, 2022
@MPV
Copy link

MPV commented Mar 16, 2022

Will this change mean that a cluster that (for whatever reason) can't reconcile, can still reconcile its bastion? Imagine in a case where you have a cluster created without a bastion, then the cluster fails, and you want CAPO to add a bastion so you can reach the cluster for troubleshooting. Curious about your thoughts on this.

No. It won't start reconciling the bastion until the cluster is up. If the cluster never comes up it will never create the bastion.

I'm referring to another scenario though, where a cluster has already been created successfully (but configured without a bastion). Then one might decide to add a bastion (for easier troubleshooting). But if, at that point, for whatever reason, the cluster can't reconcile, would the bastion still reconcile successfully on its own (so its added and we can troubleshoot the cluster)?

There's no fundamental reason we couldn't do this, though, but it would take a fair amount of refactoring. This is essentially the same direction I'm trying to take the reconciliation of instances, i.e. independent resources can be reconciled concurrently.

Sounds promising. 🙏

@mdbooth
Copy link
Contributor Author

mdbooth commented Mar 16, 2022

Will this change mean that a cluster that (for whatever reason) can't reconcile, can still reconcile its bastion? Imagine in a case where you have a cluster created without a bastion, then the cluster fails, and you want CAPO to add a bastion so you can reach the cluster for troubleshooting. Curious about your thoughts on this.

No. It won't start reconciling the bastion until the cluster is up. If the cluster never comes up it will never create the bastion.

I'm referring to another scenario though, where a cluster has already been created successfully (but configured without a bastion). Then one might decide to add a bastion (for easier troubleshooting). But if, at that point, for whatever reason, the cluster can't reconcile, would the bastion still reconcile successfully on its own (so its added and we can troubleshoot the cluster)?

No, this isn't going to help in that case. That said, the bastion requires most of the cluster infrastructure to be up already, so I'm not sure how early we could usefully move it in the cluster reconciliation process anyway.

What's the use case, btw? I'd have thought that the bastion was most useful for debugging machines rather than the cluster? If we're failing to create networks/routers/security groups/load balancers that's surely all going to be debugged via object status/events/OpenStack API. Does the bastion help with any of that?

@dulek
Copy link
Contributor

dulek commented Mar 24, 2022

I tested this on my env and it seems to work just fine.

openStackCluster.Status.BastionSecurityGroup = nil

return nil
return true, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like deleted is as simple as if err == nil? I guess you just want to be explicit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently yes, but for the same reason as updating reconcile instance I want to be able to return an incomplete result. i.e. There was no error, but delete is not finished yet.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 28, 2022
mdbooth added 5 commits March 30, 2022 13:44
Common network and security group handling between CreateBastion() and
CreateInstance().

A principal advantage of this refactor is that it makes the marshalling
of OpenStackMachineSpec and Instance respectively into an InstanceSpec a
cheap operation which makes no API calls.
Allow CreateBastion and CreateInstance to be called as reconcilers.
Specifically they become idempotent and can additionally return a 'not
yet complete' status by returning a nil InstanceStatus. This new status
is handled in the controller by rescheduling reconciliation.

To reflect this change we rename the methods to ReconcileBastion and
ReconcileInstance.

In making this change we also make some opportunistic changes to
reconciliation of the Bastion:

* Bastion reconciliation errors no longer put the cluster in a failed
  state
* We mark the cluster Ready before creating the Bastion
* We handle the case where the Bastion floating IP is already associated

Apart from the Bastion changes, this is almost entirely code motion as
can be seen in the unit tests. While we permit the Reconcile methods to
return an incomplete state, nothing yet returns it. The only change in
the unit tests is due to moving the GetInstanceStatusByName check which
is common to the bastion and machines into reconcileInstance.
Refactor instance creation in machine controller and cluster
controller (for the bastion) to call compute.ReconcileInstance() with an
InstanceSpec.
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 30, 2022
@mdbooth mdbooth marked this pull request as draft April 1, 2022 05:35
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 1, 2022
@mdbooth mdbooth changed the title ✨ Refactor CreateInstance and CreateBastion ✨WIP: Refactor CreateInstance and CreateBastion Apr 1, 2022
@k8s-ci-robot
Copy link
Contributor

@mdbooth: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 2, 2022
@seanschneeweiss
Copy link
Contributor

I thought Ready was a one-way gate, i.e. we're not allowed to unset it once set? If that's not correct, though, I'm very interested in that.

I actually have a draft doc which I need to clean up and push somewhere with my thoughts on the failed state. I do think we need a status marker for non-terminal failure. These would be examples of non-terminal failure. If we could use Ready that way it would be very interesting, but without knowing for sure I suspect it would break assumptions. I think this is a CAPI discussion.

@mdbooth I had a look at CAPZ, CAPA, and CAPIBM to do so too.
CAPG doesn't change the ready status.

From the CRD and cluster-api docs I don't really identify whether to see this as a one-way gate or not. Definitely a question for CAPI. Personally I'd like to use it for non-terminal failure.

@mdbooth
Copy link
Contributor Author

mdbooth commented Jun 21, 2022

I'll create a new PR when I'm working on this again.

@mdbooth mdbooth closed this Jun 21, 2022
@mdbooth mdbooth deleted the bastion branch December 13, 2022 10:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants