Enable coalescing reconciler for more controllers by CecileRobertMichon · Pull Request #1691 · kubernetes-sigs/cluster-api-provider-azure

CecileRobertMichon · 2021-09-16T21:09:50Z

Enable coalescing reconciler for AzureCluster, AzureMachine, AzureManagedControlPlane, AzureManagedCluster, and AzureManagedMachinePool

What type of PR is this?
/kind feature

What this PR does / why we need it: #1332 (devigned@b6b38b0) added a coalescing reconciler to debounce reconciles (in other words, make sure we don't run too many successful reconcile loops in short amounts of time). At the time, it was only enabled for AzureMachinePool and AzureMachinePoolMachine controllers. This PR enables it for more controllers, specifically all the ones that reconcile Azure resources (AzureCluster, AzureMachine, AzureManagedControlPlane, AzureManagedCluster, and AzureManagedMachinePool), in preparation for #1541.

Also fixes some duplicate code in main.go.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1688

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

squashed commits
includes documentation
adds unit tests

Release note:

Enable coalescing reconciler for more controllers

CecileRobertMichon · 2021-09-16T21:12:49Z

main.go

 	registerControllers(ctx, mgr)
-	// +kubebuilder:scaffold:builder

-	if err := mgr.AddReadyzCheck("webhook", mgr.GetWebhookServer().StartedChecker()); err != nil {


@reviewers, please pay close attention here. I believe we were duplicating some code so I removed it, but let me know if there is a good reason for this being called twice (here and line 520)

same thing with mgr.Start (line 304 and line 514)

Nope. There is no reason to call this twice.

CecileRobertMichon · 2021-09-16T21:13:05Z

/assign @devigned

CecileRobertMichon · 2021-09-16T21:15:33Z

main.go

 		os.Exit(1)
 	}

+	clusterCache, err := coalescing.NewRequestCache(5 * time.Second)


went with 5 seconds since I don't want the reconciles to be too slow but let me know if you think that's too aggressive

devigned

Looks good. Just a couple comments.

devigned · 2021-09-16T21:59:23Z

main.go

 	registerControllers(ctx, mgr)
-	// +kubebuilder:scaffold:builder

-	if err := mgr.AddReadyzCheck("webhook", mgr.GetWebhookServer().StartedChecker()); err != nil {


Nope. There is no reason to call this twice.

devigned · 2021-09-16T22:00:36Z

main.go

 }

 func registerControllers(ctx context.Context, mgr manager.Manager) {
+	machineCache, err := coalescing.NewRequestCache(5 * time.Second)


Did you consider making this a configurable as a cmdline arg?

I didn't, that would be a good one to be able to make configurable. Do you have any thoughts on whether it should be configurable per controller or just a single value?

I had made it configurable for both AMP and AMPM. If it's not overkill for cmdline args, I'd vote for per controller.

AMP and AMPM are both hardcoded right now, I don't think they're configurable.

Per controller makes sense, but it might be overwhelming to the user to be able to configure all of them (or to have to configure each one separately to change all the values). What do you think about one common flag for now, and potentially make it more granular later if the use case arises?

You are right. I lied. I think I was thinking about doing that, but must have forgot or thought that I already had.

wishful thinking :)

okay let me know what you think of this:

using a common var across all the controllers: this is slightly less granular but I think it's a better, easier to understand configuration from a user's perspective (since they're not supposed to know how the code of each controller works), 10 seconds being a middle ground default value.

the flag name "debouncing-timer" and description, tried to make that as human developer-friendly as possible and tried to describe how it's actually useful (ie. what it does) and not how it does it (ie. "cache").

CecileRobertMichon · 2021-09-17T18:09:46Z

private cluster test had remaining resources after delete

/retest

Enable coalescing reconciler for AzureCluster, AzureMachine, AzureManagedControlPlane, AzureManagedCluster, and AzureManagedMachinePool

devigned · 2021-09-21T13:13:12Z

/retest

CecileRobertMichon · 2021-09-21T22:02:15Z

/assign @shysank @devigned

devigned

/lgtm

shysank · 2021-09-22T18:42:19Z

How will this work when there are more than one controller instances running? AFAICT, the cache appears to be local to an instance. Does controller manager guarantee that an object's reconciliation request is always sent to the same instance?

devigned · 2021-09-22T19:15:18Z

How will this work when there are more than one controller instances running? AFAICT, the cache appears to be local to an instance. Does controller manager guarantee that an object's reconciliation request is always sent to the same instance?

Are you implying a scenario where more than one controller instance is watching and reconciling the same resources?

shysank · 2021-09-22T19:20:33Z

Are you implying a scenario where more than one controller instance is watching and reconciling the same resources?

Yeah, as in, just scale my capz controller deployment to 2 (or more).

devigned · 2021-09-22T19:30:18Z

Yeah, as in, just scale my capz controller deployment to 2 (or more).

The controller should have only 1 leader elected based on our manager configuration. I don't know that we should support 2 controllers reconciling the same resources. If 2 controllers are run side by side, I would imagine that each would be responsible for reconciling their own exclusive set of resources.

^ is that assumption incorrect?

devigned · 2021-09-22T19:34:15Z

Architecture docs to the rescue.

https://cluster-api.sigs.k8s.io/developer/architecture/controllers/support-multiple-instances.html#contract

shysank · 2021-09-22T20:21:45Z

@devigned I think it's a fair assumption. Thanks for the explanation! The only edge case I can think of is when a new leader gets elected, but that's going be rare, and even if it happens, the worst thing that could happen is that the cache will get invalidated, which is fine.
/lgtm

CecileRobertMichon · 2021-09-22T20:48:27Z

Thanks for bringing this up @shysank, definitely a good scenario to think about.

/approve

k8s-ci-robot · 2021-09-22T20:48:49Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [CecileRobertMichon]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 16, 2021

k8s-ci-robot requested review from alexeldeib and juan-lee September 16, 2021 21:10

k8s-ci-robot added area/provider/azure Issues or PRs related to azure provider sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Sep 16, 2021

CecileRobertMichon commented Sep 16, 2021

View reviewed changes

k8s-ci-robot assigned devigned Sep 16, 2021

CecileRobertMichon commented Sep 16, 2021

View reviewed changes

devigned reviewed Sep 16, 2021

View reviewed changes

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 18, 2021

Cecile Robert-Michon added 2 commits September 20, 2021 10:54

Enable coalescing reconciler for more controllers

534821f

Enable coalescing reconciler for AzureCluster, AzureMachine, AzureManagedControlPlane, AzureManagedCluster, and AzureManagedMachinePool

Add debouncing timer cmdLine arg

3adb8b5

CecileRobertMichon force-pushed the coalescing-reconcilers branch from 6e3d3e5 to 3adb8b5 Compare September 20, 2021 17:54

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 20, 2021

k8s-ci-robot assigned shysank Sep 21, 2021

devigned approved these changes Sep 22, 2021

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 22, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 22, 2021

k8s-ci-robot merged commit 0ea1c6a into kubernetes-sigs:main Sep 22, 2021

k8s-ci-robot added this to the v0.5 milestone Sep 22, 2021

CecileRobertMichon mentioned this pull request Sep 22, 2021

Adding correlation ID in reconcile loop loggers #1575

Merged

3 tasks

CecileRobertMichon deleted the coalescing-reconcilers branch February 17, 2023 23:24

Comments

Conversation

CecileRobertMichon commented Sep 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CecileRobertMichon commented Sep 16, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devigned left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CecileRobertMichon commented Sep 17, 2021

Uh oh!

devigned commented Sep 21, 2021

Uh oh!

CecileRobertMichon commented Sep 21, 2021

Uh oh!

devigned left a comment

Choose a reason for hiding this comment

Uh oh!

shysank commented Sep 22, 2021

Uh oh!

devigned commented Sep 22, 2021

Uh oh!

shysank commented Sep 22, 2021

Uh oh!

devigned commented Sep 22, 2021

Uh oh!

devigned commented Sep 22, 2021

Uh oh!

shysank commented Sep 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CecileRobertMichon commented Sep 22, 2021

Uh oh!

k8s-ci-robot commented Sep 22, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CecileRobertMichon commented Sep 16, 2021 •

edited

Loading

shysank commented Sep 22, 2021 •

edited

Loading