Skip to content

Comments

Enable coalescing reconciler for more controllers#1691

Merged
k8s-ci-robot merged 2 commits intokubernetes-sigs:mainfrom
CecileRobertMichon:coalescing-reconcilers
Sep 22, 2021
Merged

Enable coalescing reconciler for more controllers#1691
k8s-ci-robot merged 2 commits intokubernetes-sigs:mainfrom
CecileRobertMichon:coalescing-reconcilers

Conversation

@CecileRobertMichon
Copy link
Contributor

@CecileRobertMichon CecileRobertMichon commented Sep 16, 2021

Enable coalescing reconciler for AzureCluster, AzureMachine, AzureManagedControlPlane, AzureManagedCluster, and AzureManagedMachinePool

What type of PR is this?
/kind feature

What this PR does / why we need it: #1332 (devigned@b6b38b0) added a coalescing reconciler to debounce reconciles (in other words, make sure we don't run too many successful reconcile loops in short amounts of time). At the time, it was only enabled for AzureMachinePool and AzureMachinePoolMachine controllers. This PR enables it for more controllers, specifically all the ones that reconcile Azure resources (AzureCluster, AzureMachine, AzureManagedControlPlane, AzureManagedCluster, and AzureManagedMachinePool), in preparation for #1541.

Also fixes some duplicate code in main.go.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1688

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

Enable coalescing reconciler for more controllers

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 16, 2021
@k8s-ci-robot k8s-ci-robot added area/provider/azure Issues or PRs related to azure provider sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Sep 16, 2021
registerControllers(ctx, mgr)
// +kubebuilder:scaffold:builder

if err := mgr.AddReadyzCheck("webhook", mgr.GetWebhookServer().StartedChecker()); err != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reviewers, please pay close attention here. I believe we were duplicating some code so I removed it, but let me know if there is a good reason for this being called twice (here and line 520)

same thing with mgr.Start (line 304 and line 514)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. There is no reason to call this twice.

@CecileRobertMichon
Copy link
Contributor Author

/assign @devigned

main.go Outdated
os.Exit(1)
}

clusterCache, err := coalescing.NewRequestCache(5 * time.Second)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

went with 5 seconds since I don't want the reconciles to be too slow but let me know if you think that's too aggressive

Copy link
Contributor

@devigned devigned left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just a couple comments.

registerControllers(ctx, mgr)
// +kubebuilder:scaffold:builder

if err := mgr.AddReadyzCheck("webhook", mgr.GetWebhookServer().StartedChecker()); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. There is no reason to call this twice.

main.go Outdated
}

func registerControllers(ctx context.Context, mgr manager.Manager) {
machineCache, err := coalescing.NewRequestCache(5 * time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider making this a configurable as a cmdline arg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't, that would be a good one to be able to make configurable. Do you have any thoughts on whether it should be configurable per controller or just a single value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had made it configurable for both AMP and AMPM. If it's not overkill for cmdline args, I'd vote for per controller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMP and AMPM are both hardcoded right now, I don't think they're configurable.

Per controller makes sense, but it might be overwhelming to the user to be able to configure all of them (or to have to configure each one separately to change all the values). What do you think about one common flag for now, and potentially make it more granular later if the use case arises?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I lied. I think I was thinking about doing that, but must have forgot or thought that I already had.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wishful thinking :)

okay let me know what you think of this:

  1. using a common var across all the controllers: this is slightly less granular but I think it's a better, easier to understand configuration from a user's perspective (since they're not supposed to know how the code of each controller works), 10 seconds being a middle ground default value.
  2. the flag name "debouncing-timer" and description, tried to make that as human developer-friendly as possible and tried to describe how it's actually useful (ie. what it does) and not how it does it (ie. "cache").

@CecileRobertMichon
Copy link
Contributor Author

private cluster test had remaining resources after delete

/retest

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 18, 2021
Cecile Robert-Michon added 2 commits September 20, 2021 10:54
Enable coalescing reconciler for AzureCluster, AzureMachine, AzureManagedControlPlane, AzureManagedCluster, and AzureManagedMachinePool
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 20, 2021
@devigned
Copy link
Contributor

/retest

@CecileRobertMichon
Copy link
Contributor Author

/assign @shysank @devigned

Copy link
Contributor

@devigned devigned left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 22, 2021
@shysank
Copy link
Contributor

shysank commented Sep 22, 2021

How will this work when there are more than one controller instances running? AFAICT, the cache appears to be local to an instance. Does controller manager guarantee that an object's reconciliation request is always sent to the same instance?

@devigned
Copy link
Contributor

How will this work when there are more than one controller instances running? AFAICT, the cache appears to be local to an instance. Does controller manager guarantee that an object's reconciliation request is always sent to the same instance?

Are you implying a scenario where more than one controller instance is watching and reconciling the same resources?

@shysank
Copy link
Contributor

shysank commented Sep 22, 2021

Are you implying a scenario where more than one controller instance is watching and reconciling the same resources?

Yeah, as in, just scale my capz controller deployment to 2 (or more).

@devigned
Copy link
Contributor

Yeah, as in, just scale my capz controller deployment to 2 (or more).

The controller should have only 1 leader elected based on our manager configuration. I don't know that we should support 2 controllers reconciling the same resources. If 2 controllers are run side by side, I would imagine that each would be responsible for reconciling their own exclusive set of resources.

^ is that assumption incorrect?

@devigned
Copy link
Contributor

@shysank
Copy link
Contributor

shysank commented Sep 22, 2021

@devigned I think it's a fair assumption. Thanks for the explanation! The only edge case I can think of is when a new leader gets elected, but that's going be rare, and even if it happens, the worst thing that could happen is that the cache will get invalidated, which is fine.
/lgtm

@CecileRobertMichon
Copy link
Contributor Author

Thanks for bringing this up @shysank, definitely a good scenario to think about.

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 22, 2021
@k8s-ci-robot k8s-ci-robot merged commit 0ea1c6a into kubernetes-sigs:main Sep 22, 2021
@k8s-ci-robot k8s-ci-robot added this to the v0.5 milestone Sep 22, 2021
@CecileRobertMichon CecileRobertMichon deleted the coalescing-reconcilers branch February 17, 2023 23:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable coalescing reconciler for all controllers

4 participants