-
Notifications
You must be signed in to change notification settings - Fork 38
OCPCLOUD-492: Run all machine api Controllers using leader election #122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPCLOUD-492: Run all machine api Controllers using leader election #122
Conversation
JoelSpeed
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments
cmd/manager/main.go
Outdated
| flag.StringVar( | ||
| &watchNamespace, | ||
| "namespace", | ||
| "", | ||
| "Namespace that the controller watches to reconcile machine-api objects. If unspecified, the controller watches for machine-api objects across all namespaces.", | ||
| ) | ||
|
|
||
| flag.StringVar( | ||
| &leaderElectResourceNamespace, | ||
| "leader-elect-resource-namespace", | ||
| "", | ||
| "The namespace of resource object that is used for locking during leader election. If unspecified, the controller watches for machine-api objects across all namespaces.", | ||
| ) | ||
|
|
||
| flag.BoolVar( | ||
| &leaderElect, | ||
| "leader-elect", | ||
| true, | ||
| "Start a leader election client and gain leadership before executing the main loop. Enable this when running replicated components for high availability.", | ||
| ) | ||
|
|
||
| flag.Int64Var( | ||
| &leaderElectLeaseDuration, | ||
| "leader-elect-lease-duration", | ||
| 15, | ||
| "The duration that non-leader candidates will wait after observing a leadership renewal until attempting to acquire leadership of a led but unrenewed leader slot. This is effectively the maximum duration that a leader can be stopped before it is replaced by another candidate. This is only applicable if leader election is enabled.", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to declare these in the way the watchNamespace was?
cmd/manager/main.go
Outdated
| flag.BoolVar( | ||
| &leaderElect, | ||
| "leader-elect", | ||
| true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given there's no default namespace, and this would be a breaking change (technically), I think this should default to false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I saw this happening locally. Will adjust the value.
cmd/manager/main.go
Outdated
| &leaderElectResourceNamespace, | ||
| "leader-elect-resource-namespace", | ||
| "", | ||
| "The namespace of resource object that is used for locking during leader election. If unspecified, the controller watches for machine-api objects across all namespaces.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second half of this sentence doesn't make any sense in this context, can you verify the behaviour if it is unset? Does controller-runtime allow this to be empty?
cmd/manager/main.go
Outdated
| "Start a leader election client and gain leadership before executing the main loop. Enable this when running replicated components for high availability.", | ||
| ) | ||
|
|
||
| flag.Int64Var( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use an actual Duration flag?
cmd/manager/main.go
Outdated
|
|
||
| leaderElectLeaseDuration := flag.Duration( | ||
| "leader-elect-lease-duration", | ||
| 15, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably still want a default of 15 seconds 😉
| 15, | |
| 15 * time.Second, |
cmd/manager/main.go
Outdated
| leaderElectResourceNamespace := flag.String( | ||
| "leader-elect-resource-namespace", | ||
| "", | ||
| "The namespace of resource object that is used for locking during leader election. If unspecified, the controller watches for the namespace currently in-use in the cluster", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "The namespace of resource object that is used for locking during leader election. If unspecified, the controller watches for the namespace currently in-use in the cluster", | |
| "The namespace of resource object that is used for locking during leader election. If unspecified and running in cluster, defaults to the service account namespace for the controller. Required for leader-election outside of a cluster.", |
cmd/manager/main.go
Outdated
| cfg := config.GetConfigOrDie() | ||
|
|
||
| opts := manager.Options{} | ||
| leaseDuration := time.Second * *leaderElectLeaseDuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The duration will already be in seconds if we set the default to seconds, that or the user will specify something on the command line like 10s, 1m, 1h depending on their preference
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JoelSpeed The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
alexander-demicev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
can we please use imperative mood in the commit instead of past? |
|
Is there counter part PRs for other providers and in MAO to use the flags that we can reference in one single place? |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
14 similar comments
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
23 similar comments
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
Using leader election by default will add stronger guarantees than we have today that only one controller is running at a time to protect against edge cases where the deployment replica could be increased or upgrades with permissive maxSurge. Relevant provider PRs: - openshift/cluster-api-provider-gcp#85 - openshift/cluster-api-provider-aws#315 - openshift/cluster-api-provider-azure#122 - openshift/cluster-api-provider-openstack#108 - openshift/cluster-api-provider-baremetal#81 - openshift/cluster-api-provider-ovirt#55 - openshift#571
Using leader election by default will add stronger guarantees than we have today that only one controller is running at a time to protect against edge cases where the deployment replica could be increased or upgrades with permissive maxSurge. Relevant provider PRs: - openshift/cluster-api-provider-gcp#85 - openshift/cluster-api-provider-aws#315 - openshift/cluster-api-provider-azure#122 - openshift/cluster-api-provider-openstack#108 - openshift/cluster-api-provider-baremetal#81 - openshift/cluster-api-provider-ovirt#55 - openshift#571
What this PR does / why we need it:
Implemented leader election for azure provider
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):OCPCLOUD-492
Special notes for your reviewer:
Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.
Release note:
Couple of new cli arguments for configuring leader election