Skip to content

Conversation

@Danil-Grigorev
Copy link

What this PR does / why we need it:
Implemented leader election for azure provider

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

OCPCLOUD-492

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Release note:

Couple of new cli arguments for configuring leader election

  -leader-elect
    	Start a leader election client and gain leadership before executing the main loop. Enable this when running replicated components for high availability. (default true)
  -leader-elect-lease-duration int
    	The duration that non-leader candidates will wait after observing a leadership renewal until attempting to acquire leadership of a led but unrenewed leader slot. This is effectively the maximum duration that a leader can be stopped before it is replaced by another candidate. This is only applicable if leader election is enabled. (default 15)
  -leader-elect-resource-namespace string
    	The namespace of resource object that is used for locking during leader election. If unspecified, the controller watches for machine-api objects across all namespaces.

Copy link

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments

Comment on lines 46 to 69
flag.StringVar(
&watchNamespace,
"namespace",
"",
"Namespace that the controller watches to reconcile machine-api objects. If unspecified, the controller watches for machine-api objects across all namespaces.",
)

flag.StringVar(
&leaderElectResourceNamespace,
"leader-elect-resource-namespace",
"",
"The namespace of resource object that is used for locking during leader election. If unspecified, the controller watches for machine-api objects across all namespaces.",
)

flag.BoolVar(
&leaderElect,
"leader-elect",
true,
"Start a leader election client and gain leadership before executing the main loop. Enable this when running replicated components for high availability.",
)

flag.Int64Var(
&leaderElectLeaseDuration,
"leader-elect-lease-duration",
15,
"The duration that non-leader candidates will wait after observing a leadership renewal until attempting to acquire leadership of a led but unrenewed leader slot. This is effectively the maximum duration that a leader can be stopped before it is replaced by another candidate. This is only applicable if leader election is enabled.",
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to declare these in the way the watchNamespace was?

flag.BoolVar(
&leaderElect,
"leader-elect",
true,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given there's no default namespace, and this would be a breaking change (technically), I think this should default to false

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I saw this happening locally. Will adjust the value.

&leaderElectResourceNamespace,
"leader-elect-resource-namespace",
"",
"The namespace of resource object that is used for locking during leader election. If unspecified, the controller watches for machine-api objects across all namespaces.",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second half of this sentence doesn't make any sense in this context, can you verify the behaviour if it is unset? Does controller-runtime allow this to be empty?

"Start a leader election client and gain leadership before executing the main loop. Enable this when running replicated components for high availability.",
)

flag.Int64Var(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use an actual Duration flag?


leaderElectLeaseDuration := flag.Duration(
"leader-elect-lease-duration",
15,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably still want a default of 15 seconds 😉

Suggested change
15,
15 * time.Second,

leaderElectResourceNamespace := flag.String(
"leader-elect-resource-namespace",
"",
"The namespace of resource object that is used for locking during leader election. If unspecified, the controller watches for the namespace currently in-use in the cluster",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"The namespace of resource object that is used for locking during leader election. If unspecified, the controller watches for the namespace currently in-use in the cluster",
"The namespace of resource object that is used for locking during leader election. If unspecified and running in cluster, defaults to the service account namespace for the controller. Required for leader-election outside of a cluster.",

cfg := config.GetConfigOrDie()

opts := manager.Options{}
leaseDuration := time.Second * *leaderElectLeaseDuration

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duration will already be in seconds if we set the default to seconds, that or the user will specify something on the command line like 10s, 1m, 1h depending on their preference

@JoelSpeed
Copy link

/approve

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JoelSpeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 23, 2020
@Danil-Grigorev
Copy link
Author

/retest

Copy link

@alexander-demicev alexander-demicev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 10, 2020
@enxebre
Copy link
Member

enxebre commented Jun 10, 2020

can we please use imperative mood in the commit instead of past?
We follow https://chris.beams.io/posts/git-commit/#imperative

@enxebre
Copy link
Member

enxebre commented Jun 10, 2020

Is there counter part PRs for other providers and in MAO to use the flags that we can reference in one single place?

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

14 similar comments
@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

23 similar comments
@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 7fe5391 into openshift:master Jul 4, 2020
Danil-Grigorev pushed a commit to Danil-Grigorev/machine-api-operator that referenced this pull request Jul 13, 2020
Using leader election by default will add stronger guarantees than we have today that only one controller is running at a time to protect against edge cases where the deployment replica could be increased or upgrades with permissive maxSurge.

Relevant provider PRs:

- openshift/cluster-api-provider-gcp#85
- openshift/cluster-api-provider-aws#315
- openshift/cluster-api-provider-azure#122
- openshift/cluster-api-provider-openstack#108
- openshift/cluster-api-provider-baremetal#81
- openshift/cluster-api-provider-ovirt#55
- openshift#571
elmiko pushed a commit to elmiko/machine-api-operator that referenced this pull request Aug 3, 2020
Using leader election by default will add stronger guarantees than we have today that only one controller is running at a time to protect against edge cases where the deployment replica could be increased or upgrades with permissive maxSurge.

Relevant provider PRs:

- openshift/cluster-api-provider-gcp#85
- openshift/cluster-api-provider-aws#315
- openshift/cluster-api-provider-azure#122
- openshift/cluster-api-provider-openstack#108
- openshift/cluster-api-provider-baremetal#81
- openshift/cluster-api-provider-ovirt#55
- openshift#571
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. release/4.6

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants