Skip to content

Conversation

@Danil-Grigorev
Copy link

Fix leader-election cli option for machine controller, allowing to eliminate edge cases when multiple controllers could be running. Allows integration with MAO PR: openshift/machine-api-operator#571

Copy link

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these all need to be the same across providers and the set of flags is defined by the MAO, is there a package in the MAO code that we could import to get them?

The change looks fine as it is. Maybe we could centralize the definition as a follow-up.

@dhellmann
Copy link

/test e2e-metal-ipi

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 6, 2020
leaderElect := flag.Bool(
"leader-elect",
false,
"Start a leader election client and gain leadership before executing the main loop. Enable this when running replicated components for high availability.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leader election is recommended also for:

  • upgrade safety, to ensure only one of the old and new controller is running at a time
  • recovery safety, to ensure that when a singleton is on a Node that's disappeared, there's a reasonable guarantee that the old process has stopped trying to be the leader before a new process starts being the leader.

For a controller, if you want to give advice on when it should use leader election, I would just say that leader election is always recommended.

@Danil-Grigorev
Copy link
Author

@dhellmann I don't think we have it like that, each provider has its own set of flags, they just happen to be similar now with recent additions and restructuring. Only some of them should be similar, but the leader-election is enabled by default in https://github.com/openshift/machine-api-operator/pull/571/files#diff-fa45321336db7ad1cedc28bf643a4f97R371 For reference you could look into GCP provider: https://github.com/openshift/cluster-api-provider-gcp/blob/master/cmd/manager/main.go#L32-L60

@dhellmann
Copy link

@dhellmann I don't think we have it like that, each provider has its own set of flags, they just happen to be similar now with recent additions and restructuring.

So different providers have different flags related to leader elections?

Danil-Grigorev pushed a commit to Danil-Grigorev/machine-api-operator that referenced this pull request Jul 13, 2020
Using leader election by default will add stronger guarantees than we have today that only one controller is running at a time to protect against edge cases where the deployment replica could be increased or upgrades with permissive maxSurge.

Relevant provider PRs:

- openshift/cluster-api-provider-gcp#85
- openshift/cluster-api-provider-aws#315
- openshift/cluster-api-provider-azure#122
- openshift/cluster-api-provider-openstack#108
- openshift/cluster-api-provider-baremetal#81
- openshift/cluster-api-provider-ovirt#55
- openshift#571
@stbenjam
Copy link
Member

/test e2e-metal-ipi

@zaneb
Copy link
Member

zaneb commented Jul 14, 2020

/lgtm

@stbenjam
Copy link
Member

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 14, 2020
@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 14, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Danil-Grigorev, dhellmann, mhrivnak, zaneb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@stbenjam
Copy link
Member

/test e2e-metal-ipi

@stbenjam
Copy link
Member

I guess things can't really get much worse, let's just land it so it makes it ASAP into new builds.

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 14, 2020
@openshift-merge-robot openshift-merge-robot merged commit dab72d7 into openshift:master Jul 14, 2020
elmiko pushed a commit to elmiko/machine-api-operator that referenced this pull request Aug 3, 2020
Using leader election by default will add stronger guarantees than we have today that only one controller is running at a time to protect against edge cases where the deployment replica could be increased or upgrades with permissive maxSurge.

Relevant provider PRs:

- openshift/cluster-api-provider-gcp#85
- openshift/cluster-api-provider-aws#315
- openshift/cluster-api-provider-azure#122
- openshift/cluster-api-provider-openstack#108
- openshift/cluster-api-provider-baremetal#81
- openshift/cluster-api-provider-ovirt#55
- openshift#571
honza pushed a commit to honza/cluster-api-provider-baremetal that referenced this pull request Feb 7, 2022
✨ Add ChecksumType field in image
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants