MCO-790: Set up leader election in machine-os-builder #4302

inesqyx · 2024-04-02T17:54:07Z

What I did

Mimicking the way that leader election is setup in machine config controller and machine config operator, we set up leader election in MOB as well. Doing so will ensure that only one single Machine OS Builder pod would be running at any given time.

How to verify it

Deploy an OpenShift cluster.
Opt into on-cluster builds.
Retrieve the logs for the Machine OS Builder pod and verify that it pauses for leader election and eventually starts.
Delete the pod and wait for the Deployment to start a replacement pod.
Retrieve the replacement pod logs and verify that it pauses for leader election and eventually starts.

openshift-ci-robot · 2024-04-02T17:54:11Z

@inesqyx: This pull request references MCO-790 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

Details

In response to this:

What I did

Mimicking the way that leader election is setup in machine config controller and machine config operator, we set up leader election in MOB as well. Doing so will ensure that only one single Machine OS Builder pod would be running at any given time.

How to verify it

Deploy an OpenShift cluster.

Opt into on-cluster builds.

Retrieve the logs for the Machine OS Builder pod and verify that it pauses for leader election and eventually starts.

Delete the pod and wait for the Deployment to start a replacement pod.

Retrieve the replacement pod logs and verify that it pauses for leader election and eventually starts.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2024-04-02T17:54:12Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

inesqyx · 2024-04-02T17:54:17Z

/test all

openshift-ci · 2024-04-02T17:55:08Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: inesqyx
Once this PR has been reviewed and has the lgtm label, please assign dkhater-redhat for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2024-04-02T23:59:20Z

@inesqyx: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/okd-scos-e2e-aws-ovn	`dd4e9b9`	link	false	`/test okd-scos-e2e-aws-ovn`
ci/prow/e2e-azure-ovn-upgrade-out-of-change	`dd4e9b9`	link	false	`/test e2e-azure-ovn-upgrade-out-of-change`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

inesqyx · 2024-04-04T13:57:20Z

/jira refresh

openshift-ci-robot · 2024-04-04T13:57:24Z

@inesqyx: This pull request references MCO-790 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

sergiordlr · 2024-04-15T16:09:37Z

When we configure a new imageBuilderType the machine-os-builder pod is restarted, but it does not release the lease, so when the new machine-os-builder pod starts it cannot get the lease and reports a failure with this message

$ oc logs machine-os-builder-fb856c6f4-nvrvf 
I0415 15:59:33.756112       1 start.go:89] Options parsed: {kubeconfig:}
I0415 15:59:33.756133       1 start.go:92] Version: machine-config-daemon-4.6.0-202006240615.p0-2682-g200c5f24-dirty (200c5f24043dee744f5b1680eb09cffcaa7d7a8f)
I0415 15:59:33.756143       1 builder.go:93] Using in-cluster kube client config
I0415 15:59:33.756369       1 leaderelection.go:122] The leader election gives 4 retries and allows for 30s of clock skew. The kube-apiserver downtime tolerance is 78s. Worst non-graceful lease acquisition is 2m43s. Worst graceful lease acquisition is {26s}.
I0415 15:59:33.770262       1 leaderelection.go:250] attempting to acquire leader lease openshift-machine-config-operator/machine-os-builder...
I0415 15:59:33.775174       1 leaderelection.go:354] lock is held by machine-os-builder-fb856c6f4-nvrvf_42ca8477-a939-4883-89f4-643b5dcfa7b0 and has not yet expired
I0415 15:59:33.775195       1 leaderelection.go:255] failed to acquire lease openshift-machine-config-operator/machine-os-builder

It fails for 2 minutes and a half and then takes the lease ungracefully.

To reproduce it just enable the on-cluster-build functionality and reconfigure the imageBuildertType with this command, for example:

$ oc patch cm/on-cluster-build-config -n openshift-machine-config-operator -p '{"data":{"imageBuilderType": "custom-pod-builder"}}

Since imageBuilderType configuration is a controlled situation, the lease should be released and acquired gracefully, shouldn't it?

A pre-merge jira ticket has been created to track this behaviour https://issues.redhat.com/browse/OCPBUGS-32271

sergiordlr · 2024-04-24T14:25:39Z

With the new MachineOsCOnfig resource there is only one image builder type, hence the ticket that we opened regarding this PR no longer applies.

We add the qe-approved label.

/label qe-approved

openshift-ci-robot · 2024-04-24T14:26:55Z

@inesqyx: This pull request references MCO-790 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

Details

In response to this:

What I did

Mimicking the way that leader election is setup in machine config controller and machine config operator, we set up leader election in MOB as well. Doing so will ensure that only one single Machine OS Builder pod would be running at any given time.

How to verify it

Deploy an OpenShift cluster.

Opt into on-cluster builds.

Retrieve the logs for the Machine OS Builder pod and verify that it pauses for leader election and eventually starts.

Delete the pod and wait for the Deployment to start a replacement pod.

Retrieve the replacement pod logs and verify that it pauses for leader election and eventually starts.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-bot · 2024-07-24T01:00:58Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-merge-robot · 2024-07-24T01:01:07Z

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

inesqyx · 2024-07-24T01:46:03Z

Close the PR, merged in #4327

Set up leader election in machine-os-builder

dd4e9b9

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 2, 2024

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 2, 2024

inesqyx marked this pull request as ready for review April 2, 2024 20:48

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 2, 2024

openshift-ci bot requested review from dkhater-redhat and jkyros April 2, 2024 20:50

openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Apr 24, 2024

inesqyx mentioned this pull request May 1, 2024

MCO-1131: Merge implementation machineOSBuild and machineOSConfig in MCO #4327

Merged

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 24, 2024

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 24, 2024

inesqyx closed this Jul 24, 2024

MCO-790: Set up leader election in machine-os-builder #4302

MCO-790: Set up leader election in machine-os-builder #4302

Uh oh!

Conversation

inesqyx commented Apr 2, 2024

Uh oh!

openshift-ci-robot commented Apr 2, 2024 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Apr 2, 2024

Uh oh!

inesqyx commented Apr 2, 2024

Uh oh!

openshift-ci bot commented Apr 2, 2024

Uh oh!

openshift-ci bot commented Apr 2, 2024

Uh oh!

inesqyx commented Apr 4, 2024

Uh oh!

openshift-ci-robot commented Apr 4, 2024 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sergiordlr commented Apr 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sergiordlr commented Apr 24, 2024

Uh oh!

openshift-ci-robot commented Apr 24, 2024 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-bot commented Jul 24, 2024

Uh oh!

openshift-merge-robot commented Jul 24, 2024

Uh oh!

inesqyx commented Jul 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

openshift-ci-robot commented Apr 2, 2024 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Apr 4, 2024 •

edited by openshift-ci bot

Loading

sergiordlr commented Apr 15, 2024 •

edited

Loading

openshift-ci-robot commented Apr 24, 2024 •

edited by openshift-ci bot

Loading