controller: Add a 5s delay before rendering MCs #303

cgwalters · 2019-01-15T14:51:59Z

To reduce churn if MCs are being created rapidly - both on general
principle, and also to reduce our exposure to the current bug
that a booting node may fail to find a GC'd MachineConfig:
#301

To reduce churn if MCs are being created rapidly - both on general principle, and also to reduce our exposure to the current bug that a booting node may fail to find a GC'd MachineConfig: openshift#301

ashcrow · 2019-01-15T15:12:08Z

TF failures

/retest

jlebon · 2019-01-15T15:52:50Z

Hmm, if I'm reading this right, even if we translate three MC creation events by 5s, we're still regenerating three times, right? Though I guess they'll hash to the same name now at least, so we won't get a generated MC quickly appearing then getting deleted.

LGTM though will defer to folks more familiar with the workqueue API.

/approve

cgwalters · 2019-01-15T15:59:07Z

we're still regenerating three times, right?

See https://godoc.org/k8s.io/client-go/util/workqueue
specifically:

and if an item is added multiple times before it can be processed, it will only be processed once.

ashcrow · 2019-01-15T16:02:20Z

tf failures again... looks like somethings up. Let's retest shortly.

jlebon · 2019-01-15T18:21:53Z

OK, after reading up some more on the workqueue API, I'm more confident this works now. I've also just tested it!

/lgtm

Re.

and if an item is added multiple times before it can be processed, it will only be processed once.

Right, I see that at https://github.com/kubernetes/client-go/blob/b831b8de7155117e51afaffeb647007a756ddc92/util/workqueue/queue.go#L114. But this happens at Add() time, and AddAfter() just delays the Add() call: https://github.com/kubernetes/client-go/blob/b831b8de7155117e51afaffeb647007a756ddc92/util/workqueue/delaying_queue.go#L162. So Add() is still called e.g. three times. So if the previous work item has already been processed, we'll just sync again. One test I did to verify this was oc create -f hello1.yaml && sleep 3.5 && oc create -f hello2.yaml.

Anyway, this still mitigates the issue fine since the MCs we're concerned about definitely happen within a 5s window. So even on the earliest event fired, we've already got all the MCs. Though of course closing the race window completely will still require some work.

openshift-ci-robot · 2019-01-15T18:22:01Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, jlebon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [cgwalters,jlebon]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ashcrow · 2019-01-15T19:14:30Z

/test e2e-aws

ashcrow · 2019-01-15T19:15:05Z

/retest

ashcrow · 2019-01-15T21:22:06Z

/test e2e-aws

This is like openshift#303 but for the node controller. We really don't need to react *instantly* to start updating and rebooting machines, and having a small delay will help avoid races when MCs are created rapidly.

…ared Bug 1854857: initial create errors should map to SamplesExists instead of ImageChangesInProgress

controller: Add a 5s delay before rendering MCs

d902c1f

To reduce churn if MCs are being created rapidly - both on general principle, and also to reduce our exposure to the current bug that a booting node may fail to find a GC'd MachineConfig: openshift#301

openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jan 15, 2019

openshift-ci-robot requested review from abhinavdahiya and smarterclayton January 15, 2019 14:52

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 15, 2019

cgwalters mentioned this pull request Jan 15, 2019

WIP: Add RHCOS oscontainer into payload, render to 00-$role-osimageurl MC #273

Closed

2 tasks

openshift-ci-robot assigned jlebon Jan 15, 2019

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 15, 2019

openshift-merge-robot merged commit d919918 into openshift:master Jan 16, 2019

jlebon mentioned this pull request Jan 16, 2019

MachineConfigs can be garbage collected while a node is still booting #301

Closed

This was referenced Jan 22, 2019

controller/node: Also add a 5s delay here responding to pool changes #337

Merged

fix initial race between template and render sub-controllers #338

Closed

osherdp pushed a commit to osherdp/machine-config-operator that referenced this pull request Apr 13, 2021

Merge pull request openshift#303 from gabemontero/api-svr-err-not-cle…

09de303

…ared Bug 1854857: initial create errors should map to SamplesExists instead of ImageChangesInProgress

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

controller: Add a 5s delay before rendering MCs #303

controller: Add a 5s delay before rendering MCs #303

Uh oh!

cgwalters commented Jan 15, 2019

Uh oh!

ashcrow commented Jan 15, 2019

Uh oh!

jlebon commented Jan 15, 2019

Uh oh!

cgwalters commented Jan 15, 2019

Uh oh!

ashcrow commented Jan 15, 2019

Uh oh!

jlebon commented Jan 15, 2019

Uh oh!

openshift-ci-robot commented Jan 15, 2019

Uh oh!

ashcrow commented Jan 15, 2019

Uh oh!

ashcrow commented Jan 15, 2019

Uh oh!

ashcrow commented Jan 15, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

controller: Add a 5s delay before rendering MCs #303

controller: Add a 5s delay before rendering MCs #303

Uh oh!

Conversation

cgwalters commented Jan 15, 2019

Uh oh!

ashcrow commented Jan 15, 2019

Uh oh!

jlebon commented Jan 15, 2019

Uh oh!

cgwalters commented Jan 15, 2019

Uh oh!

ashcrow commented Jan 15, 2019

Uh oh!

jlebon commented Jan 15, 2019

Uh oh!

openshift-ci-robot commented Jan 15, 2019

Uh oh!

ashcrow commented Jan 15, 2019

Uh oh!

ashcrow commented Jan 15, 2019

Uh oh!

ashcrow commented Jan 15, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants