Skip to content

Conversation

@cgwalters
Copy link
Member

To reduce churn if MCs are being created rapidly - both on general
principle, and also to reduce our exposure to the current bug
that a booting node may fail to find a GC'd MachineConfig:
#301

To reduce churn if MCs are being created rapidly - both on general
principle, and also to reduce our exposure to the current bug
that a booting node may fail to find a GC'd MachineConfig:
openshift#301
@openshift-ci-robot openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jan 15, 2019
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 15, 2019
@ashcrow
Copy link
Member

ashcrow commented Jan 15, 2019

TF failures

/retest

@jlebon
Copy link
Member

jlebon commented Jan 15, 2019

Hmm, if I'm reading this right, even if we translate three MC creation events by 5s, we're still regenerating three times, right? Though I guess they'll hash to the same name now at least, so we won't get a generated MC quickly appearing then getting deleted.

LGTM though will defer to folks more familiar with the workqueue API.

/approve

@cgwalters
Copy link
Member Author

we're still regenerating three times, right?

See https://godoc.org/k8s.io/client-go/util/workqueue
specifically:

and if an item is added multiple times before it can be processed, it will only be processed once.

@ashcrow
Copy link
Member

ashcrow commented Jan 15, 2019

tf failures again... looks like somethings up. Let's retest shortly.

@jlebon
Copy link
Member

jlebon commented Jan 15, 2019

OK, after reading up some more on the workqueue API, I'm more confident this works now. I've also just tested it!

/lgtm

Re.

and if an item is added multiple times before it can be processed, it will only be processed once.

Right, I see that at https://github.com/kubernetes/client-go/blob/b831b8de7155117e51afaffeb647007a756ddc92/util/workqueue/queue.go#L114. But this happens at Add() time, and AddAfter() just delays the Add() call: https://github.com/kubernetes/client-go/blob/b831b8de7155117e51afaffeb647007a756ddc92/util/workqueue/delaying_queue.go#L162. So Add() is still called e.g. three times. So if the previous work item has already been processed, we'll just sync again. One test I did to verify this was oc create -f hello1.yaml && sleep 3.5 && oc create -f hello2.yaml.

Anyway, this still mitigates the issue fine since the MCs we're concerned about definitely happen within a 5s window. So even on the earliest event fired, we've already got all the MCs. Though of course closing the race window completely will still require some work.

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 15, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, jlebon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ashcrow
Copy link
Member

ashcrow commented Jan 15, 2019

/test e2e-aws

@ashcrow
Copy link
Member

ashcrow commented Jan 15, 2019

/retest

@ashcrow
Copy link
Member

ashcrow commented Jan 15, 2019

/test e2e-aws

@openshift-merge-robot openshift-merge-robot merged commit d919918 into openshift:master Jan 16, 2019
cgwalters added a commit to cgwalters/machine-config-operator that referenced this pull request Jan 22, 2019
This is like
openshift#303
but for the node controller.

We really don't need to react *instantly* to start updating and
rebooting machines, and having a small delay will help avoid
races when MCs are created rapidly.
osherdp pushed a commit to osherdp/machine-config-operator that referenced this pull request Apr 13, 2021
…ared

Bug 1854857: initial create errors should map to SamplesExists instead of ImageChangesInProgress
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants