Skip to content

Conversation

@runcom
Copy link
Member

@runcom runcom commented Feb 23, 2019

This patch does various things, all related:

Closes: #338
Closes: #385

Signed-off-by: Antonio Murdaca [email protected]

@openshift-ci-robot openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 23, 2019
@runcom
Copy link
Member Author

runcom commented Feb 23, 2019

/retest

@runcom runcom force-pushed the start-informeres-first branch from 3194d28 to 21563df Compare February 24, 2019 22:55
@cgwalters
Copy link
Member

I think with stuff like this it'd help me a lot to have a "reference operator" codebase. There's the kube sample controller but our code is different enough I'm not sure if it's doing the same thing.

@runcom
Copy link
Member Author

runcom commented Mar 6, 2019

I think with stuff like this it'd help me a lot to have a "reference operator" codebase. There's the kube sample controller but our code is different enough I'm not sure if it's doing the same thing.

cool, and from that example operator it looks like informers are indeed started before the controller itself https://github.com/kubernetes/sample-controller/blob/master/main.go#L64-L71 - I've rebased this as well

/assign @LorbusChris

@runcom runcom force-pushed the start-informeres-first branch 2 times, most recently from a12e170 to 4b7aaf4 Compare March 6, 2019 12:05
@runcom runcom force-pushed the start-informeres-first branch 2 times, most recently from 08f1f4a to 33edd4d Compare March 17, 2019 13:37
@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 17, 2019
@runcom runcom changed the title controller_context: start informers first controllers: refactor code and start informers first Mar 17, 2019
@runcom
Copy link
Member Author

runcom commented Mar 17, 2019

/retest

@runcom
Copy link
Member Author

runcom commented Mar 17, 2019

This has been updated (re-read the summary in the first comment) and does fix #338 as well (by waiting on the template controller 🎉 ):

I0317 16:27:21.495296       1 node_controller.go:383] Pool master is unconfigured, pausing 5s for renderer to initialize
I0317 16:27:21.495462       1 node_controller.go:383] Pool worker is unconfigured, pausing 5s for renderer to initialize
I0317 16:27:21.495537       1 render_controller.go:380] Error syncing machineconfigpool master: ControllerConfig has not completed: completed(false) running(false) failing(true)
I0317 16:27:21.495584       1 render_controller.go:380] Error syncing machineconfigpool worker: ControllerConfig has not completed: completed(false) running(false) failing(true)
I0317 16:27:21.501140       1 render_controller.go:380] Error syncing machineconfigpool master: ControllerConfig has not completed: completed(false) running(false) failing(true)
I0317 16:27:21.501180       1 render_controller.go:380] Error syncing machineconfigpool worker: ControllerConfig has not completed: completed(false) running(false) failing(true)
I0317 16:27:21.511368       1 render_controller.go:380] Error syncing machineconfigpool master: ControllerConfig has not completed: completed(false) running(false) failing(true)
I0317 16:27:21.511407       1 render_controller.go:380] Error syncing machineconfigpool worker: ControllerConfig has not completed: completed(false) running(false) failing(true)
I0317 16:27:21.531622       1 render_controller.go:380] Error syncing machineconfigpool master: ControllerConfig has not completed: completed(false) running(false) failing(true)
I0317 16:27:21.531647       1 render_controller.go:380] Error syncing machineconfigpool worker: ControllerConfig has not completed: completed(false) running(false) failing(true)
I0317 16:27:21.571897       1 render_controller.go:380] Error syncing machineconfigpool worker: ControllerConfig has not completed: completed(false) running(false) failing(true)
I0317 16:27:21.571897       1 render_controller.go:380] Error syncing machineconfigpool master: ControllerConfig has not completed: completed(false) running(false) failing(true)
I0317 16:27:21.671461       1 render_controller.go:500] Generated machineconfig worker-b058168e9ec42a1704343e84bb2dc7eb from 4 configs: [{MachineConfig  00-worker  machineconfiguration.openshift.io/v1  } {MachineConfig  00-worker-ssh  machineconfiguration.openshift.io/v1  } {MachineConfig  01-worker-container-runtime  machineconfiguration.openshift.io/v1  } {MachineConfig  01-worker-kubelet  machineconfiguration.openshift.io/v1  }]
I0317 16:27:21.684235       1 render_controller.go:500] Generated machineconfig master-34908aca2b6075843cc77cd59449a0df from 4 configs: [{MachineConfig  00-master  machineconfiguration.openshift.io/v1  } {MachineConfig  00-master-ssh  machineconfiguration.openshift.io/v1  } {MachineConfig  01-master-container-runtime  machineconfiguration.openshift.io/v1  } {MachineConfig  01-master-kubelet  machineconfiguration.openshift.io/v1  }]

@runcom
Copy link
Member Author

runcom commented Mar 17, 2019

@runcom runcom changed the title controllers: refactor code and start informers first controllers: refactor code and start informers first, fix template/render race Mar 17, 2019
@runcom runcom force-pushed the start-informeres-first branch from c3bb896 to a2ad129 Compare March 17, 2019 22:33
@openshift-ci-robot openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 17, 2019
This patch does various things, all related:

- start the informers for controllers before running them to follow what
https://github.com/kubernetes/sample-controller/blob/master/main.go#L64-L71
does and to avoid races already spotted in unit tests
(openshift#457)
- move the clients builder code from cmd/common to a new package just for that
under the new internal/ folder so nobody but us can use that
- move common controllers code under pkg/controller/common to be reused
- create an e2e only clientset, avoiding us to type client version
everytime (eg client.CoreV1()..). This is a need cause if in the future
the api change, we don't play grep for a week replacing old apis...

Signed-off-by: Antonio Murdaca <[email protected]>
runcom added 2 commits March 18, 2019 00:59
The template controller is responsible for generating the initial MCs
but the render controller can kick in before the template is done. Add a
sync mechanism by looking at the controller config status as that's the
souce of truth to understand if a sync is done if the controller config
changes and the template controller runs again.

Signed-off-by: Antonio Murdaca <[email protected]>
@runcom runcom force-pushed the start-informeres-first branch from a2ad129 to 1e381d2 Compare March 17, 2019 23:59
@runcom
Copy link
Member Author

runcom commented Mar 18, 2019

/retest

@kikisdeliveryservice
Copy link
Contributor

LGTM! Really like some of the changes here, will leave it to @cgwalters / @LorbusChris to give it one more set of 👀

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

}

if cur.Generation != cur.Status.ObservedGeneration {
return fmt.Errorf("status for ControllerConfig %s is being reported for %d, expecting it for %d", cc[0].GetName(), cur.Status.ObservedGeneration, cur.Generation)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like this isn't an error but eh.

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 18, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, runcom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 782e8e7 into openshift:master Mar 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix initial race between template and render sub-controllers

7 participants