Skip to content

Conversation

@cgwalters
Copy link
Member

Rather than poll all of the daemons, add a helper that waits
for a pool to complete a config.

One of our tests walks over the MCDs, change it to just assert
on all of the nodes.

The SSH test can also just wait for a pool and then rsh to
each node.

@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels May 16, 2019
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't thi racy if we jump here before the pool starts rolling? I believe that's why I went node by node confirming current is there and of the new pool

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, you added a check below but this may flake then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just above that we do waitForRenderedConfig right?

Copy link
Member Author

@cgwalters cgwalters May 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record though I didn't test this much locally, created a PR to have CI do it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah, I had an old master on my laptop and didn't notice that, cool yeah, no flake no race 🎉

@cgwalters
Copy link
Member Author

Hum, so that run failed; added some more debug logs to help me figure it out. Offhand...I think there is a race here because the tests were relying on checking for config=X and status=Updated, but we don't update those atomically. Really...it feels like we should have Spec.Configuration and Status.Configuration just like other objects right?

@kikisdeliveryservice
Copy link
Contributor

kikisdeliveryservice commented May 16, 2019

would checking if state is Done before checking that they match clarify what's happening (ie switch the 2 asserts?)

assert.Equal(t, constants.MachineConfigDaemonStateDone, node.Annotations[constants.MachineConfigDaemonStateAnnotationKey])

@cgwalters
Copy link
Member Author

OK this test passes "locally" now - I added a time.Sleep(13*time.Second) as a temporary hack. Pretty sure the race is #765 (comment) - going to take a quick look at a PR for that.

@cgwalters
Copy link
Member Author

Hooray, e2e-aws-op passed!

#773 should help us avoid the race.

@kikisdeliveryservice
Copy link
Contributor

ive seen these weird e2e-aws-upgrade failures before, checking them out elsewhere. let's try again:
/test e2e-aws-upgrade

Rather than poll all of the daemons, add a helper that waits
for a pool to complete a config.

One of our tests walks over the MCDs, change it to just assert
on all of the nodes.

The SSH test can also just wait for a pool and then `rsh` to
each node.
@cgwalters
Copy link
Member Author

Rebased 🏄‍♂️ and CI is good, can I get a lgtm?

@runcom
Copy link
Member

runcom commented May 17, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 17, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, runcom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit e861ccb into openshift:master May 17, 2019
cgwalters added a commit to cgwalters/machine-config-operator that referenced this pull request May 22, 2019
See openshift#765 (comment)

MachineConfigPool needs a `Spec.Configuration` and `Status.Configuration`
[just like other objects][1] so that we can properly detect state.
Currently there's a race because the render controller may set `Status.Configuration`
while the pool's `Status` still has `Updated`, so one can't reliably check whether the
pool is at a given config.

With this, ownership is clear: the render controller sets the spec, and the node controller
updates the status.

[1] https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#spec-and-status)
cgwalters added a commit to cgwalters/machine-config-operator that referenced this pull request Jun 14, 2019
See openshift#765 (comment)

MachineConfigPool needs a `Spec.Configuration` and `Status.Configuration`
[just like other objects][1] so that we can properly detect state.
Currently there's a race because the render controller may set `Status.Configuration`
while the pool's `Status` still has `Updated`, so one can't reliably check whether the
pool is at a given config.

With this, ownership is clear: the render controller sets the spec, and the node controller
updates the status.

[1] https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#spec-and-status)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants