Skip to content

Conversation

@yuqi-zhang
Copy link
Contributor

Change existing watch workflow to an informer instead, as watches
are prone to breaking and restarting the MCD. The Run() function
within MCD is changed from an infinite loop to instead block,
and updates are handled by callbacks via the informer.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 15, 2018
@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 15, 2018
@yuqi-zhang
Copy link
Contributor Author

Marking WIP as the newest version isn't fully tested pending other issues.

cc/ @jlebon @sdemos @ashcrow

@ashcrow
Copy link
Member

ashcrow commented Oct 16, 2018

Makes sense so far. I recommend rebasing though as I think there has been a commit or two in the areas you're updating.

@abhinavdahiya
Copy link
Contributor

you are using an informer that reacts to changes from all the nodes. Please only watch changes to the node the daemon was scheduled.

@sdemos
Copy link
Contributor

sdemos commented Oct 16, 2018

@abhinavdahiya do you have a good example of an informer that only watches one instance of a resource? I wasn't sure if you could do that, right now the update callback checks if it's the node it cares about and if it's not it returns immediately, which I thought would be enough.

@sdemos sdemos mentioned this pull request Oct 18, 2018
4 tasks
@yuqi-zhang
Copy link
Contributor Author

yuqi-zhang commented Oct 18, 2018

Rebased.
@abhinavdahiya that's a good point, but I'm not sure what the proper method to do so would be. Am I able to just set an informer on a singular node instead of KubeInformerFactory.Core().V1().Nodes()? Right now the handler immediately checks for the node, so there shouldn't be a significant performance penalty.

@ashcrow
Copy link
Member

ashcrow commented Oct 19, 2018

/retest

@yuqi-zhang yuqi-zhang changed the title WIP: daemon: use informers to check for updates daemon: use informers to check for updates Oct 22, 2018
@yuqi-zhang
Copy link
Contributor Author

Added some fixes and removing WIP:

  • instead of panic, instead reboot the node after setting degraded. This makes it consistent with other failures (the node will sleep post reboot)
  • add a check to continue sleeping upon updates while we're degraded to preserve previous run logs (with actual failures)
  • stop checking for node updates and deletions for the informer (causes race conditions when update has not fully applied, and we handle those cases elsewhere regardless)

It is possible to filter for nodes to watch, albeit roundabout and adds complexity. I feel that the check in handleUpdate is working, and doesn't cause performance issues to warrant a custom informer. Note that there is the option to use UntilWithSync: https://github.com/kubernetes/client-go/blob/master/tools/watch/until.go#L98 sometime in the future (basically a custom informer that mimics our old workflow without causing issues), but that is with v9.0.0 (we use 8.0.0: https://github.com/openshift/machine-config-operator/blob/master/Gopkg.toml#L60), so I personally think the fix here should work for now.

I've tested with updates to configs, as well as leaving it running for 6+ hours, haven't broken so far or hot looped. I have the latest image built at docker.io/jerzhang/origin-machine-config-daemon:v3.11.0-178-g936c032f if you want to try for yourself.

@yuqi-zhang
Copy link
Contributor Author

@ashcrow @sdemos PTAL

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 22, 2018
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No update needed, but I'm curious: Was there a specific reason to do the informer setup after the Daemon creation? Or is it more so it reads better to you this way rather than:

return  &Daemon{
        ....
        nodeLister:       nodeInformer.Lister(),
        nodeListerSynced: nodeInformer.Informer().HasSynced,
}, nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that elsewhere we split off the Lister, etc. (e.g. https://github.com/openshift/machine-config-operator/blob/master/pkg/controller/node/node_controller.go#L105). I can change it to your suggested method if you'd like.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this is fine. I was just curious 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a comment here would be helpful. Seeing a closing of the channel without going through the informer docs seems odd.

Copy link
Member

@ashcrow ashcrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question, and one nit. Neither are blockers.

@ashcrow
Copy link
Member

ashcrow commented Oct 23, 2018

Will let @sdemos and/or @jlebon and/or @kikisdeliveryservice review and pull the merge lever if they also 👍.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: indentation looks off here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change the run() and process() split here? For one thing it makes the error handling a bit less painful in all the calls (i.e. we're just marking degraded on error in one spot instead of on all the out paths).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial thought was that process() was more of a loop processing commands, whereas now it just blocks at the <-stop, and callbacks happen instead. This is also somewhat consistent with how we have informers elsewhere in this repo. I have no strong feelings either way so I can change that if you feel that makes more sense,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO I think it's ok that Run and process are not split in this PR wince the work in #126 was going to have to change process and Run significantly anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that there's nothing preventing this to run at the same time as the initial triggerUpdate()? Or do callbacks not actually occur until after cache.WaitForCacheSync()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I haven't seen this cause an issue/race condition (even when there are repeated failed updates so it should have triggered), but let me double check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd rather we stop watching and exit on error conditions in this callback function and instead only have the "sleep on Degraded" logic at the beginning of the daemon.

Copy link
Contributor Author

@yuqi-zhang yuqi-zhang Oct 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, should I maybe instead not start the informer if we are degraded? So we never even get to this place?

Copy link
Member

@jlebon jlebon Oct 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely! (I thought that was already the case -- when are callbacks actually active? Is it at nodeInformer.Informer().AddEventHandler() time? In that case, yeah we should move that to after checking if the node is even degraded.)

But also in this function if the node is degraded, let's just stop watching for updates and exit?

@yuqi-zhang
Copy link
Contributor Author

Added fixup to move the state checking before the informer starts, which degrades early (and doesn't start informer anymore) as well as prevent race conditions between multiple updates. Will squash if it makes sense. Tested.

@jlebon PTAL

@sdemos
Copy link
Contributor

sdemos commented Oct 24, 2018

changes all make sense to me, and the logic looks good. leaving it to @jlebon to confirm the changes he requested.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: at this point, we might as well move these to runStartCmd before we call CheckStateOnBoot?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will fix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, there's a race condition here if the node annotation is modified after CheckStateOnBoot(), but before cache is synced and we can start receiving callbacks, right?

Here's an idea: we spawn another goroutine that just listens for notifications on a channel before doing triggerUpdate(). Then in handleUpdate we send a notification. But crucially, we also send a notification after WaitForCacheSync() is done if annotations don't match. That way updates are still always serialized, and we close the race condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I thought about this as well, but while testing this it seems that the notifications should still be there for when the informer actually starts listening (Maybe I'm mis-understanding, I used to have a addNode callback as well which triggers despite having happened before the informer starts listening)

Also, during testing, I found that the callback happens quite frequently with the node (once every couple seconds), so updates are generally caught eventually anyways (since we only care about differences in CC vs DC anyways).

@yuqi-zhang
Copy link
Contributor Author

yuqi-zhang commented Oct 25, 2018

@jlebon pushed a minor fixup

As discussed, it is probably better to handle via. a new goroutine to ensure no updates are lost, but for the time being this is unlikely to cause any race conditions. Instead I've PR'ed a fix to the other race condition at: #145

We will likely improve this behaviour as follow-up work. If you think this makes sense, I will squash the commits.

Thanks for the reviews!

Edit: will rebase 145 as well when squashing.

@jlebon
Copy link
Member

jlebon commented Oct 25, 2018

Yup sure, SGTM. To be fair, a race existed even before this PR (i.e. in waitUntilUpdate between comparing dc to cc and actually starting to watch for annotation updates).

@ashcrow
Copy link
Member

ashcrow commented Oct 25, 2018

/test images

@ashcrow
Copy link
Member

ashcrow commented Oct 26, 2018

/test e2e-aws

@ashcrow
Copy link
Member

ashcrow commented Oct 26, 2018

@yuqi-zhang needs a rebase.

Change existing watch workflow to an informer instead, as watches
are prone to breaking and restarting the MCD. The Run() function
within MCD is changed from an infinite loop to instead block,
and updates are handled by callbacks via the informer.

Signed-off-by: Yu Qi Zhang <jerzhang@redhat.com>
@yuqi-zhang
Copy link
Contributor Author

Rebased, squashed, and re-tested locally

@jlebon
Copy link
Member

jlebon commented Oct 26, 2018

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 26, 2018
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jlebon, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 26, 2018
@openshift-merge-robot openshift-merge-robot merged commit 2d42cab into openshift:master Oct 26, 2018
@yuqi-zhang yuqi-zhang deleted the fix-mcd-node-watch branch August 26, 2019 15:08
wking added a commit to wking/machine-config-operator that referenced this pull request May 14, 2020
AddFunc docs in [1,2].  We've been setting an UpdateFunc on this
informer since we pivoted to node informers in d67e633 (daemon: use
informers to check for updates, 2018-10-26, openshift#130).  But we need to
listen for add events to catch situations where the node's
desiredConfig is altered before the informer's initial List call.
This avoids situtations like [3,4]:

  I0514 03:27:54.773485    4242 update.go:1346] Starting to manage node: ci-op-9gpgc-w-d-hdjg2.c.openshift-gce-devel-ci.internal
  ...
  I0514 03:42:37.392240    4242 request.go:1068] Response Body: {"kind":"Node","apiVersion":"v1","metadata":{"name":"ci-op-9gpgc-w-d-hdjg2.c.openshift-gce-devel-ci.internal"..."machineconfiguration.openshift.io/desiredConfig":"rendered-node-log-level-14137156-e501-4d3b-9a93-7fff577b45e5-a048da8ad1c08dce6dc6aa3c1101fb44"...
  ...
  I0514 03:43:18.111048    1897 update.go:1346] Starting to manage node: ci-op-9gpgc-w-d-hdjg2.c.openshift-gce-devel-ci.internal
  ...
  I0514 03:43:21.844234    1897 request.go:1068] Response Body: {"kind":"NodeList","apiVersion":"v1","metadata":{"selfLink":"/api/v1/nodes","resourceVersion":"28107"},"items":[{"metadata":{"name":"ci-op-9gpgc-m-0.c.openshift-gce-devel-ci.internal"..."machineconfiguration.openshift.io/currentConfig":"rendered-master-9f55fba0ad22a8069d04b6ddf87b8ed9","machineconfiguration.openshift.io/desiredConfig":"rendered-master-9f55fba0ad22a8069d04b6ddf87b8ed9"...
  ...
  I0514 03:48:15.568762    1897 daemon.go:1119] Updating Node ci-op-9gpgc-w-d-hdjg2.c.openshift-gce-devel-ci.internal
  ...
  I0514 03:48:16.569041    1897 daemon.go:368] Started syncing node "ci-op-9gpgc-w-d-hdjg2.c.openshift-gce-devel-ci.internal" (2020-05-14 03:48:16.569022688 +0>
  I0514 03:48:16.590279    1897 daemon.go:767] Current config: rendered-worker-a048da8ad1c08dce6dc6aa3c1101fb44
  I0514 03:48:16.590302    1897 daemon.go:768] Desired config: rendered-node-log-level-14137156-e501-4d3b-9a93-7fff577b45e5-a048da8ad1c08dce6dc6aa3c1101fb44

That was:

1. Somebody sets desiredConfig to rendered-node-log-level-... while
   the old MCD (PID 4242) is still running.
2. New MCD (PID 1897) comes up, and fires up its Node informer.
3. Informer does a List while starting up, and sees the node with
   desiredConfig set to rendered-node-log-level-...  Goes to call
   AddFunc, but until this commit we weren't setting it, so MCD does
   nothing.
4. Informer sets up its Watch.
5. Time passes...
6. Something else changes the managed Node (kubelet heartbeat?  Who cares?), Watch trips, UpdateFunc called.
7. MCD notices the desiredConfig and begins applying rendered-node-log-level-...

With this commit, we'll react at step 3 instead of 7, hopefully fixing
[5].

[1]: https://pkg.go.dev/k8s.io/client-go/tools/cache?tab=doc#ResourceEventHandlerFuncs
[2]: https://pkg.go.dev/k8s.io/client-go/tools/cache?tab=doc#ResourceEventHandlerFuncs.OnAdd
[3]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/1730/pull-ci-openshift-machine-config-operator-master-e2e-gcp-op/2250
[4]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/1730/pull-ci-openshift-machine-config-operator-master-e2e-gcp-op/2250/artifacts/e2e-gcp-op/pods/openshift-machine-config-operator_machine-config-daemon-pq5wv_machine-config-daemon.log
[5]: https://bugzilla.redhat.com/show_bug.cgi?id=1834895
osherdp pushed a commit to osherdp/machine-config-operator that referenced this pull request Apr 13, 2021
BUG 1697644: operator: use openshift-config/pull-secret to fetch the pull secret
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants