-
Notifications
You must be signed in to change notification settings - Fork 287
🐛 Return from reconciler after adding finalizer #1464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Return from reconciler after adding finalizer #1464
Conversation
✅ Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mdbooth The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This avoids an extremely surprising write-after-read consistency bug when later returning after making subsequent object changes. The issue stems from the fact that we don't write to the same place we read from. We read from a cache: a shared informer which is populated from a watch running against the apiserver. However we write directly to the apiserver rather than writing through the cache. If we patch the watched object without returning, while we continue to execute the change will be written to the api server and will propagate to our own shared informer. This change will result in a new reconcile being queued for this version of the object which now has a Finalizer but no other changes. Taking the case of the machine controller we will then go on to create a server and we will patch the object with a providerID and return. On return we will write this change to the apiserver which will eventually propagate to our shared informer and result in another reconcile. However our previous patch has already propagated, so we will be *immediately* called with the *old version* of the object. If we've been careful this will hopefully be no more than inefficient, but can lead to very hard to debug errors. The safe way to patch objects when reconcile is called from an informer is to always return after patching an object which will directly result in another reconcile. We also *must not* set Requeue in the returned Result as this will result in being called again with the stale object. When patching an object we must return and wait for the informer to see the change and call the reconciler again.
7387216 to
d71d7aa
Compare
lentzi90
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
/hold cancel |
|
Thank you for the fix and even more for the explanation. |
|
Should we backport to release-0.7? I think it would be nice to get this in a release before the next minor 🙂 |
|
/cherry-pick release-0.7 |
|
@lentzi90: #1464 failed to apply on top of branch "release-0.7": DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This avoids an extremely surprising write-after-read consistency bug when later returning after making subsequent object changes.
The issue stems from the fact that we don't write to the same place we read from. We read from a cache: a shared informer which is populated from a watch running against the apiserver. However we write directly to the apiserver rather than writing through the cache.
If we patch the watched object without returning, while we continue to execute the change will be written to the api server and will propagate to our own shared informer. This change will result in a new reconcile being queued for this version of the object which now has a Finalizer but no other changes.
Taking the case of the machine controller we will then go on to create a server and we will patch the object with a providerID and return. On return we will write this change to the apiserver which will eventually propagate to our shared informer and result in another reconcile. However our previous patch has already propagated, so we will be immediately called with the old version of the object. If we've been careful this will hopefully be no more than inefficient, but can lead to very hard to debug errors.
The safe way to patch objects when reconcile is called from an informer is to always return after patching an object which will directly result in another reconcile. We also must not set Requeue in the returned Result as this will result in being called again with the stale object. When patching an object we must return and wait for the informer to see the change and call the reconciler again.
/hold