-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node stuck at deletion due to finalizer - regression #2669
Comments
The issue was also reproduced in v1.19.3+k3s3. Since the issue is pre-existing we don't want it to block our imminent releases. We'll mention this in the release notes. I've bumped the milestone to the next set of patch releases. This will need to get fixed in 1.19, 1.20, and master. The issue seems to happen intermittently and will not always be observed. The release notes will briefly explain the issue and workaround to fix the finalizers so the node(s) can be deleted. |
Note: To delete the node that is stuck during delete
|
@davidnuzik do you know what reason caused this issue? curiously want to know the root cause. |
etcd node controller |
Opps, where is etcd node controller? |
@brandond hi, with my testing, i have not found the stable reproduce the node stuck for this bug. |
I'm not sure how to reproduce it on demand either. My understanding is that something on the rancher/wrangler side has an issue where onDelete handlers leave stuck finalizers. For this reason, onDelete handlers are to be avoided in general, and the code needs to be rewritten to periodically reconcile the Kubernetes and etcd cluster member lists, instead of relying on watching v1.Node. |
thanks for your clarify. can't wait to fix it asap. |
I have been testing IaC code to rebuild rke2 nodes and ran into this issue. From what I can tell it is only the control-plane nodes which points as noted above is probably the etcd controller. When recreating the issue I checked that the etcd member was removed which it always was so the controller didn't fail to complete that task. One thing I have started checking is if the current node being deleted is the current etcd leader. I have added some logic to call 'move-leader' if the current node being deleted is the etcd leader and have yet to run into the same issue. |
Validated on master branch using commitid
|
Thanks so much. I stuck at the same issue for more than 1 hour, until saw this |
it worked, thanks 👍 |
Environment
k3s version v1.19.4+k3s-989c9369 (989c936)
Issue
Deletion of node is stuck due to presence of finalizer. This seems to be a regression
Steps to reproduce
Deletion of a node is stuck, unless finalizer is removed.
Related issue rancher/rke2#401
The text was updated successfully, but these errors were encountered: