Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node stuck at deletion due to finalizer - regression #2669

Closed
ShylajaDevadiga opened this issue Dec 9, 2020 · 13 comments
Closed

Node stuck at deletion due to finalizer - regression #2669

ShylajaDevadiga opened this issue Dec 9, 2020 · 13 comments
Assignees
Milestone

Comments

@ShylajaDevadiga
Copy link
Contributor

Environment
k3s version v1.19.4+k3s-989c9369 (989c936)
Issue
Deletion of node is stuck due to presence of finalizer. This seems to be a regression

Steps to reproduce

  1. Create a 3 node cluster
  2. Verify nodes are Ready and pods are Running
  3. Delete node3 using kubectl delete node
    Deletion of a node is stuck, unless finalizer is removed.
--
    finalizers:
    - wrangler.cattle.io/managed-etcd-controller
    - wrangler.cattle.io/node
    labels:
--
          f:finalizers:
            .: {}
            v:"wrangler.cattle.io/managed-etcd-controller": {}
            v:"wrangler.cattle.io/node": {}

Related issue rancher/rke2#401

@davidnuzik
Copy link
Contributor

davidnuzik commented Dec 9, 2020

The issue was also reproduced in v1.19.3+k3s3. Since the issue is pre-existing we don't want it to block our imminent releases. We'll mention this in the release notes. I've bumped the milestone to the next set of patch releases. This will need to get fixed in 1.19, 1.20, and master.
We'll see if there is a workaround so that this can also be indicated in the release notes.

The issue seems to happen intermittently and will not always be observed. The release notes will briefly explain the issue and workaround to fix the finalizers so the node(s) can be deleted.

@davidnuzik davidnuzik changed the title Node stuck at deletion due to finalizer - regression [master, 1.19, 1.20] Node stuck at deletion due to finalizer - regression Dec 9, 2020
@ShylajaDevadiga
Copy link
Contributor Author

ShylajaDevadiga commented Dec 9, 2020

Note: To delete the node that is stuck during delete

kubectl get node -o name <nodename> | xargs -i kubectl patch {} -p '{"metadata":{"finalizers":[]}}' --type=merge

@xiaods
Copy link
Contributor

xiaods commented Apr 17, 2021

@davidnuzik do you know what reason caused this issue? curiously want to know the root cause.

@brandond
Copy link
Member

etcd node controller

@xiaods
Copy link
Contributor

xiaods commented Apr 21, 2021

etcd node controller

Opps, where is etcd node controller?

@brandond
Copy link
Member

@xiaods
Copy link
Contributor

xiaods commented Apr 26, 2021

@brandond hi, with my testing, i have not found the stable reproduce the node stuck for this bug.
the above issue describe issue base on v1.19.4
I am not sure the latest version 1.19.10 can resolve the bug.

@brandond
Copy link
Member

I'm not sure how to reproduce it on demand either. My understanding is that something on the rancher/wrangler side has an issue where onDelete handlers leave stuck finalizers. For this reason, onDelete handlers are to be avoided in general, and the code needs to be rewritten to periodically reconcile the Kubernetes and etcd cluster member lists, instead of relying on watching v1.Node.

@xiaods
Copy link
Contributor

xiaods commented Apr 28, 2021

I'm not sure how to reproduce it on demand either. My understanding is that something on the rancher/wrangler side has an issue where onDelete handlers leave stuck finalizers. For this reason, onDelete handlers are to be avoided in general, and the code needs to be rewritten to periodically reconcile the Kubernetes and etcd cluster member lists, instead of relying on watching v1.Node.

thanks for your clarify. can't wait to fix it asap.

@mitchellmaler
Copy link

I have been testing IaC code to rebuild rke2 nodes and ran into this issue. From what I can tell it is only the control-plane nodes which points as noted above is probably the etcd controller. When recreating the issue I checked that the etcd member was removed which it always was so the controller didn't fail to complete that task. One thing I have started checking is if the current node being deleted is the current etcd leader. I have added some logic to call 'move-leader' if the current node being deleted is the etcd leader and have yet to run into the same issue.

@brandond brandond modified the milestones: v1.21 - Backlog, v1.22.0+k3s1 Aug 6, 2021
@brandond brandond changed the title [master, 1.19, 1.20] Node stuck at deletion due to finalizer - regression Node stuck at deletion due to finalizer - regression Aug 6, 2021
@fapatel1 fapatel1 modified the milestones: v1.22.0+k3s1, v1.22.2+k3s1 Aug 23, 2021
@rancher-max
Copy link
Contributor

Validated on master branch using commitid ad1a40a96c400c17780cf9455f5da330d690194c

As this is an intermittent issue, I spun up multiple clusters and deleted a node and none of them were stuck. This has been fixed in all of our currently released versions as well (1.19.14, 1.20.10, and 1.21.4) so any users tracking this thread should be able to upgrade to one of those versions to see this fixed.

@ozbillwang
Copy link

ozbillwang commented Jun 10, 2022

Note: To delete the node that is stuck during delete

kubectl get node -o name <nodename> | xargs -i kubectl patch {} -p '{"metadata":{"finalizers":[]}}' --type=merge

Thanks so much. I stuck at the same issue for more than 1 hour, until saw this

@CihatDinc
Copy link

Note: To delete the node that is stuck during delete

kubectl get node -o name <nodename> | xargs -i kubectl patch {} -p '{"metadata":{"finalizers":[]}}' --type=merge

it worked, thanks 👍

@k3s-io k3s-io locked and limited conversation to collaborators Jan 26, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests