Antrea CNI not cleaned up after deletion #1659

jayunit100 · 2020-12-11T19:13:24Z

Describe the bug

Kubelet seems to still want to use antrea as its CNI provider, even after antrea is removed:

, error: exit status 1, failed to stop running pod ffa31025e18df8891f96a80b26cf800a371e8db798aff19fa24454ee487f2b6d: output: time="2020-12-11T14:09:00-05:00" level=fatal msg="stopping the pod sandbox \"ffa31025e18df8891f96a80b26cf800a371e8db798aff19fa24454ee487f2b6d\" failed: rpc error: code = Unknown desc = failed to destroy network for sandbox \"ffa31025e18df8891f96a80b26cf800a371e8db798aff19fa24454ee487f2b6d\": rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/antrea/cni.sock: connect: connection refused\""

To Reproduce

Uninstall antrea and then attempt to install a different CNI provider. Containers will land on nodes (nodes continue to be marked as schedulable) even though theres no cni.sock.

Expected

After the antrea cni is gone, the kubelet would no longer be aware of any antrea related sockets.

(NOTE: This might just be a race condition where you need to wait some time before you can finish saying youve succesfully uninstalled the CNI, and if so, maybe its more a kubelet issue then an antrea one).
Actual behavior
A clear and concise description of what's the actual behavior. If applicable, add screenshots, log messages, etc. to help explain the problem.

Versions:
Please provide the following information:

Antrea version (Docker image tag).
Kubernetes version (use kubectl version). If your Kubernetes components have different versions, please provide the version for all of them.
Container runtime: which runtime are you using (e.g. containerd, cri-o, docker) and which version are you using?
Linux kernel version on the Kubernetes Nodes (uname -r).
If you chose to compile the Open vSwitch kernel module manually instead of using the kernel module built into the Linux kernel, which version of the OVS kernel module are you using? Include the output of modinfo openvswitch for the Kubernetes Nodes.

Additional context
Add any other context about the problem here, such as Antrea logs, kubelet logs, etc.

(Please consider pasting long output into a GitHub gist or any other pastebin.)

The text was updated successfully, but these errors were encountered:

antoninbas · 2020-12-11T19:37:30Z

This is a duplicate of #181, so I will close this issue.
We are aware of this. This is an issue which is common to most (all?) CNIs AFAIK. I do not believe there is a good way to automate this cleanup when you delete Antrea K8s resources. However, we could and should provide a mechanism for cleanup, like other CNIs do, e.g. in the form of a K8s Job.

Kaushal28 · 2021-11-01T10:00:18Z

So what are the steps for complete cleanup?

jayunit100 added the kind/bug Categorizes issue or PR as related to a bug. label Dec 11, 2020

antoninbas closed this as completed Dec 11, 2020

antoninbas added the triage/duplicate Indicates an issue is a duplicate of other open issue. label Dec 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Antrea CNI not cleaned up after deletion #1659

Antrea CNI not cleaned up after deletion #1659

jayunit100 commented Dec 11, 2020

antoninbas commented Dec 11, 2020

Kaushal28 commented Nov 1, 2021

Antrea CNI not cleaned up after deletion #1659

Antrea CNI not cleaned up after deletion #1659

Comments

jayunit100 commented Dec 11, 2020

antoninbas commented Dec 11, 2020

Kaushal28 commented Nov 1, 2021