-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Race when deleting a pod and subnet with multiple IPs the same subnet bound to the same pod #4898
Comments
The issues you're encountering with race conditions during the deletion of pods and subnets in Kube-OVN are related to the management of finalizers. Finalizers are used to ensure that certain cleanup operations are completed before a resource is fully deleted. In your case, the lingering IPs and subnets that cannot be deleted without manually patching the finalizers indicate that the cleanup process is not being completed as expected. For Race A, the problem arises when a pod is being created, but its IP allocation isn't complete. If you delete both the pod and the subnet at this stage, the IP address remains because the finalizer prevents its deletion until the IP allocation process is fully resolved. For Race B, the issue is that even after the pod is deleted, the IP and subnet remain due to the finalizer not being removed. This can happen if the system doesn't recognize that all IPs are no longer in use, preventing the subnet from being deleted. To address these issues, you might consider the following:
These steps should help mitigate the race conditions you're experiencing and ensure that resources are properly cleaned up when a pod or subnet is deleted. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
could you please attach the log I want to see if the pod deletion triggers IP deletion, and, if IP deletion trigger subnet update. |
For IP CR bind to VM, the lifecycle is bind to the VM CR not the Pod. Does the VM CR still exists? |
@oilbeater I'm not using KubeVirt, all VMs are StatefulSets. I confirmed pods and sfs are gone. |
Does the logical switch port in ovn-nb still exist? |
@oilbeater For case B, no after patched. I believe no before patched but I can't verify it now. |
For case A, the subnet is deleted before the IP is created. As a result, when the pod is deleted, it cannot find the related subnet and fails to delete the IP CR. I believe case B behaves similarly. To completely avoid this issue, a lock is needed for every subnet and related IP operation to maintain the correct order. However, this may slow down pod creation. I will try to decouple pod deletion from the subnet and delete all related IP CRs based solely on pod annotations. |
Kube-OVN Version
v1.12.28
Kubernetes Version
/
Operation-system/Kernel Version
/
Description
Race A
When a pod is in the process of being created, but before its IP allocation is complete—if you delete both the pod and the subnet now, the following issues occur:
kubectl delete ip
.kubectl patch ips --type=merge -p '{"metadata":{"finalizers":[]}}'
Race B
I'm unsure about the triggering conditions, but the final test results are:
This is quite common in my production cluster, with about a 10% probability of recurrence, but I'm still unsure of the underlying cause.
Steps To Reproduce
Subnet YAML: #4822 (comment)
Pod YAML:
Current Behavior
/
Expected Behavior
/
The text was updated successfully, but these errors were encountered: