-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Controller crashes if ENABLE_NP is false and not specifying a gateway node #4841
Comments
The issue you're encountering is due to the When you create a subnet using the Go SDK without specifying a gateway, Kube-OVN automatically modifies the subnet to set the gateway to the first IP in the CIDR block. This triggers two rapid events: the initial creation and a subsequent modification, which leads to the crash because the In contrast, when using To address this issue, you can implement a nil check for To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
When kube-ovn-cni initializes the Pod network, it pings the gateway address to ensure the network is ready. If a NetworkPolicy is present, it must allow traffic to the gateway IP to ensure the ping succeeds. That's why the logical NetworkPolicy list exists. |
@oilbeater do you hve an idea why not specifying a gateway in a YAML file doesn't trigger that crash but through the SDK it does? I can implement a simple fix based on a check for nil pointers, but I also want to understand why no one has noticed this bug before, am I calling the SDK wrong? |
@SkalaNetworks with this yaml and kubectl, kube-ovn-controller also crash. So I think is a common issue. In theory, when gateway IP changed the crash will happen.
|
Sometimes the kube-ovn-controller can recover from crash, maybe related to some order of startup and subnet handle. |
Yeah, I can't reproduce the bug very consistently with YAML files, but the SDK fails 100% of the time. This is strange. I pushed a fix. |
Kube-OVN Version
v1.13/v1.14
Kubernetes Version
v1.31.0
Operation-system/Kernel Version
/
Description
The controller crashes when submitting a new Subnet to the K8S API through the Go SDK if network policies have been disabled.
The reason is a panic due to a nil pointer found here:
kube-ovn/pkg/controller/subnet.go
Line 72 in 77650e5
We try to use a npLister, which isn't initialized due to network policies being disabled. The only way to not trigger that call is for the above condition to evaluate to false. My example code doesn't trigger that code if I set the gateway manually, but if I don't specify the gateway, it seems Kube-OVN accepts the Subnet as-is, and a millisecond later edits the Gateway to the first IP in the subnet.
This means there's 2 events in quick succession:
And then it crashes. I don't understand how my call from Go code triggers the modification of the gateway, but doing it using kubectl doesn't. If I don't specify the gateway in a YAML file and submit it, no crash happens.
I am unsure if my way of creating Subnets using the SDK is the right one considering this bug, but it's the standard K8S way.
I also don't know why NPs need to be re-enqued if the gateway changes, but adding a condition that fails to check if npLister is nil would be a quick fix for this problem.
Steps To Reproduce
Current Behavior
Controller crashes
Expected Behavior
Controller doesn't crash
The text was updated successfully, but these errors were encountered: