NodeRouteController should account for PodCIDR updates #6965
Labels
area/transit/routing
Issues or PRs related to routing.
kind/bug
Categorizes issue or PR as related to a bug.
Describe the bug
The
PodCIDR
field inNodeSpec
is "immutable": once set, it cannot be changed.However, because the
NodeRouteController
uses a workqueue (with the Node name as the key), it is possible to observe the same Node name inaddNodeRoute
, but with a different PodCIDR. This is not something we account for today, which means the controller may be buggy.More precisely, the following sequence of events would trigger a bug, where the datapath uses a stale PodCIDR:
The Delete handler (step 2) and the Add handler (step 3) will both enqueue the Node name ("foo"). If the time between steps 2 and 3 is small enough, the events will be "merged" in the workqueue, and
syncNodeRoute
will be called only once. At that point,syncNodeRoute
will get Node Y from the lister andaddNodeRoute
will be called. There will still be an entry in theinstalledNodes
indexer (corresponding to Node Y) and because we do not check that the PodCIDRs match, we will skip updating the datapath:antrea/pkg/agent/controller/noderoute/node_route_controller.go
Lines 524 to 528 in e4aedec
Note that because we do check that the Node IPs match, Node Y must have the same IPs as Node X in order for the issue to be observed (i.e., same name and same IPs). This "constraint", coupled with timing considerations, mean that this issue is quite unlikely.
To Reproduce
The following unit test can be used (add it to
pkg/agent/controller/noderoute/node_route_controller_test.go
):Versions:
v2.2
The text was updated successfully, but these errors were encountered: