Enhance Felix route table for elevated priority programming#11904
Conversation
JIRA ticket: CORE-12272 On its own, this PR does not yet change how Felix programs local routes. `SetRoutes` callers do not currently set the new `Priority` field, so programmed `Priority` will be 0 for IPv4 routes and 1024 for IPv6 routes, as was the case before this PR. Upcoming, but separate, live migration work will: - add Felix configuration fields for "normal" and "elevated" priority values - change route programming outside of live migration to use the "normal" priority value - program routes with the "elevated" priority value during a live migration, as part of ensuring the best possible handover
There was a problem hiding this comment.
Pull request overview
This PR extends Felix’s route table abstraction to support programming routes with an explicit kernel route priority (metric), enabling future work to program “normal” vs “elevated” priority routes during live migration while preserving existing default behavior when unset.
Changes:
- Add a
Priorityfield toroutetable.Target. - Include priority in the kernel route key and propagate it through read/program/delete paths (including the IPv6 “priority 0 means default 1024” handling).
- Update RouteTable tests to set/assert explicit priorities and exercise deletion behavior for routes with differing priorities.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| felix/routetable/defs.go | Adds Priority to Target. |
| felix/routetable/route_table.go | Threads priority into kernel route key and route programming/reading logic. |
| felix/routetable/route_table_test.go | Updates tests to include/verify route priority behavior. |
| func (r *RouteTable) recalculateDesiredKernelRoute(cidr ip.CIDR, target Target) { | ||
| defer r.updateGauges() | ||
| kernKey := r.routeKeyForCIDR(cidr) | ||
| kernKey := r.routeKeyForCIDR(cidr, target) | ||
| oldDesiredRoute, _ := r.kernelRoutes.Desired().Get(kernKey) |
There was a problem hiding this comment.
recalculateDesiredKernelRoute() computes kernKey from the caller-supplied target before determining bestTarget. If different candidates for the same CIDR can have different Priority values (or if a route’s priority changes over time), this will program the chosen route under the wrong key/metric and can also leave stale desired entries/routes behind (old key is never deleted). Consider deriving kernKey from the selected bestTarget (after conflict resolution) and explicitly deleting any prior desired entries for the same CIDR/TOS with a different priority to avoid leaking routes when priority changes.
| routesByCIDR[target.CIDR] = target | ||
| r.addOwningIface(routeClass, ifaceName, target.CIDR) | ||
| r.addOwningIface(routeClass, ifaceName, target.CIDR, target) | ||
| r.updatePermanentARP(ifaceName, target.CIDR.Addr(), target.DestMAC) |
There was a problem hiding this comment.
RouteUpdate() overwrites the existing target for a CIDR and then recalculates using only the new target. With Priority now part of the kernel route key, updating an existing CIDR’s priority via RouteUpdate() will not remove the old desired/dataplane entry keyed by the previous priority, potentially leaving an orphaned route behind. Consider capturing the previous target (if any) and cleaning up any old key-affecting state (e.g., delete the old desired key / queue deletion of the old (CIDR, oldPriority) route) before setting the new target.
Relatedly: - Change `RouteRemove` to take a `Target` (like `RouteUpdate` and `SetRoutes`) instead of just CIDR. (Only the key fields are significant in `RouteRemove` calls.) Todo: - Test adding/removing routes with same CIDR and different priorities at the same time. - Do we need to handle different priorities in `conntrackTracker` calls?
- Rename kernelRouteKey to RouteKey and make it part of the API. - In Target use an embedded RouteKey instead of equivalent individual fields. - Update RouteRemove to take RouteKey instead of Target.
RouteUpdate and SetRoutes normalize IPv6 Priority 0 to 1024 (since the kernel treats 0 as "use default" which is 1024 for IPv6), but RouteRemove did not. This meant RouteRemove with Priority 0 would fail to find and remove IPv6 routes that were stored with the normalized key Priority 1024. Extract normalizeRouteKey from routeKeyForTarget and call it in RouteRemove as well. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…CIDR Include Priority in mock netlink KeyForRoute so routes with the same CIDR but different priorities get distinct keys in the mock dataplane. Update all existing route key assertions accordingly. Add 11 new tests covering multi-priority route scenarios: adding and removing routes at different priorities, resync behavior, and stale route cleanup. Add a test for IPv6 RouteRemove Priority normalization. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Test the two key live migration subcases where Felix-managed routes
coexist with external BIRD routes at different priorities for the same
VM IP:
(a) Source host: Felix local route at normal priority, BIRD remote route
appears at elevated priority, Felix removes its route, BIRD reverts
to normal priority.
(b) Destination host: BIRD remote route at normal priority, Felix
programs local route at elevated priority, BIRD route removed, Felix
reverts to normal priority.
Verifies that resync never disturbs the external BIRD routes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a detailed comment explaining the interaction between conntrack cleanup and multiple route priorities. The ConntrackCleanupManager is keyed on CIDR (one owner per CIDR) while routes are keyed on RouteKey (CIDR + Priority). For the live migration use case this is safe: Felix only manages one route per CIDR at a time, with the coexisting BIRD route being external to the tracker. On the source host, the conntrack flush when Felix removes its local route is correct: the VM is leaving, so stale conntrack entries should be flushed to force policy re-evaluation (the return path may now traverse different HostEndpoints on a different host). On the destination host, no flush is triggered because BIRD's pre-existing route was never tracked. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| for _, t := range targets { | ||
| delete(oldTargetsToCleanUp, t.CIDR) | ||
| newTargets[t.CIDR] = t | ||
| routeKey := r.normalizeRouteKey(t.RouteKey) |
There was a problem hiding this comment.
Just pondering if there's a way to force normalization to avoid forgetting to do it on some new code path in future. It was forced before because we only stored CIDR and then the conversion to kernel route did the trick.
We could index the internal fields with a private normalizedRouteKey type, which is the same as RouteKey but the constructor normalizes. Probably overkill!
| handlemgr.WithSocketTimeout(netlinkTimeout), | ||
| ) | ||
|
|
||
| // Interaction between conntrack cleanup and multiple route priorities |
There was a problem hiding this comment.
Comment is welcome, but maybe a bit big?
There was a problem hiding this comment.
Updated now, PTAL when you have time.
There was a problem hiding this comment.
I'm not sure I made it a lot shorter; but definitely more relevant I hope.
JIRA ticket: CORE-12272
On its own, this PR does not yet change how Felix programs local routes.
SetRoutescallers do not currently set the newPriorityfield, so programmedPrioritywill be 0 for IPv4 routes and 1024 for IPv6 routes, as was the case before this PR.Upcoming, but separate, live migration work will: