Skip to content

Enhance Felix route table for elevated priority programming#11904

Merged
nelljerram merged 10 commits into
projectcalico:masterfrom
nelljerram:routing-priority
Mar 2, 2026
Merged

Enhance Felix route table for elevated priority programming#11904
nelljerram merged 10 commits into
projectcalico:masterfrom
nelljerram:routing-priority

Conversation

@nelljerram
Copy link
Copy Markdown
Member

JIRA ticket: CORE-12272

On its own, this PR does not yet change how Felix programs local routes. SetRoutes callers do not currently set the new Priority field, so programmed Priority will be 0 for IPv4 routes and 1024 for IPv6 routes, as was the case before this PR.

Upcoming, but separate, live migration work will:

  • add Felix configuration fields for "normal" and "elevated" priority values
  • change route programming outside of live migration to use the "normal" priority value
  • program routes with the "elevated" priority value during a live migration, as part of ensuring the best possible handover

JIRA ticket: CORE-12272

On its own, this PR does not yet change how Felix programs local routes.  `SetRoutes` callers do not
currently set the new `Priority` field, so programmed `Priority` will be 0 for IPv4 routes and 1024
for IPv6 routes, as was the case before this PR.

Upcoming, but separate, live migration work will:
- add Felix configuration fields for "normal" and "elevated" priority values
- change route programming outside of live migration to use the "normal" priority value
- program routes with the "elevated" priority value during a live migration, as part of ensuring the best possible handover
@nelljerram nelljerram requested a review from a team as a code owner February 23, 2026 16:02
Copilot AI review requested due to automatic review settings February 23, 2026 16:03
@marvin-tigera marvin-tigera added this to the Calico v3.32.0 milestone Feb 23, 2026
@marvin-tigera marvin-tigera added release-note-required Change has user-facing impact (no matter how small) docs-pr-required Change is not yet documented labels Feb 23, 2026
@nelljerram nelljerram added docs-not-required Docs not required for this change release-note-not-required Change has no user-facing impact and removed release-note-required Change has user-facing impact (no matter how small) docs-pr-required Change is not yet documented labels Feb 23, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends Felix’s route table abstraction to support programming routes with an explicit kernel route priority (metric), enabling future work to program “normal” vs “elevated” priority routes during live migration while preserving existing default behavior when unset.

Changes:

  • Add a Priority field to routetable.Target.
  • Include priority in the kernel route key and propagate it through read/program/delete paths (including the IPv6 “priority 0 means default 1024” handling).
  • Update RouteTable tests to set/assert explicit priorities and exercise deletion behavior for routes with differing priorities.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
felix/routetable/defs.go Adds Priority to Target.
felix/routetable/route_table.go Threads priority into kernel route key and route programming/reading logic.
felix/routetable/route_table_test.go Updates tests to include/verify route priority behavior.

Comment thread felix/routetable/route_table.go Outdated
Comment on lines 632 to 635
func (r *RouteTable) recalculateDesiredKernelRoute(cidr ip.CIDR, target Target) {
defer r.updateGauges()
kernKey := r.routeKeyForCIDR(cidr)
kernKey := r.routeKeyForCIDR(cidr, target)
oldDesiredRoute, _ := r.kernelRoutes.Desired().Get(kernKey)
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recalculateDesiredKernelRoute() computes kernKey from the caller-supplied target before determining bestTarget. If different candidates for the same CIDR can have different Priority values (or if a route’s priority changes over time), this will program the chosen route under the wrong key/metric and can also leave stale desired entries/routes behind (old key is never deleted). Consider deriving kernKey from the selected bestTarget (after conflict resolution) and explicitly deleting any prior desired entries for the same CIDR/TOS with a different priority to avoid leaking routes when priority changes.

Copilot uses AI. Check for mistakes.
Comment thread felix/routetable/route_table.go
Comment thread felix/routetable/route_table.go Outdated
Comment on lines 511 to 518
routesByCIDR[target.CIDR] = target
r.addOwningIface(routeClass, ifaceName, target.CIDR)
r.addOwningIface(routeClass, ifaceName, target.CIDR, target)
r.updatePermanentARP(ifaceName, target.CIDR.Addr(), target.DestMAC)
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RouteUpdate() overwrites the existing target for a CIDR and then recalculates using only the new target. With Priority now part of the kernel route key, updating an existing CIDR’s priority via RouteUpdate() will not remove the old desired/dataplane entry keyed by the previous priority, potentially leaving an orphaned route behind. Consider capturing the previous target (if any) and cleaning up any old key-affecting state (e.g., delete the old desired key / queue deletion of the old (CIDR, oldPriority) route) before setting the new target.

Copilot uses AI. Check for mistakes.
nelljerram and others added 7 commits February 26, 2026 12:52
Relatedly:
- Change `RouteRemove` to take a `Target` (like `RouteUpdate` and `SetRoutes`) instead of just
CIDR.  (Only the key fields are significant in `RouteRemove` calls.)

Todo:
- Test adding/removing routes with same CIDR and different priorities at the same time.
- Do we need to handle different priorities in `conntrackTracker` calls?
- Rename kernelRouteKey to RouteKey and make it part of the API.
- In Target use an embedded RouteKey instead of equivalent individual fields.
- Update RouteRemove to take RouteKey instead of Target.
RouteUpdate and SetRoutes normalize IPv6 Priority 0 to 1024 (since the
kernel treats 0 as "use default" which is 1024 for IPv6), but RouteRemove
did not. This meant RouteRemove with Priority 0 would fail to find and
remove IPv6 routes that were stored with the normalized key Priority 1024.

Extract normalizeRouteKey from routeKeyForTarget and call it in
RouteRemove as well.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…CIDR

Include Priority in mock netlink KeyForRoute so routes with the same CIDR
but different priorities get distinct keys in the mock dataplane. Update
all existing route key assertions accordingly.

Add 11 new tests covering multi-priority route scenarios: adding and
removing routes at different priorities, resync behavior, and stale route
cleanup. Add a test for IPv6 RouteRemove Priority normalization.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Test the two key live migration subcases where Felix-managed routes
coexist with external BIRD routes at different priorities for the same
VM IP:

(a) Source host: Felix local route at normal priority, BIRD remote route
    appears at elevated priority, Felix removes its route, BIRD reverts
    to normal priority.

(b) Destination host: BIRD remote route at normal priority, Felix
    programs local route at elevated priority, BIRD route removed, Felix
    reverts to normal priority.

Verifies that resync never disturbs the external BIRD routes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a detailed comment explaining the interaction between conntrack
cleanup and multiple route priorities.  The ConntrackCleanupManager is
keyed on CIDR (one owner per CIDR) while routes are keyed on RouteKey
(CIDR + Priority).  For the live migration use case this is safe: Felix
only manages one route per CIDR at a time, with the coexisting BIRD
route being external to the tracker.

On the source host, the conntrack flush when Felix removes its local
route is correct: the VM is leaving, so stale conntrack entries should
be flushed to force policy re-evaluation (the return path may now
traverse different HostEndpoints on a different host).

On the destination host, no flush is triggered because BIRD's
pre-existing route was never tracked.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
for _, t := range targets {
delete(oldTargetsToCleanUp, t.CIDR)
newTargets[t.CIDR] = t
routeKey := r.normalizeRouteKey(t.RouteKey)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pondering if there's a way to force normalization to avoid forgetting to do it on some new code path in future. It was forced before because we only stored CIDR and then the conversion to kernel route did the trick.

We could index the internal fields with a private normalizedRouteKey type, which is the same as RouteKey but the constructor normalizes. Probably overkill!

Comment thread felix/routetable/route_table.go Outdated
handlemgr.WithSocketTimeout(netlinkTimeout),
)

// Interaction between conntrack cleanup and multiple route priorities
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is welcome, but maybe a bit big?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated now, PTAL when you have time.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I made it a lot shorter; but definitely more relevant I hope.

@nelljerram nelljerram merged commit 9ca264c into projectcalico:master Mar 2, 2026
3 checks passed
@nelljerram nelljerram deleted the routing-priority branch March 2, 2026 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-not-required Docs not required for this change release-note-not-required Change has no user-facing impact

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants