Skip to content

[BPF] Re-wire SetTriggerFn on syncer swap in proxy.SetSyncer#12701

Merged
tomastigera merged 1 commit into
projectcalico:masterfrom
tomastigera:tomas/bpf-proxy-set-trigger-fn
May 11, 2026
Merged

[BPF] Re-wire SetTriggerFn on syncer swap in proxy.SetSyncer#12701
tomastigera merged 1 commit into
projectcalico:masterfrom
tomastigera:tomas/bpf-proxy-set-trigger-fn

Conversation

@tomastigera
Copy link
Copy Markdown
Contributor

Description

proxy.New() wires the syncer's expand-NodePort fixup callback to the proxy's runner via dp.SetTriggerFn(p.runner.Run) (proxy.go:180). proxy.SetSyncer() swaps in a fresh syncer (e.g. on a host-IP change inside KubeProxy.run()) but did not re-wire that callback.

Effect

The expand-NodePort fixup goroutine handles ExternalTrafficPolicy=Local NodePort services with backends on remote nodes whose routes were not yet programmed when the syncer first computed dataplane state. It re-runs expandNodePorts on each route-table change and, when a previously-missed route now resolves, calls s.triggerFn() to schedule an Apply (syncer.go:1216).

After any host-IP change, the new syncer that KubeProxy.run() hands to SetSyncer had triggerFn = nil, so the fixup's resolution was silently lost until some unrelated service/endpoint update piggybacked an Apply. In a quiet cluster, NodePort entries to backends on certain remote nodes could remain stale for an arbitrary period.

Origin

The asymmetry is pre-existing — kp.run() always created a fresh syncer and called SetSyncer on it before #12667. But #12667 made it visible in the diff: the first call now also goes through this path's logical sibling (proxy.New(), which does set the trigger), making the missing wiring on SetSyncer plain to read. Caught on the v3.31 backport review (#12693).

Fix

Have proxy.SetSyncer call s.SetTriggerFn(p.runner.Run) on the new syncer, under the same lock as the dpSyncer swap, mirroring what proxy.New() does for the initial syncer.

Test

New regression test should re-arm the NodePort externalTrafficPolicy=Local route-fixup trigger after a syncer swap in proxy_test.go swaps a syncer via SetSyncer, verifies the new syncer received a non-nil trigger callback, and confirms invoking it schedules an Apply.

The test fails on master before this change with:

SetSyncer must wire the fixup callback on the new syncer
Expected <func()>: nil not to be nil

Related issues/PRs

Follow-up to #12667. Surfaced during v3.31 backport review (#12693, comment thread on kube-proxy.go:161).

Release Note

ebpf - Fix kube-proxy losing the NodePort externalTrafficPolicy=Local route-fixup trigger after a syncer swap, which could cause stale NAT entries on remote backends.

Reminder for the reviewer

proxy.New() wires the syncer's expand-NodePort fixup callback to the
proxy's runner via dp.SetTriggerFn(p.runner.Run). proxy.SetSyncer()
swaps in a fresh syncer (e.g. on a host-IP change inside
KubeProxy.run()) but did not re-wire that callback. Result: after any
host-IP change the new syncer's expand-NodePort fixup goroutine
resolves previously-missed routes for ExternalTrafficPolicy=Local
NodePort backends, but cannot schedule the dataplane Apply that would
program them — the fix-up is silently lost until something else
piggybacks an Apply.

Pointed out by review on the v3.31 backport (projectcalico#12693) of projectcalico#12667; the
asymmetry is pre-existing (it was equally broken before projectcalico#12667
restructured run()) but became plain to read after the restructure.

Make SetSyncer call s.SetTriggerFn(p.runner.Run) on the new syncer
under the same lock as the dpSyncer swap, mirroring what proxy.New()
does for the initial syncer.

Add a regression test that swaps a syncer via SetSyncer and verifies
the new syncer received a non-nil trigger callback that schedules an
Apply when invoked. The test fails on master before this change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 6, 2026 20:06
@tomastigera tomastigera requested a review from a team as a code owner May 6, 2026 20:06
@tomastigera tomastigera added cherry-pick-candidate docs-not-required Docs not required for this change release-note-required Change has user-facing impact (no matter how small) labels May 6, 2026
@marvin-tigera marvin-tigera added this to the Calico v3.33.0 milestone May 6, 2026
@marvin-tigera marvin-tigera added docs-pr-required Change is not yet documented and removed docs-not-required Docs not required for this change cherry-pick-candidate labels May 6, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug in Felix’s eBPF kube-proxy where swapping in a new DPSyncer via proxy.SetSyncer() failed to re-wire the syncer’s expand-NodePort route-fixup trigger callback, causing ExternalTrafficPolicy=Local NodePort updates to be missed after a syncer swap (e.g., host IP change).

Changes:

  • Re-wire DPSyncer.SetTriggerFn(p.runner.Run) inside proxy.SetSyncer() when replacing the dataplane syncer.
  • Add a regression test that swaps syncers, asserts the new syncer received a non-nil trigger callback, and verifies invoking it schedules an Apply.
  • Extend the test mockSyncer to record the trigger function passed via SetTriggerFn.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
felix/bpf/proxy/proxy.go Ensures SetSyncer() re-attaches the runner trigger to the newly installed syncer.
felix/bpf/proxy/proxy_test.go Adds regression coverage for trigger re-wiring on syncer swap; updates mock syncer to capture triggerFn.

@tomastigera tomastigera merged commit 747e26c into projectcalico:master May 11, 2026
12 checks passed
@tomastigera tomastigera deleted the tomas/bpf-proxy-set-trigger-fn branch May 11, 2026 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-pr-required Change is not yet documented release-note-required Change has user-facing impact (no matter how small)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants