[BPF] Re-wire SetTriggerFn on syncer swap in proxy.SetSyncer#12701
Merged
tomastigera merged 1 commit intoMay 11, 2026
Merged
Conversation
proxy.New() wires the syncer's expand-NodePort fixup callback to the proxy's runner via dp.SetTriggerFn(p.runner.Run). proxy.SetSyncer() swaps in a fresh syncer (e.g. on a host-IP change inside KubeProxy.run()) but did not re-wire that callback. Result: after any host-IP change the new syncer's expand-NodePort fixup goroutine resolves previously-missed routes for ExternalTrafficPolicy=Local NodePort backends, but cannot schedule the dataplane Apply that would program them — the fix-up is silently lost until something else piggybacks an Apply. Pointed out by review on the v3.31 backport (projectcalico#12693) of projectcalico#12667; the asymmetry is pre-existing (it was equally broken before projectcalico#12667 restructured run()) but became plain to read after the restructure. Make SetSyncer call s.SetTriggerFn(p.runner.Run) on the new syncer under the same lock as the dpSyncer swap, mirroring what proxy.New() does for the initial syncer. Add a regression test that swaps a syncer via SetSyncer and verifies the new syncer received a non-nil trigger callback that schedules an Apply when invoked. The test fails on master before this change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes a bug in Felix’s eBPF kube-proxy where swapping in a new DPSyncer via proxy.SetSyncer() failed to re-wire the syncer’s expand-NodePort route-fixup trigger callback, causing ExternalTrafficPolicy=Local NodePort updates to be missed after a syncer swap (e.g., host IP change).
Changes:
- Re-wire
DPSyncer.SetTriggerFn(p.runner.Run)insideproxy.SetSyncer()when replacing the dataplane syncer. - Add a regression test that swaps syncers, asserts the new syncer received a non-nil trigger callback, and verifies invoking it schedules an
Apply. - Extend the test
mockSyncerto record the trigger function passed viaSetTriggerFn.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| felix/bpf/proxy/proxy.go | Ensures SetSyncer() re-attaches the runner trigger to the newly installed syncer. |
| felix/bpf/proxy/proxy_test.go | Adds regression coverage for trigger re-wiring on syncer swap; updates mock syncer to capture triggerFn. |
aaaaaaaalex
approved these changes
May 11, 2026
This was referenced May 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
proxy.New()wires the syncer's expand-NodePort fixup callback to the proxy's runner viadp.SetTriggerFn(p.runner.Run)(proxy.go:180).proxy.SetSyncer()swaps in a fresh syncer (e.g. on a host-IP change insideKubeProxy.run()) but did not re-wire that callback.Effect
The expand-NodePort fixup goroutine handles
ExternalTrafficPolicy=LocalNodePort services with backends on remote nodes whose routes were not yet programmed when the syncer first computed dataplane state. It re-runsexpandNodePortson each route-table change and, when a previously-missed route now resolves, callss.triggerFn()to schedule an Apply (syncer.go:1216).After any host-IP change, the new syncer that
KubeProxy.run()hands toSetSyncerhadtriggerFn = nil, so the fixup's resolution was silently lost until some unrelated service/endpoint update piggybacked an Apply. In a quiet cluster, NodePort entries to backends on certain remote nodes could remain stale for an arbitrary period.Origin
The asymmetry is pre-existing —
kp.run()always created a fresh syncer and calledSetSynceron it before #12667. But #12667 made it visible in the diff: the first call now also goes through this path's logical sibling (proxy.New(), which does set the trigger), making the missing wiring onSetSyncerplain to read. Caught on the v3.31 backport review (#12693).Fix
Have
proxy.SetSyncercalls.SetTriggerFn(p.runner.Run)on the new syncer, under the same lock as the dpSyncer swap, mirroring whatproxy.New()does for the initial syncer.Test
New regression test
should re-arm the NodePort externalTrafficPolicy=Local route-fixup trigger after a syncer swapinproxy_test.goswaps a syncer viaSetSyncer, verifies the new syncer received a non-nil trigger callback, and confirms invoking it schedules an Apply.The test fails on master before this change with:
Related issues/PRs
Follow-up to #12667. Surfaced during v3.31 backport review (#12693, comment thread on
kube-proxy.go:161).Release Note
Reminder for the reviewer
release-note-requireddocs-not-required(no user-facing config change; bug fix)cherry-pick-candidate(bug fix; should be backported alongside [v3.31] [BPF] Defer kube-proxy construction until first hostIPs #12693 / [v3.32] [BPF] Defer kube-proxy construction until first hostIPs and in-sync #12694)