Skip to content

[v3.32] [BPF] Defer kube-proxy construction until first hostIPs and in-sync#12694

Merged
tomastigera merged 1 commit into
projectcalico:release-v3.32from
tomastigera:auto-pick-of-#12667-upstream-release-v3.32
May 6, 2026
Merged

[v3.32] [BPF] Defer kube-proxy construction until first hostIPs and in-sync#12694
tomastigera merged 1 commit into
projectcalico:release-v3.32from
tomastigera:auto-pick-of-#12667-upstream-release-v3.32

Conversation

@tomastigera

Copy link
Copy Markdown
Contributor

Description

Cherry-pick of #12667 to release-v3.32. Felix restart on a node receiving NodePort traffic was breaking new TCP connections for ~500 ms — the bootstrap window between kube-proxy starting up and receiving the first hostIPs.

Root cause

In that window, kube-proxy.go:start() constructed the proxy with a stub Syncer (only podNPIP, no real host IPs). proxy.New() spins up the k8s informer goroutines synchronously; once they sync, an Apply runs against the stub Syncer. The cachingmap computes desired state without any (realHostIP, nodePort) FE entry and erases pre-existing real-host-IP NodePort FE entries left by the previous Felix run.

New external TCP connections to the NodePort during the gap miss the FE map, ascend to the host stack with no listener, and get RST by the kernel. The wildcard FE entry does not cover external→HEP NodePort traffic (nat_lookup.h:60-71 returns NULL on CALI_F_FROM_HEP without from_tun), so it provides no safety net.

Fix

Defer proxy construction until both the first hostIPUpdates and the first hostMetadataUpdates have arrived, then construct the Syncer (with real host IPs) and the proxy in one shot via run(). The KubeProxy.Conntrack* callbacks already nil-check kp.syncer, so in-flight conntrack scans during the wait remain safe.

Test

Regression test in kube-proxy_test.go pre-populates the front map with a real-host-IP NodePort FE entry, fires only the host-metadata gate (mimicking the bootstrap window), and asserts via Consistently over 500 ms that the entry survives. The test fails on the buggy code (Consistently trips at ~125 ms) and passes with the fix.

End-to-end validated on a 3.31 BPF cluster with the equivalent fix; cherry-picks to 3.32 cleanly (3.32 already has #11817's host-metadata gate plumbing).

Related issues/PRs

Closes #12192. Cherry-pick of #12667.

Release Note

ebpf - Fix transient NodePort connection failures when Felix restarts on a node receiving external NodePort traffic.

Reminder for the reviewer

  • release-note-required
  • docs-not-required (no user-facing config change; bug fix)

Felix restart on a node receiving NodePort traffic was breaking new
TCP connections for ~500ms — the bootstrap window between kube-proxy
starting up and receiving the first hostIPs.

In that window, kube-proxy.go:start() constructed the proxy with a
stub Syncer (only podNPIP, no real host IPs). proxy.New() spins up
the k8s informer goroutines synchronously; once they sync, an Apply
runs against the stub Syncer. The cachingmap computes desired state
without any (realHostIP, nodePort) FE entry and erases pre-existing
real-host-IP NodePort FE entries left by the previous Felix run. New
external TCP connections to the NodePort during the gap miss the FE
map, ascend to the host stack with no listener, and get RST by the
kernel. The wildcard FE entry does not cover external→HEP NodePort
traffic (nat_lookup.h:60-71) so it provides no safety net.

Defer proxy construction until both the first hostIPUpdates and the
first hostMetadataUpdates have arrived, then construct the Syncer
(with real host IPs) and the proxy in one shot via run(). The
KubeProxy.Conntrack* callbacks already nil-check kp.syncer, so
in-flight conntrack scans during the wait remain safe.

Add a regression test in kube-proxy_test.go that pre-populates the
front map with a real-host-IP NodePort FE entry, fires only the
host-metadata gate (mimicking the bootstrap window), and asserts via
Consistently that the entry survives.

Closes projectcalico#12192
Copilot AI review requested due to automatic review settings May 5, 2026 21:53
@tomastigera tomastigera requested a review from a team as a code owner May 5, 2026 21:53
@marvin-tigera marvin-tigera added this to the Calico v3.32.1 milestone May 5, 2026
@marvin-tigera marvin-tigera added release-note-required Change has user-facing impact (no matter how small) docs-pr-required Change is not yet documented labels May 5, 2026
@tomastigera tomastigera added docs-not-required Docs not required for this change release-note-required Change has user-facing impact (no matter how small) labels May 5, 2026
@marvin-tigera marvin-tigera removed the docs-pr-required Change is not yet documented label May 5, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR backports the eBPF kube-proxy bootstrap fix to release-v3.32, preventing transient NodePort connection failures during Felix restart by deferring proxy construction until both (a) the first host IPs update and (b) the host-metadata “in-sync” signal have been received.

Changes:

  • Delay initial proxy construction until initial hostIPUpdates and hostMetadataUpdates are available, to avoid an early Apply against a syncer without real host IPs.
  • Make Stop() tolerant of being called before kp.proxy is constructed.
  • Add a regression test that pre-populates a real-host-IP NodePort FE entry and asserts it is not erased during the bootstrap window.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
felix/bpf/proxy/kube-proxy.go Defers initial proxy construction until initial host IP + host-metadata updates; adds nil-guard in Stop().
felix/bpf/proxy/kube-proxy_test.go Adds regression test for NodePort FE preservation during the bootstrap window; imports NAT helpers.

Comment on lines 199 to 235
func (kp *KubeProxy) start() error {
var withLocalNP []net.IP
if kp.ipFamily == 4 {
withLocalNP = append(withLocalNP, podNPIP)
} else {
withLocalNP = append(withLocalNP, podNPIPV6)
}

syncer, err := NewSyncer(kp.ipFamily, withLocalNP, kp.frontendMap, kp.backendMap, kp.MaglevMap, kp.affinityMap, kp.rt, kp.excludedCIDRs, kp.maglevLUTSize)
if err != nil {
return errors.WithMessage(err, "new bpf syncer")
}

proxy, err := New(kp.k8s, syncer, kp.hostname, kp.opts...)
if err != nil {
return errors.WithMessage(err, "new proxy")
// Block until we have the first batch of host IPs AND the
// host-metadata in-sync signal (sent by CompleteDeferredWork after
// the int_dataplane has finished its first datastore-in-sync
// apply). Only then construct the proxy, via run(). proxy.New()
// kicks off the k8s informer goroutines synchronously; once those
// sync, they trigger Apply on the syncer. Constructing the proxy
// before we have real host IPs lets that Apply run against a
// syncer whose desired state lacks every (realHostIP, nodePort)
// FE entry, which then erases pre-existing entries left by the
// previous Felix run and breaks external NodePort traffic during
// the kube-proxy bootstrap window. See projectcalico/calico#12192.
var hostIPs []net.IP
select {
case ips, ok := <-kp.hostIPUpdates:
if !ok {
return nil
}
hostIPs = ips
case <-kp.exiting:
return nil
}

kp.lock.Lock()
kp.proxy = proxy
kp.syncer = syncer
kp.lock.Unlock()

// Wait for the initial update.
hostIPs := <-kp.hostIPUpdates

hostMetadata := make(map[string]*proto.HostMetadataV4V6Update)
// Block until we go in-sync and get the first batch of hostmetadata
// updates, to avoid a flap after a Felix restart. In practice, this
// recv should happen very soon after receiving the host IPs above.
hostMetadataUpdates := <-kp.hostMetadataUpdates
mergeHostMetadataV4V6Updates(hostMetadata, hostMetadataUpdates)
select {
case updates, ok := <-kp.hostMetadataUpdates:
if !ok {
return nil
}
mergeHostMetadataV4V6Updates(hostMetadata, updates)
case <-kp.exiting:
return nil
}

err = kp.run(hostIPs, hostMetadata)
if err != nil {
if err := kp.run(hostIPs, hostMetadata); err != nil {
return err
}
@tomastigera tomastigera merged commit d84f660 into projectcalico:release-v3.32 May 6, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-not-required Docs not required for this change release-note-required Change has user-facing impact (no matter how small)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants