Fix stale EIP assignments during failover and controller restart#5606
Conversation
WalkthroughTracks per-EgressIP per-IP assigned nodes in the controller cache, seeds pod egress status entries as a pending marker during cache rebuild, prunes stale per-EIP statuses during add/delete assignment flows, and adds an end-to-end test validating SNAT/LRP nexthop reconciliation during simultaneous EIP failover and controller restart. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant K8s as Kubernetes API
participant Ctrl as EgressIP Controller
participant Cache as egressIPCache
participant OVN as OVN NB DB
rect rgb(250,250,255)
note over Ctrl,Cache: Startup / cache rebuild
Ctrl->>K8s: List EgressIP objects
Ctrl->>Cache: generateCacheForEgressIP()
Cache-->>Ctrl: egressIPToAssignedNodes (EIP -> IP -> Node)
Ctrl->>Cache: syncPodAssignmentCache()
note right of Cache: Seed podState.egressStatuses = egressStatusStatePending
end
rect rgb(245,255,245)
note over Ctrl,OVN: Reconciliation / failover flows
Ctrl->>Ctrl: addPodEgressIPAssignments()\n- consult statusMap, avoid overwriting pending seeds
Ctrl->>Ctrl: deleteEgressIPAssignments()\n- detect & prune stale per-EIP statuses via hasStaleEIPStatus
Ctrl->>OVN: Update SNATs / LRP nexthops per assigned node
OVN-->>Ctrl: NAT / route / QoS state responses
end
alt Failover detected
Ctrl->>OVN: Adjust nexthops to new node(s)
else Controller restart with seeded pending statuses
Ctrl->>Cache: Pending seeds trigger reconciliation and cleanup
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yml Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🧰 Additional context used🧠 Learnings (7)📓 Common learnings📚 Learning: 2025-10-20T15:07:49.533ZApplied to files:
📚 Learning: 2025-10-23T14:10:26.595ZApplied to files:
📚 Learning: 2025-08-19T10:19:13.298ZApplied to files:
📚 Learning: 2025-09-03T09:38:27.723ZApplied to files:
📚 Learning: 2025-10-09T12:23:01.462ZApplied to files:
📚 Learning: 2025-08-08T10:03:01.147ZApplied to files:
🧬 Code graph analysis (1)go-controller/pkg/ovn/egressip_test.go (7)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
🔇 Additional comments (7)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ast-grep (0.39.6)go-controller/pkg/ovn/egressip_test.goThanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
go-controller/pkg/ovn/egressip_test.go (3)
11307-11319: Strengthen the “restart + failover” simulation by asserting pre-state before reconcileRight now we jump straight to reconcileEgressIP(nil, &eIP). To prove we’re actually transitioning from “assigned to node1” to “assigned to node2”, consider first asserting the pre-condition DB state (SNAT on node1, LRP nexthop to 100.64.0.2) before flipping status. That makes the test deterministic and validates both removal and reprogramming paths. You can keep the nil oldObj to model restart, but add a short pre-state HaveData assertion before Line 11307.
11353-11359: Avoid magic literals for nexthops; use the existing constantUse node2LogicalRouterIPv4 instead of a hardcoded "100.64.0.3" to avoid drift if join IPs change.
- Nexthops: []string{"100.64.0.3"}, + Nexthops: node2LogicalRouterIPv4,
11376-11386: Prefer getEIPSNAT helper for NAT constructionFor consistency with nearby tests and to cut duplication, build the NAT via getEIPSNAT and only adjust UUID/logical port as needed.
- &nbdb.NAT{ - UUID: "egressip-nat-UUID2", - LogicalIP: podV4IP, - ExternalIP: egressIP, - ExternalIDs: getEgressIPNATDbIDs(egressIPName, egressPod.Namespace, egressPod.Name, IPFamilyValueV4, fakeOvn.controller.controllerName).GetExternalIDs(), - Type: nbdb.NATTypeSNAT, - LogicalPort: &expectedNatLogicalPort2, - Options: map[string]string{ - "stateless": "false", - }, - }, + func() *nbdb.NAT { + n := getEIPSNAT(podV4IP, egressPod.Namespace, egressPod.Name, egressIP, expectedNatLogicalPort2, DefaultNetworkControllerName) + n.UUID = "egressip-nat-UUID2" + return n + }(),
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
go-controller/pkg/ovn/egressip_test.go(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-19T10:19:13.298Z
Learnt from: tssurya
PR: ovn-kubernetes/ovn-kubernetes#5276
File: go-controller/pkg/node/bridgeconfig/bridgeflows.go:817-827
Timestamp: 2025-08-19T10:19:13.298Z
Learning: In ovn-kubernetes go-controller/pkg/node/bridgeconfig/bridgeflows.go, MEG (Multiple External Gateways, controlled by disableSNATMultipleGWs) and EgressIP are independent features that should not be coupled in flow logic. The priority 104 flow condition should use "disableSNATMultipleGWs || isNetworkAdvertised" instead of "(disableSNATMultipleGWs && config.OVNKubernetesFeature.EnableEgressIP) || isNetworkAdvertised" to allow MEG to function independently of EgressIP enablement.
Applied to files:
go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-09-29T13:52:53.191Z
Learnt from: trozet
PR: ovn-kubernetes/ovn-kubernetes#5587
File: go-controller/pkg/networkmanager/egressip_tracker.go:396-443
Timestamp: 2025-09-29T13:52:53.191Z
Learning: In ovn-kubernetes go-controller/pkg/networkmanager/egressip_tracker.go, the onNetworkRefChange callback must remain synchronous because it relies on ordering of events with the Active signal. Using goroutines would create race conditions where DELETE events could be processed before ADD events, breaking the ordering guarantees that downstream components depend on.
Applied to files:
go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-09-03T09:38:27.723Z
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5555
File: go-controller/pkg/ovn/egressip_test.go:11335-11351
Timestamp: 2025-09-03T09:38:27.723Z
Learning: In ovn-kubernetes Go tests (e.g., go-controller/pkg/ovn/egressip_test.go), any goroutine that uses Gomega assertions should call defer ginkgo.GinkgoRecover() at the top so assertion panics are captured by Ginkgo.
Applied to files:
go-controller/pkg/ovn/egressip_test.go
🧬 Code graph analysis (1)
go-controller/pkg/ovn/egressip_test.go (10)
go-controller/pkg/config/config.go (1)
Gateway(147-164)go-controller/pkg/util/node_annotations.go (1)
OVNNodeHostCIDRs(98-98)go-controller/pkg/testing/libovsdb/libovsdb.go (2)
TestSetup(41-50)TestData(52-52)go-controller/pkg/types/const.go (10)
OVNClusterRouter(43-43)GWRouterPrefix(48-48)GWRouterToJoinSwitchPrefix(55-55)EXTSwitchToGWRouterPrefix(58-58)GWRouterToExtSwitchPrefix(59-59)ExternalSwitchPrefix(47-47)DefaultNetworkName(7-7)DefaultNoRereoutePriority(120-120)EgressIPNodeConnectionMark(144-144)EgressIPReroutePriority(122-122)go-controller/pkg/nbdb/logical_router_port.go (1)
LogicalRouterPort(11-26)go-controller/pkg/nbdb/logical_switch_port.go (1)
LogicalSwitchPort(11-30)go-controller/pkg/libovsdb/ops/options.go (1)
RouterPort(14-14)go-controller/vendor/github.com/containernetworking/cni/pkg/types/types.go (2)
ParseCIDR(30-38)IPNet(26-26)go-controller/pkg/nbdb/logical_router_policy.go (3)
LogicalRouterPolicy(22-34)LogicalRouterPolicyActionAllow(15-15)LogicalRouterPolicyActionReroute(17-17)go-controller/pkg/nbdb/nat.go (2)
NAT(21-36)NATTypeSNAT(16-16)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build-PR
- GitHub Check: Lint
40cf6f6 to
faea77f
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
go-controller/pkg/ovn/egressip_test.go (3)
11307-11319: Simulating restart via reconcile(new) only can diverge from real sequence; patch status and pass old/new for robustness.Right now you mutate a local eIP copy and call reconcileEgressIP(nil, &eIP). The informer store still holds the old status (node1), so the watcher and your direct reconcile may race and produce non‑deterministic transitions. To better mirror a restart+failover and reduce flakes: fetch the stored object as old, update status in the fake client (patchReplaceEgressIPStatus), then call reconcileEgressIP(old, new).
Example:
- eIP.Status = egressipv1.EgressIPStatus{ Items: []egressipv1.EgressIPStatusItem{{Node: node2.Name, EgressIP: egressIP}} } - fakeOvn.controller.eIPC.nodeName = node1Name - err = fakeOvn.controller.eIPC.reconcileEgressIP(nil, &eIP) + oldEIP, err := fakeOvn.fakeClient.EgressIPClient.K8sV1().EgressIPs().Get(context.TODO(), eIP.Name, metav1.GetOptions{}) + gomega.Expect(err).NotTo(gomega.HaveOccurred()) + newEIP := oldEIP.DeepCopy() + newEIP.Status = egressipv1.EgressIPStatus{Items: []egressipv1.EgressIPStatusItem{{Node: node2.Name, EgressIP: egressIP}}} + gomega.Expect(fakeOvn.controller.eIPC.patchReplaceEgressIPStatus(newEIP.Name, newEIP.Status.Items)).To(gomega.Succeed()) + fakeOvn.controller.eIPC.nodeName = node1Name + err = fakeOvn.controller.eIPC.reconcileEgressIP(oldEIP, newEIP)This keeps the fake client, queue, and reconcile paths aligned and makes the test intent clearer.
11357-11360: Prefer the existing constant for join-switch nexthop.Use node2LogicalRouterIPv4 rather than a string literal "100.64.0.3" to avoid drift if constants change.
- Nexthops: []string{"100.64.0.3"}, + Nexthops: node2LogicalRouterIPv4,
11444-11444: Add inspectTimeout to Eventually to curb CI flakiness.Other tests use inspectTimeout; apply it here too.
- gomega.Eventually(fakeOvn.nbClient).Should(libovsdbtest.HaveData(expectedDatabaseState)) + gomega.Eventually(fakeOvn.nbClient, inspectTimeout).Should(libovsdbtest.HaveData(expectedDatabaseState))
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
go-controller/cmd/ovnkube/ovnkube.go(1 hunks)go-controller/pkg/controllermanager/controller_manager.go(4 hunks)go-controller/pkg/ovn/egressip.go(4 hunks)go-controller/pkg/ovn/egressip_test.go(1 hunks)go-controller/pkg/ovn/ovn_test.go(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-29T13:52:53.191Z
Learnt from: trozet
PR: ovn-kubernetes/ovn-kubernetes#5587
File: go-controller/pkg/networkmanager/egressip_tracker.go:396-443
Timestamp: 2025-09-29T13:52:53.191Z
Learning: In ovn-kubernetes go-controller/pkg/networkmanager/egressip_tracker.go, the onNetworkRefChange callback must remain synchronous because it relies on ordering of events with the Active signal. Using goroutines would create race conditions where DELETE events could be processed before ADD events, breaking the ordering guarantees that downstream components depend on.
Applied to files:
go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-09-03T09:38:27.723Z
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5555
File: go-controller/pkg/ovn/egressip_test.go:11335-11351
Timestamp: 2025-09-03T09:38:27.723Z
Learning: In ovn-kubernetes Go tests (e.g., go-controller/pkg/ovn/egressip_test.go), any goroutine that uses Gomega assertions should call defer ginkgo.GinkgoRecover() at the top so assertion panics are captured by Ginkgo.
Applied to files:
go-controller/pkg/ovn/egressip_test.go
🧬 Code graph analysis (2)
go-controller/pkg/controllermanager/controller_manager.go (2)
go-controller/pkg/ovn/address_set/address_set.go (1)
NewOvnAddressSetFactory(84-90)go-controller/pkg/ovn/default_network_controller.go (1)
DefaultNetworkControllerName(49-49)
go-controller/pkg/ovn/egressip_test.go (9)
go-controller/pkg/config/config.go (1)
Gateway(147-164)go-controller/pkg/util/node_annotations.go (1)
OVNNodeHostCIDRs(98-98)go-controller/pkg/testing/libovsdb/libovsdb.go (2)
TestSetup(41-50)TestData(52-52)go-controller/pkg/types/const.go (10)
OVNClusterRouter(43-43)GWRouterPrefix(48-48)GWRouterToJoinSwitchPrefix(55-55)EXTSwitchToGWRouterPrefix(58-58)GWRouterToExtSwitchPrefix(59-59)ExternalSwitchPrefix(47-47)DefaultNetworkName(7-7)DefaultNoRereoutePriority(120-120)EgressIPNodeConnectionMark(144-144)EgressIPReroutePriority(122-122)go-controller/pkg/nbdb/logical_router_port.go (1)
LogicalRouterPort(11-26)go-controller/pkg/nbdb/logical_switch_port.go (1)
LogicalSwitchPort(11-30)go-controller/vendor/github.com/containernetworking/cni/pkg/types/types.go (2)
ParseCIDR(30-38)IPNet(26-26)go-controller/pkg/nbdb/logical_router_policy.go (3)
LogicalRouterPolicy(22-34)LogicalRouterPolicyActionAllow(15-15)LogicalRouterPolicyActionReroute(17-17)go-controller/pkg/nbdb/nat.go (2)
NAT(21-36)NATTypeSNAT(16-16)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Lint
- GitHub Check: Build-PR
🔇 Additional comments (2)
go-controller/pkg/ovn/ovn_test.go (1)
233-246: Constructor wiring stays backward-compatible. Empty nodeName preserves the FakeOVN controller semantics while satisfying the updated signature.go-controller/cmd/ovnkube/ovnkube.go (1)
510-518: Correctly plumbs node identity into the manager. ForwardingrunMode.identitykeeps the controller manager aligned with the process identity already used for leader election.
faea77f to
73ece4e
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
go-controller/pkg/ovn/egressip_test.go (1)
1148-11428: Test may flake: reconcile called with old=nil while watchers are runningWatchers first process the EIP in status=node1 (creating NAT/LRP on node1). Then you call reconcileEgressIP(nil, &eIP) after changing status to node2. Without an “old” object, reconcile may add node2 config but not remove node1’s, leaving stale NAT/LRP. This can make the test racy/ordering-dependent.
Recommend one of:
- Simulate a real UPDATE: pass old and new to reconcileEgressIP so removal/addition are both handled deterministically; or
- Patch the status in the fake client (patchReplaceEgressIPStatus) and let the EIP watcher drive the update; or
- Don’t start the EIP watcher before calling reconcile to avoid concurrent adds, or add a barrier asserting the initial node1 config exists before triggering the failover, then assert it’s replaced.
Proposed minimal change (simulate UPDATE with old/new):
- // To simulate an ovnkube-controller restart, update the EIP object with the newly assigned node. - // Then invoke reconcileEgressIP using only the updated EIP object to trigger the EgressIP add event. - eIP.Status = egressipv1.EgressIPStatus{ + // Simulate controller restart + failover by invoking reconcile as an UPDATE + eIPOld := eIP.DeepCopy() + eIP.Status = egressipv1.EgressIPStatus{ Items: []egressipv1.EgressIPStatusItem{ { Node: node2.Name, EgressIP: egressIP, }, }, } - err = fakeOvn.controller.eIPC.reconcileEgressIP(nil, &eIP) + err = fakeOvn.controller.eIPC.reconcileEgressIP(eIPOld, &eIP) gomega.Expect(err).NotTo(gomega.HaveOccurred())This avoids leaving stale node1 NAT/LRP due to missing diff context and reduces non-determinism from the concurrently running watchers.
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
go-controller/pkg/ovn/egressip.go(6 hunks)go-controller/pkg/ovn/egressip_test.go(5 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-29T13:52:53.205Z
Learnt from: trozet
PR: ovn-kubernetes/ovn-kubernetes#5587
File: go-controller/pkg/networkmanager/egressip_tracker.go:396-443
Timestamp: 2025-09-29T13:52:53.205Z
Learning: In ovn-kubernetes go-controller/pkg/networkmanager/egressip_tracker.go, the onNetworkRefChange callback must remain synchronous because it relies on ordering of events with the Active signal. Using goroutines would create race conditions where DELETE events could be processed before ADD events, breaking the ordering guarantees that downstream components depend on.
Applied to files:
go-controller/pkg/ovn/egressip.gogo-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-09-03T09:38:27.723Z
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5555
File: go-controller/pkg/ovn/egressip_test.go:11335-11351
Timestamp: 2025-09-03T09:38:27.723Z
Learning: In ovn-kubernetes Go tests (e.g., go-controller/pkg/ovn/egressip_test.go), any goroutine that uses Gomega assertions should call defer ginkgo.GinkgoRecover() at the top so assertion panics are captured by Ginkgo.
Applied to files:
go-controller/pkg/ovn/egressip_test.go
🧬 Code graph analysis (1)
go-controller/pkg/ovn/egressip_test.go (9)
go-controller/pkg/config/config.go (1)
Gateway(147-164)go-controller/pkg/util/node_annotations.go (1)
OVNNodeHostCIDRs(98-98)go-controller/pkg/testing/libovsdb/libovsdb.go (2)
TestSetup(41-50)TestData(52-52)go-controller/pkg/types/const.go (10)
OVNClusterRouter(43-43)GWRouterPrefix(48-48)GWRouterToJoinSwitchPrefix(55-55)EXTSwitchToGWRouterPrefix(58-58)GWRouterToExtSwitchPrefix(59-59)ExternalSwitchPrefix(47-47)DefaultNetworkName(7-7)DefaultNoRereoutePriority(120-120)EgressIPNodeConnectionMark(144-144)EgressIPReroutePriority(122-122)go-controller/pkg/nbdb/logical_router_port.go (1)
LogicalRouterPort(11-26)go-controller/pkg/nbdb/logical_switch_port.go (1)
LogicalSwitchPort(11-30)go-controller/vendor/github.com/containernetworking/cni/pkg/types/types.go (2)
ParseCIDR(30-38)IPNet(26-26)go-controller/pkg/nbdb/logical_router_policy.go (3)
LogicalRouterPolicy(22-34)LogicalRouterPolicyActionAllow(15-15)LogicalRouterPolicyActionReroute(17-17)go-controller/pkg/nbdb/nat.go (2)
NAT(21-36)NATTypeSNAT(16-16)
🔇 Additional comments (8)
go-controller/pkg/ovn/egressip_test.go (3)
7976-7976: OK: status map size updatedLen(3) expectation aligns with the new sync-populated statuses.
7989-7991: OK: status map values set to "sync"Using explicit "sync" values matches the revised initialization behavior.
8016-8018: OK: reassert "sync" stateConsistent with the updated cache-populate semantics.
go-controller/pkg/ovn/egressip.go (5)
1162-1163: LGTM! Well-structured per-EIP node tracking.The new
egressIPToAssignedNodesfield provides a nested map structure (EgressIP name → EgressIP IP → node name) that complements the existing flategressIPIPToNodeCache. This enables efficient lookup of node assignments per EgressIP during cache synchronization.
846-846: LGTM! Sync marker mechanism enables proper restart reconciliation.The modified condition
!exists || value == "sync"correctly implements the sync marker pattern:
- Statuses marked as
"sync"during cache rebuild (line 1606) are re-processed- Statuses with
""(empty string, set at lines 882/890) are skipped as already processed- This ensures controller restarts can reconcile existing database state with the cache
925-927: LGTM! Defensive cleanup of stale status entries.This guard correctly prunes stale status entries from pods that are no longer managed by this EgressIP object. The logic is safe because:
- It only runs when
podStatus.egressIPName != name(pod managed by different EIP)- The
contains()check prevents errors if status doesn't exist- Executes under
podAssignmentlock (line 919)This cleanup handles edge cases where a pod's EIP assignment changes but stale cache entries remain.
1603-1607: LGTM! Core sync mechanism correctly seeds pod status cache.This code seeds
podState.egressStatuseswith"sync"values for each discovered EgressIP assignment during cache rebuild. The flow is:
- During sync: discovered statuses are marked with
"sync"(here)- During add: statuses with
"sync"are re-processed (line 846 check)- After processing:
"sync"is replaced with""(lines 882, 890)This ensures proper reconciliation after controller restarts by tracking which database entries need re-processing.
1966-1968: LGTM! Proper initialization and population of node assignment map.The
egressIPToAssignedNodesmap is correctly initialized and populated:
- Line 1967-1968: Declare and initialize in cache structure
- Line 1982: Create inner map per EgressIP name
- Line 1990: Populate with EgressIP IP → node mappings from
Status.ItemsThis data structure is then used in
syncPodAssignmentCache(lines 1604-1607) to seed pod status with "sync" markers during cache rebuilds.Also applies to: 1982-1982, 1990-1990
73ece4e to
150b874
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
go-controller/pkg/ovn/egressip.go (3)
1171-1172: Document the purpose of the newegressIPToAssignedNodesfield.Add a comment explaining that this map tracks per-EIP per-IP node assignments (e.g.,
egressIP name -> egressIP IP -> assigned node name) to support cache seeding during controller restart and EIP failover scenarios.+ // egressIP name -> egress IP -> assigned node name + // Used during cache rebuild to seed pod assignment state with current EIP-to-node mappings egressIPToAssignedNodes map[string]map[string]string
1612-1616: Clarify the purpose of the "sync" marker in comments.The code populates
podState.egressStatuseswith "sync" values during cache rebuild, but the purpose and lifecycle of this marker could be clearer. Consider adding a comment explaining that "sync" indicates an entry found during cache rebuild that needs reconciliation, and will be replaced with "" once setup completes.+ // Populate podState.egressStatuses with assigned nodes for each egressIP IP. + // Mark entries as "sync" to indicate they need reconciliation (will be cleared to "" once setup completes). for egressIPIP, nodeName := range egressIPCache.egressIPToAssignedNodes[egressIPName] { podState.egressStatuses.statusMap[egressipv1.EgressIPStatusItem{ EgressIP: egressIPIP, Node: nodeName}] = "sync" }
2270-2280: Consider optimizinghasStaleEIPStatusor adding documentation.The function iterates through all entries in
statusMapto find a stale entry. While this is acceptable for small maps, consider:
- Document the expected behavior when multiple stale entries exist (currently returns the first one found due to
break)- Consider whether the function should return all stale entries or just the first one
Add documentation:
+// hasStaleEIPStatus checks if there's an existing status entry with the same EgressIP +// but a different Node than potentialStatus. This indicates a failover scenario where +// the EgressIP moved from one node to another. Returns the first stale entry found, or nil. func (e egressStatuses) hasStaleEIPStatus(potentialStatus egressipv1.EgressIPStatusItem) *egressipv1.EgressIPStatusItem {
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
go-controller/pkg/ovn/egressip.go(7 hunks)go-controller/pkg/ovn/egressip_test.go(5 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-29T13:52:53.205Z
Learnt from: trozet
PR: ovn-kubernetes/ovn-kubernetes#5587
File: go-controller/pkg/networkmanager/egressip_tracker.go:396-443
Timestamp: 2025-09-29T13:52:53.205Z
Learning: In ovn-kubernetes go-controller/pkg/networkmanager/egressip_tracker.go, the onNetworkRefChange callback must remain synchronous because it relies on ordering of events with the Active signal. Using goroutines would create race conditions where DELETE events could be processed before ADD events, breaking the ordering guarantees that downstream components depend on.
Applied to files:
go-controller/pkg/ovn/egressip_test.gogo-controller/pkg/ovn/egressip.go
📚 Learning: 2025-09-03T09:38:27.723Z
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5555
File: go-controller/pkg/ovn/egressip_test.go:11335-11351
Timestamp: 2025-09-03T09:38:27.723Z
Learning: In ovn-kubernetes Go tests (e.g., go-controller/pkg/ovn/egressip_test.go), any goroutine that uses Gomega assertions should call defer ginkgo.GinkgoRecover() at the top so assertion panics are captured by Ginkgo.
Applied to files:
go-controller/pkg/ovn/egressip_test.go
🧬 Code graph analysis (1)
go-controller/pkg/ovn/egressip_test.go (8)
go-controller/pkg/config/config.go (1)
Gateway(147-164)go-controller/pkg/testing/libovsdb/libovsdb.go (2)
TestSetup(41-50)TestData(52-52)go-controller/pkg/types/const.go (10)
OVNClusterRouter(43-43)GWRouterPrefix(48-48)GWRouterToJoinSwitchPrefix(55-55)EXTSwitchToGWRouterPrefix(58-58)GWRouterToExtSwitchPrefix(59-59)ExternalSwitchPrefix(47-47)DefaultNetworkName(7-7)DefaultNoRereoutePriority(120-120)EgressIPNodeConnectionMark(144-144)EgressIPReroutePriority(122-122)go-controller/pkg/nbdb/logical_router_port.go (1)
LogicalRouterPort(11-26)go-controller/pkg/nbdb/logical_switch_port.go (1)
LogicalSwitchPort(11-30)go-controller/vendor/github.com/containernetworking/cni/pkg/types/types.go (2)
ParseCIDR(30-38)IPNet(26-26)go-controller/pkg/nbdb/logical_router_policy.go (3)
LogicalRouterPolicy(22-34)LogicalRouterPolicyActionAllow(15-15)LogicalRouterPolicyActionReroute(17-17)go-controller/pkg/nbdb/nat.go (2)
NAT(21-36)NATTypeSNAT(16-16)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build-PR
- GitHub Check: Lint
🔇 Additional comments (2)
go-controller/pkg/ovn/egressip.go (2)
846-858: Verify the stale detection logic handles all edge cases correctly.The code checks for stale EIP status and deletes it inline during the loop. Consider these edge cases:
- What if
deletePodEgressIPAssignmentsfails? The stale entry is deleted from the map (line 856) even if the OVN cleanup fails.- What if multiple new statuses in
statusAssignmentscorrespond to the same EgressIP but different nodes? The first iteration will clean up the stale entry, but subsequent iterations won't find it.The current implementation appears correct for case 2 (subsequent iterations won't find stale entries after first cleanup). However, for case 1, consider whether the delete at line 856 should only occur if the cleanup succeeds.
Review the error handling:
if staleStatus := podState.egressStatuses.hasStaleEIPStatus(status); staleStatus != nil { err = e.deletePodEgressIPAssignments(ni, name, []egressipv1.EgressIPStatusItem{*staleStatus}, pod) if err != nil { return fmt.Errorf("failed to delete stale EgressIP %s, status %v for pod %s", name, podKey, *staleStatus) } delete(podState.egressStatuses.statusMap, *staleStatus) }If
deletePodEgressIPAssignmentsreturns an error, we return immediately without deleting from the map, which is correct. So the current logic appears safe.
934-937: Verify the cleanup logic for standby EgressIP references.When a pod is managed by a different EgressIP object, the code removes any stale reference to the current EgressIP from
podStatus.egressStatuses. This makes sense for cleanup. However, verify that this doesn't cause issues if the pod later switches to be managed by this EgressIP (e.g., if the current managing EgressIP is deleted).The cleanup appears correct: if a pod is managed by EgressIP-A, and we're deleting EgressIP-B's assignments, we remove any stale references to EgressIP-B from the pod's state. If EgressIP-A is later deleted and EgressIP-B becomes active, the normal add flow will re-establish the assignments.
| }, | ||
| &nbdb.LogicalSwitch{ | ||
| UUID: types.ExternalSwitchPrefix + node2Name + "-UUID", | ||
| Name: types.ExternalSwitchPrefix + node2Name, | ||
| Ports: []string{types.EXTSwitchToGWRouterPrefix + types.GWRouterPrefix + node2Name + "-UUID"}, | ||
| }, | ||
| node1Switch, | ||
| node2Switch, | ||
| }, | ||
| }, | ||
| &egressipv1.EgressIPList{ | ||
| Items: []egressipv1.EgressIP{eIP}, | ||
| }, | ||
| &corev1.NodeList{ | ||
| Items: []corev1.Node{node1, node2}, | ||
| }, | ||
| &corev1.NamespaceList{ | ||
| Items: []corev1.Namespace{*egressNamespace}, | ||
| }, | ||
| &corev1.PodList{ | ||
| Items: []corev1.Pod{egressPod}, | ||
| }, | ||
| ) | ||
|
|
||
| i, n, _ := net.ParseCIDR(podV4IP + "/23") | ||
| n.IP = i | ||
| fakeOvn.controller.logicalPortCache.add(&egressPod, "", types.DefaultNetworkName, "", nil, []*net.IPNet{n}) | ||
|
|
||
| err := fakeOvn.controller.WatchEgressIPNamespaces() | ||
| gomega.Expect(err).NotTo(gomega.HaveOccurred()) | ||
| err = fakeOvn.controller.WatchEgressIPPods() | ||
| gomega.Expect(err).NotTo(gomega.HaveOccurred()) | ||
| err = fakeOvn.controller.WatchEgressNodes() | ||
| gomega.Expect(err).NotTo(gomega.HaveOccurred()) | ||
| err = fakeOvn.controller.WatchEgressIP() | ||
| gomega.Expect(err).NotTo(gomega.HaveOccurred()) | ||
|
|
||
| // To simulate an ovnkube-controller restart, update the EIP object with the newly assigned node. | ||
| // Then invoke reconcileEgressIP using only the updated EIP object to trigger the EgressIP add event. | ||
| eIP.Status = egressipv1.EgressIPStatus{ | ||
| Items: []egressipv1.EgressIPStatusItem{ | ||
| { | ||
| Node: node2.Name, | ||
| EgressIP: egressIP, | ||
| }, | ||
| }, | ||
| } | ||
| err = fakeOvn.controller.eIPC.reconcileEgressIP(nil, &eIP) | ||
| gomega.Expect(err).NotTo(gomega.HaveOccurred()) | ||
|
|
||
| egressSVCServedPodsASv4, _ := buildEgressServiceAddressSets(nil) | ||
| egressIPServedPodsASv4, _ := buildEgressIPServedPodsAddressSets([]string{podV4IP}, types.DefaultNetworkName, fakeOvn.controller.eIPC.controllerName) | ||
| egressNodeIPsASv4, _ := buildEgressIPNodeAddressSets([]string{node1IPv4, node2IPv4}) | ||
|
|
||
| node1Switch.QOSRules = []string{"default-QoS-UUID"} | ||
| node2Switch.QOSRules = []string{"default-QoS-UUID"} | ||
| expectedNatLogicalPort2 := "k8s-node2" | ||
| expectedDatabaseState := []libovsdbtest.TestData{ | ||
| &nbdb.LogicalRouterPolicy{ | ||
| Priority: types.DefaultNoRereoutePriority, | ||
| Match: fmt.Sprintf("(ip4.src == $%s || ip4.src == $%s) && ip4.dst == $%s", | ||
| egressIPServedPodsASv4.Name, egressSVCServedPodsASv4.Name, egressNodeIPsASv4.Name), | ||
| Action: nbdb.LogicalRouterPolicyActionAllow, | ||
| UUID: "default-no-reroute-node-UUID", | ||
| Options: map[string]string{"pkt_mark": types.EgressIPNodeConnectionMark}, | ||
| ExternalIDs: getEgressIPLRPNoReRoutePodToNodeDbIDs(IPFamilyValueV4, types.DefaultNetworkName, fakeOvn.controller.eIPC.controllerName).GetExternalIDs(), | ||
| }, | ||
| getNoReRouteReplyTrafficPolicy(types.DefaultNetworkName, fakeOvn.controller.eIPC.controllerName), | ||
| &nbdb.LogicalRouterPolicy{ | ||
| Priority: types.DefaultNoRereoutePriority, | ||
| Match: "ip4.src == 10.128.0.0/14 && ip4.dst == 10.128.0.0/14", | ||
| Action: nbdb.LogicalRouterPolicyActionAllow, | ||
| UUID: "no-reroute-UUID", | ||
| ExternalIDs: getEgressIPLRPNoReRoutePodToPodDbIDs(IPFamilyValueV4, types.DefaultNetworkName, fakeOvn.controller.eIPC.controllerName).GetExternalIDs(), | ||
| }, | ||
| &nbdb.LogicalRouterPolicy{ | ||
| Priority: types.DefaultNoRereoutePriority, | ||
| Match: fmt.Sprintf("ip4.src == 10.128.0.0/14 && ip4.dst == %s", config.Gateway.V4JoinSubnet), | ||
| Action: nbdb.LogicalRouterPolicyActionAllow, | ||
| UUID: "no-reroute-service-UUID", | ||
| ExternalIDs: getEgressIPLRPNoReRoutePodToJoinDbIDs(IPFamilyValueV4, types.DefaultNetworkName, fakeOvn.controller.eIPC.controllerName).GetExternalIDs(), | ||
| }, | ||
| &nbdb.LogicalRouterPolicy{ | ||
| Priority: types.EgressIPReroutePriority, | ||
| Match: fmt.Sprintf("ip4.src == %s", egressPod.Status.PodIP), | ||
| Action: nbdb.LogicalRouterPolicyActionReroute, | ||
| Nexthops: []string{"100.64.0.3"}, | ||
| ExternalIDs: getEgressIPLRPReRouteDbIDs(eIP.Name, egressPod.Namespace, egressPod.Name, IPFamilyValueV4, types.DefaultNetworkName, fakeOvn.controller.eIPC.controllerName).GetExternalIDs(), | ||
| UUID: "reroute-UUID1", | ||
| }, | ||
| &nbdb.LogicalRouter{ | ||
| Name: types.OVNClusterRouter, | ||
| UUID: types.OVNClusterRouter + "-UUID", | ||
| Policies: []string{"no-reroute-UUID", "no-reroute-service-UUID", "default-no-reroute-node-UUID", "default-no-reroute-reply-traffic", "reroute-UUID1"}, | ||
| }, | ||
| &nbdb.LogicalRouter{ | ||
| Name: types.GWRouterPrefix + node1.Name, | ||
| UUID: types.GWRouterPrefix + node1.Name + "-UUID", | ||
| Ports: []string{types.GWRouterToJoinSwitchPrefix + types.GWRouterPrefix + node1.Name + "-UUID"}, | ||
| Nat: []string{"pod-node-nat-UUID1"}, |
There was a problem hiding this comment.
Goroutines need GinkgoRecover
Each goroutine calling Gomega assertions must defer ginkgo.GinkgoRecover() to surface panics correctly. Please add this at the top of both goroutines. Based on learnings
+ go func() {
+ defer ginkgo.GinkgoRecover()
defer ginkgo.GinkgoRecover()
defer wg.Done()
i, n, _ := net.ParseCIDR(podV4IP + "/23")
n.IP = i
fakeOvn.controller.logicalPortCache.add(&egressPod, "", types.DefaultNetworkName, "", nil, []*net.IPNet{n})
_, err := fakeOvn.fakeClient.KubeClient.CoreV1().Pods(egressPod.Namespace).Create(context.TODO(), &egressPod, metav1.CreateOptions{})
gomega.Expect(err).NotTo(gomega.HaveOccurred())
}()
go func() {
+ defer ginkgo.GinkgoRecover()
defer ginkgo.GinkgoRecover()
defer wg.Done()
i, n, _ := net.ParseCIDR(podV4IP2 + "/23")
n.IP = i
fakeOvn.controller.logicalPortCache.add(&egressPod2, "", types.DefaultNetworkName, "", nil, []*net.IPNet{n})Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
go-controller/pkg/ovn/egressip_test.go around lines 11261-11360: the test spawns
goroutines that use Gomega assertions but do not recover panics, so add defer
ginkgo.GinkgoRecover() as the first statement in each goroutine function (i.e.,
at the top of both goroutine bodies) to ensure panics inside the goroutines are
reported to Ginkgo; no other behavioral changes are needed.
There was a problem hiding this comment.
this suggestion is not for changes in the PR.
There was a problem hiding this comment.
@pperiyasamy You’re right — that suggestion isn’t applicable to the changes in this PR. The new test here doesn’t introduce goroutines, so no GinkgoRecover update is needed. I’ll limit this check to diffs that add or modify goroutines going forward.
🧠 Learnings used
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5555
File: go-controller/pkg/ovn/egressip_test.go:11335-11351
Timestamp: 2025-09-03T09:38:27.723Z
Learning: In ovn-kubernetes Go tests (e.g., go-controller/pkg/ovn/egressip_test.go), any goroutine that uses Gomega assertions should call defer ginkgo.GinkgoRecover() at the top so assertion panics are captured by Ginkgo.
150b874 to
a69e7ce
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
go-controller/pkg/ovn/egressip_test.go (1)
11623-11696: Fix the invalid loop syntax (for range 10)
for range 10is not valid Go syntax, so the test won't compile. Use a standard counted loop instead.- for range 10 { + for i := 0; i < 10; i++ {
🧹 Nitpick comments (1)
go-controller/pkg/ovn/egressip.go (1)
846-853: Consider deduplicating stale assignments to avoid redundant operations.The current logic calls
hasStaleEIPStatusfor each new status in the loop. If multiple new status items exist for the same EgressIP name (e.g., after a multi-IP failover), the same stale entry could be added tostaleAssignmentsmultiple times.While the subsequent deletions are idempotent (lines 875-882), this causes unnecessary duplicate delete operations and log messages.
Consider deduplicating
staleAssignmentsbefore processing, or tracking already-detected stale entries within the loop:for _, status := range statusAssignments { if value, exists := podState.egressStatuses.statusMap[status]; !exists || value == egressStatusStateSynced { remainingAssignments = append(remainingAssignments, status) } // Detect stale EIP status entries (same EgressIP reassigned to a different node) // and queue the outdated entry for cleanup. if staleStatus := podState.egressStatuses.hasStaleEIPStatus(status); staleStatus != nil { - staleAssignments = append(staleAssignments, *staleStatus) + // Avoid duplicates by checking if already queued + isDuplicate := false + for _, existing := range staleAssignments { + if existing.EgressIP == staleStatus.EgressIP && existing.Node == staleStatus.Node { + isDuplicate = true + break + } + } + if !isDuplicate { + staleAssignments = append(staleAssignments, *staleStatus) + } } }
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
go-controller/pkg/ovn/egressip.go(10 hunks)go-controller/pkg/ovn/egressip_test.go(5 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-03T09:38:27.723Z
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5555
File: go-controller/pkg/ovn/egressip_test.go:11335-11351
Timestamp: 2025-09-03T09:38:27.723Z
Learning: In ovn-kubernetes Go tests (e.g., go-controller/pkg/ovn/egressip_test.go), any goroutine that uses Gomega assertions should call defer ginkgo.GinkgoRecover() at the top so assertion panics are captured by Ginkgo.
Applied to files:
go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-09-29T13:52:53.205Z
Learnt from: trozet
PR: ovn-kubernetes/ovn-kubernetes#5587
File: go-controller/pkg/networkmanager/egressip_tracker.go:396-443
Timestamp: 2025-09-29T13:52:53.205Z
Learning: In ovn-kubernetes go-controller/pkg/networkmanager/egressip_tracker.go, the onNetworkRefChange callback must remain synchronous because it relies on ordering of events with the Active signal. Using goroutines would create race conditions where DELETE events could be processed before ADD events, breaking the ordering guarantees that downstream components depend on.
Applied to files:
go-controller/pkg/ovn/egressip_test.go
🧬 Code graph analysis (1)
go-controller/pkg/ovn/egressip_test.go (6)
go-controller/pkg/config/config.go (1)
Gateway(147-164)go-controller/pkg/util/node_annotations.go (1)
OVNNodeHostCIDRs(98-98)go-controller/pkg/testing/libovsdb/libovsdb.go (2)
TestSetup(41-50)TestData(52-52)go-controller/pkg/types/const.go (10)
OVNClusterRouter(43-43)GWRouterPrefix(48-48)GWRouterToJoinSwitchPrefix(55-55)EXTSwitchToGWRouterPrefix(58-58)GWRouterToExtSwitchPrefix(59-59)ExternalSwitchPrefix(47-47)DefaultNetworkName(7-7)DefaultNoRereoutePriority(120-120)EgressIPNodeConnectionMark(144-144)EgressIPReroutePriority(122-122)go-controller/pkg/nbdb/logical_router_policy.go (1)
LogicalRouterPolicy(22-34)go-controller/pkg/nbdb/nat.go (1)
NAT(21-36)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Lint
- GitHub Check: Build-PR
🔇 Additional comments (6)
go-controller/pkg/ovn/egressip.go (6)
875-882: LGTM! Stale assignment cleanup logic is correct.The code properly:
- Deletes stale OVN configuration via
deletePodEgressIPAssignments- Removes the entry from the in-memory status map
- Fails fast if cleanup encounters errors, preventing inconsistent state
The locking is also correct—
deletePodEgressIPAssignmentsis called while holding thepodAssignmentlock as required by the function contract (line 1048).
938-940: LGTM! Essential cleanup for standby EIP tracking.This ensures that when a pod is no longer managed by this EIP (checked at line 937), any associated status entries are properly removed from the per-pod status map. This prevents stale entries from accumulating when EIPs are deleted or pod selectors change.
1616-1621: LGTM! Core fix for the failover race condition.This change seeds the per-pod status map during controller sync by iterating through the per-EIP per-IP node assignments (
egressIPToAssignedNodes). TheegressStatusStateSyncedmarker ensures these entries are re-applied during the next reconciliation cycle.This addresses the race condition described in the PR objectives: when the controller restarts during an EIP failover, the informer cache is rebuilt from the current EgressIP status, enabling detection and cleanup of stale SNAT/LRP configurations for previously assigned nodes.
1175-1176: LGTM! Essential tracking for per-EIP per-IP node assignments.The new
egressIPToAssignedNodesfield (typemap[string]map[string]string) tracks which node each specific EIP IP is assigned to:
- Outer key: EgressIP name
- Inner key: EgressIP IP address
- Value: Node name
This granular tracking enables:
- Accurate cache population during controller sync (used at line 1617)
- Detection of stale assignments when an EIP is reassigned to a different node
The initialization and population logic in
generateCacheForEgressIPis correct.Also applies to: 1979-1981, 1995-1995, 2003-2003
2260-2262: LGTM! Clear documentation of the sync state mechanism.The constant definition and updated
statusMapdocumentation clearly explain the two-state system:
- Empty string
"": Status is applied and reconciledegressStatusStateSynced("sync"): Status was seeded during controller sync and must be (re)appliedThis documentation improves code maintainability and makes the synchronization logic easier to understand.
Also applies to: 2267-2272
2283-2293: LGTM! Correct stale entry detection for failover scenarios.The
hasStaleEIPStatushelper correctly identifies when the same EgressIP (IP address) is assigned to a different node, indicating a failover event that requires cleanup of the old node's configuration.The function returns after finding the first stale entry, which is appropriate because:
- Multiple stale entries for the same EIP IP are rare
- The caller removes detected stale entries from the map (line 881)
- Subsequent reconciliations would catch any remaining stale entries
a69e7ce to
16f22a7
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (4)
go-controller/pkg/ovn/egressip.go (2)
828-836: Verify stale detection handles all edge cases.The logic checks for stale status entries where the same EgressIP is assigned to a different node. However,
hasStaleEIPStatusreturns only the first stale entry found. If multiple stale entries exist for the same EgressIP (e.g., Node A and Node B both stale, new assignment to Node C), only one will be cleaned up per reconciliation cycle. While subsequent reconciliations will eventually clean up remaining stale entries, consider whether this gradual cleanup is acceptable or if all stale entries should be removed in a single pass for more predictable behavior.Consider modifying
hasStaleEIPStatusto return all stale entries:-func (e egressStatuses) hasStaleEIPStatus(potentialStatus egressipv1.EgressIPStatusItem) *egressipv1.EgressIPStatusItem { - var staleStatus *egressipv1.EgressIPStatusItem +func (e egressStatuses) hasStaleEIPStatus(potentialStatus egressipv1.EgressIPStatusItem) []egressipv1.EgressIPStatusItem { + var staleStatuses []egressipv1.EgressIPStatusItem for status := range e.statusMap { if status.EgressIP == potentialStatus.EgressIP && status.Node != potentialStatus.Node { - staleStatus = &egressipv1.EgressIPStatusItem{EgressIP: status.EgressIP, Node: status.Node} - break + staleStatuses = append(staleStatuses, egressipv1.EgressIPStatusItem{EgressIP: status.EgressIP, Node: status.Node}) } } - return staleStatus + return staleStatuses }And update the calling code to handle a slice:
- if staleStatus := podState.egressStatuses.hasStaleEIPStatus(status); staleStatus != nil { - staleAssignments = append(staleAssignments, *staleStatus) + if staleStatuses := podState.egressStatuses.hasStaleEIPStatus(status); len(staleStatuses) > 0 { + staleAssignments = append(staleAssignments, staleStatuses...) }
870-879: Simplify local zone node check.The code locks each node just to check if it's in the local zone via
Load, butsyncmap.Loadis already thread-safe and doesn't require external locking. TheLockKey/UnlockKeycalls add unnecessary overhead.Remove the locking since
Loadis thread-safe:proceed := false for _, status := range statusAssignments { - e.nodeZoneState.LockKey(status.Node) isLocalZoneEgressNode, loadedEgressNode := e.nodeZoneState.Load(status.Node) if loadedEgressNode && isLocalZoneEgressNode { proceed = true - e.nodeZoneState.UnlockKey(status.Node) break } - e.nodeZoneState.UnlockKey(status.Node) }go-controller/pkg/ovn/egressip_test.go (2)
7976-7976: Prefer order‑independent map assertionsIndexing into Status.Items can be brittle if ordering changes. Assert by key/value presence instead:
for each item in eip1Obj.Status.Items, Expect(statusMap).To(HaveKeyWithValue(item, egressStatusStateSynced)).- gomega.Expect(pas.egressStatuses.statusMap).To(gomega.HaveLen(3)) - gomega.Expect(pas.egressStatuses.statusMap[eip1Obj.Status.Items[0]]).To(gomega.Equal(egressStatusStateSynced)) - gomega.Expect(pas.egressStatuses.statusMap[eip1Obj.Status.Items[1]]).To(gomega.Equal(egressStatusStateSynced)) + gomega.Expect(pas.egressStatuses.statusMap).To(gomega.HaveLen(3)) + for _, it := range eip1Obj.Status.Items { + gomega.Expect(pas.egressStatuses.statusMap). + To(gomega.HaveKeyWithValue(it, egressStatusStateSynced)) + }Also applies to: 7989-7991, 8016-8018
11148-11497: New failover+restart reconciliation test: solid coverage; consider explicit timeout to reduce flakesScenario and expectations look correct (LRP nexthop to transit IP, nodeIP SNAT on old node, address sets/QoS). To harden CI runs, wrap the final DB check with an explicit timeout (e.g., Eventually(..., inspectTimeout)).
- gomega.Eventually(fakeOvn.nbClient).Should(libovsdbtest.HaveData(expectedDatabaseState)) + gomega.Eventually(fakeOvn.nbClient, inspectTimeout). + Should(libovsdbtest.HaveData(expectedDatabaseState))
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
go-controller/pkg/ovn/egressip.go(10 hunks)go-controller/pkg/ovn/egressip_test.go(5 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-03T09:38:27.723Z
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5555
File: go-controller/pkg/ovn/egressip_test.go:11335-11351
Timestamp: 2025-09-03T09:38:27.723Z
Learning: In ovn-kubernetes Go tests (e.g., go-controller/pkg/ovn/egressip_test.go), any goroutine that uses Gomega assertions should call defer ginkgo.GinkgoRecover() at the top so assertion panics are captured by Ginkgo.
Applied to files:
go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-09-29T13:52:53.205Z
Learnt from: trozet
PR: ovn-kubernetes/ovn-kubernetes#5587
File: go-controller/pkg/networkmanager/egressip_tracker.go:396-443
Timestamp: 2025-09-29T13:52:53.205Z
Learning: In ovn-kubernetes go-controller/pkg/networkmanager/egressip_tracker.go, the onNetworkRefChange callback must remain synchronous because it relies on ordering of events with the Active signal. Using goroutines would create race conditions where DELETE events could be processed before ADD events, breaking the ordering guarantees that downstream components depend on.
Applied to files:
go-controller/pkg/ovn/egressip_test.go
🧬 Code graph analysis (1)
go-controller/pkg/ovn/egressip_test.go (8)
go-controller/pkg/config/config.go (1)
Gateway(147-164)go-controller/pkg/testing/libovsdb/libovsdb.go (2)
TestSetup(41-50)TestData(52-52)go-controller/pkg/types/const.go (10)
OVNClusterRouter(43-43)GWRouterPrefix(48-48)GWRouterToJoinSwitchPrefix(55-55)EXTSwitchToGWRouterPrefix(58-58)GWRouterToExtSwitchPrefix(59-59)ExternalSwitchPrefix(47-47)DefaultNetworkName(7-7)DefaultNoRereoutePriority(120-120)EgressIPNodeConnectionMark(144-144)EgressIPReroutePriority(122-122)go-controller/pkg/nbdb/logical_router_port.go (1)
LogicalRouterPort(11-26)go-controller/pkg/nbdb/logical_switch_port.go (1)
LogicalSwitchPort(11-30)go-controller/vendor/github.com/containernetworking/cni/pkg/types/types.go (2)
ParseCIDR(30-38)IPNet(26-26)go-controller/pkg/nbdb/logical_router_policy.go (3)
LogicalRouterPolicy(22-34)LogicalRouterPolicyActionAllow(15-15)LogicalRouterPolicyActionReroute(17-17)go-controller/pkg/nbdb/nat.go (2)
NAT(21-36)NATTypeSNAT(16-16)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build-PR
- GitHub Check: Lint
🔇 Additional comments (3)
go-controller/pkg/ovn/egressip.go (3)
1616-1620: Verify seeding handles controller restart race correctly.The seeding logic populates
podState.egressStatuseswith entries fromegressIPToAssignedNodesduringsyncPodAssignmentCache, marking them asegressStatusStateSynced. This enables detection of stale entries when the controller restarts. However, if the EgressIP'sStatus.Itemsis already updated to the new node (Node B) beforesyncPodAssignmentCacheruns, and the old node (Node A) has already been removed fromStatus.Items, then Node A's stale configuration won't be detected for cleanup.Verify the timing guarantees: does
syncPodAssignmentCachealways run with a snapshot ofStatus.Itemsthat includes the old node assignment, or could there be a race where the old assignment is already removed from the API server by the time the controller starts? If the latter is possible, stale SNAT and LRP configuration might remain on Node A.Consider adding a test case that verifies cleanup when:
- Controller is down
- EIP fails over from Node A to Node B
Status.Itemsis updated to only show Node B- Controller starts (cache rebuild)
- Only Add event is delivered (not Update)
- Verify Node A's configuration is cleaned up
2283-2293: LGTM! Helper correctly identifies stale entries.The
hasStaleEIPStatushelper correctly identifies stale entries by matching on EgressIP while checking for a different Node. The early return withbreakis appropriate since only the first stale entry needs to be returned (subsequent ones will be cleaned up in later reconciliation cycles).
938-940: LGTM! Cleanup ensures consistency.The cleanup of stale status entries when
podState.egressIPNamedoesn't match ensures the cache remains consistent even when pods are managed by different EgressIP objects over time.
16f22a7 to
4fdd7a4
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
go-controller/pkg/ovn/egressip_test.go (2)
7976-7976: Reduce duplication in egress status assertionsThe repeated checks against pas.egressStatuses.statusMap (length and egressStatusStateSynced value) are correct but duplicated in multiple places. Consider extracting a tiny helper (e.g., assertStatusesSynced(t, pas, wantLen)) to keep tests terse and easier to update if state semantics change.
Also applies to: 7989-7990, 8016-8017, 8082-8082
11148-11502: Stabilize the new failover+restart test and optionally harden cleanup verification
- Use the existing inspectTimeout with Eventually to avoid CI flakes on large expected DB states:
- gomega.Eventually(fakeOvn.nbClient).Should(libovsdbtest.HaveData(expectedDatabaseState)) + gomega.Eventually(fakeOvn.nbClient, inspectTimeout).Should(libovsdbtest.HaveData(expectedDatabaseState))
- Optional: to explicitly validate stale cleanup, pre-seed NBData with a stale EIP SNAT on node1 (LogicalPort "k8s-node1" + ExternalIP = egressIP) and a reroute LRP with node1 join IP as nexthop, then assert they are absent from expectedDatabaseState. This makes the regression being fixed observable in the test outcome, not just inferred.
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
go-controller/pkg/ovn/egressip.go(10 hunks)go-controller/pkg/ovn/egressip_test.go(5 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-29T13:52:53.205Z
Learnt from: trozet
PR: ovn-kubernetes/ovn-kubernetes#5587
File: go-controller/pkg/networkmanager/egressip_tracker.go:396-443
Timestamp: 2025-09-29T13:52:53.205Z
Learning: In ovn-kubernetes go-controller/pkg/networkmanager/egressip_tracker.go, the onNetworkRefChange callback must remain synchronous because it relies on ordering of events with the Active signal. Using goroutines would create race conditions where DELETE events could be processed before ADD events, breaking the ordering guarantees that downstream components depend on.
Applied to files:
go-controller/pkg/ovn/egressip_test.gogo-controller/pkg/ovn/egressip.go
🧬 Code graph analysis (1)
go-controller/pkg/ovn/egressip_test.go (8)
go-controller/pkg/config/config.go (2)
Gateway(170-187)OVNKubernetesFeature(158-161)go-controller/pkg/testing/libovsdb/libovsdb.go (2)
TestSetup(41-50)TestData(52-52)go-controller/pkg/types/const.go (10)
OVNClusterRouter(43-43)GWRouterPrefix(48-48)GWRouterToJoinSwitchPrefix(55-55)EXTSwitchToGWRouterPrefix(58-58)GWRouterToExtSwitchPrefix(59-59)ExternalSwitchPrefix(47-47)DefaultNetworkName(7-7)DefaultNoRereoutePriority(120-120)EgressIPNodeConnectionMark(144-144)EgressIPReroutePriority(122-122)go-controller/pkg/nbdb/logical_router_port.go (1)
LogicalRouterPort(11-26)go-controller/pkg/nbdb/logical_switch_port.go (1)
LogicalSwitchPort(11-30)go-controller/vendor/github.com/containernetworking/cni/pkg/types/types.go (2)
ParseCIDR(30-38)IPNet(26-26)go-controller/pkg/nbdb/logical_router_policy.go (3)
LogicalRouterPolicy(22-34)LogicalRouterPolicyActionAllow(15-15)LogicalRouterPolicyActionReroute(17-17)go-controller/pkg/nbdb/nat.go (2)
NAT(21-36)NATTypeSNAT(16-16)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build-PR
- GitHub Check: Lint
🔇 Additional comments (5)
go-controller/pkg/ovn/egressip.go (5)
938-940: LGTM! Proper cleanup of stale statuses.This correctly handles the cleanup of stale statuses when an EgressIP object no longer manages a pod. The logic properly checks if the pod is managed by a different EgressIP (line 937) but still has this statusToRemove in its cache, and removes it accordingly.
1175-1176: Well-structured per-EIP node tracking.The new
egressIPToAssignedNodesmap correctly tracks the assigned node for each EgressIP IP within each EgressIP object. The structure (egressIP name → egressIP IP → node name) properly supports the failover scenario where an EgressIP can be reassigned to a different node.The initialization at line 1980 and population at lines 1995-2003 look correct, properly keying by both egressIP name and the parsed IP string.
Also applies to: 1979-2003
1616-1620: Synced status seeding enables proper reconciliation.The seeding of pod egress statuses during cache synchronization correctly marks entries with
egressStatusStateSynced. This marker is checked at line 828 inaddPodEgressIPAssignments, ensuring that synced entries will be re-applied during the next reconciliation cycle. This is the key mechanism for fixing stale assignments after controller restart.The loop properly creates status items for each EgressIP IP using the assigned node from
egressIPToAssignedNodes[egressIPName].
2260-2293: Clear state tracking with proper stale detection.The
egressStatusStateSyncedconstant and updatedegressStatusesdocumentation clearly define the three states an entry can be in. ThehasStaleEIPStatusmethod correctly identifies stale entries by finding statuses with the same EgressIP but different node.The implementation properly creates a new
EgressIPStatusItemstruct at line 2288 rather than returning a pointer to the loop variable, avoiding a common Go pitfall.
792-882: No code changes needed.The concern about multiple stale entries is technically valid but the architecture correctly handles it through eventual consistency. The code works as designed:
hasStaleEIPStatusreturns the first stale entry found (same EgressIP, different Node) per invocation- If multiple stale entries exist for the same EgressIP, they are cleaned up across successive calls to
addPodEgressIPAssignments- Error handling is idempotent: deletion failures trigger retries where the next stale entry is detected
- The
podAssignmentlock serializes operations, preventing race conditions during cleanupThis eventually-consistent approach is acceptable and avoids unnecessary complexity of returning all stale entries at once.
Pull Request Test Coverage Report for Build 18655908164Details
💛 - Coveralls |
4fdd7a4 to
9250e66
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
go-controller/pkg/ovn/egressip.go (1)
856-881: Deadlock fix confirmed; consider minor optimization.The removal of the outer
nodeZoneStatelock before callingdeletePodEgressIPAssignmentscorrectly addresses the deadlock issue from previous reviews. The current implementation is safe because:
podAssignmentlock is held by the caller (lines 767-768)deletePodEgressIPAssignmentsacquiresnodeZoneStatelocks internally (line 1064)- Lock ordering is consistent: podAssignment (outer) → nodeZoneState (inner)
However, the zone-check logic (lines 868-881) could be slightly optimized by checking
isPodScheduledinLocalZonefirst before iterating through status nodes, since the pod zone check is cheaper and the condition at line 879 uses||logic:proceed := false + if e.isPodScheduledinLocalZone(pod) { + proceed = true + } - for _, status := range statusAssignments { + if !proceed { + for _, status := range statusAssignments { e.nodeZoneState.LockKey(status.Node) isLocalZoneEgressNode, loadedEgressNode := e.nodeZoneState.Load(status.Node) if loadedEgressNode && isLocalZoneEgressNode { proceed = true e.nodeZoneState.UnlockKey(status.Node) break } e.nodeZoneState.UnlockKey(status.Node) + } } - if !proceed && !e.isPodScheduledinLocalZone(pod) { + if !proceed { return nil }go-controller/pkg/ovn/egressip_test.go (2)
11325-11330: Consider checking errors from helper functions.While ignoring errors with
_is common in test code with known-good inputs, explicitly handling or at least acknowledging potential failures improves test robustness and debuggability.Consider one of these approaches:
Option 1: Use
gomega.Expectto verify the operations succeed:- i, n, _ := net.ParseCIDR(podV4IP + "/23") + i, n, err := net.ParseCIDR(podV4IP + "/23") + gomega.Expect(err).NotTo(gomega.HaveOccurred()) n.IP = iOption 2: Use a helper function like the existing
types.ParseCIDR(from vendor/github.com/containernetworking/cni/pkg/types/types.go):- i, n, _ := net.ParseCIDR(podV4IP + "/23") - n.IP = i + n, err := types.ParseCIDR(podV4IP + "/23") + gomega.Expect(err).NotTo(gomega.HaveOccurred())Apply similar changes to lines 11328-11330 and 11358-11360.
Also applies to: 11358-11360
11148-11356: Add comments to clarify the test scenario.This test simulates a complex race condition scenario (EIP failover concurrent with controller restart). Adding a structured comment block explaining the initial state, the sequence of events, and what's being verified would improve maintainability and help future developers understand the test's purpose.
Consider adding a comment like this at the beginning of the test:
ginkgo.It("should update SNAT and LRP nexthops during simultaneous EIP failover and ovnkube-controller restart", func() { + // This test reproduces the race condition described in the PR: + // Initial state: EIP-1 assigned to node-1, pod1 on node-1, pod2 on node-3 + // During failover: EIP-1 reassigned to node-2, controller restarts + // Race condition: Pod add events processed with stale cache create entries pointing to node-1 + // Expected result: After reconciliation, all LRPs should point to node-2 (the new EIP node) + // and stale SNAT/LRP entries referring to node-1 should be removed app.Action = func(*cli.Context) error {
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
go-controller/pkg/ovn/egressip.go(10 hunks)go-controller/pkg/ovn/egressip_test.go(5 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-09T12:23:01.462Z
Learnt from: npinaeva
PR: ovn-kubernetes/ovn-kubernetes#5561
File: go-controller/pkg/ovn/egressip.go:3256-3304
Timestamp: 2025-10-09T12:23:01.462Z
Learning: In go-controller/pkg/ovn/egressip.go, EgressIP reroute policies (priority types.EgressIPReroutePriority) are created via createReroutePolicyOps() using getEgressIPLRPReRouteDbIDs(..., controller = e.controllerName). Therefore, predicates updating these LRPs should match ExternalIDs[OwnerControllerKey] against e.controllerName (not a network-scoped controller name).
Applied to files:
go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-09-03T09:38:27.723Z
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5555
File: go-controller/pkg/ovn/egressip_test.go:11335-11351
Timestamp: 2025-09-03T09:38:27.723Z
Learning: In ovn-kubernetes Go tests (e.g., go-controller/pkg/ovn/egressip_test.go), any goroutine that uses Gomega assertions should call defer ginkgo.GinkgoRecover() at the top so assertion panics are captured by Ginkgo.
Applied to files:
go-controller/pkg/ovn/egressip_test.go
🧬 Code graph analysis (1)
go-controller/pkg/ovn/egressip_test.go (6)
go-controller/pkg/config/config.go (2)
Gateway(170-187)OVNKubernetesFeature(158-161)go-controller/pkg/testing/libovsdb/libovsdb.go (2)
TestSetup(41-50)TestData(52-52)go-controller/pkg/types/const.go (10)
OVNClusterRouter(43-43)GWRouterPrefix(48-48)GWRouterToJoinSwitchPrefix(55-55)EXTSwitchToGWRouterPrefix(58-58)GWRouterToExtSwitchPrefix(59-59)ExternalSwitchPrefix(47-47)DefaultNetworkName(7-7)DefaultNoRereoutePriority(120-120)EgressIPNodeConnectionMark(144-144)EgressIPReroutePriority(122-122)go-controller/vendor/github.com/containernetworking/cni/pkg/types/types.go (2)
ParseCIDR(30-38)IPNet(26-26)go-controller/pkg/nbdb/logical_router_policy.go (1)
LogicalRouterPolicy(22-34)go-controller/pkg/nbdb/nat.go (1)
NAT(21-36)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build-PR
- GitHub Check: Lint
🔇 Additional comments (3)
go-controller/pkg/ovn/egressip.go (2)
2282-2292: LGTM: Stale EIP detection logic is correct.The
hasStaleEIPStatusmethod correctly identifies when the same EgressIP is reassigned to a different node by comparingEgressIPfields while checking for differingNodefields. This is the core logic needed to detect and clean up stale SNAT/LRP entries during failover scenarios.
1615-1619: Verify seeded statuses are applied exactly once per reconciliation cycle.The seeding logic populates
podState.egressStatuseswithegressStatusStateSyncedfor all EIP assignments. Line 827 treats these as needing reapplication (value == egressStatusStateSynced). Confirm that after the first application, these entries are updated to""to prevent redundant reapplication on subsequent reconciliation passes within the same controller session.Looking at line 894 and 902, the code sets
podState.egressStatuses.statusMap[status] = ""after successful application, which confirms the marker is cleared. This should prevent redundant work.go-controller/pkg/ovn/egressip_test.go (1)
7976-7976: LGTM! Status tracking updates are consistent.The updates to use
HaveLenchecks and expectegressStatusStateSyncedinstead of empty strings correctly reflect the new per-EIP synchronization mechanism introduced in this PR.Also applies to: 7989-7990, 8016-8017, 8082-8082
|
I am confused with the reproducer:
Why is step 4 an EIP add? It should be an update? in which case we are supposed to cleanup the old status from the configured pods an not leave any stale entries behind. |
|
/retest-failed |
So you are saying that we ran the sync with the EIP using the old object but then got the ADD with the new one? That sounds weird as I would expect the synthetic ADDs and the initial syncs to use the same objects to avoid exactly this type of issues. |
That's right.
Okay, but that hasn’t always been the case. The initial sync sometimes used an outdated EIP object, possibly because the watch factory informer cache wasn’t fully up to date when syncing existing objects. This issue occurs intermittently when EIP failover attempts combined with ovnkube-controller restarts. |
9250e66 to
b7d87e2
Compare
@kyrtapz Thanks for the offline discussion, you correctly pointed out EIP initial sync and add happens with new EIP status, only pod add event saw older EIP status from the informer cache which led to stale SNATs/LRP nexthops. so adjusted the PR only for this scenario. PTAL. |
There was a problem hiding this comment.
Thanks @pperiyasamy! This version looks good to me.
It was tricky to understand the exact order of events that this PR addresses so here it is for others that might be confused:
We are addressing a very specific startup race where the stale entries cleanup runs before the egressIP status got updated and the egressIP controller starts after. That ends up with us missing one update and just configuring the new status without removing the old one.
With this:
ovn-kubernetes/go-controller/pkg/ovn/default_network_controller.go
Lines 462 to 485 in a44f6c1
The order is:
- syncEgressIPs is called from
WatchEgressIPNamespaceswith old EgressIP status - no stale entries found. - EgressIP status gets updated.
WatchEgressIPNamespaces/WatchEgressIPPod/WatchEgressIPcalls synthetic ADD using the new status - no cleanup of old status.
|
After talking to @pperiyasamy I think there still might be a possibility for a race:
Maybe we need to populate the eipStatus in |
b7d87e2 to
188b23d
Compare
yes @kyrtapz , restored |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
go-controller/pkg/ovn/egressip.go (1)
859-866: Consider logging stale status deletion as info level for better visibility.The deletion of stale EIP status is logged at V(2) level, which may not be visible in production. Since this addresses the race condition mentioned in the PR, consider using
klog.Infofinstead ofklog.V(2).Infofto ensure these cleanup operations are visible in logs for debugging and verification.- klog.V(2).Infof("Deleting stale pod egress IP status: %v for EgressIP: %s and pod: %s/%s/%v", staleStatus, name, pod.Namespace, pod.Name, podIPNets) + klog.Infof("Deleting stale pod egress IP status: %v for EgressIP: %s and pod: %s/%s/%v", staleStatus, name, pod.Namespace, pod.Name, podIPNets)
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
go-controller/pkg/ovn/egressip.go(10 hunks)go-controller/pkg/ovn/egressip_test.go(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- go-controller/pkg/ovn/egressip_test.go
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5606
File: go-controller/pkg/ovn/egressip_test.go:11388-11395
Timestamp: 2025-10-20T15:07:49.533Z
Learning: In interconnect (IC) multi-zone scenarios in ovn-kubernetes, EgressIP reroute LRPs (priority types.EgressIPReroutePriority) and related SNATs for pods scheduled on remote-zone nodes are programmed only in that remote zone’s OVN NB database. The global/local zone NB DB should not expect those remote LRPs/SNATs. Applies to go-controller/pkg/ovn/egressip_test.go tests that mark nodes with "global" vs "remote".
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Lint
- GitHub Check: Build-PR
🔇 Additional comments (2)
go-controller/pkg/ovn/egressip.go (2)
1613-1617: LGTM! Proper seeding of egressStatuses during sync.This code correctly populates the
egressStatusescache with assigned node information from the EIP sync, marking entries withegressStatusStateSynced. This enables subsequent event handlers to detect and clean up stale SNAT/LRP entries when EIP failover occurs during controller restart.
867-884: Verify locking pattern doesn't introduce performance issues at scale.The new local zone eligibility check acquires and releases
nodeZoneStatelocks for each status instatusAssignments. While the locks are properly ordered (acquired and released one at a time), this could introduce latency in deployments with many EIP status assignments.Consider monitoring this in scale tests. If performance issues are observed, an alternative approach could pre-gather local zone node information before the loop to avoid repeated locking.
kyrtapz
left a comment
There was a problem hiding this comment.
LGTM
Left some non-blocking comments.
188b23d to
73e7057
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
go-controller/pkg/ovn/egressip.go (1)
792-866: Well-structured stale entry cleanup with acceptable tradeoffs.The refactored flow correctly separates remaining assignments from stale ones, deletes stale entries before adding new ones (avoiding conflicts), and maintains proper cache consistency. The warning-and-continue approach for deletion failures (line 863) is a reasonable tradeoff—sync functions will eventually catch any orphaned database entries, and avoiding perpetual retry failures is valuable.
One minor consideration: if stale deletions consistently fail for a specific pod, the cache entry is cleared (line 865) but the database entry persists until the next sync. This window is acceptable given the reconciliation mechanisms in place, but be aware that brief periods of stale SNATs/LRPs may occur.
If you want to be more defensive, you could consider:
- Tracking failed stale deletions separately (e.g., a
failedDeletionsset) and retrying them before adding new entries- Adding a metric to monitor stale deletion failures
However, this adds complexity and may not be necessary given the sync functions already provide eventual consistency.
go-controller/pkg/ovn/egressip_test.go (1)
11148-11501: Consider helper functions to improve test maintainability.This test is comprehensive (350+ lines) but quite lengthy. The setup includes repetitive patterns for creating routers, ports, and switches. Consider extracting helper functions for common object creation patterns to improve readability and maintainability.
For example, helper functions for:
- Creating gateway routers with standard options
- Creating logical router ports with transit switch IPs
- Creating external switch ports with router options
This is optional and can be deferred to future refactoring.
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
go-controller/pkg/ovn/egressip.go(9 hunks)go-controller/pkg/ovn/egressip_test.go(5 hunks)
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5606
File: go-controller/pkg/ovn/egressip_test.go:11388-11395
Timestamp: 2025-10-20T15:07:49.533Z
Learning: In interconnect (IC) multi-zone scenarios in ovn-kubernetes, EgressIP reroute LRPs (priority types.EgressIPReroutePriority) and related SNATs for pods scheduled on remote-zone nodes are programmed only in that remote zone’s OVN NB database. The global/local zone NB DB should not expect those remote LRPs/SNATs. Applies to go-controller/pkg/ovn/egressip_test.go tests that mark nodes with "global" vs "remote".
📚 Learning: 2025-10-20T15:07:49.533Z
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5606
File: go-controller/pkg/ovn/egressip_test.go:11388-11395
Timestamp: 2025-10-20T15:07:49.533Z
Learning: In interconnect (IC) multi-zone scenarios in ovn-kubernetes, EgressIP reroute LRPs (priority types.EgressIPReroutePriority) and related SNATs for pods scheduled on remote-zone nodes are programmed only in that remote zone’s OVN NB database. The global/local zone NB DB should not expect those remote LRPs/SNATs. Applies to go-controller/pkg/ovn/egressip_test.go tests that mark nodes with "global" vs "remote".
Applied to files:
go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-10-23T14:10:26.595Z
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5493
File: go-controller/pkg/ovn/egressip_test.go:13927-13945
Timestamp: 2025-10-23T14:10:26.595Z
Learning: In ovn-kubernetes/go-controller/pkg/ovn/egressip_test.go unit tests (e.g., the "Sync/remove invalid next hop from LRP" cases), it is acceptable to use the same mask value for both IPv4 and IPv6 in annotations/fixtures; do not require family-correct masks (e.g., /64 for v6) in these tests.
Applied to files:
go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-09-03T09:38:27.723Z
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5555
File: go-controller/pkg/ovn/egressip_test.go:11335-11351
Timestamp: 2025-09-03T09:38:27.723Z
Learning: In ovn-kubernetes Go tests (e.g., go-controller/pkg/ovn/egressip_test.go), any goroutine that uses Gomega assertions should call defer ginkgo.GinkgoRecover() at the top so assertion panics are captured by Ginkgo.
Applied to files:
go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-10-09T12:23:01.462Z
Learnt from: npinaeva
PR: ovn-kubernetes/ovn-kubernetes#5561
File: go-controller/pkg/ovn/egressip.go:3256-3304
Timestamp: 2025-10-09T12:23:01.462Z
Learning: In go-controller/pkg/ovn/egressip.go, EgressIP reroute policies (priority types.EgressIPReroutePriority) are created via createReroutePolicyOps() using getEgressIPLRPReRouteDbIDs(..., controller = e.controllerName). Therefore, predicates updating these LRPs should match ExternalIDs[OwnerControllerKey] against e.controllerName (not a network-scoped controller name).
Applied to files:
go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-08-08T10:03:01.147Z
Learnt from: ricky-rav
PR: ovn-kubernetes/ovn-kubernetes#5387
File: test/e2e/route_advertisements.go:677-678
Timestamp: 2025-08-08T10:03:01.147Z
Learning: In ovn-kubernetes test/e2e/route_advertisements.go (Go, e2e tests), maintainers (per ricky-rav on PR #5387) prefer not to refactor existing variable reuse (e.g., reusing `pod`/`svc` for multiple pods/services) or add node-pinning in unrelated PRs. Suggestions about such refactors should be deferred to a follow-up issue rather than requested in the current feature PR.
Applied to files:
go-controller/pkg/ovn/egressip_test.go
🧬 Code graph analysis (1)
go-controller/pkg/ovn/egressip_test.go (9)
go-controller/pkg/config/config.go (2)
Gateway(170-187)OVNKubernetesFeature(158-161)go-controller/pkg/testing/libovsdb/libovsdb.go (2)
TestSetup(41-50)TestData(52-52)go-controller/pkg/types/const.go (10)
OVNClusterRouter(39-39)GWRouterPrefix(44-44)GWRouterToJoinSwitchPrefix(49-49)EXTSwitchToGWRouterPrefix(50-50)GWRouterToExtSwitchPrefix(51-51)ExternalSwitchPrefix(43-43)DefaultNetworkName(7-7)DefaultNoRereoutePriority(115-115)EgressIPNodeConnectionMark(139-139)EgressIPReroutePriority(117-117)go-controller/pkg/nbdb/logical_router_port.go (1)
LogicalRouterPort(11-26)go-controller/pkg/nbdb/logical_switch_port.go (1)
LogicalSwitchPort(11-30)go-controller/vendor/github.com/containernetworking/cni/pkg/types/types.go (2)
ParseCIDR(30-38)IPNet(26-26)go-controller/pkg/nbdb/logical_router_policy.go (3)
LogicalRouterPolicy(22-34)LogicalRouterPolicyActionAllow(15-15)LogicalRouterPolicyActionReroute(17-17)go-controller/pkg/ovn/egressip.go (1)
IPFamilyValueV4(62-62)go-controller/pkg/nbdb/nat.go (2)
NAT(21-36)NATTypeSNAT(16-16)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Lint
- GitHub Check: Build-PR
🔇 Additional comments (9)
go-controller/pkg/ovn/egressip.go (5)
1174-1175: LGTM: EgressIP-to-node assignment tracking properly implemented.The new
egressIPToAssignedNodescache field correctly tracks the mapping of EgressIP name → EgressIP IP → assigned node. The nested map structure is populated during cache generation and used during sync to seed pending status markers, enabling detection of stale assignments after failover.Also applies to: 1975-1999
2256-2258: Clear documentation of pending state semantics.The
egressStatusStatePendingconstant and its usage in thestatusMapare well-documented. The distinction between "" (applied/reconciled) and "pending" (requires reconciliation) is clear and will help future maintainers understand the sync-and-reconcile pattern.Also applies to: 2263-2268
2279-2292: Stale status detection logic is sound.The
hasStaleEIPStatusmethod correctly identifies scenarios where the same EgressIP address has been reassigned to a different node (failover case). Returning the first match is appropriate since multiple stale entries for the same EIP would indicate a data inconsistency that should be cleaned up anyway.
867-884: Zone locality optimization is correctly implemented.The proceed check efficiently short-circuits work when neither the egress nodes nor the pod are relevant to this zone. The logic correctly implements: proceed if (any egress node is local) OR (pod is local to zone). The early break (line 879) once a local node is found is a good optimization.
Note that this is a coarse-grained check—individual assignment filtering happens later in
addPodEgressIPAssignmentbased on fine-grained node/pod locality, which is the right approach.
1610-1616: Pending status population enables proper reconciliation.Seeding the
podState.egressStatuseswithegressStatusStatePendingduring sync is the key mechanism that allows subsequent pod add events to detect entries that were populated during sync and reconcile them against current state. This correctly addresses the race condition described in the PR objectives where pod events use stale informer cache.go-controller/pkg/ovn/egressip_test.go (4)
7976-7976: LGTM: Expectation updated to reflect pending state tracking.The change from checking for a specific length to
HaveLen(2)correctly reflects that thestatusMapnow tracks pending states for both EgressIP status items.
7989-7990: LGTM: Expectations updated for pending state semantics.The changes correctly update the test expectations to use
egressStatusStatePendinginstead of empty strings, aligning with the new egress status tracking behavior in the podAssignment cache.Also applies to: 8016-8017, 8082-8082
11420-11429: Clarify the SNAT entry on GR_node1 after failover.The expected database state includes a SNAT entry on
GR_node1(lines 11420-11429) that translatespodV4IPtonode1IPv4. This appears after the EgressIP has failed over from node1 to node2.
- If this SNAT is stale (left over from before the failover), shouldn't the fix remove it?
- If this SNAT serves a different purpose (e.g., pod default SNAT when
DisableSNATMultipleGWs=true), please add a comment explaining why it remains after failover.The PR description states the fix "leverages egressStatuses stored in the podAssignment cache to reconcile pod assignments and remove stale SNAT and LRP entries created by the race." Can you confirm whether this SNAT entry is expected to remain?
11345-11356: Verify test adequately simulates the race condition.The PR objectives describe a race where "pod add events were processed using an older EIP status from the informer cache, which caused stale SNATs/LRP nexthops." However, in this test:
- Pods are created and added to the logical port cache before the simulated restart (lines 11320-11330)
- The simulated restart only calls
reconcileEgressIPwith the updated EIP status (line 11355)Does this test adequately reproduce the race where pod add events arrive during/after controller restart but use stale EIP status from the cache? Consider adding a comment explaining how this test setup simulates the described race condition, or adjust the test to more explicitly demonstrate concurrent pod events using stale cache state.
Scenario: - Nodes: node-1, node-2, node-3 - Egress IPs: EIP-1 - Pods: pod1 on node-1, pod2 on node-3 (pods are created via deployment replicas) - Egress-assignable nodes: node-1, node-2 - EIP-1 assigned to node-1 During a simultaneous reboot of node-1 and node-2, EIP-1 failed over to node-2 and ovnkube-controller restarted at nearly the same time: 1) EIP-1 was reassigned to node-2 by the cluster manager. 2) The sync EIP happened for EIP1 with stale status, though it cleaned SNATs/LRPs referring to node-1 due to outdated pod IPs (this is because pods will be recreated due to node reboots). 3) pod1/pod2 Add events arrived while the informer cache still had the old EIP status, so new SNATs/LRPs were created pointing to node-1. 4) The EIP-1 Add event arrived with the new status; entries for node-2 were added/updated. 5) Result: stale SNATs and LRPs with stale nexthops for node-1 remained. Fix: - Populate pod EIP status during EgressIP sync so podAssignment has accurate egressStatuses. - Reconcile stale assignments using podAssignment (egressStatuses) when the informer cache is not up to date, ensuring SNAT/LRP for the previously assigned node are corrected. - Remove stale EIP SNAT entries for remote-zone pods accordingly. - Add coverage for simultaneous EIP failover and controller restart. Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com>
f1275b0 to
86c6930
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
go-controller/pkg/ovn/egressip_test.go (1)
11149-11502: Consider adding explanatory comments for the interconnect scenario.The test comprehensively validates the EIP failover race condition fix and correctly handles the interconnect multi-zone scenario. However, the complex setup and expected state could benefit from brief inline comments clarifying:
- Why
egressPod2's LRP is intentionally absent from the expected DB state (it's on node3 in the remote zone, so its reroute LRP belongs to the remote zone's NB DB).- The purpose of the SNAT entry (lines 11421-11430) with
node1IPv4as the external IP—it appears to be a default pod SNAT rather than an EgressIP SNAT, which would use the EIP address and be programmed in the remote zone DB where node2 resides.These comments would help future maintainers quickly understand the interconnect architecture being tested.
Example addition near line 11389:
+ // LRP for egressPod reroutes to node2's transit switch IP (new EIP assignment). + // Note: egressPod2's LRP is not included here because node3 is in a remote zone; + // its LRP is programmed only in that remote zone's NB DB. &nbdb.LogicalRouterPolicy{And near line 11421:
+ // Default pod SNAT for egressPod on its local node (node1). + // The EgressIP SNAT (Pod->EIP) is programmed on node2 in the remote zone DB (not visible here). &nbdb.NAT{
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
go-controller/pkg/ovn/egressip.go(9 hunks)go-controller/pkg/ovn/egressip_test.go(5 hunks)
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5606
File: go-controller/pkg/ovn/egressip_test.go:11388-11395
Timestamp: 2025-10-20T15:07:49.533Z
Learning: In interconnect (IC) multi-zone scenarios in ovn-kubernetes, EgressIP reroute LRPs (priority types.EgressIPReroutePriority) and related SNATs for pods scheduled on remote-zone nodes are programmed only in that remote zone’s OVN NB database. The global/local zone NB DB should not expect those remote LRPs/SNATs. Applies to go-controller/pkg/ovn/egressip_test.go tests that mark nodes with "global" vs "remote".
📚 Learning: 2025-10-23T14:10:26.595Z
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5493
File: go-controller/pkg/ovn/egressip_test.go:13927-13945
Timestamp: 2025-10-23T14:10:26.595Z
Learning: In ovn-kubernetes/go-controller/pkg/ovn/egressip_test.go unit tests (e.g., the "Sync/remove invalid next hop from LRP" cases), it is acceptable to use the same mask value for both IPv4 and IPv6 in annotations/fixtures; do not require family-correct masks (e.g., /64 for v6) in these tests.
Applied to files:
go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-10-20T15:07:49.533Z
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5606
File: go-controller/pkg/ovn/egressip_test.go:11388-11395
Timestamp: 2025-10-20T15:07:49.533Z
Learning: In interconnect (IC) multi-zone scenarios in ovn-kubernetes, EgressIP reroute LRPs (priority types.EgressIPReroutePriority) and related SNATs for pods scheduled on remote-zone nodes are programmed only in that remote zone’s OVN NB database. The global/local zone NB DB should not expect those remote LRPs/SNATs. Applies to go-controller/pkg/ovn/egressip_test.go tests that mark nodes with "global" vs "remote".
Applied to files:
go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-09-03T09:38:27.723Z
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5555
File: go-controller/pkg/ovn/egressip_test.go:11335-11351
Timestamp: 2025-09-03T09:38:27.723Z
Learning: In ovn-kubernetes Go tests (e.g., go-controller/pkg/ovn/egressip_test.go), any goroutine that uses Gomega assertions should call defer ginkgo.GinkgoRecover() at the top so assertion panics are captured by Ginkgo.
Applied to files:
go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-10-09T12:23:01.462Z
Learnt from: npinaeva
PR: ovn-kubernetes/ovn-kubernetes#5561
File: go-controller/pkg/ovn/egressip.go:3256-3304
Timestamp: 2025-10-09T12:23:01.462Z
Learning: In go-controller/pkg/ovn/egressip.go, EgressIP reroute policies (priority types.EgressIPReroutePriority) are created via createReroutePolicyOps() using getEgressIPLRPReRouteDbIDs(..., controller = e.controllerName). Therefore, predicates updating these LRPs should match ExternalIDs[OwnerControllerKey] against e.controllerName (not a network-scoped controller name).
Applied to files:
go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-08-08T10:03:01.147Z
Learnt from: ricky-rav
PR: ovn-kubernetes/ovn-kubernetes#5387
File: test/e2e/route_advertisements.go:677-678
Timestamp: 2025-08-08T10:03:01.147Z
Learning: In ovn-kubernetes test/e2e/route_advertisements.go (Go, e2e tests), maintainers (per ricky-rav on PR #5387) prefer not to refactor existing variable reuse (e.g., reusing `pod`/`svc` for multiple pods/services) or add node-pinning in unrelated PRs. Suggestions about such refactors should be deferred to a follow-up issue rather than requested in the current feature PR.
Applied to files:
go-controller/pkg/ovn/egressip_test.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build-PR
- GitHub Check: Lint
🔇 Additional comments (1)
go-controller/pkg/ovn/egressip_test.go (1)
7977-7977: LGTM: Consistent pending state tracking.The updated assertions correctly validate the new behavior where
egressStatuses.statusMapentries are explicitly tracked withegressStatusStatePendingvalues instead of empty strings during cache operations.Also applies to: 7990-7991, 8017-8018, 8083-8083
|
saw a flake with EgressFirewall E2E test, unrelated to this PR changes. created an issue #5693. |
|
Fix included in accepted release 4.21.0-0.nightly-2025-11-03-191704 |
When ovnkube-controller restart and an EgressIP (EIP) failover occur at the same time, this causes a race condition between event handling and informer cache update for EgressIP status. so while handling pod add event, controller sees older EIP status it fails to remove the SNAT and LRP configuration for the previously assigned node.
This PR leverages the
egressStatusesstored in thepodAssignmentcache to reconcile and remove those stale entries correctly. Have also added an unit test to replicate above mentioned scenario.Summary by CodeRabbit
New Features
Bug Fixes
Tests