Skip to content

Fix stale EIP assignments during failover and controller restart#5606

Merged
jcaamano merged 1 commit into
ovn-kubernetes:masterfrom
pperiyasamy:eip_failover_ovnkubenode_restart
Oct 28, 2025
Merged

Fix stale EIP assignments during failover and controller restart#5606
jcaamano merged 1 commit into
ovn-kubernetes:masterfrom
pperiyasamy:eip_failover_ovnkubenode_restart

Conversation

@pperiyasamy

@pperiyasamy pperiyasamy commented Sep 30, 2025

Copy link
Copy Markdown
Contributor

When ovnkube-controller restart and an EgressIP (EIP) failover occur at the same time, this causes a race condition between event handling and informer cache update for EgressIP status. so while handling pod add event, controller sees older EIP status it fails to remove the SNAT and LRP configuration for the previously assigned node.

Nodes: node-1, node-2 and node-3
Egress IPs: EIP-1
Pods: pod1 (placed on node-1), pod2 (placed on node-3)
Egress Assignable Nodes: node-1 and node-2

Scenario (seen from the reproducer setup)
-------------------------------------------

EIP-1 assigned to node-1. node-1 and node-2 rebooted at the sametime.  
This made EIP failover and ovnkube-controller container restart happened almost at the same time.

1. EIP-1 is reassigned to node-2 by cluster manager.
2. EIP controller synchronizes EIP1 object with new object, but it cleans up SNATs and LRPs referring to node-1 due to stale pod IP addresses (due to pod recreation).
3. At the same time, pod1 and pod2 add events are triggered, but EIP controller's watch factory seeing older EIP status from the informer cache, so SNATs and LRPs are created referring to node-1 for the new pod IPs.
4. EIP-1 add event is triggered with new EIP status, EIP controller adds new SNAT entries and updates LRP nexthop for node-2.
5. stale SNATs, LRPs having with stale nexthops for node-1.

This PR leverages the egressStatuses stored in the podAssignment cache to reconcile and remove those stale entries correctly. Have also added an unit test to replicate above mentioned scenario.

Summary by CodeRabbit

  • New Features

    • Per-EIP/per-IP assignment tracking to improve visibility and reconciliation after topology changes.
    • Seeded "pending" status markers during sync to preserve assignment info while reconciliation runs.
  • Bug Fixes

    • Cleanup of stale per-pod and per-EIP status entries to prevent misreporting.
    • More reliable EgressIP failover so SNAT and router nexthops update correctly during simultaneous failover and controller restart.
  • Tests

    • Added large multi-node end-to-end failover test simulating controller restart; updated assertions and removed ad-hoc cache resets.

@coderabbitai

coderabbitai Bot commented Sep 30, 2025

Copy link
Copy Markdown

Walkthrough

Tracks per-EgressIP per-IP assigned nodes in the controller cache, seeds pod egress status entries as a pending marker during cache rebuild, prunes stale per-EIP statuses during add/delete assignment flows, and adds an end-to-end test validating SNAT/LRP nexthop reconciliation during simultaneous EIP failover and controller restart.

Changes

Cohort / File(s) Summary
EgressIP controller logic
go-controller/pkg/ovn/egressip.go
Added egressIPToAssignedNodes map[string]map[string]string in cache; introduced egressStatusStatePending; populated per-EIP per-IP assigned-node mappings during cache generation and sync; seed podState.egressStatuses with pending statuses; added egressStatuses.hasStaleEIPStatus(...) and pruning of stale statuses; updated add/delete assignment flows to consult/handle pending and stale entries.
EgressIP tests
go-controller/pkg/ovn/egressip_test.go
Added an end-to-end test exercising simultaneous EIP failover and ovnkube-controller restart that verifies SNAT and LRP nexthop updates across IPv4/IPv6 and interconnect contexts; adjusted test assertions to use egressStatusStatePending and updated statusMap length expectations; removed manual cache-resetting steps in test sequences.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant K8s as Kubernetes API
  participant Ctrl as EgressIP Controller
  participant Cache as egressIPCache
  participant OVN as OVN NB DB

  rect rgb(250,250,255)
  note over Ctrl,Cache: Startup / cache rebuild
  Ctrl->>K8s: List EgressIP objects
  Ctrl->>Cache: generateCacheForEgressIP()
  Cache-->>Ctrl: egressIPToAssignedNodes (EIP -> IP -> Node)
  Ctrl->>Cache: syncPodAssignmentCache()
  note right of Cache: Seed podState.egressStatuses = egressStatusStatePending
  end

  rect rgb(245,255,245)
  note over Ctrl,OVN: Reconciliation / failover flows
  Ctrl->>Ctrl: addPodEgressIPAssignments()\n- consult statusMap, avoid overwriting pending seeds
  Ctrl->>Ctrl: deleteEgressIPAssignments()\n- detect & prune stale per-EIP statuses via hasStaleEIPStatus
  Ctrl->>OVN: Update SNATs / LRP nexthops per assigned node
  OVN-->>Ctrl: NAT / route / QoS state responses
  end

  alt Failover detected
    Ctrl->>OVN: Adjust nexthops to new node(s)
  else Controller restart with seeded pending statuses
    Ctrl->>Cache: Pending seeds trigger reconciliation and cleanup
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Areas to focus: correctness of egressIPToAssignedNodes population/usage, hasStaleEIPStatus detection and pruning, interplay between seeded egressStatusStatePending and assignment flows, and the new end-to-end test setup/expectations.

Possibly related PRs

Suggested labels

area/e2e-testing, component/ovnkube-controller

Suggested reviewers

  • tssurya
  • jcaamano
  • martinkennelly

Poem

I twitch my whiskers, map each EIP,
Seeds pending status where puzzles lie—
When controllers hop and nexthops flee,
I prune stale paths and stitch the sky.
Little rabbit hums—SNATs dance by. 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Fix stale EIP assignments during failover and controller restart" directly aligns with the main objective of the pull request. The changes implement stale-entry detection and reconciliation logic in the egressip.go file, along with a comprehensive test that reproduces the concurrent EIP failover and controller restart scenario. The title is concise, specific, and clearly identifies both the problem being addressed (stale EIP assignments) and the conditions under which it occurs (failover and controller restart). A reviewer scanning the commit history would immediately understand the primary purpose of this changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f1275b0 and 86c6930.

📒 Files selected for processing (2)
  • go-controller/pkg/ovn/egressip.go (9 hunks)
  • go-controller/pkg/ovn/egressip_test.go (5 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5606
File: go-controller/pkg/ovn/egressip_test.go:11388-11395
Timestamp: 2025-10-20T15:07:49.533Z
Learning: In interconnect (IC) multi-zone scenarios in ovn-kubernetes, EgressIP reroute LRPs (priority types.EgressIPReroutePriority) and related SNATs for pods scheduled on remote-zone nodes are programmed only in that remote zone’s OVN NB database. The global/local zone NB DB should not expect those remote LRPs/SNATs. Applies to go-controller/pkg/ovn/egressip_test.go tests that mark nodes with "global" vs "remote".
📚 Learning: 2025-10-20T15:07:49.533Z
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5606
File: go-controller/pkg/ovn/egressip_test.go:11388-11395
Timestamp: 2025-10-20T15:07:49.533Z
Learning: In interconnect (IC) multi-zone scenarios in ovn-kubernetes, EgressIP reroute LRPs (priority types.EgressIPReroutePriority) and related SNATs for pods scheduled on remote-zone nodes are programmed only in that remote zone’s OVN NB database. The global/local zone NB DB should not expect those remote LRPs/SNATs. Applies to go-controller/pkg/ovn/egressip_test.go tests that mark nodes with "global" vs "remote".

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-10-23T14:10:26.595Z
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5493
File: go-controller/pkg/ovn/egressip_test.go:13927-13945
Timestamp: 2025-10-23T14:10:26.595Z
Learning: In ovn-kubernetes/go-controller/pkg/ovn/egressip_test.go unit tests (e.g., the "Sync/remove invalid next hop from LRP" cases), it is acceptable to use the same mask value for both IPv4 and IPv6 in annotations/fixtures; do not require family-correct masks (e.g., /64 for v6) in these tests.

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-08-19T10:19:13.298Z
Learnt from: tssurya
PR: ovn-kubernetes/ovn-kubernetes#5276
File: go-controller/pkg/node/bridgeconfig/bridgeflows.go:817-827
Timestamp: 2025-08-19T10:19:13.298Z
Learning: In ovn-kubernetes go-controller/pkg/node/bridgeconfig/bridgeflows.go, MEG (Multiple External Gateways, controlled by disableSNATMultipleGWs) and EgressIP are independent features that should not be coupled in flow logic. The priority 104 flow condition should use "disableSNATMultipleGWs || isNetworkAdvertised" instead of "(disableSNATMultipleGWs && config.OVNKubernetesFeature.EnableEgressIP) || isNetworkAdvertised" to allow MEG to function independently of EgressIP enablement.

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-09-03T09:38:27.723Z
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5555
File: go-controller/pkg/ovn/egressip_test.go:11335-11351
Timestamp: 2025-09-03T09:38:27.723Z
Learning: In ovn-kubernetes Go tests (e.g., go-controller/pkg/ovn/egressip_test.go), any goroutine that uses Gomega assertions should call defer ginkgo.GinkgoRecover() at the top so assertion panics are captured by Ginkgo.

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-10-09T12:23:01.462Z
Learnt from: npinaeva
PR: ovn-kubernetes/ovn-kubernetes#5561
File: go-controller/pkg/ovn/egressip.go:3256-3304
Timestamp: 2025-10-09T12:23:01.462Z
Learning: In go-controller/pkg/ovn/egressip.go, EgressIP reroute policies (priority types.EgressIPReroutePriority) are created via createReroutePolicyOps() using getEgressIPLRPReRouteDbIDs(..., controller = e.controllerName). Therefore, predicates updating these LRPs should match ExternalIDs[OwnerControllerKey] against e.controllerName (not a network-scoped controller name).

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-08-08T10:03:01.147Z
Learnt from: ricky-rav
PR: ovn-kubernetes/ovn-kubernetes#5387
File: test/e2e/route_advertisements.go:677-678
Timestamp: 2025-08-08T10:03:01.147Z
Learning: In ovn-kubernetes test/e2e/route_advertisements.go (Go, e2e tests), maintainers (per ricky-rav on PR #5387) prefer not to refactor existing variable reuse (e.g., reusing `pod`/`svc` for multiple pods/services) or add node-pinning in unrelated PRs. Suggestions about such refactors should be deferred to a follow-up issue rather than requested in the current feature PR.

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
🧬 Code graph analysis (1)
go-controller/pkg/ovn/egressip_test.go (7)
go-controller/pkg/testing/libovsdb/libovsdb.go (2)
  • TestSetup (41-50)
  • TestData (52-52)
go-controller/pkg/types/const.go (10)
  • OVNClusterRouter (39-39)
  • GWRouterPrefix (44-44)
  • GWRouterToJoinSwitchPrefix (49-49)
  • EXTSwitchToGWRouterPrefix (50-50)
  • GWRouterToExtSwitchPrefix (51-51)
  • ExternalSwitchPrefix (43-43)
  • DefaultNetworkName (7-7)
  • DefaultNoRereoutePriority (115-115)
  • EgressIPNodeConnectionMark (139-139)
  • EgressIPReroutePriority (117-117)
go-controller/pkg/nbdb/logical_router_port.go (1)
  • LogicalRouterPort (11-26)
go-controller/pkg/nbdb/logical_switch_port.go (1)
  • LogicalSwitchPort (11-30)
go-controller/vendor/github.com/containernetworking/cni/pkg/types/types.go (2)
  • ParseCIDR (30-38)
  • IPNet (26-26)
go-controller/pkg/nbdb/logical_router_policy.go (3)
  • LogicalRouterPolicy (22-34)
  • LogicalRouterPolicyActionAllow (15-15)
  • LogicalRouterPolicyActionReroute (17-17)
go-controller/pkg/nbdb/nat.go (2)
  • NAT (21-36)
  • NATTypeSNAT (16-16)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Lint
  • GitHub Check: Build-PR
🔇 Additional comments (7)
go-controller/pkg/ovn/egressip_test.go (3)

7977-7977: LGTM: Assertion updates align with pending status tracking.

The changes correctly update assertions to check for egressStatusStatePending instead of empty strings, reflecting the PR's introduction of per-EIP-per-IP assignment status tracking. The HaveLen(2) assertion properly validates the two entries in the statusMap corresponding to the two EgressIP status items.

Also applies to: 7990-7991, 8017-8018, 8083-8083


11149-11502: Test structure and scenario setup look solid.

The test properly exercises the EIP failover + controller restart scenario described in the PR objectives:

  • Multi-zone interconnect setup with global and remote nodes is correctly configured
  • Simulates controller restart by updating EIP status and calling reconcileEgressIP
  • Validates expected LRP nexthop updates (line 11393 correctly points to node2's transit switch IP)
  • Correctly omits remote-zone entries (egressPod2's LRP, node2's EgressIP SNAT) per interconnect semantics

The test design effectively validates the reconciliation logic for the race condition scenario.


11421-11430: The NAT entry is correct - it's a default node SNAT, not an EgressIP-managed SNAT.

The NAT with ExternalIDs: nil is a default node-level SNAT, not an EgressIP SNAT. EgressIP SNATs are identified by having ExternalIDs[ObjectNameKey] set, which this NAT lacks. This default SNAT for the pod on node1 (LogicalIP=podV4IP, ExternalIP=node1IPv4) is unrelated to the EgressIP failover and should persist in the expected state. The test expectations are accurate.

go-controller/pkg/ovn/egressip.go (4)

792-866: LGTM: Stale EIP status detection and cleanup logic correctly addresses the failover race.

The implementation properly handles the race condition described in the PR:

  1. Lines 828-837 detect both pending entries (seeded during sync) and stale entries (same EgressIP reassigned to a different node)
  2. Lines 859-866 clean up stale SNAT/LRP entries before adding new assignments
  3. Proper ordering ensures stale cleanup happens before the new assignment loop (lines 885-912)
  4. Error handling logs warnings without blocking the reconciliation

This ensures that when a pod add event is processed with a stale EIP status in the informer cache, the old entries pointing to the previous node are removed before creating new ones pointing to the current node.


867-884: LGTM: Zone locality check optimizes multi-zone processing.

The logic correctly determines whether any work needs to be done in the local zone by checking if either:

  1. Any status node (egress node) is local to this zone, or
  2. The pod itself is scheduled in the local zone

Proper node locking ensures thread-safe locality checks, and the early return avoids unnecessary processing when neither condition is met.


1610-1616: LGTM: Sync seeding with pending entries enables proper reconciliation.

This is the core mechanism for handling the race condition:

During syncPodAssignmentCache, the code seeds podState.egressStatuses with per-IP assignments marked as egressStatusStatePending. When subsequent pod add events are processed (lines 828-830), these pending entries are detected and reconciled, ensuring:

  1. Sync entries are validated against actual pod events
  2. Stale entries from before the sync are detected and cleaned up
  3. The cache eventually reflects the current EgressIP assignments

This design elegantly handles the scenario where sync uses an old EIP status while concurrent pod events use a newer status.


1999-2001: LGTM: Cache tracking infrastructure properly supports stale detection.

The additions provide the necessary infrastructure for detecting and cleaning up stale entries:

  1. egressIPToAssignedNodes (lines 1999-2001, 2015, 2023): Tracks per-EIP per-IP node assignments, enabling cache seeding during sync
  2. egressStatusStatePending constant (lines 2280-2282): Well-documented marker for entries needing reconciliation
  3. hasStaleEIPStatus method (lines 2303-2316): Correctly identifies entries with the same EgressIP but different node, indicating a failover occurred; returns a safe new allocation

The updated documentation (lines 2287-2292) clearly explains the state semantics, making the code maintainable.

Also applies to: 2015-2015, 2023-2023, 2280-2282, 2303-2316

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ast-grep (0.39.6)
go-controller/pkg/ovn/egressip_test.go

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added feature/egress-ip Issues related to EgressIP feature area/unit-testing Issues related to adding/updating unit tests labels Sep 30, 2025

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
go-controller/pkg/ovn/egressip_test.go (3)

11307-11319: Strengthen the “restart + failover” simulation by asserting pre-state before reconcile

Right now we jump straight to reconcileEgressIP(nil, &eIP). To prove we’re actually transitioning from “assigned to node1” to “assigned to node2”, consider first asserting the pre-condition DB state (SNAT on node1, LRP nexthop to 100.64.0.2) before flipping status. That makes the test deterministic and validates both removal and reprogramming paths. You can keep the nil oldObj to model restart, but add a short pre-state HaveData assertion before Line 11307.


11353-11359: Avoid magic literals for nexthops; use the existing constant

Use node2LogicalRouterIPv4 instead of a hardcoded "100.64.0.3" to avoid drift if join IPs change.

-                        Nexthops:    []string{"100.64.0.3"},
+                        Nexthops:    node2LogicalRouterIPv4,

11376-11386: Prefer getEIPSNAT helper for NAT construction

For consistency with nearby tests and to cut duplication, build the NAT via getEIPSNAT and only adjust UUID/logical port as needed.

-                    &nbdb.NAT{
-                        UUID:        "egressip-nat-UUID2",
-                        LogicalIP:   podV4IP,
-                        ExternalIP:  egressIP,
-                        ExternalIDs: getEgressIPNATDbIDs(egressIPName, egressPod.Namespace, egressPod.Name, IPFamilyValueV4, fakeOvn.controller.controllerName).GetExternalIDs(),
-                        Type:        nbdb.NATTypeSNAT,
-                        LogicalPort: &expectedNatLogicalPort2,
-                        Options: map[string]string{
-                            "stateless": "false",
-                        },
-                    },
+                    func() *nbdb.NAT {
+                        n := getEIPSNAT(podV4IP, egressPod.Namespace, egressPod.Name, egressIP, expectedNatLogicalPort2, DefaultNetworkControllerName)
+                        n.UUID = "egressip-nat-UUID2"
+                        return n
+                    }(),
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b20fb84 and 40cf6f6.

📒 Files selected for processing (1)
  • go-controller/pkg/ovn/egressip_test.go (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-19T10:19:13.298Z
Learnt from: tssurya
PR: ovn-kubernetes/ovn-kubernetes#5276
File: go-controller/pkg/node/bridgeconfig/bridgeflows.go:817-827
Timestamp: 2025-08-19T10:19:13.298Z
Learning: In ovn-kubernetes go-controller/pkg/node/bridgeconfig/bridgeflows.go, MEG (Multiple External Gateways, controlled by disableSNATMultipleGWs) and EgressIP are independent features that should not be coupled in flow logic. The priority 104 flow condition should use "disableSNATMultipleGWs || isNetworkAdvertised" instead of "(disableSNATMultipleGWs && config.OVNKubernetesFeature.EnableEgressIP) || isNetworkAdvertised" to allow MEG to function independently of EgressIP enablement.

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-09-29T13:52:53.191Z
Learnt from: trozet
PR: ovn-kubernetes/ovn-kubernetes#5587
File: go-controller/pkg/networkmanager/egressip_tracker.go:396-443
Timestamp: 2025-09-29T13:52:53.191Z
Learning: In ovn-kubernetes go-controller/pkg/networkmanager/egressip_tracker.go, the onNetworkRefChange callback must remain synchronous because it relies on ordering of events with the Active signal. Using goroutines would create race conditions where DELETE events could be processed before ADD events, breaking the ordering guarantees that downstream components depend on.

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-09-03T09:38:27.723Z
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5555
File: go-controller/pkg/ovn/egressip_test.go:11335-11351
Timestamp: 2025-09-03T09:38:27.723Z
Learning: In ovn-kubernetes Go tests (e.g., go-controller/pkg/ovn/egressip_test.go), any goroutine that uses Gomega assertions should call defer ginkgo.GinkgoRecover() at the top so assertion panics are captured by Ginkgo.

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
🧬 Code graph analysis (1)
go-controller/pkg/ovn/egressip_test.go (10)
go-controller/pkg/config/config.go (1)
  • Gateway (147-164)
go-controller/pkg/util/node_annotations.go (1)
  • OVNNodeHostCIDRs (98-98)
go-controller/pkg/testing/libovsdb/libovsdb.go (2)
  • TestSetup (41-50)
  • TestData (52-52)
go-controller/pkg/types/const.go (10)
  • OVNClusterRouter (43-43)
  • GWRouterPrefix (48-48)
  • GWRouterToJoinSwitchPrefix (55-55)
  • EXTSwitchToGWRouterPrefix (58-58)
  • GWRouterToExtSwitchPrefix (59-59)
  • ExternalSwitchPrefix (47-47)
  • DefaultNetworkName (7-7)
  • DefaultNoRereoutePriority (120-120)
  • EgressIPNodeConnectionMark (144-144)
  • EgressIPReroutePriority (122-122)
go-controller/pkg/nbdb/logical_router_port.go (1)
  • LogicalRouterPort (11-26)
go-controller/pkg/nbdb/logical_switch_port.go (1)
  • LogicalSwitchPort (11-30)
go-controller/pkg/libovsdb/ops/options.go (1)
  • RouterPort (14-14)
go-controller/vendor/github.com/containernetworking/cni/pkg/types/types.go (2)
  • ParseCIDR (30-38)
  • IPNet (26-26)
go-controller/pkg/nbdb/logical_router_policy.go (3)
  • LogicalRouterPolicy (22-34)
  • LogicalRouterPolicyActionAllow (15-15)
  • LogicalRouterPolicyActionReroute (17-17)
go-controller/pkg/nbdb/nat.go (2)
  • NAT (21-36)
  • NATTypeSNAT (16-16)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build-PR
  • GitHub Check: Lint

@pperiyasamy pperiyasamy force-pushed the eip_failover_ovnkubenode_restart branch from 40cf6f6 to faea77f Compare October 1, 2025 09:01

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
go-controller/pkg/ovn/egressip_test.go (3)

11307-11319: Simulating restart via reconcile(new) only can diverge from real sequence; patch status and pass old/new for robustness.

Right now you mutate a local eIP copy and call reconcileEgressIP(nil, &eIP). The informer store still holds the old status (node1), so the watcher and your direct reconcile may race and produce non‑deterministic transitions. To better mirror a restart+failover and reduce flakes: fetch the stored object as old, update status in the fake client (patchReplaceEgressIPStatus), then call reconcileEgressIP(old, new).

Example:

- eIP.Status = egressipv1.EgressIPStatus{ Items: []egressipv1.EgressIPStatusItem{{Node: node2.Name, EgressIP: egressIP}} }
- fakeOvn.controller.eIPC.nodeName = node1Name
- err = fakeOvn.controller.eIPC.reconcileEgressIP(nil, &eIP)
+ oldEIP, err := fakeOvn.fakeClient.EgressIPClient.K8sV1().EgressIPs().Get(context.TODO(), eIP.Name, metav1.GetOptions{})
+ gomega.Expect(err).NotTo(gomega.HaveOccurred())
+ newEIP := oldEIP.DeepCopy()
+ newEIP.Status = egressipv1.EgressIPStatus{Items: []egressipv1.EgressIPStatusItem{{Node: node2.Name, EgressIP: egressIP}}}
+ gomega.Expect(fakeOvn.controller.eIPC.patchReplaceEgressIPStatus(newEIP.Name, newEIP.Status.Items)).To(gomega.Succeed())
+ fakeOvn.controller.eIPC.nodeName = node1Name
+ err = fakeOvn.controller.eIPC.reconcileEgressIP(oldEIP, newEIP)

This keeps the fake client, queue, and reconcile paths aligned and makes the test intent clearer.


11357-11360: Prefer the existing constant for join-switch nexthop.

Use node2LogicalRouterIPv4 rather than a string literal "100.64.0.3" to avoid drift if constants change.

- Nexthops:    []string{"100.64.0.3"},
+ Nexthops:    node2LogicalRouterIPv4,

11444-11444: Add inspectTimeout to Eventually to curb CI flakiness.

Other tests use inspectTimeout; apply it here too.

- gomega.Eventually(fakeOvn.nbClient).Should(libovsdbtest.HaveData(expectedDatabaseState))
+ gomega.Eventually(fakeOvn.nbClient, inspectTimeout).Should(libovsdbtest.HaveData(expectedDatabaseState))
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 40cf6f6 and faea77f.

📒 Files selected for processing (5)
  • go-controller/cmd/ovnkube/ovnkube.go (1 hunks)
  • go-controller/pkg/controllermanager/controller_manager.go (4 hunks)
  • go-controller/pkg/ovn/egressip.go (4 hunks)
  • go-controller/pkg/ovn/egressip_test.go (1 hunks)
  • go-controller/pkg/ovn/ovn_test.go (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-29T13:52:53.191Z
Learnt from: trozet
PR: ovn-kubernetes/ovn-kubernetes#5587
File: go-controller/pkg/networkmanager/egressip_tracker.go:396-443
Timestamp: 2025-09-29T13:52:53.191Z
Learning: In ovn-kubernetes go-controller/pkg/networkmanager/egressip_tracker.go, the onNetworkRefChange callback must remain synchronous because it relies on ordering of events with the Active signal. Using goroutines would create race conditions where DELETE events could be processed before ADD events, breaking the ordering guarantees that downstream components depend on.

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-09-03T09:38:27.723Z
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5555
File: go-controller/pkg/ovn/egressip_test.go:11335-11351
Timestamp: 2025-09-03T09:38:27.723Z
Learning: In ovn-kubernetes Go tests (e.g., go-controller/pkg/ovn/egressip_test.go), any goroutine that uses Gomega assertions should call defer ginkgo.GinkgoRecover() at the top so assertion panics are captured by Ginkgo.

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
🧬 Code graph analysis (2)
go-controller/pkg/controllermanager/controller_manager.go (2)
go-controller/pkg/ovn/address_set/address_set.go (1)
  • NewOvnAddressSetFactory (84-90)
go-controller/pkg/ovn/default_network_controller.go (1)
  • DefaultNetworkControllerName (49-49)
go-controller/pkg/ovn/egressip_test.go (9)
go-controller/pkg/config/config.go (1)
  • Gateway (147-164)
go-controller/pkg/util/node_annotations.go (1)
  • OVNNodeHostCIDRs (98-98)
go-controller/pkg/testing/libovsdb/libovsdb.go (2)
  • TestSetup (41-50)
  • TestData (52-52)
go-controller/pkg/types/const.go (10)
  • OVNClusterRouter (43-43)
  • GWRouterPrefix (48-48)
  • GWRouterToJoinSwitchPrefix (55-55)
  • EXTSwitchToGWRouterPrefix (58-58)
  • GWRouterToExtSwitchPrefix (59-59)
  • ExternalSwitchPrefix (47-47)
  • DefaultNetworkName (7-7)
  • DefaultNoRereoutePriority (120-120)
  • EgressIPNodeConnectionMark (144-144)
  • EgressIPReroutePriority (122-122)
go-controller/pkg/nbdb/logical_router_port.go (1)
  • LogicalRouterPort (11-26)
go-controller/pkg/nbdb/logical_switch_port.go (1)
  • LogicalSwitchPort (11-30)
go-controller/vendor/github.com/containernetworking/cni/pkg/types/types.go (2)
  • ParseCIDR (30-38)
  • IPNet (26-26)
go-controller/pkg/nbdb/logical_router_policy.go (3)
  • LogicalRouterPolicy (22-34)
  • LogicalRouterPolicyActionAllow (15-15)
  • LogicalRouterPolicyActionReroute (17-17)
go-controller/pkg/nbdb/nat.go (2)
  • NAT (21-36)
  • NATTypeSNAT (16-16)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Lint
  • GitHub Check: Build-PR
🔇 Additional comments (2)
go-controller/pkg/ovn/ovn_test.go (1)

233-246: Constructor wiring stays backward-compatible. Empty nodeName preserves the FakeOVN controller semantics while satisfying the updated signature.

go-controller/cmd/ovnkube/ovnkube.go (1)

510-518: Correctly plumbs node identity into the manager. Forwarding runMode.identity keeps the controller manager aligned with the process identity already used for leader election.

Comment thread go-controller/pkg/ovn/egressip.go Outdated
@pperiyasamy pperiyasamy force-pushed the eip_failover_ovnkubenode_restart branch from faea77f to 73ece4e Compare October 6, 2025 11:48

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
go-controller/pkg/ovn/egressip_test.go (1)

1148-11428: Test may flake: reconcile called with old=nil while watchers are running

Watchers first process the EIP in status=node1 (creating NAT/LRP on node1). Then you call reconcileEgressIP(nil, &eIP) after changing status to node2. Without an “old” object, reconcile may add node2 config but not remove node1’s, leaving stale NAT/LRP. This can make the test racy/ordering-dependent.

Recommend one of:

  • Simulate a real UPDATE: pass old and new to reconcileEgressIP so removal/addition are both handled deterministically; or
  • Patch the status in the fake client (patchReplaceEgressIPStatus) and let the EIP watcher drive the update; or
  • Don’t start the EIP watcher before calling reconcile to avoid concurrent adds, or add a barrier asserting the initial node1 config exists before triggering the failover, then assert it’s replaced.

Proposed minimal change (simulate UPDATE with old/new):

-                // To simulate an ovnkube-controller restart, update the EIP object with the newly assigned node.
-                // Then invoke reconcileEgressIP using only the updated EIP object to trigger the EgressIP add event.
-                eIP.Status = egressipv1.EgressIPStatus{
+                // Simulate controller restart + failover by invoking reconcile as an UPDATE
+                eIPOld := eIP.DeepCopy()
+                eIP.Status = egressipv1.EgressIPStatus{
                     Items: []egressipv1.EgressIPStatusItem{
                         {
                             Node:     node2.Name,
                             EgressIP: egressIP,
                         },
                     },
                 }
-                err = fakeOvn.controller.eIPC.reconcileEgressIP(nil, &eIP)
+                err = fakeOvn.controller.eIPC.reconcileEgressIP(eIPOld, &eIP)
                 gomega.Expect(err).NotTo(gomega.HaveOccurred())

This avoids leaving stale node1 NAT/LRP due to missing diff context and reduces non-determinism from the concurrently running watchers.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between faea77f and 73ece4e.

📒 Files selected for processing (2)
  • go-controller/pkg/ovn/egressip.go (6 hunks)
  • go-controller/pkg/ovn/egressip_test.go (5 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-29T13:52:53.205Z
Learnt from: trozet
PR: ovn-kubernetes/ovn-kubernetes#5587
File: go-controller/pkg/networkmanager/egressip_tracker.go:396-443
Timestamp: 2025-09-29T13:52:53.205Z
Learning: In ovn-kubernetes go-controller/pkg/networkmanager/egressip_tracker.go, the onNetworkRefChange callback must remain synchronous because it relies on ordering of events with the Active signal. Using goroutines would create race conditions where DELETE events could be processed before ADD events, breaking the ordering guarantees that downstream components depend on.

Applied to files:

  • go-controller/pkg/ovn/egressip.go
  • go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-09-03T09:38:27.723Z
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5555
File: go-controller/pkg/ovn/egressip_test.go:11335-11351
Timestamp: 2025-09-03T09:38:27.723Z
Learning: In ovn-kubernetes Go tests (e.g., go-controller/pkg/ovn/egressip_test.go), any goroutine that uses Gomega assertions should call defer ginkgo.GinkgoRecover() at the top so assertion panics are captured by Ginkgo.

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
🧬 Code graph analysis (1)
go-controller/pkg/ovn/egressip_test.go (9)
go-controller/pkg/config/config.go (1)
  • Gateway (147-164)
go-controller/pkg/util/node_annotations.go (1)
  • OVNNodeHostCIDRs (98-98)
go-controller/pkg/testing/libovsdb/libovsdb.go (2)
  • TestSetup (41-50)
  • TestData (52-52)
go-controller/pkg/types/const.go (10)
  • OVNClusterRouter (43-43)
  • GWRouterPrefix (48-48)
  • GWRouterToJoinSwitchPrefix (55-55)
  • EXTSwitchToGWRouterPrefix (58-58)
  • GWRouterToExtSwitchPrefix (59-59)
  • ExternalSwitchPrefix (47-47)
  • DefaultNetworkName (7-7)
  • DefaultNoRereoutePriority (120-120)
  • EgressIPNodeConnectionMark (144-144)
  • EgressIPReroutePriority (122-122)
go-controller/pkg/nbdb/logical_router_port.go (1)
  • LogicalRouterPort (11-26)
go-controller/pkg/nbdb/logical_switch_port.go (1)
  • LogicalSwitchPort (11-30)
go-controller/vendor/github.com/containernetworking/cni/pkg/types/types.go (2)
  • ParseCIDR (30-38)
  • IPNet (26-26)
go-controller/pkg/nbdb/logical_router_policy.go (3)
  • LogicalRouterPolicy (22-34)
  • LogicalRouterPolicyActionAllow (15-15)
  • LogicalRouterPolicyActionReroute (17-17)
go-controller/pkg/nbdb/nat.go (2)
  • NAT (21-36)
  • NATTypeSNAT (16-16)
🔇 Additional comments (8)
go-controller/pkg/ovn/egressip_test.go (3)

7976-7976: OK: status map size updated

Len(3) expectation aligns with the new sync-populated statuses.


7989-7991: OK: status map values set to "sync"

Using explicit "sync" values matches the revised initialization behavior.


8016-8018: OK: reassert "sync" state

Consistent with the updated cache-populate semantics.

go-controller/pkg/ovn/egressip.go (5)

1162-1163: LGTM! Well-structured per-EIP node tracking.

The new egressIPToAssignedNodes field provides a nested map structure (EgressIP name → EgressIP IP → node name) that complements the existing flat egressIPIPToNodeCache. This enables efficient lookup of node assignments per EgressIP during cache synchronization.


846-846: LGTM! Sync marker mechanism enables proper restart reconciliation.

The modified condition !exists || value == "sync" correctly implements the sync marker pattern:

  • Statuses marked as "sync" during cache rebuild (line 1606) are re-processed
  • Statuses with "" (empty string, set at lines 882/890) are skipped as already processed
  • This ensures controller restarts can reconcile existing database state with the cache

925-927: LGTM! Defensive cleanup of stale status entries.

This guard correctly prunes stale status entries from pods that are no longer managed by this EgressIP object. The logic is safe because:

  • It only runs when podStatus.egressIPName != name (pod managed by different EIP)
  • The contains() check prevents errors if status doesn't exist
  • Executes under podAssignment lock (line 919)

This cleanup handles edge cases where a pod's EIP assignment changes but stale cache entries remain.


1603-1607: LGTM! Core sync mechanism correctly seeds pod status cache.

This code seeds podState.egressStatuses with "sync" values for each discovered EgressIP assignment during cache rebuild. The flow is:

  1. During sync: discovered statuses are marked with "sync" (here)
  2. During add: statuses with "sync" are re-processed (line 846 check)
  3. After processing: "sync" is replaced with "" (lines 882, 890)

This ensures proper reconciliation after controller restarts by tracking which database entries need re-processing.


1966-1968: LGTM! Proper initialization and population of node assignment map.

The egressIPToAssignedNodes map is correctly initialized and populated:

  • Line 1967-1968: Declare and initialize in cache structure
  • Line 1982: Create inner map per EgressIP name
  • Line 1990: Populate with EgressIP IP → node mappings from Status.Items

This data structure is then used in syncPodAssignmentCache (lines 1604-1607) to seed pod status with "sync" markers during cache rebuilds.

Also applies to: 1982-1982, 1990-1990

@pperiyasamy pperiyasamy force-pushed the eip_failover_ovnkubenode_restart branch from 73ece4e to 150b874 Compare October 6, 2025 16:00

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
go-controller/pkg/ovn/egressip.go (3)

1171-1172: Document the purpose of the new egressIPToAssignedNodes field.

Add a comment explaining that this map tracks per-EIP per-IP node assignments (e.g., egressIP name -> egressIP IP -> assigned node name) to support cache seeding during controller restart and EIP failover scenarios.

+	// egressIP name -> egress IP -> assigned node name
+	// Used during cache rebuild to seed pod assignment state with current EIP-to-node mappings
 	egressIPToAssignedNodes map[string]map[string]string

1612-1616: Clarify the purpose of the "sync" marker in comments.

The code populates podState.egressStatuses with "sync" values during cache rebuild, but the purpose and lifecycle of this marker could be clearer. Consider adding a comment explaining that "sync" indicates an entry found during cache rebuild that needs reconciliation, and will be replaced with "" once setup completes.

+	// Populate podState.egressStatuses with assigned nodes for each egressIP IP.
+	// Mark entries as "sync" to indicate they need reconciliation (will be cleared to "" once setup completes).
 	for egressIPIP, nodeName := range egressIPCache.egressIPToAssignedNodes[egressIPName] {
 		podState.egressStatuses.statusMap[egressipv1.EgressIPStatusItem{
 			EgressIP: egressIPIP, Node: nodeName}] = "sync"
 	}

2270-2280: Consider optimizing hasStaleEIPStatus or adding documentation.

The function iterates through all entries in statusMap to find a stale entry. While this is acceptable for small maps, consider:

  1. Document the expected behavior when multiple stale entries exist (currently returns the first one found due to break)
  2. Consider whether the function should return all stale entries or just the first one

Add documentation:

+// hasStaleEIPStatus checks if there's an existing status entry with the same EgressIP
+// but a different Node than potentialStatus. This indicates a failover scenario where
+// the EgressIP moved from one node to another. Returns the first stale entry found, or nil.
 func (e egressStatuses) hasStaleEIPStatus(potentialStatus egressipv1.EgressIPStatusItem) *egressipv1.EgressIPStatusItem {
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 73ece4e and 150b874.

📒 Files selected for processing (2)
  • go-controller/pkg/ovn/egressip.go (7 hunks)
  • go-controller/pkg/ovn/egressip_test.go (5 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-29T13:52:53.205Z
Learnt from: trozet
PR: ovn-kubernetes/ovn-kubernetes#5587
File: go-controller/pkg/networkmanager/egressip_tracker.go:396-443
Timestamp: 2025-09-29T13:52:53.205Z
Learning: In ovn-kubernetes go-controller/pkg/networkmanager/egressip_tracker.go, the onNetworkRefChange callback must remain synchronous because it relies on ordering of events with the Active signal. Using goroutines would create race conditions where DELETE events could be processed before ADD events, breaking the ordering guarantees that downstream components depend on.

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
  • go-controller/pkg/ovn/egressip.go
📚 Learning: 2025-09-03T09:38:27.723Z
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5555
File: go-controller/pkg/ovn/egressip_test.go:11335-11351
Timestamp: 2025-09-03T09:38:27.723Z
Learning: In ovn-kubernetes Go tests (e.g., go-controller/pkg/ovn/egressip_test.go), any goroutine that uses Gomega assertions should call defer ginkgo.GinkgoRecover() at the top so assertion panics are captured by Ginkgo.

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
🧬 Code graph analysis (1)
go-controller/pkg/ovn/egressip_test.go (8)
go-controller/pkg/config/config.go (1)
  • Gateway (147-164)
go-controller/pkg/testing/libovsdb/libovsdb.go (2)
  • TestSetup (41-50)
  • TestData (52-52)
go-controller/pkg/types/const.go (10)
  • OVNClusterRouter (43-43)
  • GWRouterPrefix (48-48)
  • GWRouterToJoinSwitchPrefix (55-55)
  • EXTSwitchToGWRouterPrefix (58-58)
  • GWRouterToExtSwitchPrefix (59-59)
  • ExternalSwitchPrefix (47-47)
  • DefaultNetworkName (7-7)
  • DefaultNoRereoutePriority (120-120)
  • EgressIPNodeConnectionMark (144-144)
  • EgressIPReroutePriority (122-122)
go-controller/pkg/nbdb/logical_router_port.go (1)
  • LogicalRouterPort (11-26)
go-controller/pkg/nbdb/logical_switch_port.go (1)
  • LogicalSwitchPort (11-30)
go-controller/vendor/github.com/containernetworking/cni/pkg/types/types.go (2)
  • ParseCIDR (30-38)
  • IPNet (26-26)
go-controller/pkg/nbdb/logical_router_policy.go (3)
  • LogicalRouterPolicy (22-34)
  • LogicalRouterPolicyActionAllow (15-15)
  • LogicalRouterPolicyActionReroute (17-17)
go-controller/pkg/nbdb/nat.go (2)
  • NAT (21-36)
  • NATTypeSNAT (16-16)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build-PR
  • GitHub Check: Lint
🔇 Additional comments (2)
go-controller/pkg/ovn/egressip.go (2)

846-858: Verify the stale detection logic handles all edge cases correctly.

The code checks for stale EIP status and deletes it inline during the loop. Consider these edge cases:

  1. What if deletePodEgressIPAssignments fails? The stale entry is deleted from the map (line 856) even if the OVN cleanup fails.
  2. What if multiple new statuses in statusAssignments correspond to the same EgressIP but different nodes? The first iteration will clean up the stale entry, but subsequent iterations won't find it.

The current implementation appears correct for case 2 (subsequent iterations won't find stale entries after first cleanup). However, for case 1, consider whether the delete at line 856 should only occur if the cleanup succeeds.

Review the error handling:

if staleStatus := podState.egressStatuses.hasStaleEIPStatus(status); staleStatus != nil {
    err = e.deletePodEgressIPAssignments(ni, name, []egressipv1.EgressIPStatusItem{*staleStatus}, pod)
    if err != nil {
        return fmt.Errorf("failed to delete stale EgressIP %s, status %v for pod %s", name, podKey, *staleStatus)
    }
    delete(podState.egressStatuses.statusMap, *staleStatus)
}

If deletePodEgressIPAssignments returns an error, we return immediately without deleting from the map, which is correct. So the current logic appears safe.


934-937: Verify the cleanup logic for standby EgressIP references.

When a pod is managed by a different EgressIP object, the code removes any stale reference to the current EgressIP from podStatus.egressStatuses. This makes sense for cleanup. However, verify that this doesn't cause issues if the pod later switches to be managed by this EgressIP (e.g., if the current managing EgressIP is deleted).

The cleanup appears correct: if a pod is managed by EgressIP-A, and we're deleting EgressIP-B's assignments, we remove any stale references to EgressIP-B from the pod's state. If EgressIP-A is later deleted and EgressIP-B becomes active, the normal add flow will re-establish the assignments.

Comment thread go-controller/pkg/ovn/egressip_test.go Outdated
Comment on lines +11261 to +11360
},
&nbdb.LogicalSwitch{
UUID: types.ExternalSwitchPrefix + node2Name + "-UUID",
Name: types.ExternalSwitchPrefix + node2Name,
Ports: []string{types.EXTSwitchToGWRouterPrefix + types.GWRouterPrefix + node2Name + "-UUID"},
},
node1Switch,
node2Switch,
},
},
&egressipv1.EgressIPList{
Items: []egressipv1.EgressIP{eIP},
},
&corev1.NodeList{
Items: []corev1.Node{node1, node2},
},
&corev1.NamespaceList{
Items: []corev1.Namespace{*egressNamespace},
},
&corev1.PodList{
Items: []corev1.Pod{egressPod},
},
)

i, n, _ := net.ParseCIDR(podV4IP + "/23")
n.IP = i
fakeOvn.controller.logicalPortCache.add(&egressPod, "", types.DefaultNetworkName, "", nil, []*net.IPNet{n})

err := fakeOvn.controller.WatchEgressIPNamespaces()
gomega.Expect(err).NotTo(gomega.HaveOccurred())
err = fakeOvn.controller.WatchEgressIPPods()
gomega.Expect(err).NotTo(gomega.HaveOccurred())
err = fakeOvn.controller.WatchEgressNodes()
gomega.Expect(err).NotTo(gomega.HaveOccurred())
err = fakeOvn.controller.WatchEgressIP()
gomega.Expect(err).NotTo(gomega.HaveOccurred())

// To simulate an ovnkube-controller restart, update the EIP object with the newly assigned node.
// Then invoke reconcileEgressIP using only the updated EIP object to trigger the EgressIP add event.
eIP.Status = egressipv1.EgressIPStatus{
Items: []egressipv1.EgressIPStatusItem{
{
Node: node2.Name,
EgressIP: egressIP,
},
},
}
err = fakeOvn.controller.eIPC.reconcileEgressIP(nil, &eIP)
gomega.Expect(err).NotTo(gomega.HaveOccurred())

egressSVCServedPodsASv4, _ := buildEgressServiceAddressSets(nil)
egressIPServedPodsASv4, _ := buildEgressIPServedPodsAddressSets([]string{podV4IP}, types.DefaultNetworkName, fakeOvn.controller.eIPC.controllerName)
egressNodeIPsASv4, _ := buildEgressIPNodeAddressSets([]string{node1IPv4, node2IPv4})

node1Switch.QOSRules = []string{"default-QoS-UUID"}
node2Switch.QOSRules = []string{"default-QoS-UUID"}
expectedNatLogicalPort2 := "k8s-node2"
expectedDatabaseState := []libovsdbtest.TestData{
&nbdb.LogicalRouterPolicy{
Priority: types.DefaultNoRereoutePriority,
Match: fmt.Sprintf("(ip4.src == $%s || ip4.src == $%s) && ip4.dst == $%s",
egressIPServedPodsASv4.Name, egressSVCServedPodsASv4.Name, egressNodeIPsASv4.Name),
Action: nbdb.LogicalRouterPolicyActionAllow,
UUID: "default-no-reroute-node-UUID",
Options: map[string]string{"pkt_mark": types.EgressIPNodeConnectionMark},
ExternalIDs: getEgressIPLRPNoReRoutePodToNodeDbIDs(IPFamilyValueV4, types.DefaultNetworkName, fakeOvn.controller.eIPC.controllerName).GetExternalIDs(),
},
getNoReRouteReplyTrafficPolicy(types.DefaultNetworkName, fakeOvn.controller.eIPC.controllerName),
&nbdb.LogicalRouterPolicy{
Priority: types.DefaultNoRereoutePriority,
Match: "ip4.src == 10.128.0.0/14 && ip4.dst == 10.128.0.0/14",
Action: nbdb.LogicalRouterPolicyActionAllow,
UUID: "no-reroute-UUID",
ExternalIDs: getEgressIPLRPNoReRoutePodToPodDbIDs(IPFamilyValueV4, types.DefaultNetworkName, fakeOvn.controller.eIPC.controllerName).GetExternalIDs(),
},
&nbdb.LogicalRouterPolicy{
Priority: types.DefaultNoRereoutePriority,
Match: fmt.Sprintf("ip4.src == 10.128.0.0/14 && ip4.dst == %s", config.Gateway.V4JoinSubnet),
Action: nbdb.LogicalRouterPolicyActionAllow,
UUID: "no-reroute-service-UUID",
ExternalIDs: getEgressIPLRPNoReRoutePodToJoinDbIDs(IPFamilyValueV4, types.DefaultNetworkName, fakeOvn.controller.eIPC.controllerName).GetExternalIDs(),
},
&nbdb.LogicalRouterPolicy{
Priority: types.EgressIPReroutePriority,
Match: fmt.Sprintf("ip4.src == %s", egressPod.Status.PodIP),
Action: nbdb.LogicalRouterPolicyActionReroute,
Nexthops: []string{"100.64.0.3"},
ExternalIDs: getEgressIPLRPReRouteDbIDs(eIP.Name, egressPod.Namespace, egressPod.Name, IPFamilyValueV4, types.DefaultNetworkName, fakeOvn.controller.eIPC.controllerName).GetExternalIDs(),
UUID: "reroute-UUID1",
},
&nbdb.LogicalRouter{
Name: types.OVNClusterRouter,
UUID: types.OVNClusterRouter + "-UUID",
Policies: []string{"no-reroute-UUID", "no-reroute-service-UUID", "default-no-reroute-node-UUID", "default-no-reroute-reply-traffic", "reroute-UUID1"},
},
&nbdb.LogicalRouter{
Name: types.GWRouterPrefix + node1.Name,
UUID: types.GWRouterPrefix + node1.Name + "-UUID",
Ports: []string{types.GWRouterToJoinSwitchPrefix + types.GWRouterPrefix + node1.Name + "-UUID"},
Nat: []string{"pod-node-nat-UUID1"},

@coderabbitai coderabbitai Bot Oct 6, 2025

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Goroutines need GinkgoRecover

Each goroutine calling Gomega assertions must defer ginkgo.GinkgoRecover() to surface panics correctly. Please add this at the top of both goroutines. Based on learnings

+				go func() {
+					defer ginkgo.GinkgoRecover()
 					defer ginkgo.GinkgoRecover()
 					defer wg.Done()
 					i, n, _ := net.ParseCIDR(podV4IP + "/23")
 					n.IP = i
 					fakeOvn.controller.logicalPortCache.add(&egressPod, "", types.DefaultNetworkName, "", nil, []*net.IPNet{n})
 					_, err := fakeOvn.fakeClient.KubeClient.CoreV1().Pods(egressPod.Namespace).Create(context.TODO(), &egressPod, metav1.CreateOptions{})
 					gomega.Expect(err).NotTo(gomega.HaveOccurred())
 				}()
 				go func() {
+					defer ginkgo.GinkgoRecover()
 					defer ginkgo.GinkgoRecover()
 					defer wg.Done()
 					i, n, _ := net.ParseCIDR(podV4IP2 + "/23")
 					n.IP = i
 					fakeOvn.controller.logicalPortCache.add(&egressPod2, "", types.DefaultNetworkName, "", nil, []*net.IPNet{n})

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
go-controller/pkg/ovn/egressip_test.go around lines 11261-11360: the test spawns
goroutines that use Gomega assertions but do not recover panics, so add defer
ginkgo.GinkgoRecover() as the first statement in each goroutine function (i.e.,
at the top of both goroutine bodies) to ensure panics inside the goroutines are
reported to Ginkgo; no other behavioral changes are needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this suggestion is not for changes in the PR.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pperiyasamy You’re right — that suggestion isn’t applicable to the changes in this PR. The new test here doesn’t introduce goroutines, so no GinkgoRecover update is needed. I’ll limit this check to diffs that add or modify goroutines going forward.


🧠 Learnings used
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5555
File: go-controller/pkg/ovn/egressip_test.go:11335-11351
Timestamp: 2025-09-03T09:38:27.723Z
Learning: In ovn-kubernetes Go tests (e.g., go-controller/pkg/ovn/egressip_test.go), any goroutine that uses Gomega assertions should call defer ginkgo.GinkgoRecover() at the top so assertion panics are captured by Ginkgo.

@pperiyasamy pperiyasamy changed the title [DNM] Add test for simultaneous EIP failover and ovnkube-controller restart Fix stale EIP node assignments during simultaneous failover Oct 6, 2025
@pperiyasamy pperiyasamy force-pushed the eip_failover_ovnkubenode_restart branch from 150b874 to a69e7ce Compare October 7, 2025 07:32

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
go-controller/pkg/ovn/egressip_test.go (1)

11623-11696: Fix the invalid loop syntax (for range 10)

for range 10 is not valid Go syntax, so the test won't compile. Use a standard counted loop instead.

-				for range 10 {
+				for i := 0; i < 10; i++ {
🧹 Nitpick comments (1)
go-controller/pkg/ovn/egressip.go (1)

846-853: Consider deduplicating stale assignments to avoid redundant operations.

The current logic calls hasStaleEIPStatus for each new status in the loop. If multiple new status items exist for the same EgressIP name (e.g., after a multi-IP failover), the same stale entry could be added to staleAssignments multiple times.

While the subsequent deletions are idempotent (lines 875-882), this causes unnecessary duplicate delete operations and log messages.

Consider deduplicating staleAssignments before processing, or tracking already-detected stale entries within the loop:

 	for _, status := range statusAssignments {
 		if value, exists := podState.egressStatuses.statusMap[status]; !exists || value == egressStatusStateSynced {
 			remainingAssignments = append(remainingAssignments, status)
 		}
 		// Detect stale EIP status entries (same EgressIP reassigned to a different node)
 		// and queue the outdated entry for cleanup.
 		if staleStatus := podState.egressStatuses.hasStaleEIPStatus(status); staleStatus != nil {
-			staleAssignments = append(staleAssignments, *staleStatus)
+			// Avoid duplicates by checking if already queued
+			isDuplicate := false
+			for _, existing := range staleAssignments {
+				if existing.EgressIP == staleStatus.EgressIP && existing.Node == staleStatus.Node {
+					isDuplicate = true
+					break
+				}
+			}
+			if !isDuplicate {
+				staleAssignments = append(staleAssignments, *staleStatus)
+			}
 		}
 	}
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 150b874 and a69e7ce.

📒 Files selected for processing (2)
  • go-controller/pkg/ovn/egressip.go (10 hunks)
  • go-controller/pkg/ovn/egressip_test.go (5 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-03T09:38:27.723Z
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5555
File: go-controller/pkg/ovn/egressip_test.go:11335-11351
Timestamp: 2025-09-03T09:38:27.723Z
Learning: In ovn-kubernetes Go tests (e.g., go-controller/pkg/ovn/egressip_test.go), any goroutine that uses Gomega assertions should call defer ginkgo.GinkgoRecover() at the top so assertion panics are captured by Ginkgo.

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-09-29T13:52:53.205Z
Learnt from: trozet
PR: ovn-kubernetes/ovn-kubernetes#5587
File: go-controller/pkg/networkmanager/egressip_tracker.go:396-443
Timestamp: 2025-09-29T13:52:53.205Z
Learning: In ovn-kubernetes go-controller/pkg/networkmanager/egressip_tracker.go, the onNetworkRefChange callback must remain synchronous because it relies on ordering of events with the Active signal. Using goroutines would create race conditions where DELETE events could be processed before ADD events, breaking the ordering guarantees that downstream components depend on.

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
🧬 Code graph analysis (1)
go-controller/pkg/ovn/egressip_test.go (6)
go-controller/pkg/config/config.go (1)
  • Gateway (147-164)
go-controller/pkg/util/node_annotations.go (1)
  • OVNNodeHostCIDRs (98-98)
go-controller/pkg/testing/libovsdb/libovsdb.go (2)
  • TestSetup (41-50)
  • TestData (52-52)
go-controller/pkg/types/const.go (10)
  • OVNClusterRouter (43-43)
  • GWRouterPrefix (48-48)
  • GWRouterToJoinSwitchPrefix (55-55)
  • EXTSwitchToGWRouterPrefix (58-58)
  • GWRouterToExtSwitchPrefix (59-59)
  • ExternalSwitchPrefix (47-47)
  • DefaultNetworkName (7-7)
  • DefaultNoRereoutePriority (120-120)
  • EgressIPNodeConnectionMark (144-144)
  • EgressIPReroutePriority (122-122)
go-controller/pkg/nbdb/logical_router_policy.go (1)
  • LogicalRouterPolicy (22-34)
go-controller/pkg/nbdb/nat.go (1)
  • NAT (21-36)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Lint
  • GitHub Check: Build-PR
🔇 Additional comments (6)
go-controller/pkg/ovn/egressip.go (6)

875-882: LGTM! Stale assignment cleanup logic is correct.

The code properly:

  1. Deletes stale OVN configuration via deletePodEgressIPAssignments
  2. Removes the entry from the in-memory status map
  3. Fails fast if cleanup encounters errors, preventing inconsistent state

The locking is also correct—deletePodEgressIPAssignments is called while holding the podAssignment lock as required by the function contract (line 1048).


938-940: LGTM! Essential cleanup for standby EIP tracking.

This ensures that when a pod is no longer managed by this EIP (checked at line 937), any associated status entries are properly removed from the per-pod status map. This prevents stale entries from accumulating when EIPs are deleted or pod selectors change.


1616-1621: LGTM! Core fix for the failover race condition.

This change seeds the per-pod status map during controller sync by iterating through the per-EIP per-IP node assignments (egressIPToAssignedNodes). The egressStatusStateSynced marker ensures these entries are re-applied during the next reconciliation cycle.

This addresses the race condition described in the PR objectives: when the controller restarts during an EIP failover, the informer cache is rebuilt from the current EgressIP status, enabling detection and cleanup of stale SNAT/LRP configurations for previously assigned nodes.


1175-1176: LGTM! Essential tracking for per-EIP per-IP node assignments.

The new egressIPToAssignedNodes field (type map[string]map[string]string) tracks which node each specific EIP IP is assigned to:

  • Outer key: EgressIP name
  • Inner key: EgressIP IP address
  • Value: Node name

This granular tracking enables:

  1. Accurate cache population during controller sync (used at line 1617)
  2. Detection of stale assignments when an EIP is reassigned to a different node

The initialization and population logic in generateCacheForEgressIP is correct.

Also applies to: 1979-1981, 1995-1995, 2003-2003


2260-2262: LGTM! Clear documentation of the sync state mechanism.

The constant definition and updated statusMap documentation clearly explain the two-state system:

  • Empty string "": Status is applied and reconciled
  • egressStatusStateSynced ("sync"): Status was seeded during controller sync and must be (re)applied

This documentation improves code maintainability and makes the synchronization logic easier to understand.

Also applies to: 2267-2272


2283-2293: LGTM! Correct stale entry detection for failover scenarios.

The hasStaleEIPStatus helper correctly identifies when the same EgressIP (IP address) is assigned to a different node, indicating a failover event that requires cleanup of the old node's configuration.

The function returns after finding the first stale entry, which is appropriate because:

  1. Multiple stale entries for the same EIP IP are rare
  2. The caller removes detected stale entries from the map (line 881)
  3. Subsequent reconciliations would catch any remaining stale entries

@pperiyasamy pperiyasamy requested review from kyrtapz and tssurya and removed request for crnithya October 7, 2025 10:23
@pperiyasamy pperiyasamy force-pushed the eip_failover_ovnkubenode_restart branch from a69e7ce to 16f22a7 Compare October 10, 2025 11:30
@pperiyasamy pperiyasamy changed the title Fix stale EIP node assignments during simultaneous failover Fix stale EIP assignments during failover and controller restart Oct 10, 2025

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
go-controller/pkg/ovn/egressip.go (2)

828-836: Verify stale detection handles all edge cases.

The logic checks for stale status entries where the same EgressIP is assigned to a different node. However, hasStaleEIPStatus returns only the first stale entry found. If multiple stale entries exist for the same EgressIP (e.g., Node A and Node B both stale, new assignment to Node C), only one will be cleaned up per reconciliation cycle. While subsequent reconciliations will eventually clean up remaining stale entries, consider whether this gradual cleanup is acceptable or if all stale entries should be removed in a single pass for more predictable behavior.

Consider modifying hasStaleEIPStatus to return all stale entries:

-func (e egressStatuses) hasStaleEIPStatus(potentialStatus egressipv1.EgressIPStatusItem) *egressipv1.EgressIPStatusItem {
-	var staleStatus *egressipv1.EgressIPStatusItem
+func (e egressStatuses) hasStaleEIPStatus(potentialStatus egressipv1.EgressIPStatusItem) []egressipv1.EgressIPStatusItem {
+	var staleStatuses []egressipv1.EgressIPStatusItem
 	for status := range e.statusMap {
 		if status.EgressIP == potentialStatus.EgressIP &&
 			status.Node != potentialStatus.Node {
-			staleStatus = &egressipv1.EgressIPStatusItem{EgressIP: status.EgressIP, Node: status.Node}
-			break
+			staleStatuses = append(staleStatuses, egressipv1.EgressIPStatusItem{EgressIP: status.EgressIP, Node: status.Node})
 		}
 	}
-	return staleStatus
+	return staleStatuses
 }

And update the calling code to handle a slice:

-		if staleStatus := podState.egressStatuses.hasStaleEIPStatus(status); staleStatus != nil {
-			staleAssignments = append(staleAssignments, *staleStatus)
+		if staleStatuses := podState.egressStatuses.hasStaleEIPStatus(status); len(staleStatuses) > 0 {
+			staleAssignments = append(staleAssignments, staleStatuses...)
 		}

870-879: Simplify local zone node check.

The code locks each node just to check if it's in the local zone via Load, but syncmap.Load is already thread-safe and doesn't require external locking. The LockKey/UnlockKey calls add unnecessary overhead.

Remove the locking since Load is thread-safe:

 	proceed := false
 	for _, status := range statusAssignments {
-		e.nodeZoneState.LockKey(status.Node)
 		isLocalZoneEgressNode, loadedEgressNode := e.nodeZoneState.Load(status.Node)
 		if loadedEgressNode && isLocalZoneEgressNode {
 			proceed = true
-			e.nodeZoneState.UnlockKey(status.Node)
 			break
 		}
-		e.nodeZoneState.UnlockKey(status.Node)
 	}
go-controller/pkg/ovn/egressip_test.go (2)

7976-7976: Prefer order‑independent map assertions

Indexing into Status.Items can be brittle if ordering changes. Assert by key/value presence instead:
for each item in eip1Obj.Status.Items, Expect(statusMap).To(HaveKeyWithValue(item, egressStatusStateSynced)).

- gomega.Expect(pas.egressStatuses.statusMap).To(gomega.HaveLen(3))
- gomega.Expect(pas.egressStatuses.statusMap[eip1Obj.Status.Items[0]]).To(gomega.Equal(egressStatusStateSynced))
- gomega.Expect(pas.egressStatuses.statusMap[eip1Obj.Status.Items[1]]).To(gomega.Equal(egressStatusStateSynced))
+ gomega.Expect(pas.egressStatuses.statusMap).To(gomega.HaveLen(3))
+ for _, it := range eip1Obj.Status.Items {
+   gomega.Expect(pas.egressStatuses.statusMap).
+     To(gomega.HaveKeyWithValue(it, egressStatusStateSynced))
+ }

Also applies to: 7989-7991, 8016-8018


11148-11497: New failover+restart reconciliation test: solid coverage; consider explicit timeout to reduce flakes

Scenario and expectations look correct (LRP nexthop to transit IP, nodeIP SNAT on old node, address sets/QoS). To harden CI runs, wrap the final DB check with an explicit timeout (e.g., Eventually(..., inspectTimeout)).

- gomega.Eventually(fakeOvn.nbClient).Should(libovsdbtest.HaveData(expectedDatabaseState))
+ gomega.Eventually(fakeOvn.nbClient, inspectTimeout).
+   Should(libovsdbtest.HaveData(expectedDatabaseState))
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a69e7ce and 16f22a7.

📒 Files selected for processing (2)
  • go-controller/pkg/ovn/egressip.go (10 hunks)
  • go-controller/pkg/ovn/egressip_test.go (5 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-03T09:38:27.723Z
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5555
File: go-controller/pkg/ovn/egressip_test.go:11335-11351
Timestamp: 2025-09-03T09:38:27.723Z
Learning: In ovn-kubernetes Go tests (e.g., go-controller/pkg/ovn/egressip_test.go), any goroutine that uses Gomega assertions should call defer ginkgo.GinkgoRecover() at the top so assertion panics are captured by Ginkgo.

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-09-29T13:52:53.205Z
Learnt from: trozet
PR: ovn-kubernetes/ovn-kubernetes#5587
File: go-controller/pkg/networkmanager/egressip_tracker.go:396-443
Timestamp: 2025-09-29T13:52:53.205Z
Learning: In ovn-kubernetes go-controller/pkg/networkmanager/egressip_tracker.go, the onNetworkRefChange callback must remain synchronous because it relies on ordering of events with the Active signal. Using goroutines would create race conditions where DELETE events could be processed before ADD events, breaking the ordering guarantees that downstream components depend on.

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
🧬 Code graph analysis (1)
go-controller/pkg/ovn/egressip_test.go (8)
go-controller/pkg/config/config.go (1)
  • Gateway (147-164)
go-controller/pkg/testing/libovsdb/libovsdb.go (2)
  • TestSetup (41-50)
  • TestData (52-52)
go-controller/pkg/types/const.go (10)
  • OVNClusterRouter (43-43)
  • GWRouterPrefix (48-48)
  • GWRouterToJoinSwitchPrefix (55-55)
  • EXTSwitchToGWRouterPrefix (58-58)
  • GWRouterToExtSwitchPrefix (59-59)
  • ExternalSwitchPrefix (47-47)
  • DefaultNetworkName (7-7)
  • DefaultNoRereoutePriority (120-120)
  • EgressIPNodeConnectionMark (144-144)
  • EgressIPReroutePriority (122-122)
go-controller/pkg/nbdb/logical_router_port.go (1)
  • LogicalRouterPort (11-26)
go-controller/pkg/nbdb/logical_switch_port.go (1)
  • LogicalSwitchPort (11-30)
go-controller/vendor/github.com/containernetworking/cni/pkg/types/types.go (2)
  • ParseCIDR (30-38)
  • IPNet (26-26)
go-controller/pkg/nbdb/logical_router_policy.go (3)
  • LogicalRouterPolicy (22-34)
  • LogicalRouterPolicyActionAllow (15-15)
  • LogicalRouterPolicyActionReroute (17-17)
go-controller/pkg/nbdb/nat.go (2)
  • NAT (21-36)
  • NATTypeSNAT (16-16)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build-PR
  • GitHub Check: Lint
🔇 Additional comments (3)
go-controller/pkg/ovn/egressip.go (3)

1616-1620: Verify seeding handles controller restart race correctly.

The seeding logic populates podState.egressStatuses with entries from egressIPToAssignedNodes during syncPodAssignmentCache, marking them as egressStatusStateSynced. This enables detection of stale entries when the controller restarts. However, if the EgressIP's Status.Items is already updated to the new node (Node B) before syncPodAssignmentCache runs, and the old node (Node A) has already been removed from Status.Items, then Node A's stale configuration won't be detected for cleanup.

Verify the timing guarantees: does syncPodAssignmentCache always run with a snapshot of Status.Items that includes the old node assignment, or could there be a race where the old assignment is already removed from the API server by the time the controller starts? If the latter is possible, stale SNAT and LRP configuration might remain on Node A.

Consider adding a test case that verifies cleanup when:

  1. Controller is down
  2. EIP fails over from Node A to Node B
  3. Status.Items is updated to only show Node B
  4. Controller starts (cache rebuild)
  5. Only Add event is delivered (not Update)
  6. Verify Node A's configuration is cleaned up

2283-2293: LGTM! Helper correctly identifies stale entries.

The hasStaleEIPStatus helper correctly identifies stale entries by matching on EgressIP while checking for a different Node. The early return with break is appropriate since only the first stale entry needs to be returned (subsequent ones will be cleaned up in later reconciliation cycles).


938-940: LGTM! Cleanup ensures consistency.

The cleanup of stale status entries when podState.egressIPName doesn't match ensures the cache remains consistent even when pods are managed by different EgressIP objects over time.

@pperiyasamy pperiyasamy force-pushed the eip_failover_ovnkubenode_restart branch from 16f22a7 to 4fdd7a4 Compare October 17, 2025 08:37

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
go-controller/pkg/ovn/egressip_test.go (2)

7976-7976: Reduce duplication in egress status assertions

The repeated checks against pas.egressStatuses.statusMap (length and egressStatusStateSynced value) are correct but duplicated in multiple places. Consider extracting a tiny helper (e.g., assertStatusesSynced(t, pas, wantLen)) to keep tests terse and easier to update if state semantics change.

Also applies to: 7989-7990, 8016-8017, 8082-8082


11148-11502: Stabilize the new failover+restart test and optionally harden cleanup verification

  • Use the existing inspectTimeout with Eventually to avoid CI flakes on large expected DB states:
- gomega.Eventually(fakeOvn.nbClient).Should(libovsdbtest.HaveData(expectedDatabaseState))
+ gomega.Eventually(fakeOvn.nbClient, inspectTimeout).Should(libovsdbtest.HaveData(expectedDatabaseState))
  • Optional: to explicitly validate stale cleanup, pre-seed NBData with a stale EIP SNAT on node1 (LogicalPort "k8s-node1" + ExternalIP = egressIP) and a reroute LRP with node1 join IP as nexthop, then assert they are absent from expectedDatabaseState. This makes the regression being fixed observable in the test outcome, not just inferred.
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 16f22a7 and 4fdd7a4.

📒 Files selected for processing (2)
  • go-controller/pkg/ovn/egressip.go (10 hunks)
  • go-controller/pkg/ovn/egressip_test.go (5 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-29T13:52:53.205Z
Learnt from: trozet
PR: ovn-kubernetes/ovn-kubernetes#5587
File: go-controller/pkg/networkmanager/egressip_tracker.go:396-443
Timestamp: 2025-09-29T13:52:53.205Z
Learning: In ovn-kubernetes go-controller/pkg/networkmanager/egressip_tracker.go, the onNetworkRefChange callback must remain synchronous because it relies on ordering of events with the Active signal. Using goroutines would create race conditions where DELETE events could be processed before ADD events, breaking the ordering guarantees that downstream components depend on.

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
  • go-controller/pkg/ovn/egressip.go
🧬 Code graph analysis (1)
go-controller/pkg/ovn/egressip_test.go (8)
go-controller/pkg/config/config.go (2)
  • Gateway (170-187)
  • OVNKubernetesFeature (158-161)
go-controller/pkg/testing/libovsdb/libovsdb.go (2)
  • TestSetup (41-50)
  • TestData (52-52)
go-controller/pkg/types/const.go (10)
  • OVNClusterRouter (43-43)
  • GWRouterPrefix (48-48)
  • GWRouterToJoinSwitchPrefix (55-55)
  • EXTSwitchToGWRouterPrefix (58-58)
  • GWRouterToExtSwitchPrefix (59-59)
  • ExternalSwitchPrefix (47-47)
  • DefaultNetworkName (7-7)
  • DefaultNoRereoutePriority (120-120)
  • EgressIPNodeConnectionMark (144-144)
  • EgressIPReroutePriority (122-122)
go-controller/pkg/nbdb/logical_router_port.go (1)
  • LogicalRouterPort (11-26)
go-controller/pkg/nbdb/logical_switch_port.go (1)
  • LogicalSwitchPort (11-30)
go-controller/vendor/github.com/containernetworking/cni/pkg/types/types.go (2)
  • ParseCIDR (30-38)
  • IPNet (26-26)
go-controller/pkg/nbdb/logical_router_policy.go (3)
  • LogicalRouterPolicy (22-34)
  • LogicalRouterPolicyActionAllow (15-15)
  • LogicalRouterPolicyActionReroute (17-17)
go-controller/pkg/nbdb/nat.go (2)
  • NAT (21-36)
  • NATTypeSNAT (16-16)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build-PR
  • GitHub Check: Lint
🔇 Additional comments (5)
go-controller/pkg/ovn/egressip.go (5)

938-940: LGTM! Proper cleanup of stale statuses.

This correctly handles the cleanup of stale statuses when an EgressIP object no longer manages a pod. The logic properly checks if the pod is managed by a different EgressIP (line 937) but still has this statusToRemove in its cache, and removes it accordingly.


1175-1176: Well-structured per-EIP node tracking.

The new egressIPToAssignedNodes map correctly tracks the assigned node for each EgressIP IP within each EgressIP object. The structure (egressIP name → egressIP IP → node name) properly supports the failover scenario where an EgressIP can be reassigned to a different node.

The initialization at line 1980 and population at lines 1995-2003 look correct, properly keying by both egressIP name and the parsed IP string.

Also applies to: 1979-2003


1616-1620: Synced status seeding enables proper reconciliation.

The seeding of pod egress statuses during cache synchronization correctly marks entries with egressStatusStateSynced. This marker is checked at line 828 in addPodEgressIPAssignments, ensuring that synced entries will be re-applied during the next reconciliation cycle. This is the key mechanism for fixing stale assignments after controller restart.

The loop properly creates status items for each EgressIP IP using the assigned node from egressIPToAssignedNodes[egressIPName].


2260-2293: Clear state tracking with proper stale detection.

The egressStatusStateSynced constant and updated egressStatuses documentation clearly define the three states an entry can be in. The hasStaleEIPStatus method correctly identifies stale entries by finding statuses with the same EgressIP but different node.

The implementation properly creates a new EgressIPStatusItem struct at line 2288 rather than returning a pointer to the loop variable, avoiding a common Go pitfall.


792-882: No code changes needed.

The concern about multiple stale entries is technically valid but the architecture correctly handles it through eventual consistency. The code works as designed:

  • hasStaleEIPStatus returns the first stale entry found (same EgressIP, different Node) per invocation
  • If multiple stale entries exist for the same EgressIP, they are cleaned up across successive calls to addPodEgressIPAssignments
  • Error handling is idempotent: deletion failures trigger retries where the next stale entry is detected
  • The podAssignment lock serializes operations, preventing race conditions during cleanup

This eventually-consistent approach is acceptable and avoids unnecessary complexity of returning all stale entries at once.

@coveralls

coveralls commented Oct 17, 2025

Copy link
Copy Markdown

Pull Request Test Coverage Report for Build 18655908164

Details

  • 42 of 44 (95.45%) changed or added relevant lines in 1 file are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.04%) to 54.639%

Changes Missing Coverage Covered Lines Changed/Added Lines %
go-controller/pkg/ovn/egressip.go 42 44 95.45%
Files with Coverage Reduction New Missed Lines %
go-controller/pkg/ovn/base_network_controller.go 2 69.13%
Totals Coverage Status
Change from base Build 18651596674: 0.04%
Covered Lines: 38941
Relevant Lines: 71270

💛 - Coveralls

@pperiyasamy pperiyasamy force-pushed the eip_failover_ovnkubenode_restart branch from 4fdd7a4 to 9250e66 Compare October 20, 2025 14:51

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
go-controller/pkg/ovn/egressip.go (1)

856-881: Deadlock fix confirmed; consider minor optimization.

The removal of the outer nodeZoneState lock before calling deletePodEgressIPAssignments correctly addresses the deadlock issue from previous reviews. The current implementation is safe because:

  • podAssignment lock is held by the caller (lines 767-768)
  • deletePodEgressIPAssignments acquires nodeZoneState locks internally (line 1064)
  • Lock ordering is consistent: podAssignment (outer) → nodeZoneState (inner)

However, the zone-check logic (lines 868-881) could be slightly optimized by checking isPodScheduledinLocalZone first before iterating through status nodes, since the pod zone check is cheaper and the condition at line 879 uses || logic:

 	proceed := false
+	if e.isPodScheduledinLocalZone(pod) {
+		proceed = true
+	}
-	for _, status := range statusAssignments {
+	if !proceed {
+		for _, status := range statusAssignments {
 		e.nodeZoneState.LockKey(status.Node)
 		isLocalZoneEgressNode, loadedEgressNode := e.nodeZoneState.Load(status.Node)
 		if loadedEgressNode && isLocalZoneEgressNode {
 			proceed = true
 			e.nodeZoneState.UnlockKey(status.Node)
 			break
 		}
 		e.nodeZoneState.UnlockKey(status.Node)
+		}
 	}
-	if !proceed && !e.isPodScheduledinLocalZone(pod) {
+	if !proceed {
 		return nil
 	}
go-controller/pkg/ovn/egressip_test.go (2)

11325-11330: Consider checking errors from helper functions.

While ignoring errors with _ is common in test code with known-good inputs, explicitly handling or at least acknowledging potential failures improves test robustness and debuggability.

Consider one of these approaches:

Option 1: Use gomega.Expect to verify the operations succeed:

-				i, n, _ := net.ParseCIDR(podV4IP + "/23")
+				i, n, err := net.ParseCIDR(podV4IP + "/23")
+				gomega.Expect(err).NotTo(gomega.HaveOccurred())
 				n.IP = i

Option 2: Use a helper function like the existing types.ParseCIDR (from vendor/github.com/containernetworking/cni/pkg/types/types.go):

-				i, n, _ := net.ParseCIDR(podV4IP + "/23")
-				n.IP = i
+				n, err := types.ParseCIDR(podV4IP + "/23")
+				gomega.Expect(err).NotTo(gomega.HaveOccurred())

Apply similar changes to lines 11328-11330 and 11358-11360.

Also applies to: 11358-11360


11148-11356: Add comments to clarify the test scenario.

This test simulates a complex race condition scenario (EIP failover concurrent with controller restart). Adding a structured comment block explaining the initial state, the sequence of events, and what's being verified would improve maintainability and help future developers understand the test's purpose.

Consider adding a comment like this at the beginning of the test:

 		ginkgo.It("should update SNAT and LRP nexthops during simultaneous EIP failover and ovnkube-controller restart", func() {
+			// This test reproduces the race condition described in the PR:
+			// Initial state: EIP-1 assigned to node-1, pod1 on node-1, pod2 on node-3
+			// During failover: EIP-1 reassigned to node-2, controller restarts
+			// Race condition: Pod add events processed with stale cache create entries pointing to node-1
+			// Expected result: After reconciliation, all LRPs should point to node-2 (the new EIP node)
+			//                  and stale SNAT/LRP entries referring to node-1 should be removed
 			app.Action = func(*cli.Context) error {
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4fdd7a4 and 9250e66.

📒 Files selected for processing (2)
  • go-controller/pkg/ovn/egressip.go (10 hunks)
  • go-controller/pkg/ovn/egressip_test.go (5 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-09T12:23:01.462Z
Learnt from: npinaeva
PR: ovn-kubernetes/ovn-kubernetes#5561
File: go-controller/pkg/ovn/egressip.go:3256-3304
Timestamp: 2025-10-09T12:23:01.462Z
Learning: In go-controller/pkg/ovn/egressip.go, EgressIP reroute policies (priority types.EgressIPReroutePriority) are created via createReroutePolicyOps() using getEgressIPLRPReRouteDbIDs(..., controller = e.controllerName). Therefore, predicates updating these LRPs should match ExternalIDs[OwnerControllerKey] against e.controllerName (not a network-scoped controller name).

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-09-03T09:38:27.723Z
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5555
File: go-controller/pkg/ovn/egressip_test.go:11335-11351
Timestamp: 2025-09-03T09:38:27.723Z
Learning: In ovn-kubernetes Go tests (e.g., go-controller/pkg/ovn/egressip_test.go), any goroutine that uses Gomega assertions should call defer ginkgo.GinkgoRecover() at the top so assertion panics are captured by Ginkgo.

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
🧬 Code graph analysis (1)
go-controller/pkg/ovn/egressip_test.go (6)
go-controller/pkg/config/config.go (2)
  • Gateway (170-187)
  • OVNKubernetesFeature (158-161)
go-controller/pkg/testing/libovsdb/libovsdb.go (2)
  • TestSetup (41-50)
  • TestData (52-52)
go-controller/pkg/types/const.go (10)
  • OVNClusterRouter (43-43)
  • GWRouterPrefix (48-48)
  • GWRouterToJoinSwitchPrefix (55-55)
  • EXTSwitchToGWRouterPrefix (58-58)
  • GWRouterToExtSwitchPrefix (59-59)
  • ExternalSwitchPrefix (47-47)
  • DefaultNetworkName (7-7)
  • DefaultNoRereoutePriority (120-120)
  • EgressIPNodeConnectionMark (144-144)
  • EgressIPReroutePriority (122-122)
go-controller/vendor/github.com/containernetworking/cni/pkg/types/types.go (2)
  • ParseCIDR (30-38)
  • IPNet (26-26)
go-controller/pkg/nbdb/logical_router_policy.go (1)
  • LogicalRouterPolicy (22-34)
go-controller/pkg/nbdb/nat.go (1)
  • NAT (21-36)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build-PR
  • GitHub Check: Lint
🔇 Additional comments (3)
go-controller/pkg/ovn/egressip.go (2)

2282-2292: LGTM: Stale EIP detection logic is correct.

The hasStaleEIPStatus method correctly identifies when the same EgressIP is reassigned to a different node by comparing EgressIP fields while checking for differing Node fields. This is the core logic needed to detect and clean up stale SNAT/LRP entries during failover scenarios.


1615-1619: Verify seeded statuses are applied exactly once per reconciliation cycle.

The seeding logic populates podState.egressStatuses with egressStatusStateSynced for all EIP assignments. Line 827 treats these as needing reapplication (value == egressStatusStateSynced). Confirm that after the first application, these entries are updated to "" to prevent redundant reapplication on subsequent reconciliation passes within the same controller session.

Looking at line 894 and 902, the code sets podState.egressStatuses.statusMap[status] = "" after successful application, which confirms the marker is cleared. This should prevent redundant work.

go-controller/pkg/ovn/egressip_test.go (1)

7976-7976: LGTM! Status tracking updates are consistent.

The updates to use HaveLen checks and expect egressStatusStateSynced instead of empty strings correctly reflect the new per-EIP synchronization mechanism introduced in this PR.

Also applies to: 7989-7990, 8016-8017, 8082-8082

Comment thread go-controller/pkg/ovn/egressip_test.go
@kyrtapz

kyrtapz commented Oct 21, 2025

Copy link
Copy Markdown
Contributor

I am confused with the reproducer:

EIP-1 assigned to node-1. node-1 and node-2 rebooted at the sametime.
This made EIP failover and ovnkube-controller container restart happened almost at the same time.

  1. EIP-1 is reassigned to node-2 by cluster manager.
  2. EIP controller synchronizes existing EIP objects with older EIP status, so it cleans up SNATs and LRPs referring to node-1 due to stale pod IP addresses.
  3. At the same time, pod1 and pod2 add events are triggered, but EIP controller's watch factory seeing older EIP status from the informer cache, so SNATs and LRPs are created referring to node-1 for the new pod IPs.
  4. EIP-1 add event is triggered with new EIP status, EIP controller adds new SNAT entries and updates LRP nexthop for node-2.
  5. stale SNATs, LRPs having with stale nexthops for node-1.

Why is step 4 an EIP add? It should be an update? in which case we are supposed to cleanup the old status from the configured pods an not leave any stale entries behind.

@pperiyasamy

Copy link
Copy Markdown
Contributor Author

/retest-failed

@kyrtapz

kyrtapz commented Oct 21, 2025

Copy link
Copy Markdown
Contributor

Why is step 4 an EIP add? It should be an update? in which case we are supposed to cleanup the old status from the configured pods an not leave any stale entries behind.

@kyrtapz this is at the time of ovnkube-node pod restart (due to node reboot), so EIP add is received. isn't that expected ?

So you are saying that we ran the sync with the EIP using the old object but then got the ADD with the new one? That sounds weird as I would expect the synthetic ADDs and the initial syncs to use the same objects to avoid exactly this type of issues.

@pperiyasamy

Copy link
Copy Markdown
Contributor Author

Why is step 4 an EIP add? It should be an update? in which case we are supposed to cleanup the old status from the configured pods an not leave any stale entries behind.

@kyrtapz this is at the time of ovnkube-node pod restart (due to node reboot), so EIP add is received. isn't that expected ?

So you are saying that we ran the sync with the EIP using the old object but then got the ADD with the new one? That sounds weird as I would expect the synthetic ADDs and the initial syncs to use the same objects to avoid exactly this type of issues.

So you are saying that we ran the sync with the EIP using the old object but then got the ADD with the new one?

That's right.

I would expect the synthetic ADDs and the initial syncs to use the same objects to avoid exactly this type of issues.

Okay, but that hasn’t always been the case. The initial sync sometimes used an outdated EIP object, possibly because the watch factory informer cache wasn’t fully up to date when syncing existing objects. This issue occurs intermittently when EIP failover attempts combined with ovnkube-controller restarts.

@pperiyasamy pperiyasamy force-pushed the eip_failover_ovnkubenode_restart branch from 9250e66 to b7d87e2 Compare October 22, 2025 17:08
@pperiyasamy

Copy link
Copy Markdown
Contributor Author

Why is step 4 an EIP add? It should be an update? in which case we are supposed to cleanup the old status from the configured pods an not leave any stale entries behind.

@kyrtapz this is at the time of ovnkube-node pod restart (due to node reboot), so EIP add is received. isn't that expected ?

So you are saying that we ran the sync with the EIP using the old object but then got the ADD with the new one? That sounds weird as I would expect the synthetic ADDs and the initial syncs to use the same objects to avoid exactly this type of issues.

So you are saying that we ran the sync with the EIP using the old object but then got the ADD with the new one?

That's right.

I would expect the synthetic ADDs and the initial syncs to use the same objects to avoid exactly this type of issues.

Okay, but that hasn’t always been the case. The initial sync sometimes used an outdated EIP object, possibly because the watch factory informer cache wasn’t fully up to date when syncing existing objects. This issue occurs intermittently when EIP failover attempts combined with ovnkube-controller restarts.

@kyrtapz Thanks for the offline discussion, you correctly pointed out EIP initial sync and add happens with new EIP status, only pod add event saw older EIP status from the informer cache which led to stale SNATs/LRP nexthops. so adjusted the PR only for this scenario. PTAL.

kyrtapz
kyrtapz previously approved these changes Oct 23, 2025

@kyrtapz kyrtapz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pperiyasamy! This version looks good to me.

It was tricky to understand the exact order of events that this PR addresses so here it is for others that might be confused:
We are addressing a very specific startup race where the stale entries cleanup runs before the egressIP status got updated and the egressIP controller starts after. That ends up with us missing one update and just configuring the new status without removing the old one.

With this:

if config.OVNKubernetesFeature.EnableEgressIP {
// This is probably the best starting order for all egress IP handlers.
// WatchEgressIPNamespaces and WatchEgressIPPods only use the informer
// cache to retrieve the egress IPs when determining if namespace/pods
// match. It is thus better if we initialize them first and allow
// WatchEgressNodes / WatchEgressIP to initialize after. Those handlers
// might change the assignments of the existing objects. If we do the
// inverse and start WatchEgressIPNamespaces / WatchEgressIPPod last, we
// risk performing a bunch of modifications on the EgressIP objects when
// we restart and then have these handlers act on stale data when they
// sync.
if err := WithSyncDurationMetric("egress ip namespace", oc.WatchEgressIPNamespaces); err != nil {
return err
}
if err := WithSyncDurationMetric("egress ip pod", oc.WatchEgressIPPods); err != nil {
return err
}
if err := WithSyncDurationMetric("egress node", oc.WatchEgressNodes); err != nil {
return err
}
if err := WithSyncDurationMetric("egress ip", oc.WatchEgressIP); err != nil {
return err
}
}

The order is:

  1. syncEgressIPs is called from WatchEgressIPNamespaces with old EgressIP status - no stale entries found.
  2. EgressIP status gets updated.
  3. WatchEgressIPNamespaces/WatchEgressIPPod/WatchEgressIP calls synthetic ADD using the new status - no cleanup of old status.

Comment thread go-controller/pkg/ovn/egressip.go
Comment thread go-controller/pkg/ovn/egressip.go Outdated
@kyrtapz

kyrtapz commented Oct 23, 2025

Copy link
Copy Markdown
Contributor

After talking to @pperiyasamy I think there still might be a possibility for a race:

  1. syncEgressIPs is called from WatchEgressIPNamespaces with old EgressIP status.
  2. EgressIP status gets updated.
  3. Synthetic EgressIPNamespace ADD is called and populates the cache with the new value so there is no chance for figuring out the stale one.

Maybe we need to populate the eipStatus in syncEgressIPs.

@pperiyasamy

Copy link
Copy Markdown
Contributor Author

After talking to @pperiyasamy I think there still might be a possibility for a race:

  1. syncEgressIPs is called from WatchEgressIPNamespaces with old EgressIP status.
  2. EgressIP status gets updated.
  3. Synthetic EgressIPNamespace ADD is called and populates the cache with the new value so there is no chance for figuring out the stale one.

Maybe we need to populate the eipStatus in syncEgressIPs.

yes @kyrtapz , restored syncEgressIPs changes, PTAL.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
go-controller/pkg/ovn/egressip.go (1)

859-866: Consider logging stale status deletion as info level for better visibility.

The deletion of stale EIP status is logged at V(2) level, which may not be visible in production. Since this addresses the race condition mentioned in the PR, consider using klog.Infof instead of klog.V(2).Infof to ensure these cleanup operations are visible in logs for debugging and verification.

-		klog.V(2).Infof("Deleting stale pod egress IP status: %v for EgressIP: %s and pod: %s/%s/%v", staleStatus, name, pod.Namespace, pod.Name, podIPNets)
+		klog.Infof("Deleting stale pod egress IP status: %v for EgressIP: %s and pod: %s/%s/%v", staleStatus, name, pod.Namespace, pod.Name, podIPNets)
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9250e66 and 188b23d.

📒 Files selected for processing (2)
  • go-controller/pkg/ovn/egressip.go (10 hunks)
  • go-controller/pkg/ovn/egressip_test.go (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • go-controller/pkg/ovn/egressip_test.go
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5606
File: go-controller/pkg/ovn/egressip_test.go:11388-11395
Timestamp: 2025-10-20T15:07:49.533Z
Learning: In interconnect (IC) multi-zone scenarios in ovn-kubernetes, EgressIP reroute LRPs (priority types.EgressIPReroutePriority) and related SNATs for pods scheduled on remote-zone nodes are programmed only in that remote zone’s OVN NB database. The global/local zone NB DB should not expect those remote LRPs/SNATs. Applies to go-controller/pkg/ovn/egressip_test.go tests that mark nodes with "global" vs "remote".
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Lint
  • GitHub Check: Build-PR
🔇 Additional comments (2)
go-controller/pkg/ovn/egressip.go (2)

1613-1617: LGTM! Proper seeding of egressStatuses during sync.

This code correctly populates the egressStatuses cache with assigned node information from the EIP sync, marking entries with egressStatusStateSynced. This enables subsequent event handlers to detect and clean up stale SNAT/LRP entries when EIP failover occurs during controller restart.


867-884: Verify locking pattern doesn't introduce performance issues at scale.

The new local zone eligibility check acquires and releases nodeZoneState locks for each status in statusAssignments. While the locks are properly ordered (acquired and released one at a time), this could introduce latency in deployments with many EIP status assignments.

Consider monitoring this in scale tests. If performance issues are observed, an alternative approach could pre-gather local zone node information before the loop to avoid repeated locking.

kyrtapz
kyrtapz previously approved these changes Oct 24, 2025

@kyrtapz kyrtapz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Left some non-blocking comments.

Comment thread go-controller/pkg/ovn/egressip.go Outdated
Comment thread go-controller/pkg/ovn/egressip.go Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
go-controller/pkg/ovn/egressip.go (1)

792-866: Well-structured stale entry cleanup with acceptable tradeoffs.

The refactored flow correctly separates remaining assignments from stale ones, deletes stale entries before adding new ones (avoiding conflicts), and maintains proper cache consistency. The warning-and-continue approach for deletion failures (line 863) is a reasonable tradeoff—sync functions will eventually catch any orphaned database entries, and avoiding perpetual retry failures is valuable.

One minor consideration: if stale deletions consistently fail for a specific pod, the cache entry is cleared (line 865) but the database entry persists until the next sync. This window is acceptable given the reconciliation mechanisms in place, but be aware that brief periods of stale SNATs/LRPs may occur.

If you want to be more defensive, you could consider:

  • Tracking failed stale deletions separately (e.g., a failedDeletions set) and retrying them before adding new entries
  • Adding a metric to monitor stale deletion failures

However, this adds complexity and may not be necessary given the sync functions already provide eventual consistency.

go-controller/pkg/ovn/egressip_test.go (1)

11148-11501: Consider helper functions to improve test maintainability.

This test is comprehensive (350+ lines) but quite lengthy. The setup includes repetitive patterns for creating routers, ports, and switches. Consider extracting helper functions for common object creation patterns to improve readability and maintainability.

For example, helper functions for:

  • Creating gateway routers with standard options
  • Creating logical router ports with transit switch IPs
  • Creating external switch ports with router options

This is optional and can be deferred to future refactoring.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 188b23d and 73e7057.

📒 Files selected for processing (2)
  • go-controller/pkg/ovn/egressip.go (9 hunks)
  • go-controller/pkg/ovn/egressip_test.go (5 hunks)
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5606
File: go-controller/pkg/ovn/egressip_test.go:11388-11395
Timestamp: 2025-10-20T15:07:49.533Z
Learning: In interconnect (IC) multi-zone scenarios in ovn-kubernetes, EgressIP reroute LRPs (priority types.EgressIPReroutePriority) and related SNATs for pods scheduled on remote-zone nodes are programmed only in that remote zone’s OVN NB database. The global/local zone NB DB should not expect those remote LRPs/SNATs. Applies to go-controller/pkg/ovn/egressip_test.go tests that mark nodes with "global" vs "remote".
📚 Learning: 2025-10-20T15:07:49.533Z
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5606
File: go-controller/pkg/ovn/egressip_test.go:11388-11395
Timestamp: 2025-10-20T15:07:49.533Z
Learning: In interconnect (IC) multi-zone scenarios in ovn-kubernetes, EgressIP reroute LRPs (priority types.EgressIPReroutePriority) and related SNATs for pods scheduled on remote-zone nodes are programmed only in that remote zone’s OVN NB database. The global/local zone NB DB should not expect those remote LRPs/SNATs. Applies to go-controller/pkg/ovn/egressip_test.go tests that mark nodes with "global" vs "remote".

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-10-23T14:10:26.595Z
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5493
File: go-controller/pkg/ovn/egressip_test.go:13927-13945
Timestamp: 2025-10-23T14:10:26.595Z
Learning: In ovn-kubernetes/go-controller/pkg/ovn/egressip_test.go unit tests (e.g., the "Sync/remove invalid next hop from LRP" cases), it is acceptable to use the same mask value for both IPv4 and IPv6 in annotations/fixtures; do not require family-correct masks (e.g., /64 for v6) in these tests.

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-09-03T09:38:27.723Z
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5555
File: go-controller/pkg/ovn/egressip_test.go:11335-11351
Timestamp: 2025-09-03T09:38:27.723Z
Learning: In ovn-kubernetes Go tests (e.g., go-controller/pkg/ovn/egressip_test.go), any goroutine that uses Gomega assertions should call defer ginkgo.GinkgoRecover() at the top so assertion panics are captured by Ginkgo.

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-10-09T12:23:01.462Z
Learnt from: npinaeva
PR: ovn-kubernetes/ovn-kubernetes#5561
File: go-controller/pkg/ovn/egressip.go:3256-3304
Timestamp: 2025-10-09T12:23:01.462Z
Learning: In go-controller/pkg/ovn/egressip.go, EgressIP reroute policies (priority types.EgressIPReroutePriority) are created via createReroutePolicyOps() using getEgressIPLRPReRouteDbIDs(..., controller = e.controllerName). Therefore, predicates updating these LRPs should match ExternalIDs[OwnerControllerKey] against e.controllerName (not a network-scoped controller name).

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-08-08T10:03:01.147Z
Learnt from: ricky-rav
PR: ovn-kubernetes/ovn-kubernetes#5387
File: test/e2e/route_advertisements.go:677-678
Timestamp: 2025-08-08T10:03:01.147Z
Learning: In ovn-kubernetes test/e2e/route_advertisements.go (Go, e2e tests), maintainers (per ricky-rav on PR #5387) prefer not to refactor existing variable reuse (e.g., reusing `pod`/`svc` for multiple pods/services) or add node-pinning in unrelated PRs. Suggestions about such refactors should be deferred to a follow-up issue rather than requested in the current feature PR.

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
🧬 Code graph analysis (1)
go-controller/pkg/ovn/egressip_test.go (9)
go-controller/pkg/config/config.go (2)
  • Gateway (170-187)
  • OVNKubernetesFeature (158-161)
go-controller/pkg/testing/libovsdb/libovsdb.go (2)
  • TestSetup (41-50)
  • TestData (52-52)
go-controller/pkg/types/const.go (10)
  • OVNClusterRouter (39-39)
  • GWRouterPrefix (44-44)
  • GWRouterToJoinSwitchPrefix (49-49)
  • EXTSwitchToGWRouterPrefix (50-50)
  • GWRouterToExtSwitchPrefix (51-51)
  • ExternalSwitchPrefix (43-43)
  • DefaultNetworkName (7-7)
  • DefaultNoRereoutePriority (115-115)
  • EgressIPNodeConnectionMark (139-139)
  • EgressIPReroutePriority (117-117)
go-controller/pkg/nbdb/logical_router_port.go (1)
  • LogicalRouterPort (11-26)
go-controller/pkg/nbdb/logical_switch_port.go (1)
  • LogicalSwitchPort (11-30)
go-controller/vendor/github.com/containernetworking/cni/pkg/types/types.go (2)
  • ParseCIDR (30-38)
  • IPNet (26-26)
go-controller/pkg/nbdb/logical_router_policy.go (3)
  • LogicalRouterPolicy (22-34)
  • LogicalRouterPolicyActionAllow (15-15)
  • LogicalRouterPolicyActionReroute (17-17)
go-controller/pkg/ovn/egressip.go (1)
  • IPFamilyValueV4 (62-62)
go-controller/pkg/nbdb/nat.go (2)
  • NAT (21-36)
  • NATTypeSNAT (16-16)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Lint
  • GitHub Check: Build-PR
🔇 Additional comments (9)
go-controller/pkg/ovn/egressip.go (5)

1174-1175: LGTM: EgressIP-to-node assignment tracking properly implemented.

The new egressIPToAssignedNodes cache field correctly tracks the mapping of EgressIP name → EgressIP IP → assigned node. The nested map structure is populated during cache generation and used during sync to seed pending status markers, enabling detection of stale assignments after failover.

Also applies to: 1975-1999


2256-2258: Clear documentation of pending state semantics.

The egressStatusStatePending constant and its usage in the statusMap are well-documented. The distinction between "" (applied/reconciled) and "pending" (requires reconciliation) is clear and will help future maintainers understand the sync-and-reconcile pattern.

Also applies to: 2263-2268


2279-2292: Stale status detection logic is sound.

The hasStaleEIPStatus method correctly identifies scenarios where the same EgressIP address has been reassigned to a different node (failover case). Returning the first match is appropriate since multiple stale entries for the same EIP would indicate a data inconsistency that should be cleaned up anyway.


867-884: Zone locality optimization is correctly implemented.

The proceed check efficiently short-circuits work when neither the egress nodes nor the pod are relevant to this zone. The logic correctly implements: proceed if (any egress node is local) OR (pod is local to zone). The early break (line 879) once a local node is found is a good optimization.

Note that this is a coarse-grained check—individual assignment filtering happens later in addPodEgressIPAssignment based on fine-grained node/pod locality, which is the right approach.


1610-1616: Pending status population enables proper reconciliation.

Seeding the podState.egressStatuses with egressStatusStatePending during sync is the key mechanism that allows subsequent pod add events to detect entries that were populated during sync and reconcile them against current state. This correctly addresses the race condition described in the PR objectives where pod events use stale informer cache.

go-controller/pkg/ovn/egressip_test.go (4)

7976-7976: LGTM: Expectation updated to reflect pending state tracking.

The change from checking for a specific length to HaveLen(2) correctly reflects that the statusMap now tracks pending states for both EgressIP status items.


7989-7990: LGTM: Expectations updated for pending state semantics.

The changes correctly update the test expectations to use egressStatusStatePending instead of empty strings, aligning with the new egress status tracking behavior in the podAssignment cache.

Also applies to: 8016-8017, 8082-8082


11420-11429: Clarify the SNAT entry on GR_node1 after failover.

The expected database state includes a SNAT entry on GR_node1 (lines 11420-11429) that translates podV4IP to node1IPv4. This appears after the EgressIP has failed over from node1 to node2.

  • If this SNAT is stale (left over from before the failover), shouldn't the fix remove it?
  • If this SNAT serves a different purpose (e.g., pod default SNAT when DisableSNATMultipleGWs=true), please add a comment explaining why it remains after failover.

The PR description states the fix "leverages egressStatuses stored in the podAssignment cache to reconcile pod assignments and remove stale SNAT and LRP entries created by the race." Can you confirm whether this SNAT entry is expected to remain?


11345-11356: Verify test adequately simulates the race condition.

The PR objectives describe a race where "pod add events were processed using an older EIP status from the informer cache, which caused stale SNATs/LRP nexthops." However, in this test:

  1. Pods are created and added to the logical port cache before the simulated restart (lines 11320-11330)
  2. The simulated restart only calls reconcileEgressIP with the updated EIP status (line 11355)

Does this test adequately reproduce the race where pod add events arrive during/after controller restart but use stale EIP status from the cache? Consider adding a comment explaining how this test setup simulates the described race condition, or adjust the test to more explicitly demonstrate concurrent pod events using stale cache state.

@pperiyasamy pperiyasamy requested a review from jcaamano October 27, 2025 10:43
Scenario:
- Nodes: node-1, node-2, node-3
- Egress IPs: EIP-1
- Pods: pod1 on node-1, pod2 on node-3 (pods are created via deployment replicas)
- Egress-assignable nodes: node-1, node-2
- EIP-1 assigned to node-1

During a simultaneous reboot of node-1 and node-2, EIP-1 failed over to node-2 and
ovnkube-controller restarted at nearly the same time:

1) EIP-1 was reassigned to node-2 by the cluster manager.
2) The sync EIP happened for EIP1 with stale status, though it cleaned SNATs/LRPs
   referring to node-1 due to outdated pod IPs (this is because pods will be
   recreated due to node reboots).
3) pod1/pod2 Add events arrived while the informer cache still had the
   old EIP status, so new SNATs/LRPs were created pointing to node-1.
4) The EIP-1 Add event arrived with the new status; entries for node-2
   were added/updated.
5) Result: stale SNATs and LRPs with stale nexthops for node-1 remained.

Fix:
- Populate pod EIP status during EgressIP sync so podAssignment has
  accurate egressStatuses.
- Reconcile stale assignments using podAssignment (egressStatuses) when
  the informer cache is not up to date, ensuring SNAT/LRP for the
  previously assigned node are corrected.
- Remove stale EIP SNAT entries for remote-zone pods accordingly.
- Add coverage for simultaneous EIP failover and controller restart.

Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com>
@pperiyasamy pperiyasamy force-pushed the eip_failover_ovnkubenode_restart branch from f1275b0 to 86c6930 Compare October 27, 2025 12:37

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
go-controller/pkg/ovn/egressip_test.go (1)

11149-11502: Consider adding explanatory comments for the interconnect scenario.

The test comprehensively validates the EIP failover race condition fix and correctly handles the interconnect multi-zone scenario. However, the complex setup and expected state could benefit from brief inline comments clarifying:

  1. Why egressPod2's LRP is intentionally absent from the expected DB state (it's on node3 in the remote zone, so its reroute LRP belongs to the remote zone's NB DB).
  2. The purpose of the SNAT entry (lines 11421-11430) with node1IPv4 as the external IP—it appears to be a default pod SNAT rather than an EgressIP SNAT, which would use the EIP address and be programmed in the remote zone DB where node2 resides.

These comments would help future maintainers quickly understand the interconnect architecture being tested.

Example addition near line 11389:

+					// LRP for egressPod reroutes to node2's transit switch IP (new EIP assignment).
+					// Note: egressPod2's LRP is not included here because node3 is in a remote zone;
+					// its LRP is programmed only in that remote zone's NB DB.
 					&nbdb.LogicalRouterPolicy{

And near line 11421:

+					// Default pod SNAT for egressPod on its local node (node1).
+					// The EgressIP SNAT (Pod->EIP) is programmed on node2 in the remote zone DB (not visible here).
 					&nbdb.NAT{
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 73e7057 and f1275b0.

📒 Files selected for processing (2)
  • go-controller/pkg/ovn/egressip.go (9 hunks)
  • go-controller/pkg/ovn/egressip_test.go (5 hunks)
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5606
File: go-controller/pkg/ovn/egressip_test.go:11388-11395
Timestamp: 2025-10-20T15:07:49.533Z
Learning: In interconnect (IC) multi-zone scenarios in ovn-kubernetes, EgressIP reroute LRPs (priority types.EgressIPReroutePriority) and related SNATs for pods scheduled on remote-zone nodes are programmed only in that remote zone’s OVN NB database. The global/local zone NB DB should not expect those remote LRPs/SNATs. Applies to go-controller/pkg/ovn/egressip_test.go tests that mark nodes with "global" vs "remote".
📚 Learning: 2025-10-23T14:10:26.595Z
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5493
File: go-controller/pkg/ovn/egressip_test.go:13927-13945
Timestamp: 2025-10-23T14:10:26.595Z
Learning: In ovn-kubernetes/go-controller/pkg/ovn/egressip_test.go unit tests (e.g., the "Sync/remove invalid next hop from LRP" cases), it is acceptable to use the same mask value for both IPv4 and IPv6 in annotations/fixtures; do not require family-correct masks (e.g., /64 for v6) in these tests.

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-10-20T15:07:49.533Z
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5606
File: go-controller/pkg/ovn/egressip_test.go:11388-11395
Timestamp: 2025-10-20T15:07:49.533Z
Learning: In interconnect (IC) multi-zone scenarios in ovn-kubernetes, EgressIP reroute LRPs (priority types.EgressIPReroutePriority) and related SNATs for pods scheduled on remote-zone nodes are programmed only in that remote zone’s OVN NB database. The global/local zone NB DB should not expect those remote LRPs/SNATs. Applies to go-controller/pkg/ovn/egressip_test.go tests that mark nodes with "global" vs "remote".

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-09-03T09:38:27.723Z
Learnt from: pperiyasamy
PR: ovn-kubernetes/ovn-kubernetes#5555
File: go-controller/pkg/ovn/egressip_test.go:11335-11351
Timestamp: 2025-09-03T09:38:27.723Z
Learning: In ovn-kubernetes Go tests (e.g., go-controller/pkg/ovn/egressip_test.go), any goroutine that uses Gomega assertions should call defer ginkgo.GinkgoRecover() at the top so assertion panics are captured by Ginkgo.

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-10-09T12:23:01.462Z
Learnt from: npinaeva
PR: ovn-kubernetes/ovn-kubernetes#5561
File: go-controller/pkg/ovn/egressip.go:3256-3304
Timestamp: 2025-10-09T12:23:01.462Z
Learning: In go-controller/pkg/ovn/egressip.go, EgressIP reroute policies (priority types.EgressIPReroutePriority) are created via createReroutePolicyOps() using getEgressIPLRPReRouteDbIDs(..., controller = e.controllerName). Therefore, predicates updating these LRPs should match ExternalIDs[OwnerControllerKey] against e.controllerName (not a network-scoped controller name).

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
📚 Learning: 2025-08-08T10:03:01.147Z
Learnt from: ricky-rav
PR: ovn-kubernetes/ovn-kubernetes#5387
File: test/e2e/route_advertisements.go:677-678
Timestamp: 2025-08-08T10:03:01.147Z
Learning: In ovn-kubernetes test/e2e/route_advertisements.go (Go, e2e tests), maintainers (per ricky-rav on PR #5387) prefer not to refactor existing variable reuse (e.g., reusing `pod`/`svc` for multiple pods/services) or add node-pinning in unrelated PRs. Suggestions about such refactors should be deferred to a follow-up issue rather than requested in the current feature PR.

Applied to files:

  • go-controller/pkg/ovn/egressip_test.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build-PR
  • GitHub Check: Lint
🔇 Additional comments (1)
go-controller/pkg/ovn/egressip_test.go (1)

7977-7977: LGTM: Consistent pending state tracking.

The updated assertions correctly validate the new behavior where egressStatuses.statusMap entries are explicitly tracked with egressStatusStatePending values instead of empty strings during cache operations.

Also applies to: 7990-7991, 8017-8018, 8083-8083

@pperiyasamy

Copy link
Copy Markdown
Contributor Author

saw a flake with EgressFirewall E2E test, unrelated to this PR changes. created an issue #5693.

@jcaamano jcaamano merged commit 1667a51 into ovn-kubernetes:master Oct 28, 2025
96 of 97 checks passed
@github-project-automation github-project-automation Bot moved this from Todo to Done in OVN Kubernetes Project Oct 28, 2025
@openshift-merge-robot

Copy link
Copy Markdown

Fix included in accepted release 4.21.0-0.nightly-2025-11-03-191704

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/unit-testing Issues related to adding/updating unit tests feature/egress-ip Issues related to EgressIP feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants