Skip to content

DNM/TEST: Rebased d/s merge 02 12 2026#3001

Closed
jluhrsen wants to merge 66 commits intoopenshift:masterfrom
jluhrsen:rebased-d/s-merge-02-12-2026
Closed

DNM/TEST: Rebased d/s merge 02 12 2026#3001
jluhrsen wants to merge 66 commits intoopenshift:masterfrom
jluhrsen:rebased-d/s-merge-02-12-2026

Conversation

@jluhrsen
Copy link
Copy Markdown
Contributor

📑 Description

Fixes #

Additional Information for reviewers

✅ Checks

  • My code requires changes to the documentation
  • if so, I have updated the documentation as required
  • My code requires tests
  • if so, I have added and/or updated the tests as required
  • All the tests have passed in the CI

How to verify it

jechen0648 and others added 30 commits February 24, 2026 18:08
Signed-off-by: Jean Chen <jechen@redhat.com>
Remove the temporary migration code that was added in 2023 to support
the transition to OVN Interconnect (IC) architecture. This HACK code
tracked whether remote zone nodes had completed migration using the
"k8s.ovn.org/remote-zone-migrated" annotation.

This code is no longer needed.

Changes:
- Remove OvnNodeMigratedZoneName constant and helper functions
  (SetNodeZoneMigrated, HasNodeMigratedZone, NodeMigratedZoneAnnotationChanged)
- Remove migrated field from nodeInfo struct in node_tracker.go
- Simplify isLocalZoneNode() in base_network_controller.go and egressip.go
- Remove HACK helper functions (checkOVNSBNodeLRSR, fetchLBNames, lbExists,
  portExists) and migration sync flow from default_node_network_controller.go
- Remove remote-zone-migrated annotation from webhook allowed annotations
- Update tests to remove references to the migration annotation

Assisted by Claude Opus 4.5 <noreply@anthropic.com>

Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
The layer2 UDN cleanup tests for IC clusters were failing because of a
zone mismatch between the controller and the test node:
- Controller zone: read from NBGlobal.Name ("global")
- Node zone: set via annotation ("test" when IC enabled)

This mismatch was previously masked in two spots:

1. The HACK in isLocalZoneNode() (removed by commit 7d408c1):
   When the controller's zone was "global" (the default), the HACK
   bypassed the zone comparison entirely and instead checked whether
   the node had a migration annotation. Since the test node had no
   migration annotation, it was treated as local despite the zone
   mismatch.

2. Unconditional gateway cleanup in deleteNodeEvent (changed by
   commit 8725a93 to only cleanup nodes tracked in localZoneNodes)

With both items above removed/changed, the test correctly fails because
the node is treated as remote (zones don't match), so it's not added to
localZoneNodes, and cleanup is skipped.

Fix the test by:
- using setupConfig() to set config.Default.Zone to testICZone when IC
  is enabled
- setting NBGlobal.Name to config.Default.Zone (which setupConfig()
  already configured correctly)

This ensures the controller and node are in the same zone, so the node
is correctly treated as local and its gateway entities are cleaned up.

🤖 Assisted by [Claude Code](https://claude.com/claude-code)

Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
When an EndpointSlice for a UDP NodePort or loadbalancer type of service
is updated, stale conntrack entries for removed endpoints must be
flushed. The existing logic failed to do this correctly if the backend
pod was on a different node. This patch fixes the issue by flushing
conntrack entries by filtering the nodePort when the node is not
hosting the backend pod.

In case that the backend pod was on the same node as the service, this
issue won't happen. Since all old pod entries are removed from the node
by the function deletePodConntrack when the pod is deleted.

Signed-off-by: Peng Liu <pliu@redhat.com>
It should be able to preserve UDP traffic when server pod cycles for a
NodePort service via a different node.

Signed-off-by: Peng Liu <pliu@redhat.com>
Signed-off-by: Tim Rozet <trozet@nvidia.com>
Even though kind-helm.sh was building ovn-kubernetes, it was pointing to
an upstream image and never using the built image without override.
Align with kind.sh where by default it uses image built. Move image
functions from kind.sh to kind-common to have single functions used by
both.

Signed-off-by: Tim Rozet <trozet@nvidia.com>
Signed-off-by: Tim Rozet <trozet@nvidia.com>
Would have modified an existing lane, but kind-helm doesn't support IPv6
yet. Will consolidate later.

Signed-off-by: Tim Rozet <trozet@nvidia.com>
Fix spelling error in function name.

Signed-off-by: Tim Rozet <trozet@nvidia.com>
Signed-off-by: Tim Rozet <trozet@nvidia.com>
There was duplication of a lot of variables. Move them to
kind-common.sh.

Signed-off-by: Tim Rozet <trozet@nvidia.com>
get_image/tag methods take an argument, but never actually pass an
argument in thier usage. They are only used in one place and it is
basically a single operation, so just remove these useless methods.

Signed-off-by: Tim Rozet <trozet@nvidia.com>
Signed-off-by: Tim Rozet <trozet@nvidia.com>
Cluster manager RBAC was missing this permission to Get FRR
Configurations.

Signed-off-by: Tim Rozet <trozet@nvidia.com>
kind-helm had its own version, lets just use he one from kind.sh.

Signed-off-by: Tim Rozet <trozet@nvidia.com>
Multihoming was already being totally skipped for ipv6. Skip only the
ipv6 and dual stack tests for ipv4.

Signed-off-by: Tim Rozet <trozet@nvidia.com>
Signed-off-by: Tim Rozet <trozet@nvidia.com>
Move the labeling and taint removal into a common function used by kind
and kind-helm. Ensure the HA labeling is only done when OVN_HA is true.
Check whether or not to do taint removal for scheduling regardless.

Signed-off-by: Tim Rozet <trozet@nvidia.com>
Extend the SpecGetter interface with GetTransport() and
GetEVPNConfiguration() methods to access EVPN fields from
ClusterUserDefinedNetwork specs.

Add renderEVPNConfig() to translate EVPN configuration from
the CUDN API to the CNI NetConf format.
Signed-off-by: Matteo Dallaglio <mdallagl@redhat.com>
Implement VID (VLAN ID) allocation for EVPN networks to enable the
Linux bridge to map traffic to the correct VNI on the Single VXLAN
Device (SVD) architecture.

Changes:
- Add vidAllocator to UDN controller for cluster-wide VID allocation
- Allocate one VID per VRF (MAC-VRF and IP-VRF can have different VIDs)
- Add VID field to VRFConfig in CNI types
- Implement VID recovery on controller restart from existing NADs
- Release VIDs when CUDN is deleted
- Expose VID via NetInfo interface (EVPNMACVRFVID, EVPNIPVRFVID)

VIDs are allocated in range 1-4094 and stored in the NAD config.

========================================================================
EVPN VID (VLAN ID) Lifecycle
========================================================================
VIDs are cluster-wide unique identifiers allocated to EVPN networks for
use as VXLAN Network Identifiers in the data plane. Each VRF (MAC-VRF or
IP-VRF) in an EVPN network requires its own VID, so a symmetric IRB
network uses 2 VIDs.

Allocation Keys:
  - MAC-VRF: "{networkName}/macvrf"
  - IP-VRF:  "{networkName}/ipvrf"

Lifecycle:
 1. ALLOCATION: When a CUDN/UDN with EVPN transport is reconciled,
    VIDs are allocated via allocateEVPNVIDsIfNeeded() and stored in
    the NAD's JSON config. The id.Allocator is idempotent - calling
    AllocateID with the same key returns the previously allocated VID.
 2. PERSISTENCE: VIDs are persisted in the NAD spec.config JSON field.
    The in-memory allocator is not persistent across controller restarts.
 3. RECOVERY: On controller startup, recoverEVPNVIDs() re-reserves VIDs
    in the allocator using NetworkManager's cached NetInfo (which has
    already parsed all NADs). This ensures VID consistency after restarts.
 4. RELEASE: When a CUDN/UDN is deleted, releaseVIDForNetwork() frees
    both the MAC-VRF and IP-VRF VIDs (if allocated) back to the pool.

Design Decision - Why VID persisted in NAD spec.config over annotations/labels:
  - Annotations were considered for faster recovery but rejected:
    1. CNI plugin on nodes needs VID in spec.config anyway
    2. Two copies (annotation + NetConf) creates sync/drift risk
    3. Recovery uses NetworkManager's cache (already parsed), so no
       startup parsing overhead to optimize away
  - CUDN status was rejected: users have copied objects with status
    populated causing conflicts; VID isn't user-facing info

Recovery Failure Handling:
  - If VID recovery fails for a CUDN (e.g., NAD not in NetworkManager
    cache, VID conflict), the error is logged and the CUDN is enqueued
    for reconciliation - startup does NOT fail.
  - This prevents DoS: a malicious/corrupted NAD cannot crash the
    entire cluster-manager.
  - During reconciliation, if the NAD exists with a valid VID, the
    allocator's idempotency ensures the same VID is re-allocated.

Thread Safety:
  - The id.Allocator uses per-key locking, making concurrent
    allocations safe.
  - Controllers use Threadiness:1, so reconciliations for the same
    resource are serialized.
========================================================================

Signed-off-by: Matteo Dallaglio <mdallagl@redhat.com>
This commit adds a VTEPNotifier and VTEP validation logic for EVPN CUDNs:

- Add VTEPNotifier to watch VTEP CRs and trigger CUDN reconciliation
- Add validateEVPNVTEP to check VTEP existence during CUDN sync
- Report VTEPNotFound status when referenced VTEP doesn't exist
- Add ReconcileVTEP to handle VTEP create/delete events
- Add RBAC permissions for vteps resource
- Add vtepInformer creation in factory when Network Segmentation is enabled

Signed-off-by: Matteo Dallaglio <mdallagl@redhat.com>
Signed-off-by: Matteo Dallaglio <mdallagl@redhat.com>
Add missing RLock to prevent concurrent map access panic when
reading namespaces while AddNADs/DeleteNADs modify the map.

Signed-off-by: Patryk Diak <pdiak@redhat.com>
Generate FRR configuration for networks with EVPN configured:
- Activate EVPN on global router with advertise-all-vni
- MAC-VRF: configure route targets if specified
- IP-VRF: create VRF-to-VNI mappings and per-VRF BGP routers with
  unicast advertisement and route targets
- Create routers for IP-VRF networks if not already selected

Uses raw FRR config until frr-k8s provides typed EVPN API.

Signed-off-by: Patryk Diak <pdiak@redhat.com>
Unify the error messages for better UX.

Signed-off-by: Patryk Diak <pdiak@redhat.com>
Signed-off-by: Yun Zhou <yunz@nvidia.com>
Signed-off-by: Yun Zhou <yunz@nvidia.com>
Signed-off-by: Yun Zhou <yunz@nvidia.com>
Signed-off-by: Yun Zhou <yunz@nvidia.com>
pliurh and others added 19 commits February 24, 2026 18:08
Implements no-overlay mode for the default network, using BGP-based
routing instead of overlay encapsulation for pod-to-pod traffic.

Key changes:
- Refactor Cleanup() to be comprehensive and idempotent - it now calls
  CleanupStaleNodes(nil) internally before deleting the transit switch
- In syncNodes: Call Cleanup() to remove all interconnect resources
  when in no-overlay mode (supports overlay->no-overlay migration)
- Split SyncNodes into EnsureTransitSwitch and CleanupStaleNodes
- Import BGP pod subnet routes in no-overlay mode (ignored in overlay)
- Skip interconnect upgrade hack and adjust MTU validation for no-overlay

Signed-off-by: Peng Liu <pliu@redhat.com>
In no-overlay mode, the cluster needs to accept incoming routes to the
pod subnets from BGP neighbors. This change enhances the
RouteAdvertisements controller to automatically populate the 'ToReceive'
field of BGP neighbors with the network's pod subnets when no-overlay
transport is used.

- Add unit tests for the new functionality.

Signed-off-by: Peng Liu <pliu@redhat.com>
This commit adds end-to-end test coverage for no-overlay mode by
introducing new test lanes in the GitHub Actions workflow matrix.

Changes include:
- Add 4 new test lanes covering no-overlay mode with different
  configurations (shard-conformance and dedicated no-overlay target)
- Configure no-overlay tests with extended timeout (190 minutes)
- Add ENABLE_NO_OVERLAY environment variable for test configuration
- Skip tests for features not applicable in no-overlay mode:
  multicast, egress IP, EgressService, and overlay MTU behavior

The test skips are necessary because no-overlay mode uses the
underlying network infrastructure directly without overlay
encapsulation, making certain overlay-specific features and MTU
reduction behavior not applicable.

Signed-off-by: Peng Liu <pliu@redhat.com>
Add preferred_lft 0 when adding test IP addresses to worker nodes,
ensuring they are marked as deprecated. This prevents the test IP
from being selected as the node's primary gateway IP while still
allowing verification of pod-to-host connectivity.

Without the fix, the addition IP may be selected as the nodeIP
mistakenly, and set to GR. It will fail the test 'test e2e pod
connectivity to host addresses [It] Should validate connectivity
from a pod to a non-node host address on same node'

Signed-off-by: Peng Liu <pliu@redhat.com>
When reconciling AdminNetworkPolicy or BaselineAdminNetworkPolicy,
the controller updates the status condition on every successful sync,
even when the status is already in the desired state. At scale (500-1000
nodes), this causes unnecessary API server load spikes since each zone's
controller repeatedly patches the same CRD.

Add a doesStatusNeedAnUpdate() utility function that compares the existing
condition with the new condition. If Status, Reason, and Message are all
unchanged, skip the ApplyStatus call entirely. This optimization applies
to both ANP and BANP status update paths.

Also move the "Patched status" log messages into the functions that
perform the actual API call, so they only log when an update actually
happens (not when skipped due to no-op).

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Fixes #5900

Signed-off-by: Ihar Hrachyshka <ihrachyshka@nvidia.com>
Assisted-by: opus (claude-opus-4-5-20251101)
- Add CUDN/UDN workload configs
- Fix workload naming
- Add UDN pod workload

Signed-off-by: jtalerico <joe.talerico@gmail.com>
When the hybrid overlay feature is enabled (specifically when hybrid overlay
cluster subnets are configured), the HandleDeleteNode function would return
early after releasing the hybrid overlay subnet. This caused the regular
cluster subnets allocated to the node to never be released, leading to a
subnet leak that eventually exhausts the cluster CIDR pool.

This commit fixes the issue by removing the early return, ensuring that
both the hybrid overlay subnets and the standard node subnets are
properly released upon node deletion.

A new test case TestNodeAllocator_HandleDeleteNode is added to verify
that both types of subnets are correctly released.

Signed-off-by: Aswin Suryanarayanan <asuryan@redhat.com>
…NS names

Signed-off-by: arkadeepsen <arsen@redhat.com>
Signed-off-by: nithyar <nithyar@nvidia.com>
Signed-off-by: nithyar <nithyar@nvidia.com>
Signed-off-by: nithyar <nithyar@nvidia.com>
Signed-off-by: nithyar <nithyar@nvidia.com>
Signed-off-by: nithyar <nithyar@nvidia.com>
Signed-off-by: nithyar <nithyar@nvidia.com>
Signed-off-by: nithyar <nithyar@nvidia.com>
We set options for "requested_chassis" on ports which take either the
SBDB chassis.hostname or the SBDB chassis.name. The chassis.name is
dervied from the node's OVS system-id. It annotated to the node as a
chassis-id annotation. chassis.hostname is set locally by OVN-Controller
when it creates the local SBDB chassis entry when it starts. The
hostname for the local node is derived from an external_id in OVS.
In OVN-Kubernetes we add remote chassis (looking for their chassis-id)
annotation and then set their hostname as their KAPI node name.
The hostname is not used by anything in OVS or OVN, and is a string
value, and not the primary index for the chassis.

The end result is inconsistency, where some objects like NBDB
GatewayChassis rely on chassis-id, while requested_chassis is using
hostname. Using hostname creates further issues where OVS may be running
in DPU and not on the KAPI node itself.

This commit moves everything to use chassis-id. Changes include:

 - Simplify syncChassis to track everything by chassis-id, remove
   checking for hostname.
 - Make requested_chassis usage use chassis-id.
 - Add validation function in libovsdb to make sure no one in the future
   tries to use hostname.
 - Fix unit tests and create consistent fake chassis id from helper
   function.
 - When creating a chassis do not set hostname for local nodes.
   OVN-Controller manages this. Set hostname to be KAPI node name for
   remote nodes.

Some other notes:
1. During chassis update, we do not modify hostname as part of
   libovsdb.
2. When we upgrade, there shouldn't be traffic outage as ports are
   updated for their requested_chassis to be chassis-id, since the
   chassis-id already exists in the SBDB entry.

Signed-off-by: Tim Rozet <trozet@nvidia.com>
With the nad controller tests, it syncsNAD then waits for an Eventually
clause to be satisified that checks if the expected number of networks
started, stopped, etc. Afterwards, it then checks these values are
consistent. The fake network controller uses a controller tcm lock
to store started, deleted networks on reconcile.

A race exists because network controller is an asynchronous reocncile
loop, so although the syncNAD calls in the test can execute, it does not
mean network controller has processed anything. This can lead to false
positives on pass criteria where this kind of event loop happens:

1. Sync NAD (create network), sync nad (delete network), expect running
   (0 networks).
2. Before network controller runs, eventually code gets hit and takes
   tcm lock. Checks that started networks (empty) contains expected
   running networks (empty). Passes eventually checks.
3. Create network processes now that tcm lock is released. Now tcm has 1
   started network, 0 stopped networks.
4. Eventually loop processes. Started networks (1) contains expected
   running networks (empty). Passes due to contains semantics. Stopped
   network check expects 1, but fails. Test flake
5. Delete never go ta chance to run.

To fix this, synchronize syncNetwork calls when syncNAD happens. It's
the only way to truly ensure the actions are completed by the time we
want to test pass/fail criteria.

Fixes: #5948

Signed-off-by: Tim Rozet <trozet@nvidia.com>
Changes-Include:

- Fix `kind` network interface lookup to work with BusyBox and iproute2:
  - Switch from `ip -br -4/-6 a sh` to `ip -o -f inet|inet6 addr show` parsing.
  - Keep interface validation via `ip link show`.
  - Correct IPv6 log message text (`IPv6` instead of `IPv4`).

- Make flow collector startup deterministic per protocol:
  - Start `goflow` with protocol-specific flags so only the target collector is enabled.
  - Avoid mixed startup log lines from unrelated collectors.

- Improve sflow test robustness:
  - Wait for `br-int` sflow row to appear per `ovnkube-node` pod before tuning.
  - Tune `sampling=1 polling=1` after row exists (best-effort if set fails).

- Fix flow log assertion flake:
  - Search all collector log lines for expected keyword instead of only the last line.

- Fix stale/incorrect error handling:
  - Use the actual `ExecWithOptions` returned error for ovs-vsctl checks.
  - Fix variable scoping bug in sflow tuning block (`setErr/setStderr`).

Not 100% sure this will solve the issue, but I ran this over 300 times
in a row and did not see a failure in my setup.

Fixes: #4221

Signed-off-by: Tim Rozet <trozet@nvidia.com>
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Feb 25, 2026

@jluhrsen: This PR was included in a payload test run from openshift/origin#30560
trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-ci-4.22-e2e-aws-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/36ee9f60-11ef-11f1-982c-974ebc5d3f9e-0

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Feb 25, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jluhrsen
Once this PR has been reviewed and has the lgtm label, please assign knobunc for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Jamo Luhrsen <jluhrsen@gmail.com>
using initial 49 NetSeg tests as our set of "blocking" test cases

Signed-off-by: Jamo Luhrsen <jluhrsen@gmail.com>
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Feb 25, 2026

@jluhrsen: This PR was included in a payload test run from openshift/origin#30560
trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-ci-4.22-e2e-aws-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/a17400f0-1294-11f1-8edb-a41b2e738def-0

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Feb 26, 2026

@jluhrsen: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-metal-ipi-ovn-dualstack-bgp fb17f34 link true /test e2e-metal-ipi-ovn-dualstack-bgp
ci/prow/e2e-aws-ovn-edge-zones fb17f34 link true /test e2e-aws-ovn-edge-zones
ci/prow/e2e-metal-ipi-ovn-ipv6 fb17f34 link true /test e2e-metal-ipi-ovn-ipv6
ci/prow/qe-perfscale-payload-control-plane-6nodes fb17f34 link true /test qe-perfscale-payload-control-plane-6nodes
ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw fb17f34 link true /test e2e-metal-ipi-ovn-dualstack-bgp-local-gw
ci/prow/security fb17f34 link false /test security

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.