TEST ANNOTATIONS FAILED! NO-JIRA: DownStream Merge [02-25-2026] by openshift-pr-manager[bot] · Pull Request #3003 · openshift/ovn-kubernetes

openshift-pr-manager · 2026-02-25T20:31:58Z

Automated merge of upstream/master → master.

Signed-off-by: Dan Winship <danwinship@redhat.com>

Remove the temporary migration code that was added in 2023 to support the transition to OVN Interconnect (IC) architecture. This HACK code tracked whether remote zone nodes had completed migration using the "k8s.ovn.org/remote-zone-migrated" annotation. This code is no longer needed. Changes: - Remove OvnNodeMigratedZoneName constant and helper functions (SetNodeZoneMigrated, HasNodeMigratedZone, NodeMigratedZoneAnnotationChanged) - Remove migrated field from nodeInfo struct in node_tracker.go - Simplify isLocalZoneNode() in base_network_controller.go and egressip.go - Remove HACK helper functions (checkOVNSBNodeLRSR, fetchLBNames, lbExists, portExists) and migration sync flow from default_node_network_controller.go - Remove remote-zone-migrated annotation from webhook allowed annotations - Update tests to remove references to the migration annotation Assisted by Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>

Ignore whitespace differences. Sort the output back into the "correct" order. Signed-off-by: Dan Winship <danwinship@redhat.com>

Signed-off-by: Soli0222 <github@str08.net>

Signed-off-by: Jean Chen <jechen@redhat.com>

Replace the custom HTTP server in StartMetricsServer with MetricServer. Signed-off-by: Lei Huang <leih@nvidia.com>

Add PodSecurity compliance to util.go

Remove IC zone migration HACK code

The layer2 UDN cleanup tests for IC clusters were failing because of a zone mismatch between the controller and the test node: - Controller zone: read from NBGlobal.Name ("global") - Node zone: set via annotation ("test" when IC enabled) This mismatch was previously masked in two spots: 1. The HACK in isLocalZoneNode() (removed by commit 7d408c1): When the controller's zone was "global" (the default), the HACK bypassed the zone comparison entirely and instead checked whether the node had a migration annotation. Since the test node had no migration annotation, it was treated as local despite the zone mismatch. 2. Unconditional gateway cleanup in deleteNodeEvent (changed by commit 8725a93 to only cleanup nodes tracked in localZoneNodes) With both items above removed/changed, the test correctly fails because the node is treated as remote (zones don't match), so it's not added to localZoneNodes, and cleanup is skipped. Fix the test by: - using setupConfig() to set config.Default.Zone to testICZone when IC is enabled - setting NBGlobal.Name to config.Default.Zone (which setupConfig() already configured correctly) This ensures the controller and node are in the same zone, so the node is correctly treated as local and its gateway entities are cleaned up. 🤖 Assisted by [Claude Code](https://claude.com/claude-code) Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>

Fix IC cluster cleanup tests zone configuration

When an EndpointSlice for a UDP NodePort or loadbalancer type of service is updated, stale conntrack entries for removed endpoints must be flushed. The existing logic failed to do this correctly if the backend pod was on a different node. This patch fixes the issue by flushing conntrack entries by filtering the nodePort when the node is not hosting the backend pod. In case that the backend pod was on the same node as the service, this issue won't happen. Since all old pod entries are removed from the node by the function deletePodConntrack when the pod is deleted. Signed-off-by: Peng Liu <pliu@redhat.com>

It should be able to preserve UDP traffic when server pod cycles for a NodePort service via a different node. Signed-off-by: Peng Liu <pliu@redhat.com>

Signed-off-by: Tim Rozet <trozet@nvidia.com>

Even though kind-helm.sh was building ovn-kubernetes, it was pointing to an upstream image and never using the built image without override. Align with kind.sh where by default it uses image built. Move image functions from kind.sh to kind-common to have single functions used by both. Signed-off-by: Tim Rozet <trozet@nvidia.com>

Signed-off-by: Tim Rozet <trozet@nvidia.com>

Would have modified an existing lane, but kind-helm doesn't support IPv6 yet. Will consolidate later. Signed-off-by: Tim Rozet <trozet@nvidia.com>

Fix spelling error in function name. Signed-off-by: Tim Rozet <trozet@nvidia.com>

Signed-off-by: Tim Rozet <trozet@nvidia.com>

There was duplication of a lot of variables. Move them to kind-common.sh. Signed-off-by: Tim Rozet <trozet@nvidia.com>

get_image/tag methods take an argument, but never actually pass an argument in thier usage. They are only used in one place and it is basically a single operation, so just remove these useless methods. Signed-off-by: Tim Rozet <trozet@nvidia.com>

Signed-off-by: Tim Rozet <trozet@nvidia.com>

Cluster manager RBAC was missing this permission to Get FRR Configurations. Signed-off-by: Tim Rozet <trozet@nvidia.com>

kind-helm had its own version, lets just use he one from kind.sh. Signed-off-by: Tim Rozet <trozet@nvidia.com>

Multihoming was already being totally skipped for ipv6. Skip only the ipv6 and dual stack tests for ipv4. Signed-off-by: Tim Rozet <trozet@nvidia.com>

Signed-off-by: Tim Rozet <trozet@nvidia.com>

Move the labeling and taint removal into a common function used by kind and kind-helm. Ensure the HA labeling is only done when OVN_HA is true. Check whether or not to do taint removal for scheduling regardless. Signed-off-by: Tim Rozet <trozet@nvidia.com>

Extend the SpecGetter interface with GetTransport() and GetEVPNConfiguration() methods to access EVPN fields from ClusterUserDefinedNetwork specs. Add renderEVPNConfig() to translate EVPN configuration from the CUDN API to the CNI NetConf format. Signed-off-by: Matteo Dallaglio <mdallagl@redhat.com>

Implement VID (VLAN ID) allocation for EVPN networks to enable the Linux bridge to map traffic to the correct VNI on the Single VXLAN Device (SVD) architecture. Changes: - Add vidAllocator to UDN controller for cluster-wide VID allocation - Allocate one VID per VRF (MAC-VRF and IP-VRF can have different VIDs) - Add VID field to VRFConfig in CNI types - Implement VID recovery on controller restart from existing NADs - Release VIDs when CUDN is deleted - Expose VID via NetInfo interface (EVPNMACVRFVID, EVPNIPVRFVID) VIDs are allocated in range 1-4094 and stored in the NAD config. ======================================================================== EVPN VID (VLAN ID) Lifecycle ======================================================================== VIDs are cluster-wide unique identifiers allocated to EVPN networks for use as VXLAN Network Identifiers in the data plane. Each VRF (MAC-VRF or IP-VRF) in an EVPN network requires its own VID, so a symmetric IRB network uses 2 VIDs. Allocation Keys: - MAC-VRF: "{networkName}/macvrf" - IP-VRF: "{networkName}/ipvrf" Lifecycle: 1. ALLOCATION: When a CUDN/UDN with EVPN transport is reconciled, VIDs are allocated via allocateEVPNVIDsIfNeeded() and stored in the NAD's JSON config. The id.Allocator is idempotent - calling AllocateID with the same key returns the previously allocated VID. 2. PERSISTENCE: VIDs are persisted in the NAD spec.config JSON field. The in-memory allocator is not persistent across controller restarts. 3. RECOVERY: On controller startup, recoverEVPNVIDs() re-reserves VIDs in the allocator using NetworkManager's cached NetInfo (which has already parsed all NADs). This ensures VID consistency after restarts. 4. RELEASE: When a CUDN/UDN is deleted, releaseVIDForNetwork() frees both the MAC-VRF and IP-VRF VIDs (if allocated) back to the pool. Design Decision - Why VID persisted in NAD spec.config over annotations/labels: - Annotations were considered for faster recovery but rejected: 1. CNI plugin on nodes needs VID in spec.config anyway 2. Two copies (annotation + NetConf) creates sync/drift risk 3. Recovery uses NetworkManager's cache (already parsed), so no startup parsing overhead to optimize away - CUDN status was rejected: users have copied objects with status populated causing conflicts; VID isn't user-facing info Recovery Failure Handling: - If VID recovery fails for a CUDN (e.g., NAD not in NetworkManager cache, VID conflict), the error is logged and the CUDN is enqueued for reconciliation - startup does NOT fail. - This prevents DoS: a malicious/corrupted NAD cannot crash the entire cluster-manager. - During reconciliation, if the NAD exists with a valid VID, the allocator's idempotency ensures the same VID is re-allocated. Thread Safety: - The id.Allocator uses per-key locking, making concurrent allocations safe. - Controllers use Threadiness:1, so reconciliations for the same resource are serialized. ======================================================================== Signed-off-by: Matteo Dallaglio <mdallagl@redhat.com>

Assisted-by: opus (claude-opus-4-5-20251101) Signed-off-by: Ihar Hrachyshka <ihrachyshka@nvidia.com>

docs: Add section on how to debug coredumps from non-go binaries

When --metrics-bind-address and --ovn-metrics-bind-address are same, emit both ovnkube and OVN/OVS metrics from a single endpoint. Signed-off-by: Lei Huang <leih@nvidia.com>

Allow emitting metrics on a single endpoint

Trivial E2E egress IP fixes

Update OVN observability documentation

Signed-off-by: Tim Rozet <trozet@nvidia.com>

Configures stale ICMP allow ACLs, then starts up and verifies with the config knob off, that the ACLs are removed. Signed-off-by: Tim Rozet <trozet@nvidia.com>

Nutanix (Builds Flow CNI on OVN-Kubernetes, integrated with Nutanix Flow and VPC networking) Signed-off-by: Amin Aflatoonian <8513427+Aminiok@users.noreply.github.com>

Add Nutanix to OVN-Kubernetes adopters page

In UDNs, goroutines are started for some controllers like NetworkQoS where a waitgroup is used to add a reference to the goroutine, and then stopChan is passed as a mechanism to shutdown the NetworkQoS controller. If the UDN controller starts, and then shutsdown very quickly, the stopChan is closed and reset to nil. It is set to nil as a pattern we use to guard multiple Stop calls to the UDN controller (Stop may be called multiple times). However, if the NetworkQoS goroutine does not finish starting before the stopChan is closed and reset to nil, then by the time NetworkQos gets to read stopChan, it will hang forever, causing the UDN controller waitgroup to wait forever. This will deadlock the entire network manager from being able to start/stop anymore UDN controllers! We can see this behavior in CI here: I0223 04:37:24.677192 77 network_controller.go:415] [zone-nad-controller network controller]: sync network wpnhc_tenant-blue I0223 04:37:24.677203 77 localnet_user_defined_network_controller.go:311] Stoping controller for UDN wpnhc_tenant-blue I0223 04:37:24.677209 77 base_secondary_layer2_network_controller.go:39] Stop secondary localnet network controller of network wpnhc_tenant-blue I0223 04:37:24.677241 77 obj_retry.go:473] Stop channel got triggered: will stop retrying failed objects of type *v1.Namespace I0223 04:37:24.677250 77 network_qos_controller.go:215] Starting controller wpnhc_tenant-blue-network-controller I0223 04:37:24.677256 77 network_qos_controller.go:218] Waiting for informer caches (networkqos,namespace,pod,node) to sync I0223 04:37:24.677263 77 obj_retry.go:473] Stop channel got triggered: will stop retrying failed objects of type *v1beta2.MultiNetworkPolicy I0223 04:37:24.677270 77 shared_informer.go:349] "Waiting for caches to sync" controller="wpnhc_tenant-blue-network-controller" I0223 04:37:24.677339 77 shared_informer.go:356] "Caches are synced" controller="wpnhc_tenant-blue-network-controller" There is never a "finished syncing network wpnhc_tenant-blue" log again after this for zone-nad-controller, nor any other networks for that matter after this point in the log. However, there are logs for node-nad-controller as it did not hit this race. To fix this, pass a copy of the oc.stopChan to the goroutines. Channels are copied as a reference so closing the oc.stopChan still closes the copy, and we can still allow oc.stopChan to be set to nil as a Stop guard. Signed-off-by: Tim Rozet <trozet@nvidia.com>

Fixes: - #6014 - ovn-kubernetes/ovn-kubernetes#6014 Signed-off-by: Andrés Hernández <tonejito@comunidad.unam.mx>

Fix UDN network controller deadlock due to stopChan nil race

Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>

This reverts commit 3c67139.

Documentation: UserDefinedNetwork Markdown does not render properly

Allow ICMP and ICMPv6 regardless of network policy

openshift-pr-manager · 2026-02-25T20:32:00Z

/hold
Run: go mod vendor && ./openshift/hack/update-tests-annotation.sh

openshift-ci-robot · 2026-02-25T20:32:01Z

@openshift-pr-manager[bot]: This pull request explicitly references no jira issue.

Details

In response to this:

Automated merge of upstream/master → master.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2026-02-25T20:32:52Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: openshift-pr-manager[bot]
Once this PR has been reviewed and has the lgtm label, please assign abhat for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2026-02-25T20:33:03Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

danwinship and others added 30 commits January 19, 2026 15:18

Use nft destroy to simplify the UDN cleanup code

809ad11

Signed-off-by: Dan Winship <danwinship@redhat.com>

Improve nodenft.MatchNFTRules behavior

1d5e616

Ignore whitespace differences. Sort the output back into the "correct" order. Signed-off-by: Dan Winship <danwinship@redhat.com>

parallelize multi-platform Docker image builds

3e6b2a3

Signed-off-by: Soli0222 <github@str08.net>

use native arm64 runners instead of QEMU emulation

9e1889e

Signed-off-by: Soli0222 <github@str08.net>

enable fail-fast to prevent partial releases

2c9f051

Signed-off-by: Soli0222 <github@str08.net>

Add PodSecurity compliance to util.go

d3b87ad

Signed-off-by: Jean Chen <jechen@redhat.com>

Unify the metrics servers used by ovnkube-node and OVS/OVN metrics

0974f2e

Replace the custom HTTP server in StartMetricsServer with MetricServer. Signed-off-by: Lei Huang <leih@nvidia.com>

Merge pull request #5862 from jechen0648/add-pod-security

ee8c0b9

Add PodSecurity compliance to util.go

Merge pull request #5865 from ricky-rav/OCPBUGS-70130

50a6412

Remove IC zone migration HACK code

Merge pull request #5889 from ricky-rav/testICUpgradeRemoval

9156a82

Fix IC cluster cleanup tests zone configuration

Add a e2e test for NodePort service

ea55b70

It should be able to preserve UDP traffic when server pod cycles for a NodePort service via a different node. Signed-off-by: Peng Liu <pliu@redhat.com>

Add missing BGP support to kind-helm.sh

e86ea08

Signed-off-by: Tim Rozet <trozet@nvidia.com>

Add shell extension to kind-common

4330306

Signed-off-by: Tim Rozet <trozet@nvidia.com>

Enable bgp helm lane

13bcba2

Would have modified an existing lane, but kind-helm doesn't support IPv6 yet. Will consolidate later. Signed-off-by: Tim Rozet <trozet@nvidia.com>

Add frr install to helm as well

d30130a

Fix spelling error in function name. Signed-off-by: Tim Rozet <trozet@nvidia.com>

Consolidate BGP related params into kind-common.sh

aaea191

Signed-off-by: Tim Rozet <trozet@nvidia.com>

Consolidate variables between kind-helm.sh and kind.sh

80d5ce4

There was duplication of a lot of variables. Move them to kind-common.sh. Signed-off-by: Tim Rozet <trozet@nvidia.com>

Remove stale methods from kind-helm.sh

ed019b6

get_image/tag methods take an argument, but never actually pass an argument in thier usage. They are only used in one place and it is basically a single operation, so just remove these useless methods. Signed-off-by: Tim Rozet <trozet@nvidia.com>

conslidate delete function to kind-common.sh

c26b190

Signed-off-by: Tim Rozet <trozet@nvidia.com>

Add missing Get permissions for RA controller

701ff43

Cluster manager RBAC was missing this permission to Get FRR Configurations. Signed-off-by: Tim Rozet <trozet@nvidia.com>

Unify kind create cluster

4746472

kind-helm had its own version, lets just use he one from kind.sh. Signed-off-by: Tim Rozet <trozet@nvidia.com>

E2E skip mulihoming ipv6 and dualstack for ipv4

1ae9371

Multihoming was already being totally skipped for ipv6. Skip only the ipv6 and dual stack tests for ipv4. Signed-off-by: Tim Rozet <trozet@nvidia.com>

Unify checking depenencies between kind and kind-helm

364317d

Signed-off-by: Tim Rozet <trozet@nvidia.com>

booxter and others added 20 commits February 20, 2026 16:52

docs: Add section on how to debug coredumps from non-go binaries

25a4e05

Assisted-by: opus (claude-opus-4-5-20251101) Signed-off-by: Ihar Hrachyshka <ihrachyshka@nvidia.com>

Merge pull request #6003 from booxter/debug-gdb

a35227a

docs: Add section on how to debug coredumps from non-go binaries

Allow emitting metrics on a single endpoint

bcb7ec9

When --metrics-bind-address and --ovn-metrics-bind-address are same, emit both ovnkube and OVN/OVS metrics from a single endpoint. Signed-off-by: Lei Huang <leih@nvidia.com>

Merge pull request #5884 from l8huang/metrics-one-ep

c270210

Allow emitting metrics on a single endpoint

Merge pull request #5992 from trozet/fix_e2e_egressip_frag

8a2ef28

Trivial E2E egress IP fixes

Merge pull request #5986 from jotak/ovnkobserv-doc

6b86da4

Update OVN observability documentation

Adds kind.sh and helm support for --allow-icmp-network-policy

63468cc

Signed-off-by: Tim Rozet <trozet@nvidia.com>

Adds E2E testing for ICMP NP bypass

69afd47

Signed-off-by: Tim Rozet <trozet@nvidia.com>

Adds unit test for stale ICMP network policy ACL

ca967b8

Configures stale ICMP allow ACLs, then starts up and verifies with the config knob off, that the ACLs are removed. Signed-off-by: Tim Rozet <trozet@nvidia.com>

Add Nutanix to adopters page

d033767

Nutanix (Builds Flow CNI on OVN-Kubernetes, integrated with Nutanix Flow and VPC networking) Signed-off-by: Amin Aflatoonian <8513427+Aminiok@users.noreply.github.com>

Merge pull request #6008 from Aminiok/patch-2

158c9c2

Add Nutanix to OVN-Kubernetes adopters page

docs: user-defined-networks: Fix markdown syntax

9d7b70f

Fixes: - #6014 - ovn-kubernetes/ovn-kubernetes#6014 Signed-off-by: Andrés Hernández <tonejito@comunidad.unam.mx>

docs: user-defined-network: Fix 'l2-UDN' image link

0b82e64

Fixes: - #6014 - ovn-kubernetes/ovn-kubernetes#6014 Signed-off-by: Andrés Hernández <tonejito@comunidad.unam.mx>

Merge pull request #6005 from trozet/fix_stop_channel_race

8c2c758

Fix UDN network controller deadlock due to stopChan nil race

(B)ANP conformance: update framework to use retries

3c67139

Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>

Revert "(B)ANP conformance: update framework to use retries"

15d73b3

This reverts commit 3c67139.

Merge pull request #6015 from tonejito/tonejito-6014-udn-markdown

b254782

Documentation: UserDefinedNetwork Markdown does not render properly

Merge pull request #5247 from trozet/allow_global_icmp_netpol

7e7538d

Allow ICMP and ICMPv6 regardless of network policy

Merge remote-tracking branch 'upstream/master' into d/s-merge-02-25-2026

1a2eca7

openshift-pr-manager Bot changed the title ~~NO-JIRA: DownStream Merge [02-25-2026]~~ TEST ANNOTATIONS FAILED! NO-JIRA: DownStream Merge [02-25-2026] Feb 25, 2026

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 25, 2026

openshift-ci Bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Feb 25, 2026

openshift-pr-manager Bot closed this Feb 25, 2026

openshift-pr-manager Bot deleted the d/s-merge-02-25-2026 branch February 25, 2026 20:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TEST ANNOTATIONS FAILED! NO-JIRA: DownStream Merge [02-25-2026]#3003

TEST ANNOTATIONS FAILED! NO-JIRA: DownStream Merge [02-25-2026]#3003
openshift-pr-manager[bot] wants to merge 166 commits intomasterfrom
d/s-merge-02-25-2026

openshift-pr-manager Bot commented Feb 25, 2026

Uh oh!

openshift-pr-manager Bot commented Feb 25, 2026

Uh oh!

openshift-ci-robot commented Feb 25, 2026

Uh oh!

openshift-ci Bot commented Feb 25, 2026

Uh oh!

openshift-ci Bot commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

openshift-pr-manager Bot commented Feb 25, 2026

Uh oh!

openshift-pr-manager Bot commented Feb 25, 2026

Uh oh!

openshift-ci-robot commented Feb 25, 2026

Uh oh!

openshift-ci Bot commented Feb 25, 2026

Uh oh!

openshift-ci Bot commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants