OCPBUGS-62859, OCPBUGS-59680: [release-4.18] DownStream Merge Sync from 4.19 [09-29-2025] by kyrtapz · Pull Request #2768 · openshift/ovn-kubernetes

kyrtapz · 2025-09-29T12:31:43Z

This is a sync from 4.19 up to #2704 with a manually cherry-picked commit to fix the GCP issue.

Signed-off-by: Jitse Klomp <jitse.klomp@conclusionxforce.nl>

Fix node update check for network cluster controller

During update node events, local and remote addOrUpdate functions are called. There are a series of sync checks used to know what to configure. However, in some cases log messages were being printed no matter what, and hybrid overlay was being processed on every node event. This cleans things up so that hybrid overlay is only sync'ed when necessary, and logs are only printed when work is being done to add the local or remote node. Also, removes an old test case for hybrid overlay where the node-subnets annotation of a node was being removed. First introduced here: ovn-kubernetes/ovn-kubernetes@aef135c#diff-9ab180ea9a39f81dc8334a00ca8ea5e4cd04f9491c27dcfd910b07929c9ddbb5R193 It's not totally clear what the purpose of this test was, but we do not support clearing OVN configuration when OVNK assigned annotations are removed by the user. The node-subnets annotation should not be removed, and if is removed, it should be configured back onto the node by cluster-manager. Signed-off-by: Tim Rozet <trozet@redhat.com>

When remote nodes are added (as new UDNs are created) the first remote add always fails. This is because the controller is waiting for the subnets annotation to be updated for the network. However, it only partially fails. It fails when the routes are attempting to be added, but this is after the logical switch port logic and some other parsing has already been done. Rather than execute this work twice, just bail early if the node does not have all of the annotations yet. This way we can execute the majority of the work only one time. With this change, only once all annotations are present will you see: "Creating interconnect resources for remote zone node" Signed-off-by: Tim Rozet <trozet@redhat.com>

Just execute the 2 route adds in the same txn Signed-off-by: Tim Rozet <trozet@redhat.com>

When a CUDN/UDN is create with joinSubnets field configured it should generate the net-attach-def with `joinSubnet` field, the code was using `joinSubnets` wich is not undertood by ovn-kubernetes. Signed-off-by: Enrique Llorente <ellorent@redhat.com>

Configures ephemeral port range for OVN SNAT'ing

udn: Fix NAD template for join subnets field

… module Signed-off-by: Alin Gabriel Serdean <aserdean@nvidia.com>

workflow: Add fix missing and apt update before trying to install VRF…

So that ginkgo times out first and we get useful output. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

We have a flow [1] to prevent leaking traffic towards a ClusterIP. However we also have a flow to prevent EIP traffic to egress before being SNATed and an additional flow to actually allow the traffic to egress in ICNI/BGP scenarios for pods on the nodes subnet [2]. The higher priority of flow [2] prevents flow [1] to be in effect. Bump priority of flow [1] since there is no case where we should leak traffic towards ClusterIPs. [1] cookie=0xdeff105, duration=492.235s, table=0, n_packets=0, n_bytes=0, priority=105,ipv6,in_port="patch-breth0_ov",ipv6_dst=fd00:10:96::/112 actions=drop [2] cookie=0xdeff105, duration=2308.615s, table=0, n_packets=4, n_bytes=376, priority=109,ipv6,in_port="patch-breth0_ov",dl_src=96:b0:34:18:12:7c,ipv6_src=fd00:10:244:1::/64 actions=ct(commit,zone=64000,exec(load:0x1->NXM_NX_CT_MARK[])),output:eth0 cookie=0xdeff105, duration=1991.854s, table=0, n_packets=0, n_bytes=0, priority=104,ipv6,in_port="patch-breth0_ov",ipv6_src=fd00:10:244::/48 actions=drop Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Change configuration in preparation for running all control plane tests: * Make both dualstack, not much value testing IPv4 single stack * Make one of the lanes noSnatGW to get signal from that as well * Enable multicast and empty LB events * Configure host to be able to route to networks from the external world * Ensure frr container is not able to route through the host/runner Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Skip those test that wouldn't be supported or otherwise require additional work. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Run all control plane tests for bgp lane

This PR adds rules to prevent SNAT if source IP belongs to the mgmtport-no-snat-subnets-v4 or mgmtport-no-snat-subnets-v6 sets, which store IPv4 and IPv6 subnets, respectively. Signed-off-by: Yossi Boaron <yboaron@redhat.com>

Currently traffic gets SNATed at ovn-k8s-mp0 within the mgmtport-snat chain. Since OVNK has transitioned to nftables, this behavior can no longer be overridden. Previously, with iptables, SNAT could be avoided by adding a higher-priority rule in the POSTROUTING chain. However, with nftables, all rules are evaluated before making a final decision, making it impossible to skip SNAT. Some applications, like Submariner, need to preserve the source IP when traffic reaches the destination pod, as certain use cases depend on it. This PR Update mgmtport-no-snat-subnets-v4 and mgmtport-no-snat-subnets-v6 nftables set based on node's annotation values. Signed-off-by: Yossi Boaron <yboaron@redhat.com>

Signed-off-by: Yossi Boaron <yboaron@redhat.com>

Signed-off-by: thisisobate <obasiuche62@gmail.com>

Some quality of life improvements for layer 3 controllers node handling

Everytime the node updates it is triggering addEgressNode, which does a route add operation libovsdb txn for default network and every UDN, initiated from the default controller egress node logic. Only runs when needed now. Signed-off-by: Tim Rozet <trozet@redhat.com>

This is unnecessary because there is another UDN path that will call this code: secondary_layer2/3_controller -> addUpdateLocalNodeEvent -> ensureRouterPoliciesForNetwork -> CreateDefaultRouteToExternal Signed-off-by: Tim Rozet <trozet@redhat.com>

Signed-off-by: Tim Rozet <trozet@redhat.com>

This function is called from many different threads. Relying on nbdb for the GR IP is not safe here, as the GR IP could be changing due to a k8s event, and the route will be wrongly configured with an old IP still in OVN NBDB. Signed-off-by: Tim Rozet <trozet@redhat.com>

chore: update footer with new LF trademark disclaimer

Optimize egress ip performance with UDNs

Enable SNAT bypass in mgmtport-snat chain for specified subnets

OCPBUGS-55098: DownStream Merge [06-04-2025]

tssurya · 2025-10-02T16:36:49Z

/label backport-risk-assessed
what could go wrong with only 260 commits?? :)
@jluhrsen: next time no matter what we cant let it grow to this much :(
brace for impact - let's hope we get two green nightlies
payloads and CI looking good - that's the signal on which I am basing my risk-assess on. Also this code has had enough soak time in 4.19 and 4.20 for months now. should be good

openshift-ci · 2025-10-02T17:10:31Z

@kyrtapz: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-aws-ovn-hypershift-kubevirt	`dc24c60`	link	false	`/test e2e-aws-ovn-hypershift-kubevirt`
ci/prow/security	`dc24c60`	link	false	`/test security`
ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw-techpreview	`dc24c60`	link	false	`/test e2e-metal-ipi-ovn-dualstack-bgp-local-gw-techpreview`
ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-techpreview	`dc24c60`	link	false	`/test e2e-metal-ipi-ovn-dualstack-bgp-techpreview`
ci/prow/e2e-openstack-ovn	`dc24c60`	link	false	`/test e2e-openstack-ovn`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

jluhrsen · 2025-10-02T21:42:01Z

/retitle OCPBUGS-59680: [release-4.18] DownStream Merge Sync from 4.19 [09-29-2025]

openshift-ci-robot · 2025-10-02T21:42:11Z

@kyrtapz: This pull request references Jira Issue OCPBUGS-59680, which is valid.

7 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.18.z) matches configured target version for branch (4.18.z)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
release note type set to "Release Note Not Required"
dependent bug Jira Issue OCPBUGS-59350 is in the state Closed (Done-Errata), which is one of the valid states (VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA))
dependent Jira Issue OCPBUGS-59350 targets the "4.19.0" version, which is one of the valid target versions: 4.19.0, 4.19.z
bug has dependents

Requesting review from QA contact:
/cc @anuragthehatter

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

This is a sync from 4.19 up to #2704 with a manually cherry-picked commit to fix the GCP issue.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

jluhrsen · 2025-10-02T21:45:34Z

@jluhrsen: next time no matter what we cant let it grow to this much :(

there was a bug exposed w the small merge that we did not want to introduce in to 4.18. the fix had to go upstream
first, then d/s merge and then to 4.19 and finally to 4.18. That is what did this to us. I don't see a way around that in
the future. hopefully this is a rare circumstance.

brace for impact - let's hope we get two green nightlies payloads and CI looking good - that's the signal on which I am basing my risk-assess on. Also this code has had enough soak time in 4.19 and 4.20 for months now. should be good

I will track the nightlies

@jluhrsen does the PR look good to you? Would you mind linking the appropriate bugs if you are aware of which ones are in?

honestly the only one I really know is the GCP bug and I added that to the title

jluhrsen · 2025-10-02T21:45:51Z

/verified

openshift-ci-robot · 2025-10-02T21:45:55Z

@jluhrsen: The /verified command must be used with one of the following actions: by, later, remove, or bypass. See https://docs.ci.openshift.org/docs/architecture/jira/#premerge-verification for more information.

Details

In response to this:

/verified

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

jluhrsen · 2025-10-02T21:47:14Z

/verified later @jluhrsen

not sure what I'm doing here, but seeing if this works.

openshift-ci-robot · 2025-10-02T21:47:27Z

@jluhrsen: This PR has been marked to be verified later by @jluhrsen.

Details

In response to this:

/verified later @jluhrsen

not sure what I'm doing here, but seeing if this works.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-10-02T21:52:43Z

@kyrtapz: Jira Issue OCPBUGS-59680: Some pull requests linked via external trackers have merged:

The following pull request, linked via external tracker, has not merged:

openshift/ovn-kubernetes#2663 is open

All associated pull requests must be merged or unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-59680 has not been moved to the MODIFIED state.

This PR is marked as verified-later. Jira issue(s) in the title of this PR will require post-merge verification. After testing, it must be manually moved to the VERIFIED state.

Details

In response to this:

This is a sync from 4.19 up to #2704 with a manually cherry-picked commit to fix the GCP issue.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

jluhrsen · 2025-10-08T20:48:39Z

/retitle OCPBUGS-62859, OCPBUGS-59680: [release-4.18] DownStream Merge Sync from 4.19 [09-29-2025]

openshift-ci-robot · 2025-10-08T20:48:49Z

@kyrtapz: Jira Issue OCPBUGS-62859 is in an unrecognized state (Verified) and will not be moved to the MODIFIED state.

Jira Issue OCPBUGS-59680 is in an unrecognized state (Verified) and will not be moved to the MODIFIED state.

Details

In response to this:

This is a sync from 4.19 up to #2704 with a manually cherry-picked commit to fix the GCP issue.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

jitseklomp and others added 30 commits June 3, 2025 12:59

Add custom_fences config to mkdocs.yml

07973c3

Signed-off-by: Jitse Klomp <jitse.klomp@conclusionxforce.nl>

Merge pull request #5275 from trozet/fix_node_update

b31c67a

Fix node update check for network cluster controller

Minor improvement to route add for remote zone nodes

98518ea

Just execute the 2 route adds in the same txn Signed-off-by: Tim Rozet <trozet@redhat.com>

Merge pull request #5265 from trozet/specify_ovn_ephemeral_port_range

2017ede

Configures ephemeral port range for OVN SNAT'ing

Merge remote-tracking branch 'ovn-org/master' into d/s-merge-06-04-2025

4699b44

Merge pull request #5279 from qinqon/udn-fix-join-subnet-typo

06acc8d

udn: Fix NAD template for join subnets field

workflow: Add fix missing and apt update before trying to install VRF…

399915a

… module Signed-off-by: Alin Gabriel Serdean <aserdean@nvidia.com>

Merge pull request #5287 from aserdean/try_fix_ci

c7a47d1

workflow: Add fix missing and apt update before trying to install VRF…

Align e2e test timeouts

575f3c0

So that ginkgo times out first and we get useful output. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Run almost all control plane tests in BGP lanes

90b88fa

Skip those test that wouldn't be supported or otherwise require additional work. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Fix HO test flake

f84d3f3

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Merge pull request #5211 from jcaamano/all-control-plane-tests-with-bgp

15a2c63

Run all control plane tests for bgp lane

Add dontSNAT subnets rules to mgmtport-snat

9554ba6

This PR adds rules to prevent SNAT if source IP belongs to the mgmtport-no-snat-subnets-v4 or mgmtport-no-snat-subnets-v6 sets, which store IPv4 and IPv6 subnets, respectively. Signed-off-by: Yossi Boaron <yboaron@redhat.com>

Unit tests for node ingress snat exclude annotation

182ba9c

Signed-off-by: Yossi Boaron <yboaron@redhat.com>

chore: update footer with new LF trademark disclaimer

cb32656

Signed-off-by: thisisobate <obasiuche62@gmail.com>

Merge pull request #5278 from trozet/node_update_improvements

d9ba339

Some quality of life improvements for layer 3 controllers node handling

L2 and L3 UDN should reconfigure reroute policies when join IP changes

a008345

Signed-off-by: Tim Rozet <trozet@redhat.com>

Merge pull request #5294 from thisisobate/trademark-update-2

171e40b

chore: update footer with new LF trademark disclaimer

Merge pull request #5286 from trozet/optimize_egress_ip

8223f18

Optimize egress ip performance with UDNs

Merge pull request #5113 from yboaron/dont_snat_marked_traffic

b92bff1

Enable SNAT bypass in mgmtport-snat chain for specified subnets

Merge pull request openshift#2618 from jluhrsen/d/s-merge-06-04-2025

36e5a1d

OCPBUGS-55098: DownStream Merge [06-04-2025]

openshift-ci bot added the backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. label Oct 2, 2025

openshift-ci bot assigned anuragthehatter, asood-rh, barbora137, huiran0826, jechen0648, Meina-rh, mffiedler, qiowang721 and rbbratta Oct 2, 2025

openshift-ci bot changed the title ~~[release-4.18] DownStream Merge Sync from 4.19 [09-29-2025]~~ OCPBUGS-59680: [release-4.18] DownStream Merge Sync from 4.19 [09-29-2025] Oct 2, 2025

openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Oct 2, 2025

openshift-ci bot requested a review from anuragthehatter October 2, 2025 21:42

openshift-ci-robot added verified-later verified Signifies that the PR passed pre-merge verification criteria labels Oct 2, 2025

openshift-merge-bot bot merged commit 82307af into openshift:release-4.18 Oct 2, 2025
41 of 46 checks passed

openshift-ci bot changed the title ~~OCPBUGS-59680: [release-4.18] DownStream Merge Sync from 4.19 [09-29-2025]~~ OCPBUGS-62859, OCPBUGS-59680: [release-4.18] DownStream Merge Sync from 4.19 [09-29-2025] Oct 8, 2025

Comments

Conversation

kyrtapz commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tssurya commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Oct 2, 2025

Uh oh!

jluhrsen commented Oct 2, 2025

Uh oh!

openshift-ci-robot commented Oct 2, 2025

Uh oh!

jluhrsen commented Oct 2, 2025

Uh oh!

jluhrsen commented Oct 2, 2025

Uh oh!

openshift-ci-robot commented Oct 2, 2025

Uh oh!

jluhrsen commented Oct 2, 2025

Uh oh!

openshift-ci-robot commented Oct 2, 2025

Uh oh!

Uh oh!

openshift-ci-robot commented Oct 2, 2025

Uh oh!

jluhrsen commented Oct 8, 2025

Uh oh!

openshift-ci-robot commented Oct 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

kyrtapz commented Sep 29, 2025 •

edited

Loading

tssurya commented Oct 2, 2025 •

edited

Loading