[release-4.18] OCPBUGS-59680, OCPBUGS-59371,OCPBUGS-48710: DownStream Merge Sync from 4.19 [09-03-2025]#2745
Conversation
Signed-off-by: Jitse Klomp <jitse.klomp@conclusionxforce.nl>
Signed-off-by: Jitse Klomp <jitse.klomp@conclusionxforce.nl>
Fix node update check for network cluster controller
During update node events, local and remote addOrUpdate functions are called. There are a series of sync checks used to know what to configure. However, in some cases log messages were being printed no matter what, and hybrid overlay was being processed on every node event. This cleans things up so that hybrid overlay is only sync'ed when necessary, and logs are only printed when work is being done to add the local or remote node. Also, removes an old test case for hybrid overlay where the node-subnets annotation of a node was being removed. First introduced here: ovn-kubernetes/ovn-kubernetes@aef135c#diff-9ab180ea9a39f81dc8334a00ca8ea5e4cd04f9491c27dcfd910b07929c9ddbb5R193 It's not totally clear what the purpose of this test was, but we do not support clearing OVN configuration when OVNK assigned annotations are removed by the user. The node-subnets annotation should not be removed, and if is removed, it should be configured back onto the node by cluster-manager. Signed-off-by: Tim Rozet <trozet@redhat.com>
When remote nodes are added (as new UDNs are created) the first remote add always fails. This is because the controller is waiting for the subnets annotation to be updated for the network. However, it only partially fails. It fails when the routes are attempting to be added, but this is after the logical switch port logic and some other parsing has already been done. Rather than execute this work twice, just bail early if the node does not have all of the annotations yet. This way we can execute the majority of the work only one time. With this change, only once all annotations are present will you see: "Creating interconnect resources for remote zone node" Signed-off-by: Tim Rozet <trozet@redhat.com>
Just execute the 2 route adds in the same txn Signed-off-by: Tim Rozet <trozet@redhat.com>
When a CUDN/UDN is create with joinSubnets field configured it should generate the net-attach-def with `joinSubnet` field, the code was using `joinSubnets` wich is not undertood by ovn-kubernetes. Signed-off-by: Enrique Llorente <ellorent@redhat.com>
Configures ephemeral port range for OVN SNAT'ing
udn: Fix NAD template for join subnets field
… module Signed-off-by: Alin Gabriel Serdean <aserdean@nvidia.com>
workflow: Add fix missing and apt update before trying to install VRF…
So that ginkgo times out first and we get useful output. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
We have a flow [1] to prevent leaking traffic towards a ClusterIP. However we also have a flow to prevent EIP traffic to egress before being SNATed and an additional flow to actually allow the traffic to egress in ICNI/BGP scenarios for pods on the nodes subnet [2]. The higher priority of flow [2] prevents flow [1] to be in effect. Bump priority of flow [1] since there is no case where we should leak traffic towards ClusterIPs. [1] cookie=0xdeff105, duration=492.235s, table=0, n_packets=0, n_bytes=0, priority=105,ipv6,in_port="patch-breth0_ov",ipv6_dst=fd00:10:96::/112 actions=drop [2] cookie=0xdeff105, duration=2308.615s, table=0, n_packets=4, n_bytes=376, priority=109,ipv6,in_port="patch-breth0_ov",dl_src=96:b0:34:18:12:7c,ipv6_src=fd00:10:244:1::/64 actions=ct(commit,zone=64000,exec(load:0x1->NXM_NX_CT_MARK[])),output:eth0 cookie=0xdeff105, duration=1991.854s, table=0, n_packets=0, n_bytes=0, priority=104,ipv6,in_port="patch-breth0_ov",ipv6_src=fd00:10:244::/48 actions=drop Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Change configuration in preparation for running all control plane tests: * Make both dualstack, not much value testing IPv4 single stack * Make one of the lanes noSnatGW to get signal from that as well * Enable multicast and empty LB events * Configure host to be able to route to networks from the external world * Ensure frr container is not able to route through the host/runner Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Skip those test that wouldn't be supported or otherwise require additional work. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Run all control plane tests for bgp lane
This PR adds rules to prevent SNAT if source IP belongs to the mgmtport-no-snat-subnets-v4 or mgmtport-no-snat-subnets-v6 sets, which store IPv4 and IPv6 subnets, respectively. Signed-off-by: Yossi Boaron <yboaron@redhat.com>
Currently traffic gets SNATed at ovn-k8s-mp0 within the mgmtport-snat chain. Since OVNK has transitioned to nftables, this behavior can no longer be overridden. Previously, with iptables, SNAT could be avoided by adding a higher-priority rule in the POSTROUTING chain. However, with nftables, all rules are evaluated before making a final decision, making it impossible to skip SNAT. Some applications, like Submariner, need to preserve the source IP when traffic reaches the destination pod, as certain use cases depend on it. This PR Update mgmtport-no-snat-subnets-v4 and mgmtport-no-snat-subnets-v6 nftables set based on node's annotation values. Signed-off-by: Yossi Boaron <yboaron@redhat.com>
Signed-off-by: Yossi Boaron <yboaron@redhat.com>
Signed-off-by: thisisobate <obasiuche62@gmail.com>
Some quality of life improvements for layer 3 controllers node handling
Everytime the node updates it is triggering addEgressNode, which does a route add operation libovsdb txn for default network and every UDN, initiated from the default controller egress node logic. Only runs when needed now. Signed-off-by: Tim Rozet <trozet@redhat.com>
This is unnecessary because there is another UDN path that will call this code: secondary_layer2/3_controller -> addUpdateLocalNodeEvent -> ensureRouterPoliciesForNetwork -> CreateDefaultRouteToExternal Signed-off-by: Tim Rozet <trozet@redhat.com>
Signed-off-by: Tim Rozet <trozet@redhat.com>
This function is called from many different threads. Relying on nbdb for the GR IP is not safe here, as the GR IP could be changing due to a k8s event, and the route will be wrongly configured with an old IP still in OVN NBDB. Signed-off-by: Tim Rozet <trozet@redhat.com>
chore: update footer with new LF trademark disclaimer
Optimize egress ip performance with UDNs
Enable SNAT bypass in mgmtport-snat chain for specified subnets
Signed-off-by: Yun Zhou <yunz@nvidia.com>
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
UDN Isolation with BGP: Remove support for receiving advertised routes from remote nodes
chore: Update libovsdb bindings to ovn 25.03
Signed-off-by: nithyar <nithyar@nvidia.com>
Bump ubuntu version to 25.04
The NodeNetworkUnavailable condition can be set after ovn-k processed the node successfully so we cannot do the early exit without checking for this. Order of events: 1. Node is added without the NodeNetworkUnavailable condition 2. OVN-Kubernetes reconciles the node 3. Condition is added by an external entity 4. We never remove it because we exit early Hence this commit adds NodeNetworkUnavailable condition check for node update event and ensures h.clearInitialNodeNetworkUnavailableCondition method is called at least once to clear this condition. Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com>
…BUGS-59680-09-03-2025
one OCP hack was removed in the last d/s merge process [0] the current d/s merge is using 'git merge -X theirs' to ensure we get exactly what is upstream and will have to be re-worked to prevent this in the future. the change that was made upstream that caused this was a refactor for gw init and DPU host handling [1] that came in recently. this commit adds the OCP hack back as well as keeping the changes introduced upstream with [1] [0] https://github.com/openshift/ovn-kubernetes/pull/2693/files#diff-d09b4698b05e3cc5ad6d020187ffb80247f0ed6f784d61a93ee4e28742e3f827 [1] ovn-kubernetes/ovn-kubernetes@5b5bc06#diff-d09b4698b05e3cc5ad6d020187ffb80247f0ed6f784d61a93ee4e28742e3f827 Signed-off-by: Jamo Luhrsen <jluhrsen@gmail.com>
|
@jluhrsen: This pull request references Jira Issue OCPBUGS-59680, which is valid. 7 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. This pull request references Jira Issue OCPBUGS-59371, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. This pull request references Jira Issue OCPBUGS-48710, which is valid. 7 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jluhrsen The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@jluhrsen: This pull request references Jira Issue OCPBUGS-59680, which is valid. 7 validation(s) were run on this bug
Requesting review from QA contact: This pull request references Jira Issue OCPBUGS-59371, which is invalid:
Comment This pull request references Jira Issue OCPBUGS-48710, which is valid. 7 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/payload-job 4.18 ci blocking |
|
@jluhrsen: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command |
|
/hold obviously this PR is not good. clusters wont even install. CLBO |
|
/test e2e-aws-ovn-fdp-qe |
|
@jluhrsen: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/close I think we are going to try to go with the full sync. no time to really figure out these CBLO install issues. |
|
@jluhrsen: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@jluhrsen: This pull request references Jira Issue OCPBUGS-59680. The bug has been updated to no longer refer to the pull request using the external bug tracker. This pull request references Jira Issue OCPBUGS-59371. The bug has been updated to no longer refer to the pull request using the external bug tracker. This pull request references Jira Issue OCPBUGS-48710. The bug has been updated to no longer refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/close no time to figure out the CLBO installs. going to try to get the big one in |
This is not a full sync of 4.19 down to 4.18. The full sync was deemed to be too large. This is only up to the commit which fixes the 4.18 GCP techpreview failure. (bug / upstream PR)
This PR will likely introduce the ovn alert issue regression we dealt with and was only recently fixed upstream. (bug / upstream PR)
also, we probably need to get an origin fix cherry-picked to 4.18 to prevent some BGP job failures from starting. auto cherry-pick
failed so we'll have to do a manual pick.
This PR also includes the OCP hack commit we had to add in 4.19 when it was accidentally removed with a
git merge -X theirs