Skip to content

Comments

OCPBUGS-62859, OCPBUGS-59680: [release-4.18] DownStream Merge Sync from 4.19 [09-29-2025]#2768

Merged
openshift-merge-bot[bot] merged 259 commits intoopenshift:release-4.18from
kyrtapz:4.18-sync-from-4.19-09-29-2025
Oct 2, 2025
Merged

OCPBUGS-62859, OCPBUGS-59680: [release-4.18] DownStream Merge Sync from 4.19 [09-29-2025]#2768
openshift-merge-bot[bot] merged 259 commits intoopenshift:release-4.18from
kyrtapz:4.18-sync-from-4.19-09-29-2025

Conversation

@kyrtapz
Copy link
Contributor

@kyrtapz kyrtapz commented Sep 29, 2025

This is a sync from 4.19 up to #2704 with a manually cherry-picked commit to fix the GCP issue.

jitseklomp and others added 30 commits June 3, 2025 12:59
Signed-off-by: Jitse Klomp <jitse.klomp@conclusionxforce.nl>
Fix node update check for network cluster controller
During update node events, local and remote addOrUpdate functions are called.
There are a series of sync checks used to know what to configure.
However, in some cases log messages were being printed no matter what,
and hybrid overlay was being processed on every node event.

This cleans things up so that hybrid overlay is only sync'ed when
necessary, and logs are only printed when work is being done to add the
local or remote node.

Also, removes an old test case for hybrid overlay where the node-subnets
annotation of a node was being removed. First introduced here:

ovn-kubernetes/ovn-kubernetes@aef135c#diff-9ab180ea9a39f81dc8334a00ca8ea5e4cd04f9491c27dcfd910b07929c9ddbb5R193

It's not totally clear what the purpose of this test was, but we do not
support clearing OVN configuration when OVNK assigned annotations are
removed by the user. The node-subnets annotation should not be removed,
and if is removed, it should be configured back onto the node by
cluster-manager.

Signed-off-by: Tim Rozet <trozet@redhat.com>
When remote nodes are added (as new UDNs are created) the first remote
add always fails. This is because the controller is waiting for the
subnets annotation to be updated for the network. However, it only
partially fails. It fails when the routes are attempting to be added,
but this is after the logical switch port logic and some other parsing
has already been done.

Rather than execute this work twice, just bail early if the node does
not have all of the annotations yet. This way we can execute the
majority of the work only one time.

With this change, only once all annotations are present will you see:

"Creating interconnect resources for remote zone node"

Signed-off-by: Tim Rozet <trozet@redhat.com>
Just execute the 2 route adds in the same txn

Signed-off-by: Tim Rozet <trozet@redhat.com>
When a CUDN/UDN is create with joinSubnets field configured it should
generate the net-attach-def with `joinSubnet` field, the code was using
`joinSubnets` wich is not undertood by ovn-kubernetes.

Signed-off-by: Enrique Llorente <ellorent@redhat.com>
Configures ephemeral port range for OVN SNAT'ing
udn: Fix NAD template for join subnets field
… module

Signed-off-by: Alin Gabriel Serdean <aserdean@nvidia.com>
workflow: Add fix missing and apt update before trying to install VRF…
So that ginkgo times out first and we get useful output.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
We have a flow [1] to prevent leaking traffic towards a ClusterIP.
However we also have a flow to prevent EIP traffic to egress before
being SNATed and an additional flow to actually allow the traffic to
egress in ICNI/BGP scenarios for pods on the nodes subnet [2]. The
higher priority of flow [2] prevents flow [1] to be in effect.

Bump priority of flow [1] since there is no case where we should leak
traffic towards ClusterIPs.

[1]
cookie=0xdeff105, duration=492.235s, table=0, n_packets=0, n_bytes=0, priority=105,ipv6,in_port="patch-breth0_ov",ipv6_dst=fd00:10:96::/112 actions=drop

[2]
cookie=0xdeff105, duration=2308.615s, table=0, n_packets=4, n_bytes=376, priority=109,ipv6,in_port="patch-breth0_ov",dl_src=96:b0:34:18:12:7c,ipv6_src=fd00:10:244:1::/64 actions=ct(commit,zone=64000,exec(load:0x1->NXM_NX_CT_MARK[])),output:eth0
cookie=0xdeff105, duration=1991.854s, table=0, n_packets=0, n_bytes=0, priority=104,ipv6,in_port="patch-breth0_ov",ipv6_src=fd00:10:244::/48 actions=drop

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Change configuration in preparation for running all control plane tests:
* Make both dualstack, not much value testing IPv4 single stack
* Make one of the lanes noSnatGW to get signal from that as well
* Enable multicast and empty LB events
* Configure host to be able to route to networks from the external world
* Ensure frr container is not able to route through the host/runner

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Skip those test that wouldn't be supported or otherwise require
additional work.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
This PR adds rules to prevent SNAT if source IP belongs to
the mgmtport-no-snat-subnets-v4 or mgmtport-no-snat-subnets-v6 sets,
which store IPv4 and IPv6 subnets, respectively.

Signed-off-by: Yossi Boaron <yboaron@redhat.com>
Currently traffic gets SNATed at ovn-k8s-mp0 within the mgmtport-snat chain.
Since OVNK has transitioned to nftables, this behavior can
no longer be overridden.

Previously, with iptables, SNAT could be avoided by adding a
higher-priority rule in the POSTROUTING chain. However, with nftables,
all rules are evaluated before making a final decision, making it
impossible to skip SNAT.

Some applications, like Submariner, need to preserve the source IP
when traffic reaches the destination pod, as certain use cases depend on it.

This PR Update mgmtport-no-snat-subnets-v4 and mgmtport-no-snat-subnets-v6
nftables set based on node's annotation values.

Signed-off-by: Yossi Boaron <yboaron@redhat.com>
Signed-off-by: Yossi Boaron <yboaron@redhat.com>
Signed-off-by: thisisobate <obasiuche62@gmail.com>
Some quality of life improvements for layer 3 controllers node handling
Everytime the node updates it is triggering addEgressNode, which does a
route add operation libovsdb txn for default network and every UDN,
initiated from the default controller egress node logic. Only runs when
needed now.

Signed-off-by: Tim Rozet <trozet@redhat.com>
This is unnecessary because there is another UDN path that will call
this code:

secondary_layer2/3_controller -> addUpdateLocalNodeEvent ->
ensureRouterPoliciesForNetwork -> CreateDefaultRouteToExternal

Signed-off-by: Tim Rozet <trozet@redhat.com>
This function is called from many different threads. Relying on nbdb for
the GR IP is not safe here, as the GR IP could be changing due to a k8s
event, and the route will be wrongly configured with an old IP still in
OVN NBDB.

Signed-off-by: Tim Rozet <trozet@redhat.com>
chore: update footer with new LF trademark disclaimer
Optimize egress ip performance with UDNs
Enable SNAT bypass in mgmtport-snat chain for specified subnets
OCPBUGS-55098: DownStream Merge [06-04-2025]
@tssurya
Copy link
Contributor

tssurya commented Oct 2, 2025

/label backport-risk-assessed
what could go wrong with only 260 commits?? :)
@jluhrsen: next time no matter what we cant let it grow to this much :(
brace for impact - let's hope we get two green nightlies
payloads and CI looking good - that's the signal on which I am basing my risk-assess on. Also this code has had enough soak time in 4.19 and 4.20 for months now. should be good

@openshift-ci openshift-ci bot added the backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. label Oct 2, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 2, 2025

@kyrtapz: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-hypershift-kubevirt dc24c60 link false /test e2e-aws-ovn-hypershift-kubevirt
ci/prow/security dc24c60 link false /test security
ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw-techpreview dc24c60 link false /test e2e-metal-ipi-ovn-dualstack-bgp-local-gw-techpreview
ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-techpreview dc24c60 link false /test e2e-metal-ipi-ovn-dualstack-bgp-techpreview
ci/prow/e2e-openstack-ovn dc24c60 link false /test e2e-openstack-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@jluhrsen
Copy link
Contributor

jluhrsen commented Oct 2, 2025

/retitle OCPBUGS-59680: [release-4.18] DownStream Merge Sync from 4.19 [09-29-2025]

@openshift-ci openshift-ci bot changed the title [release-4.18] DownStream Merge Sync from 4.19 [09-29-2025] OCPBUGS-59680: [release-4.18] DownStream Merge Sync from 4.19 [09-29-2025] Oct 2, 2025
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Oct 2, 2025
@openshift-ci-robot
Copy link
Contributor

@kyrtapz: This pull request references Jira Issue OCPBUGS-59680, which is valid.

7 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.18.z) matches configured target version for branch (4.18.z)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
  • release note type set to "Release Note Not Required"
  • dependent bug Jira Issue OCPBUGS-59350 is in the state Closed (Done-Errata), which is one of the valid states (VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA))
  • dependent Jira Issue OCPBUGS-59350 targets the "4.19.0" version, which is one of the valid target versions: 4.19.0, 4.19.z
  • bug has dependents

Requesting review from QA contact:
/cc @anuragthehatter

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

This is a sync from 4.19 up to #2704 with a manually cherry-picked commit to fix the GCP issue.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from anuragthehatter October 2, 2025 21:42
@jluhrsen
Copy link
Contributor

jluhrsen commented Oct 2, 2025

@jluhrsen: next time no matter what we cant let it grow to this much :(

there was a bug exposed w the small merge that we did not want to introduce in to 4.18. the fix had to go upstream
first, then d/s merge and then to 4.19 and finally to 4.18. That is what did this to us. I don't see a way around that in
the future. hopefully this is a rare circumstance.

brace for impact - let's hope we get two green nightlies payloads and CI looking good - that's the signal on which I am basing my risk-assess on. Also this code has had enough soak time in 4.19 and 4.20 for months now. should be good

I will track the nightlies

@jluhrsen does the PR look good to you? Would you mind linking the appropriate bugs if you are aware of which ones are in?

honestly the only one I really know is the GCP bug and I added that to the title

@jluhrsen
Copy link
Contributor

jluhrsen commented Oct 2, 2025

/verified

@openshift-ci-robot
Copy link
Contributor

@jluhrsen: The /verified command must be used with one of the following actions: by, later, remove, or bypass. See https://docs.ci.openshift.org/docs/architecture/jira/#premerge-verification for more information.

Details

In response to this:

/verified

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jluhrsen
Copy link
Contributor

jluhrsen commented Oct 2, 2025

/verified later @jluhrsen

not sure what I'm doing here, but seeing if this works.

@openshift-ci-robot openshift-ci-robot added verified-later verified Signifies that the PR passed pre-merge verification criteria labels Oct 2, 2025
@openshift-ci-robot
Copy link
Contributor

@jluhrsen: This PR has been marked to be verified later by @jluhrsen.

Details

In response to this:

/verified later @jluhrsen

not sure what I'm doing here, but seeing if this works.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-bot openshift-merge-bot bot merged commit 82307af into openshift:release-4.18 Oct 2, 2025
41 of 46 checks passed
@openshift-ci-robot
Copy link
Contributor

@kyrtapz: Jira Issue OCPBUGS-59680: Some pull requests linked via external trackers have merged:

The following pull request, linked via external tracker, has not merged:

All associated pull requests must be merged or unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-59680 has not been moved to the MODIFIED state.

This PR is marked as verified-later. Jira issue(s) in the title of this PR will require post-merge verification. After testing, it must be manually moved to the VERIFIED state.

Details

In response to this:

This is a sync from 4.19 up to #2704 with a manually cherry-picked commit to fix the GCP issue.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jluhrsen
Copy link
Contributor

jluhrsen commented Oct 8, 2025

/retitle OCPBUGS-62859, OCPBUGS-59680: [release-4.18] DownStream Merge Sync from 4.19 [09-29-2025]

@openshift-ci openshift-ci bot changed the title OCPBUGS-59680: [release-4.18] DownStream Merge Sync from 4.19 [09-29-2025] OCPBUGS-62859, OCPBUGS-59680: [release-4.18] DownStream Merge Sync from 4.19 [09-29-2025] Oct 8, 2025
@openshift-ci-robot
Copy link
Contributor

@kyrtapz: Jira Issue OCPBUGS-62859 is in an unrecognized state (Verified) and will not be moved to the MODIFIED state.

Jira Issue OCPBUGS-59680 is in an unrecognized state (Verified) and will not be moved to the MODIFIED state.

Details

In response to this:

This is a sync from 4.19 up to #2704 with a manually cherry-picked commit to fix the GCP issue.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria verified-later ¯\_(ツ)_/¯ ¯\\\_(ツ)_/¯

Projects

None yet

Development

Successfully merging this pull request may close these issues.