OCPBUGS-57179, OCPBUGS-49824: DownStream Merge [07-09-2025] by martinkennelly · Pull Request #2659 · openshift/ovn-kubernetes

martinkennelly · 2025-07-09T13:02:19Z

u/s to d/s merge to main

The FDB lookup is only used for non-destined shared MAC traffic. When OVN or the host send a packet that hits a NORMAL action it will initate MAC learning and can drive up the CPU of OVS. We still need NORMAL action to account for sending to unknown ports like localnet ports, but we do not want to learn the shared MAC. Therefore create a static entry binding it to the LOCAL port. Signed-off-by: Tim Rozet <trozet@redhat.com>

Commit f978967 caused a regression in performance. As the below issue describes, the egress traffic from OVN will now use NORMAL action, which will cause an FDB lookup and then FLOOD if not found. This always ends up being the case because the reply ARP packet from the physical port is flooded to the patch port and the LOCAL port. This causes an increase in CPU and unnecessarily flooding packets. We need layer 2 packets destined to the shared gateway mac to go to both the host and OVN. This is so both can receive ARP replies, etc. However, we also need the FDB entry in OVS to get updated, for our new functionality with using the NORMAL action. To fix this, add a static FDB entry for LOCAL, then modify the layer 2 flooding flow actions from "output:patch,LOCAL" to "output:patch,NORMAL". Since the FDB entry is bound in the table to LOCAL, it is effectively forwarding the packets the same as before, but with the added bonus of FDB learning on ingress. Fixes: #5318 Signed-off-by: Tim Rozet <trozet@redhat.com>

This allows a localnet VM arp reply to go to OVN, rather than a lookup that only hits the LOCAL port in the fdb table. Signed-off-by: Tim Rozet <trozet@redhat.com>

When using Docker, push image command fails because the push_args var is interpreted as empty string, Docker reject it as invalid variable and fails with the following error: $ docker push '' localhost:5000/ovn-daemonset-fedora:latest docker: 'docker push' requires 1 argument Remove the push_args wrapping quotes. Signed-off-by: Or Mergi <ormergi@redhat.com>

Since CanServeNamespace filters out namespace events for namespaces unknown to be served by this primary network, we need to reconcile namespaces once the network is reconfigured to serve a namespace. Hence this commit reconciles those namespaces and also reconciles each network policy if it contains only peer namespace selector. Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com>

This commits exports FilterFunc from handler and uses it while reconciling network policy for UDN peer namespaces. Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com>

This commit makes network reconcilation loop to sync only namespace object and network policies sync to happen from namespace reconcilation loop. Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com>

The diff between v0.7.0 and v0.8.0 is simply a rename from ovn-org/libovsdb to ovn-kubernetes/libovsdb. Signed-off-by: Dave Tucker <dave@dtucker.co.uk>

kind: Rm push_args variable quotes

Initial implementations erroneously assumed a CIDR for NATs logicalIP. Also, eip controller expects all OVN constructs that support EIP to have this metadata so if we cannot build this metadata then add dummy data so its cleaned up later by EIP controller. This was not caught by unit tests because the unit test also contained the assumption of only logical IP with no mask. It was not caught by upstream CI because we have no reboot tests. Signed-off-by: Martin Kennelly <mkennell@redhat.com>

The startup syncer was removing OVN constructs due to logic bugs introduced when EIP code was refactored for UDN. The are added again when eip controller syncs but this causes interruption. 1. Due to poor naming, enforcement of types and programmer error we were mixing up variables between a pod IP and an EIP IP. See: nodeName, ok := cache.egressIPIPToNodeCache[parsedLogicalIP.String()] parsedLogicalIP is a pod IP and not an EIP IP. 2. When iterating over the existing config for an EIP, we should delete config for LRPs where an EIP doesn't exist. 3. Remove LRPs when a network isnt found Signed-off-by: Martin Kennelly <mkennell@redhat.com>

…readability No func changes. Check if obj is nil post parsing IP. Improve logging of stale OVN config. Signed-off-by: Martin Kennelly <mkennell@redhat.com>

Removes config for deleted nodes/pods while controller was down and ensures ovn config is removed while preserving valid config. Signed-off-by: Martin Kennelly <mkennell@redhat.com>

Fixes FDB learning and usage of NORMAL action

chore: bump libovsdb to v0.8.0

EgressIP: fix startup syncer

openshift-ci-robot · 2025-07-09T13:02:34Z

@martinkennelly: This pull request references Jira Issue OCPBUGS-57179, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.20.0) matches configured target version for branch (4.20.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (jechen@redhat.com), skipping review request.

The bug has been updated to refer to the pull request using the external bug tracker.

This pull request references Jira Issue OCPBUGS-49824, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.20.0) matches configured target version for branch (4.20.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @asood-rh

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

u/s to d/s merge to main

cc @jcaamano @pperiyasamy @trozet

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

martinkennelly · 2025-07-09T13:04:08Z

/payload 4.20 nightly blocking

openshift-ci · 2025-07-09T13:04:14Z

@martinkennelly: trigger 11 job(s) of type blocking for the nightly release of OCP 4.20

periodic-ci-openshift-release-master-ci-4.20-e2e-aws-upgrade-ovn-single-node
periodic-ci-openshift-release-master-nightly-4.20-e2e-aws-ovn-upgrade-fips
periodic-ci-openshift-release-master-ci-4.20-e2e-azure-ovn-upgrade
periodic-ci-openshift-release-master-ci-4.20-upgrade-from-stable-4.19-e2e-gcp-ovn-rt-upgrade
periodic-ci-openshift-hypershift-release-4.20-periodics-e2e-aws-ovn-conformance
periodic-ci-openshift-release-master-nightly-4.20-e2e-aws-ovn-serial
periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview
periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview-serial
periodic-ci-openshift-release-master-nightly-4.20-fips-payload-scan
periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ipi-ovn-bm
periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ipi-ovn-ipv6

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/33b15de0-5cc5-11f0-9598-ed265b50f5d0-0

jluhrsen · 2025-07-09T17:02:51Z

/payload 4.20 ci blocking

openshift-ci · 2025-07-09T17:02:56Z

@jluhrsen: trigger 4 job(s) of type blocking for the ci release of OCP 4.20

periodic-ci-openshift-release-master-ci-4.20-upgrade-from-stable-4.19-e2e-aws-ovn-upgrade
periodic-ci-openshift-release-master-ci-4.20-upgrade-from-stable-4.19-e2e-azure-ovn-upgrade
periodic-ci-openshift-release-master-ci-4.20-e2e-gcp-ovn-upgrade
periodic-ci-openshift-hypershift-release-4.20-periodics-e2e-aws-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/8cefe1d0-5ce6-11f0-84cf-82dbf5953384-0

jluhrsen · 2025-07-09T17:02:57Z

/retest

jluhrsen · 2025-07-09T17:42:36Z

/retitle OCPBUGS-57179, OCPBUGS-49824: DownStream Merge [07-09-2025]

jluhrsen · 2025-07-13T18:57:02Z

/payload-job periodic-ci-openshift-release-master-ci-4.20-e2e-aws-upgrade-ovn-single-node

TBH, this job is NOT healthy right now. I don't think we should care if it passes or not

martinkennelly · 2025-07-14T12:52:12Z

Looks good to me now @jluhrsen i checked the single node job and its unrelated.

martinkennelly · 2025-07-14T12:53:11Z

/test 4.20-upgrade-from-stable-4.19-e2e-aws-ovn-upgrade-ipsec

martinkennelly · 2025-07-14T12:53:43Z

/test e2e-aws-ovn-hypershift-kubevirt

martinkennelly · 2025-07-14T12:54:01Z

/test e2e-aws-ovn-fdp-qe

jluhrsen · 2025-07-14T21:07:35Z

just need some labels here please

tssurya · 2025-07-15T11:08:36Z

perf scale is failing do we know why?

jluhrsen · 2025-07-15T16:12:50Z

perf scale is failing do we know why?

did it just time out? I see this:

time="2025-07-12 18:57:26" level=error msg="4h0m0s timeout reached" file="helpers.go:85"
time="2025-07-12 18:57:26" level=info msg="👋 kube-burner run completed with rc 2 for UUID fc1f9b7b-65a6-457f-873b-b3333563b72a" file="helpers.go:87"
+ exit_code=2

jluhrsen · 2025-07-15T18:59:54Z

/lgtm

tssurya · 2025-07-16T06:13:48Z

The perf/scale lane is failing because of known bug: https://redhat-internal.slack.com/archives/GQ0CU2623/p1752603854333399?thread_ts=1752571219.301049&cid=GQ0CU2623

tssurya · 2025-07-16T06:14:14Z

hypershift-kubevirt has such a low pass rate ugh and uprade-ipsec is perma failing cc @pperiyasamy

tssurya · 2025-07-16T06:14:46Z

/approve

returning favor to Jaime who took care of the 4.19 one last time

tssurya · 2025-07-16T06:15:05Z

fdp-qe job didn't run - failures unrelated

openshift-ci · 2025-07-16T06:17:19Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jluhrsen, martinkennelly, tssurya

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [tssurya]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2025-07-16T09:00:39Z

/retest-required

Remaining retests: 0 against base HEAD d5f06a9 and 2 for PR HEAD 28f55a4 in total

openshift-ci-robot · 2025-07-16T11:30:48Z

/retest-required

Remaining retests: 0 against base HEAD d5f06a9 and 2 for PR HEAD 28f55a4 in total

martinkennelly · 2025-07-16T16:24:32Z

/retest

Unrelated failures

jluhrsen · 2025-07-16T21:14:59Z

/test e2e-aws-ovn-serial

jluhrsen · 2025-07-16T21:16:08Z

/test e2e-aws-ovn-serial

although this job looks to have taken a turn for the worse recently. we may not get lucky here and will need to override

openshift-ci-robot · 2025-07-17T01:28:10Z

/retest-required

Remaining retests: 0 against base HEAD d5f06a9 and 2 for PR HEAD 28f55a4 in total

openshift-ci · 2025-07-17T05:45:39Z

@martinkennelly: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-aws-ovn-hypershift-kubevirt	`28f55a4`	link	false	`/test e2e-aws-ovn-hypershift-kubevirt`
ci/prow/security	`28f55a4`	link	false	`/test security`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci-robot · 2025-07-17T05:46:41Z

@martinkennelly: Jira Issue OCPBUGS-57179: All pull requests linked via external trackers have merged:

openshift/ovn-kubernetes#2659

Jira Issue OCPBUGS-57179 has been moved to the MODIFIED state.

Jira Issue OCPBUGS-49824: All pull requests linked via external trackers have merged:

openshift/ovn-kubernetes#2659

Jira Issue OCPBUGS-49824 has been moved to the MODIFIED state.

Details

In response to this:

u/s to d/s merge to main

cc @jcaamano @pperiyasamy @trozet

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Reamer · 2025-07-17T07:25:10Z

Do we get a downstream merge to versions 4.20 and 4.19 to fix OCPBUGS-57179 there as well?

jluhrsen · 2025-07-17T21:55:35Z

Do we get a downstream merge to versions 4.20 and 4.19 to fix OCPBUGS-57179 there as well?

yes, we will continuously do these kind of downstream merges from upstream ovnk in to openshift master. then sync those changes to 4.19 and down to 4.18 in time. There will likely just be a continuous flow of these merge PRs happening and as soon as one gets in a new one with new changes will be opened.

currently we have:

d/s merge to master
4.20->4.19 sync
4.19->4.18 sync

jechen0648 · 2025-07-18T12:10:16Z

/jira refresh

openshift-ci-robot · 2025-07-18T12:10:20Z

@jechen0648: Jira Issue OCPBUGS-57179 is in an unrecognized state (Verified) and will not be moved to the MODIFIED state.

Jira Issue OCPBUGS-49824 is in an unrecognized state (ON_QA) and will not be moved to the MODIFIED state.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

trozet and others added 17 commits June 26, 2025 17:58

Remove physical port from l2 flow

3735ec2

This allows a localnet VM arp reply to go to OVN, rather than a lookup that only hits the LOCAL port in the fdb table. Signed-off-by: Tim Rozet <trozet@redhat.com>

Use Handler FilterFunc to filter out np peer namespace

96db6fd

This commits exports FilterFunc from handler and uses it while reconciling network policy for UDN peer namespaces. Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com>

Use namespace reconcilation loop for syncing network policies

f792af5

This commit makes network reconcilation loop to sync only namespace object and network policies sync to happen from namespace reconcilation loop. Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com>

chore: bump libovsdb to v0.8.0

0b513c6

The diff between v0.7.0 and v0.8.0 is simply a rename from ovn-org/libovsdb to ovn-kubernetes/libovsdb. Signed-off-by: Dave Tucker <dave@dtucker.co.uk>

Merge pull request #5349 from ormergi/fix-kind-push-cmd

f25f775

kind: Rm push_args variable quotes

EIP OVN controller: remove possibility of crash, improve logging and …

41a9151

…readability No func changes. Check if obj is nil post parsing IP. Improve logging of stale OVN config. Signed-off-by: Martin Kennelly <mkennell@redhat.com>

OVN EIP startup syncer: add UTs for pod / node deleted

053585e

Removes config for deleted nodes/pods while controller was down and ensures ovn config is removed while preserving valid config. Signed-off-by: Martin Kennelly <mkennell@redhat.com>

Merge pull request #5334 from trozet/fix_fdb_learning

256b38c

Fixes FDB learning and usage of NORMAL action

Merge pull request #5352 from dave-tucker/libovsdb-up

6bc337b

chore: bump libovsdb to v0.8.0

Merge pull request #5295 from martinkennelly/fix-eip-syncer

b179c50

EgressIP: fix startup syncer

Merge remote-tracking branch 'origin/master' into merge-09-jul-25

28f55a4

openshift-ci bot requested review from abhat, asood-rh and tssurya July 9, 2025 13:02

openshift-ci bot assigned jluhrsen Jul 15, 2025

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 15, 2025

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 16, 2025

openshift-merge-bot bot merged commit 272896f into openshift:master Jul 17, 2025
45 of 47 checks passed

Conversation

martinkennelly commented Jul 9, 2025

Uh oh!

openshift-ci-robot commented Jul 9, 2025

Uh oh!

martinkennelly commented Jul 9, 2025

Uh oh!

openshift-ci bot commented Jul 9, 2025

Uh oh!

jluhrsen commented Jul 9, 2025

Uh oh!

openshift-ci bot commented Jul 9, 2025

Uh oh!

jluhrsen commented Jul 9, 2025

Uh oh!

jluhrsen commented Jul 9, 2025

Uh oh!

jluhrsen commented Jul 13, 2025

Uh oh!

martinkennelly commented Jul 14, 2025

Uh oh!

martinkennelly commented Jul 14, 2025

Uh oh!

martinkennelly commented Jul 14, 2025

Uh oh!

martinkennelly commented Jul 14, 2025

Uh oh!

jluhrsen commented Jul 14, 2025

Uh oh!

tssurya commented Jul 15, 2025

Uh oh!

jluhrsen commented Jul 15, 2025

Uh oh!

jluhrsen commented Jul 15, 2025

Uh oh!

tssurya commented Jul 16, 2025

Uh oh!

tssurya commented Jul 16, 2025

Uh oh!

tssurya commented Jul 16, 2025

Uh oh!

tssurya commented Jul 16, 2025

Uh oh!

openshift-ci bot commented Jul 16, 2025

Uh oh!

openshift-ci-robot commented Jul 16, 2025

Uh oh!

openshift-ci-robot commented Jul 16, 2025

Uh oh!

martinkennelly commented Jul 16, 2025

Uh oh!

jluhrsen commented Jul 16, 2025

Uh oh!

jluhrsen commented Jul 16, 2025

Uh oh!

openshift-ci-robot commented Jul 17, 2025

Uh oh!

openshift-ci bot commented Jul 17, 2025

Uh oh!

Uh oh!

openshift-ci-robot commented Jul 17, 2025

Uh oh!

Reamer commented Jul 17, 2025

Uh oh!

jluhrsen commented Jul 17, 2025

Uh oh!

jechen0648 commented Jul 18, 2025

Uh oh!

openshift-ci-robot commented Jul 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants