Skip to content

OCPBUGS-57179, OCPBUGS-49824: DownStream Merge [07-09-2025]#2659

Merged
openshift-merge-bot[bot] merged 17 commits intoopenshift:masterfrom
martinkennelly:merge-09-jul-25
Jul 17, 2025
Merged

OCPBUGS-57179, OCPBUGS-49824: DownStream Merge [07-09-2025]#2659
openshift-merge-bot[bot] merged 17 commits intoopenshift:masterfrom
martinkennelly:merge-09-jul-25

Conversation

@martinkennelly
Copy link
Contributor

u/s to d/s merge to main

cc @jcaamano @pperiyasamy @trozet

trozet and others added 17 commits June 26, 2025 17:58
The FDB lookup is only used for non-destined shared MAC traffic. When
OVN or the host send a packet that hits a NORMAL action it will initate
MAC learning and can drive up the CPU of OVS. We still need NORMAL
action to account for sending to unknown ports like localnet ports, but
we do not want to learn the shared MAC. Therefore create a static entry
binding it to the LOCAL port.

Signed-off-by: Tim Rozet <trozet@redhat.com>
Commit f978967 caused a regression in performance. As the below issue
describes, the egress traffic from OVN will now use NORMAL action, which
will cause an FDB lookup and then FLOOD if not found. This always ends
up being the case because the reply ARP packet from the physical port is
flooded to the patch port and the LOCAL port. This causes an increase in
CPU and unnecessarily flooding packets.

We need layer 2 packets destined to the shared gateway mac to go to both
the host and OVN. This is so both can receive ARP replies, etc. However,
we also need the FDB entry in OVS to get updated, for our new
functionality with using the NORMAL action.

To fix this, add a static FDB entry for LOCAL, then modify the layer 2
flooding flow actions from "output:patch,LOCAL" to
"output:patch,NORMAL". Since the FDB entry is bound in the table to
LOCAL, it is effectively forwarding the packets the same as before, but
with the added bonus of FDB learning on ingress.

Fixes: #5318

Signed-off-by: Tim Rozet <trozet@redhat.com>
This allows a localnet VM arp reply to go to OVN, rather than a lookup
that only hits the LOCAL port in the fdb table.

Signed-off-by: Tim Rozet <trozet@redhat.com>
When using Docker, push image command fails because
the push_args var is interpreted as empty string, Docker
reject it as invalid variable and fails with the following error:
  $ docker push '' localhost:5000/ovn-daemonset-fedora:latest
  docker: 'docker push' requires 1 argument

Remove the push_args wrapping quotes.

Signed-off-by: Or Mergi <ormergi@redhat.com>
Since CanServeNamespace filters out namespace events for namespaces unknown
to be served by this primary network, we need to reconcile namespaces once
the network is reconfigured to serve a namespace.
Hence this commit reconciles those namespaces and also reconciles each network
policy if it contains only peer namespace selector.

Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com>
This commits exports FilterFunc from handler and uses it while
reconciling network policy for UDN peer namespaces.

Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com>
This commit makes network reconcilation loop to sync only namespace
object and network policies sync to happen from namespace reconcilation
loop.

Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com>
The diff between v0.7.0 and v0.8.0 is simply a rename from
ovn-org/libovsdb to ovn-kubernetes/libovsdb.

Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
Initial implementations erroneously assumed a CIDR for NATs
logicalIP.

Also, eip controller expects all OVN constructs that support
EIP to have this metadata so if we cannot build this metadata
then add dummy data so its cleaned up later by EIP controller.

This was not caught by unit tests because the unit test also
contained the assumption of only logical IP with no mask.

It was not caught by upstream CI because we have no reboot tests.

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
The startup syncer was removing OVN constructs due to logic bugs
introduced when EIP code was refactored for UDN. The are added again
when eip controller syncs but this causes interruption.

1. Due to poor naming, enforcement of types and programmer error
we were mixing up variables between a pod IP and an EIP IP.
See:
nodeName, ok := cache.egressIPIPToNodeCache[parsedLogicalIP.String()]

parsedLogicalIP is a pod IP and not an EIP IP.

2. When iterating over the existing config for an EIP, we should
delete config for LRPs where an EIP doesn't exist.

3. Remove LRPs when a network isnt found

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
…readability

No func changes.
Check if obj is nil post parsing IP.
Improve logging of stale OVN config.

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Removes config for deleted nodes/pods while controller
was down and ensures ovn config is removed while preserving
valid config.

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Fixes FDB learning and usage of NORMAL action
@openshift-ci-robot openshift-ci-robot added jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Jul 9, 2025
@openshift-ci-robot
Copy link
Contributor

@martinkennelly: This pull request references Jira Issue OCPBUGS-57179, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.20.0) matches configured target version for branch (4.20.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (jechen@redhat.com), skipping review request.

The bug has been updated to refer to the pull request using the external bug tracker.

This pull request references Jira Issue OCPBUGS-49824, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.20.0) matches configured target version for branch (4.20.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @asood-rh

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

u/s to d/s merge to main

cc @jcaamano @pperiyasamy @trozet

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from abhat, asood-rh and tssurya July 9, 2025 13:02
@martinkennelly
Copy link
Contributor Author

/payload 4.20 nightly blocking

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 9, 2025

@martinkennelly: trigger 11 job(s) of type blocking for the nightly release of OCP 4.20

  • periodic-ci-openshift-release-master-ci-4.20-e2e-aws-upgrade-ovn-single-node
  • periodic-ci-openshift-release-master-nightly-4.20-e2e-aws-ovn-upgrade-fips
  • periodic-ci-openshift-release-master-ci-4.20-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.20-upgrade-from-stable-4.19-e2e-gcp-ovn-rt-upgrade
  • periodic-ci-openshift-hypershift-release-4.20-periodics-e2e-aws-ovn-conformance
  • periodic-ci-openshift-release-master-nightly-4.20-e2e-aws-ovn-serial
  • periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview
  • periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview-serial
  • periodic-ci-openshift-release-master-nightly-4.20-fips-payload-scan
  • periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ipi-ovn-bm
  • periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ipi-ovn-ipv6

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/33b15de0-5cc5-11f0-9598-ed265b50f5d0-0

@jluhrsen
Copy link
Contributor

jluhrsen commented Jul 9, 2025

/payload 4.20 ci blocking

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 9, 2025

@jluhrsen: trigger 4 job(s) of type blocking for the ci release of OCP 4.20

  • periodic-ci-openshift-release-master-ci-4.20-upgrade-from-stable-4.19-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.20-upgrade-from-stable-4.19-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.20-e2e-gcp-ovn-upgrade
  • periodic-ci-openshift-hypershift-release-4.20-periodics-e2e-aws-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/8cefe1d0-5ce6-11f0-84cf-82dbf5953384-0

@jluhrsen
Copy link
Contributor

jluhrsen commented Jul 9, 2025

/retest

@jluhrsen
Copy link
Contributor

jluhrsen commented Jul 9, 2025

/retitle OCPBUGS-57179, OCPBUGS-49824: DownStream Merge [07-09-2025]

@jluhrsen
Copy link
Contributor

/payload-job periodic-ci-openshift-release-master-ci-4.20-e2e-aws-upgrade-ovn-single-node

TBH, this job is NOT healthy right now. I don't think we should care if it passes or not

@martinkennelly
Copy link
Contributor Author

Looks good to me now @jluhrsen i checked the single node job and its unrelated.

@martinkennelly
Copy link
Contributor Author

/test 4.20-upgrade-from-stable-4.19-e2e-aws-ovn-upgrade-ipsec

@martinkennelly
Copy link
Contributor Author

/test e2e-aws-ovn-hypershift-kubevirt

@martinkennelly
Copy link
Contributor Author

/test e2e-aws-ovn-fdp-qe

@jluhrsen
Copy link
Contributor

just need some labels here please

@tssurya
Copy link
Contributor

tssurya commented Jul 15, 2025

perf scale is failing do we know why?

@jluhrsen
Copy link
Contributor

perf scale is failing do we know why?

did it just time out? I see this:

time="2025-07-12 18:57:26" level=error msg="4h0m0s timeout reached" file="helpers.go:85"
time="2025-07-12 18:57:26" level=info msg="👋 kube-burner run completed with rc 2 for UUID fc1f9b7b-65a6-457f-873b-b3333563b72a" file="helpers.go:87"
+ exit_code=2

@jluhrsen
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 15, 2025
@tssurya
Copy link
Contributor

tssurya commented Jul 16, 2025

@tssurya
Copy link
Contributor

tssurya commented Jul 16, 2025

hypershift-kubevirt has such a low pass rate ugh and uprade-ipsec is perma failing cc @pperiyasamy

@tssurya
Copy link
Contributor

tssurya commented Jul 16, 2025

/approve

returning favor to Jaime who took care of the 4.19 one last time

@tssurya
Copy link
Contributor

tssurya commented Jul 16, 2025

fdp-qe job didn't run - failures unrelated

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 16, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jluhrsen, martinkennelly, tssurya

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 16, 2025
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD d5f06a9 and 2 for PR HEAD 28f55a4 in total

1 similar comment
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD d5f06a9 and 2 for PR HEAD 28f55a4 in total

@martinkennelly
Copy link
Contributor Author

/retest

Unrelated failures

@jluhrsen
Copy link
Contributor

/test e2e-aws-ovn-serial

@jluhrsen
Copy link
Contributor

/test e2e-aws-ovn-serial

although this job looks to have taken a turn for the worse recently. we may not get lucky here and will need to override

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD d5f06a9 and 2 for PR HEAD 28f55a4 in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 17, 2025

@martinkennelly: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-hypershift-kubevirt 28f55a4 link false /test e2e-aws-ovn-hypershift-kubevirt
ci/prow/security 28f55a4 link false /test security

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 272896f into openshift:master Jul 17, 2025
45 of 47 checks passed
@openshift-ci-robot
Copy link
Contributor

@martinkennelly: Jira Issue OCPBUGS-57179: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-57179 has been moved to the MODIFIED state.

Jira Issue OCPBUGS-49824: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-49824 has been moved to the MODIFIED state.

Details

In response to this:

u/s to d/s merge to main

cc @jcaamano @pperiyasamy @trozet

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@Reamer
Copy link
Contributor

Reamer commented Jul 17, 2025

Do we get a downstream merge to versions 4.20 and 4.19 to fix OCPBUGS-57179 there as well?

@jluhrsen
Copy link
Contributor

Do we get a downstream merge to versions 4.20 and 4.19 to fix OCPBUGS-57179 there as well?

yes, we will continuously do these kind of downstream merges from upstream ovnk in to openshift master. then sync those changes to 4.19 and down to 4.18 in time. There will likely just be a continuous flow of these merge PRs happening and as soon as one gets in a new one with new changes will be opened.

currently we have:

d/s merge to master
4.20->4.19 sync
4.19->4.18 sync

@jechen0648
Copy link
Contributor

/jira refresh

@openshift-ci-robot
Copy link
Contributor

@jechen0648: Jira Issue OCPBUGS-57179 is in an unrecognized state (Verified) and will not be moved to the MODIFIED state.

Jira Issue OCPBUGS-49824 is in an unrecognized state (ON_QA) and will not be moved to the MODIFIED state.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.