OCPBUGS-55098: DownStream Merge [06-04-2025] by jluhrsen · Pull Request #2618 · openshift/ovn-kubernetes

jluhrsen · 2025-06-04T22:11:11Z

📑 Description

Fixes #

Additional Information for reviewers

✅ Checks

My code requires changes to the documentation
if so, I have updated the documentation as required
My code requires tests
if so, I have added and/or updated the tests as required
All the tests have passed in the CI

How to verify it

Related to investigating the root cause for: #5260. This commit removes adding pods that are not scheduled to the retry framework. When the pod is scheduled the controller will receive an event. Additionally these functions that add pods were using the kubeclient instead of informer cache. That means everytime a UDN was added we would issue kubeclient command to get all pods, which is really bad for performance. Signed-off-by: Tim Rozet <trozet@redhat.com>

There was a previous bug where when an egress packet would be SNAT'ed to the node IP, using a nodeport source port, it would cause reply traffic to get DNAT'ed to the nodeport load balancer. This happened because the egress connections were not conntracked correctly. This was fixed via: https://issues.redhat.com/browse/OCPBUGS-25889 https://issues.redhat.com/browse/FDP-291 However, that fix was not hardware offloadable. The ideal fix here would be to always commit to conntrack and have it be HW offloadable. Until we have a better solution, we can configure the port range for OVN to use on its SNAT. This applies to all SNATs for traffic that enters the local host or leaves the host. The new config option --ephemeral-port-range "<minPort>-<maxPort>" can be used to specify the port range to use with OVN. If not provided, this value will be automatically derived from the ephemeral port range in /proc/sys/net/ipv4/ip_local_port_range, which is typically set already to avoid nodeport range conflicts. Signed-off-by: Tim Rozet <trozet@redhat.com>

Signed-off-by: Tim Rozet <trozet@redhat.com>

Kubeclient get for nodes and pods were being used in other places in the code. Removed all of their uses except for specific cases like the ovn db manager and windows, where we do not have full informer setups. While transitioning to use the factory, it created a cylical dependency between metrics and factory libraries, due to the configuration duration recorder. Split the configuration duration recorder into its own sub-package under metrics/recorders. Signed-off-by: Tim Rozet <trozet@redhat.com>

Introduced in 836ec36 This would just cause node updates to fire HandleAddUpdateNodeEvent everytime as the code prior to the aforementioned commit would have. Signed-off-by: Tim Rozet <trozet@redhat.com>

We have unit tests that check to see if only certain annotations were removed, rather than an all or nothing approach. Additionally this function was added as a failsafe in case a user did modify the annotations, or some other unforseen event where the annotations are now missing. Change the function to check each annotation (if it applies to the allocator). Signed-off-by: Tim Rozet <trozet@redhat.com>

Retry all pods smarter

Fix node update check for network cluster controller

Configures ephemeral port range for OVN SNAT'ing

asood-rh · 2025-06-04T22:14:14Z

/test e2e-aws-ovn-fdp-qe

martinkennelly · 2025-06-05T08:07:26Z

/retitle OCPBUGS-55098: DownStream Merge [06-04-2025]

openshift-ci-robot · 2025-06-05T08:07:35Z

@jluhrsen: This pull request references Jira Issue OCPBUGS-55098, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.20.0) matches configured target version for branch (4.20.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @anuragthehatter

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

📑 Description

Fixes #

Additional Information for reviewers

✅ Checks

My code requires changes to the documentation

if so, I have updated the documentation as required

My code requires tests

if so, I have added and/or updated the tests as required

All the tests have passed in the CI

How to verify it

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

martinkennelly · 2025-06-05T08:08:01Z

cc @trozet

jluhrsen · 2025-06-05T20:43:16Z

/retest

asood-rh · 2025-06-06T01:31:44Z

/test e2e-aws-ovn-fdp-qe

asood-rh · 2025-06-06T11:58:50Z

/test e2e-aws-ovn-fdp-qe

openshift-ci · 2025-06-06T15:24:26Z

@jluhrsen: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/security	`4699b44`	link	false	`/test security`
ci/prow/e2e-aws-ovn-fdp-qe	`4699b44`	link	false	`/test e2e-aws-ovn-fdp-qe`
ci/prow/e2e-aws-ovn-hypershift-kubevirt	`4699b44`	link	false	`/test e2e-aws-ovn-hypershift-kubevirt`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

asood-rh · 2025-06-06T19:46:05Z

@jechen0648 looking at OCP-70667 failure

d31d171 commit adds the port range. I believe this is causing the failure.

jluhrsen · 2025-06-07T17:56:01Z

@jechen0648 looking at OCP-70667 failure

d31d171 commit adds the port range. I believe this is causing the failure.

/hold

@asood-rh do we need to get something fixed u/s and update this d/s merge?

asood-rh · 2025-06-09T12:11:11Z

@jechen0648 looking at OCP-70667 failure
d31d171 commit adds the port range. I believe this is causing the failure.

/hold

@asood-rh do we need to get something fixed u/s and update this d/s merge?

@jluhrsen It is just automation script that needs to be fixed so that it passes. It is not a product bug.

jluhrsen · 2025-06-09T20:24:08Z

@jechen0648 looking at OCP-70667 failure
d31d171 commit adds the port range. I believe this is causing the failure.

/hold
@asood-rh do we need to get something fixed u/s and update this d/s merge?

@jluhrsen It is just automation script that needs to be fixed so that it passes. It is not a product bug.

thank you
/hold cancel

trozet · 2025-06-10T19:13:05Z

/lgtm

openshift-ci · 2025-06-10T19:14:05Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jluhrsen, trozet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [trozet]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2025-06-10T19:19:09Z

@jluhrsen: Jira Issue OCPBUGS-55098: All pull requests linked via external trackers have merged:

openshift/ovn-kubernetes#2618

Jira Issue OCPBUGS-55098 has been moved to the MODIFIED state.

Details

In response to this:

📑 Description

Fixes #

Additional Information for reviewers

✅ Checks

My code requires changes to the documentation

if so, I have updated the documentation as required

My code requires tests

if so, I have added and/or updated the tests as required

All the tests have passed in the CI

How to verify it

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-bot · 2025-06-10T22:50:46Z

[ART PR BUILD NOTIFIER]

Distgit: ovn-kubernetes-base
This PR has been included in build ose-ovn-kubernetes-base-container-v4.20.0-202506102141.p0.g36e5a1d.assembly.stream.el9.
All builds following this will include this PR.

openshift-bot · 2025-06-10T23:50:54Z

[ART PR BUILD NOTIFIER]

Distgit: ovn-kubernetes-microshift
This PR has been included in build ovn-kubernetes-microshift-container-v4.20.0-202506102141.p0.g36e5a1d.assembly.stream.el9.
All builds following this will include this PR.

openshift-bot · 2025-06-11T00:05:55Z

[ART PR BUILD NOTIFIER]

Distgit: ose-ovn-kubernetes
This PR has been included in build ose-ovn-kubernetes-container-v4.20.0-202506102141.p0.g36e5a1d.assembly.stream.el9.
All builds following this will include this PR.

trozet and others added 10 commits May 20, 2025 16:06

Use watchFactory instead of kclient for gateway snat cleanup

7a30735

Signed-off-by: Tim Rozet <trozet@redhat.com>

Fix node update check for network cluster controller

81ab595

Introduced in 836ec36 This would just cause node updates to fire HandleAddUpdateNodeEvent everytime as the code prior to the aforementioned commit would have. Signed-off-by: Tim Rozet <trozet@redhat.com>

Merge pull request #5261 from trozet/fix_enqueue_pods

c01e2ec

Retry all pods smarter

Merge pull request #5275 from trozet/fix_node_update

b31c67a

Fix node update check for network cluster controller

Merge pull request #5265 from trozet/specify_ovn_ephemeral_port_range

2017ede

Configures ephemeral port range for OVN SNAT'ing

Merge remote-tracking branch 'ovn-org/master' into d/s-merge-06-04-2025

4699b44

openshift-ci bot requested review from JacobTanenbaum and abhat June 4, 2025 22:12

openshift-ci bot changed the title ~~DownStream Merge [06-04-2025]~~ OCPBUGS-55098: DownStream Merge [06-04-2025] Jun 5, 2025

openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Jun 5, 2025

openshift-ci bot requested a review from anuragthehatter June 5, 2025 08:07

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 7, 2025

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 9, 2025

openshift-ci bot assigned trozet Jun 10, 2025

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 10, 2025

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 10, 2025

openshift-merge-bot bot merged commit 36e5a1d into openshift:master Jun 10, 2025
40 of 43 checks passed

Comments

Conversation

jluhrsen commented Jun 4, 2025

📑 Description

Additional Information for reviewers

✅ Checks

How to verify it

Uh oh!

asood-rh commented Jun 4, 2025

Uh oh!

martinkennelly commented Jun 5, 2025

Uh oh!

openshift-ci-robot commented Jun 5, 2025

📑 Description

Additional Information for reviewers

✅ Checks

How to verify it

Uh oh!

martinkennelly commented Jun 5, 2025

Uh oh!

jluhrsen commented Jun 5, 2025

Uh oh!

asood-rh commented Jun 6, 2025

Uh oh!

asood-rh commented Jun 6, 2025

Uh oh!

openshift-ci bot commented Jun 6, 2025

Uh oh!

asood-rh commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jluhrsen commented Jun 7, 2025

Uh oh!

asood-rh commented Jun 9, 2025

Uh oh!

jluhrsen commented Jun 9, 2025

Uh oh!

trozet commented Jun 10, 2025

Uh oh!

openshift-ci bot commented Jun 10, 2025

Uh oh!

Uh oh!

openshift-ci-robot commented Jun 10, 2025

📑 Description

Additional Information for reviewers

✅ Checks

How to verify it

Uh oh!

openshift-bot commented Jun 10, 2025

Uh oh!

openshift-bot commented Jun 10, 2025

Uh oh!

openshift-bot commented Jun 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

asood-rh commented Jun 6, 2025 •

edited

Loading