Skip to content

OCPBUGS-61454: [4.19] allow default network -> localnet on the same node for any localnet subnet#2753

Merged
sdodson merged 14 commits intoopenshift:release-4.19from
ricky-rav:OCPBUGS-59657_419
Oct 8, 2025
Merged

OCPBUGS-61454: [4.19] allow default network -> localnet on the same node for any localnet subnet#2753
sdodson merged 14 commits intoopenshift:release-4.19from
ricky-rav:OCPBUGS-59657_419

Conversation

@ricky-rav
Copy link
Contributor

@ricky-rav ricky-rav commented Sep 9, 2025

Upstream: ovn-kubernetes/ovn-kubernetes#5480
Downstream master: #2750
4.20: #2751

For the 4.18 backport, we'll need to wait for #2745 #2663 to merge first.

trozet and others added 14 commits September 9, 2025 16:41
Fixes regression from 1448d5a

The previous commit dropped matching on in_port so that localnet ports
would also use table 1. This allows reply packets from a localnet pod
towards the shared OVN/LOCAL IP to be sent to the correct port.

However, a regression was introduced where traffic coming from these
localnet ports to any destination would be sent to table 1. Egress
traffic from the localnet ports is not committed to conntrack, so by
sending to table=1 via CT we were getting a miss.

This is especially bad for hardware offload where a localnet port is
being used as the Geneve encap port. In this case all geneve traffic
misses in CT lookup and is not offloaded.

Table 1 is intended to be for handling IP traffic destined to the shared
Gateway IP/MAC that both the Host and OVN use. It is also used to handle
reply traffic for Egress IP. To fix this problem, we can add dl_dst
match criteria to this flow, ensuring that only traffic destined to the
Host/OVN goes to table 1.

Furthermore, after fixing this problem there still exists the issue that
localnet -> host/OVN egress traffic will still enter table 1 and CT
miss. Potentially this can be fixed with always committing egress
traffic, but it might have performance penalty, so deferring that fix to
a later date.

Signed-off-by: Tim Rozet <trozet@nvidia.com>
(cherry picked from commit 318f8ce)
We did this for IPv4 in 1448d5a, but forgot about IPv6.

Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
(cherry picked from commit 66d8f14)
Add dl_dst=$breth0 to table=0, prio=50 for IPv6

We want to match in table=1 only conntrack'ed reply traffic whose next hop is either OVN or the host. As a consequence, localnet traffic whose next hop is an external router (and that might or might not be destined to OVN/host) should bypass table=1 and just hit the NORMAL flow in table=0.

Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
(cherry picked from commit ef1aa99)
Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
(cherry picked from commit 4ce92a9)
We already tested localnet -> host, let's also cover connections initiated from the host.
The localnet uses IPs in the same subnet as the host network.

Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
(cherry picked from commit a5029f8)
We have two non-InterConnect CI lanes for multihoming, while only one with IC enabled (and local gw). We need coverage with IC enabled for both gateway modes, so let's make an existing non-IC lane IC enabled, set it as dualstack and gateway=shared to have better coverage.

Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
(cherry picked from commit bf6f9c1)
Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
(cherry picked from commit 6de44ef)
Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
(cherry picked from commit c4cc25a)
This is needed because we will need to generate IPs from different subnets than just the host subnet.

Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
(cherry picked from commit eb5f3c1)
Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
(cherry picked from commit f82e101)
The localnet is on a subnet different than the host subnet, the corresponding NAD is configured with a VLAN ID, the localnet pod uses an external router to communicate to cluster pods.

Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
(cherry picked from commit 69ec569)
Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
(cherry picked from commit 51eae7a)
Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
(cherry picked from commit dea42b4)
In testing we saw how an invalid conntrack state would drop all echo requests after the first one. Let's send three pings in each test then.

Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
(cherry picked from commit b004ed0)
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Sep 9, 2025
@openshift-ci-robot
Copy link
Contributor

@ricky-rav: This pull request references Jira Issue OCPBUGS-61454, which is invalid:

  • expected dependent Jira Issue OCPBUGS-61453 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but it is New instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Upstream: ovn-kubernetes/ovn-kubernetes#5480
Downstream master: #2750
4.20: #2751

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ricky-rav
Copy link
Contributor Author

/retest

2 similar comments
@ricky-rav
Copy link
Contributor Author

/retest

@ricky-rav
Copy link
Contributor Author

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 26, 2025

@ricky-rav: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw-techpreview fe444a7 link false /test e2e-metal-ipi-ovn-dualstack-bgp-local-gw-techpreview
ci/prow/e2e-aws-ovn-serial-ipsec fe444a7 link false /test e2e-aws-ovn-serial-ipsec
ci/prow/e2e-aws-ovn-hypershift-conformance-techpreview fe444a7 link false /test e2e-aws-ovn-hypershift-conformance-techpreview
ci/prow/e2e-aws-ovn-hypershift-kubevirt fe444a7 link false /test e2e-aws-ovn-hypershift-kubevirt
ci/prow/security fe444a7 link false /test security
ci/prow/okd-scos-e2e-aws-ovn fe444a7 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ricky-rav
Copy link
Contributor Author

/retest
/payload 4.19 ci blocking
/payload 4.19 nightly blocking

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 2, 2025

@ricky-rav: trigger 5 job(s) of type blocking for the ci release of OCP 4.19

  • periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.19-e2e-gcp-ovn-upgrade
  • periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aks
  • periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aws-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/377ee090-9f74-11f0-8df4-85b9fb56a36c-0

trigger 11 job(s) of type blocking for the nightly release of OCP 4.19

  • periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-azure-aks-ovn-conformance
  • periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aws-ovn-conformance
  • periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-ovn-serial
  • periodic-ci-openshift-release-master-ci-4.19-e2e-aws-upgrade-ovn-single-node
  • periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-techpreview
  • periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-techpreview-serial
  • periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-ovn-upgrade-fips
  • periodic-ci-openshift-release-master-ci-4.19-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-gcp-ovn-rt-upgrade
  • periodic-ci-openshift-release-master-nightly-4.19-e2e-metal-ipi-ovn-bm
  • periodic-ci-openshift-release-master-nightly-4.19-e2e-metal-ipi-ovn-ipv6

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/377ee090-9f74-11f0-8df4-85b9fb56a36c-1

@asood-rh
Copy link
Contributor

asood-rh commented Oct 2, 2025

/verified by @asood-rh

Details in https://issues.redhat.com/browse/OCPBUGS-61454

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Oct 2, 2025
@openshift-ci-robot
Copy link
Contributor

@asood-rh: This PR has been marked as verified by @asood-rh.

Details

In response to this:

/verified by @asood-rh

Details in https://issues.redhat.com/browse/OCPBUGS-61454

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ricky-rav
Copy link
Contributor Author

4.19 ci blocking
All jobs passed except for periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aks, which never really took off -> To retest

4.19 nightly blocking
3 failing jobs:

  • periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-azure-aks-ovn-conformance
    • build11 error after 30m for all runs, I will rerun this
  • periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aws-ovn-conformance
    • build11 error after 30m for all runs, I will rerun this
  • periodic-ci-openshift-release-master-ci-4.19-e2e-azure-ovn-upgrade
    • failed for lack of resources (leases) on Azure => I will rerun this too
INFO[2025-10-02T10:23:50Z] Acquiring leases for test e2e-azure-ovn-upgrade: [azure-2-quota-slice] 
ERRO[2025-10-02T12:23:50Z] error: Failed to acquire resource, current capacity: 0 free, 57 leased 
INFO[2025-10-02T12:23:50Z] Ran for 2h40m8s                              
ERRO[2025-10-02T12:23:50Z] Some steps failed:                           
ERRO[2025-10-02T12:23:50Z] 
  * could not run steps: step e2e-azure-ovn-upgrade failed: failed to acquire lease for "azure-2-quota-slice": resources not found 

/payload-aggregate periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aks 5
/payload-aggregate periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-azure-aks-ovn-conformanc 5
/payload-aggregate periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aws-ovn-conformance 5
/payload-aggregate periodic-ci-openshift-release-master-ci-4.19-e2e-azure-ovn-upgrade 5

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 3, 2025

@ricky-rav: trigger 3 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aks
  • periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-azure-aks-ovn-conformance
  • periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aws-ovn-conformance

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/5abadb00-a06d-11f0-95dc-f3e7f32af62c-0

@tssurya
Copy link
Contributor

tssurya commented Oct 7, 2025

/override ci/prow/lint

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 7, 2025

@tssurya: Overrode contexts on behalf of tssurya: ci/prow/lint

Details

In response to this:

/override ci/prow/lint

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@tssurya
Copy link
Contributor

tssurya commented Oct 8, 2025

/lgtm

@tssurya
Copy link
Contributor

tssurya commented Oct 8, 2025

Next time lets use -x from 4.20 for the commits not master

@tssurya
Copy link
Contributor

tssurya commented Oct 8, 2025

/label backport-risk-assessed

@openshift-ci openshift-ci bot added the backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. label Oct 8, 2025
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 8, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 8, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ricky-rav, tssurya

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 8, 2025
@sdodson
Copy link
Member

sdodson commented Oct 8, 2025

/label jira/valid-bug
/unlabel jira/invalid-bug
Upstream PR is marked as verified so will move to verified as soon as an accepted payload lands

@openshift-ci openshift-ci bot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Oct 8, 2025
@sdodson sdodson merged commit fa9e097 into openshift:release-4.19 Oct 8, 2025
48 of 58 checks passed
@openshift-ci-robot
Copy link
Contributor

@ricky-rav: Jira Issue OCPBUGS-61454: Some pull requests linked via external trackers have merged:

The following pull request, linked via external tracker, has not merged:

All associated pull requests must be merged or unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-61454 has not been moved to the MODIFIED state.

This PR is marked as verified. If the remaining PRs listed above are marked as verified before merging, the issue will automatically be moved to VERIFIED after all of the changes from the PRs are available in an accepted nightly payload.

Details

In response to this:

Upstream: ovn-kubernetes/ovn-kubernetes#5480
Downstream master: #2750
4.20: #2751

For the 4.18 backport, we'll need to wait for #2745 #2663 to merge first.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@sdodson sdodson removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Oct 8, 2025
@ricky-rav
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot
Copy link
Contributor

@ricky-rav: Jira Issue Verification Checks: Jira Issue OCPBUGS-61454
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-61454 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-robot
Copy link
Contributor

Fix included in accepted release 4.19.0-0.nightly-2025-10-09-001851

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.