Skip to content

Conversation

@kyrtapz
Copy link
Contributor

@kyrtapz kyrtapz commented Aug 5, 2025

Bump OVN to 25.03.0-73.el9fdp for OCP and 25.03.1-36.el9s for OKD.
Using a different version for OKD as it is currently the only one available.

This is a list of relevant bug fixes and new core OVN features picked up by the bump:

Bug fixes:
==========
- pinctrl: Fix missing garp.
- binding: Avoid 100% CPU when postponing claims.
- northd: Sample_Collector.set_ids can actually be 32-bit values.

New Features:
=============
- Added support to choose selection methods - dp_hash or
  hash (with specified hash fields) for ECMP routes
  while choosing nexthop.
- Added support for Spine-Leaf topology of logical switches by adding
  a new LSP type 'switch' that can directly connect two logical switches.
  Supported for both distributed and transit switches.
- SSL/TLS:
  * TLSv1 and TLSv1.1 protocols are deprecated and disabled by default
    on OpenFlow and database connections.  Use --ssl-protocols to turn
    them back on.  Support will be fully removed in the next release.
  * OpenSSL 1.1.1 or newer is now required for SSL/TLS support.
  * The protocol list in --ssl-protocols or corresponding database column
    now supports specifying simple protocol ranges like:
      - "TLSv1-TLSv1.2" to enable all protocols between TLSv1 and TLSv1.2.
      - "TLSv1.2+" to enable protocol TLSv1.2 and later.
    The value must be a list of protocols or exactly one protocol range.
  * Added explicit support for TLSv1.3.  It can now be enabled via
    --ssl-protocols (TLSv1.3 was supported in earlier versions only when
    this option was not set).  TLS ciphersuites for TLSv1.3 and later can
    be configured via --ssl-ciphersuites (--ssl-ciphers only applies to
    TLSv1.2 and earlier).
- Add "arp-nd-max-timeout-sec" config option to vswitchd external-ids to
  configure the interval (in seconds) between ovn-controller originated
  ARP/ND packets used for tracking ECMP next hop MAC addresses.
- Auto flush ECMP symmetric reply connection states when an ECMP route is
  removed by the CMS.  This behavior is controlled by the
  "ecmp_nexthop_monitor_enable" config option in the NB_Global table.
  Disabled by default.
- Improved handling of IPv6 traffic by enabling address prefix tracking
  in OVS for both IPv4 and IPv6 addresses, whenever possible, reducing
  the amount of IPv6 datapath flows.
- Add concept of Transit Routers, users are now allowed to specify
  options:requested-chassis for router ports; if the chassis is remote
  then the router port will behave as a remote port.
- Added a new ACL option "persist-established" that allows for
  established connections to bypass ACL matching. This way, if an ACL
  match changes, traffic on the established connection can still pass.
- Logical router policies can now be arranged in chains. Using the new
  "jump" action, combined with new "chain" and "jump_chain" columns,
  allows for policies to be chained together.
- Dynamic Routing support (FRR BGP integration for unicast routing)
- Add "options:ct-commit-all" to LR, that enables commit of all traffic
  to DNAT and SNAT zone when LR is stateful.

There is a slight error in the commit message as pointed out here. Due to time constraints we've decided to go with it to avoid re-running CI.

@kyrtapz kyrtapz changed the title Bump OVN to 25.03 NO-JIRA: Bump OVN to 25.03 Aug 5, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Aug 5, 2025
@openshift-ci-robot
Copy link
Contributor

@kyrtapz: This pull request explicitly references no jira issue.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 5, 2025
@kyrtapz
Copy link
Contributor Author

kyrtapz commented Aug 5, 2025

/hold
Lets wait for the centos release of ovn 25.03

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 5, 2025
@kyrtapz
Copy link
Contributor Author

kyrtapz commented Aug 5, 2025

/cc @tssurya @almusil @dceara

@openshift-ci openshift-ci bot requested review from almusil, dceara and tssurya August 5, 2025 17:14
@kyrtapz
Copy link
Contributor Author

kyrtapz commented Aug 6, 2025

/retest

@dceara
Copy link
Contributor

dceara commented Aug 6, 2025

Hi @kyrtapz,

This is a list of relevant bug fixes and new core OVN features picked up by the bump:

Bug fixes:
==========
- logical-fields: Fix IPv6 dp flow explosion caused by ip6.mcast_rsvd. (#FDP-1557)
  https://issues.redhat.com/browse/FDP-1557
- controller: Slightly optimize the runtime_data handler for sb_ro.
- Revert "northd: Don't skip the unSNAT stage for traffic towards VIPs."
  - fixes HWOL for node port traffic with NVidia NICs
- controller: Install QoS rules even on 'system' ports. (#FDP-1472)
  https://issues.redhat.com/browse/FDP-1472
- controller: Make sure we run engine_cleanup after thread destroy.
- northd: Sample_Collector.set_ids can actually be 32-bit values.

New Features:
=============
- Added support to choose selection methods - dp_hash or
  hash (with specified hash fields) for ECMP routes
  while choosing nexthop.
- Added support for Spine-Leaf topology of logical switches by adding
  a new LSP type 'switch' that can directly connect two logical switches.
  Supported for both distributed and transit switches.
- SSL/TLS:
  * TLSv1 and TLSv1.1 protocols are deprecated and disabled by default
    on OpenFlow and database connections.  Use --ssl-protocols to turn
    them back on.  Support will be fully removed in the next release.
  * OpenSSL 1.1.1 or newer is now required for SSL/TLS support.
  * The protocol list in --ssl-protocols or corresponding database column
    now supports specifying simple protocol ranges like:
      - "TLSv1-TLSv1.2" to enable all protocols between TLSv1 and TLSv1.2.
      - "TLSv1.2+" to enable protocol TLSv1.2 and later.
    The value must be a list of protocols or exactly one protocol range.
  * Added explicit support for TLSv1.3.  It can now be enabled via
    --ssl-protocols (TLSv1.3 was supported in earlier versions only when
    this option was not set).  TLS ciphersuites for TLSv1.3 and later can
    be configured via --ssl-ciphersuites (--ssl-ciphers only applies to
    TLSv1.2 and earlier).
- Add "arp-nd-max-timeout-sec" config option to vswitchd external-ids to
  configure the interval (in seconds) between ovn-controller originated
  ARP/ND packets used for tracking ECMP next hop MAC addresses.
- Auto flush ECMP symmetric reply connection states when an ECMP route is
  removed by the CMS.  This behavior is controlled by the
  "ecmp_nexthop_monitor_enable" config option in the NB_Global table.
  Disabled by default.
- Improved handling of IPv6 traffic by enabling address prefix tracking
  in OVS for both IPv4 and IPv6 addresses, whenever possible, reducing
  the amount of IPv6 datapath flows.
- Add concept of Transit Routers, users are now allowed to specify
  options:requested-chassis for router ports; if the chassis is remote
  then the router port will behave as a remote port.
- Added a new ACL option "persist-established" that allows for
  established connections to bypass ACL matching. This way, if an ACL
  match changes, traffic on the established connection can still pass.
- Logical router policies can now be arranged in chains. Using the new
  "jump" action, combined with new "chain" and "jump_chain" columns,
  allows for policies to be chained together.
- Dynamic Routing support (FRR BGP integration for unicast routing)
- Add "options:ct-commit-all" to LR, that enables commit of all traffic
  to DNAT and SNAT zone when LR is stateful.

It would be great if you could add that to the commit message.

Thanks,
Dumitru

A list of relevant bug fixes and new core OVN features picked up by the bump:

Bug fixes:
==========
- logical-fields: Fix IPv6 dp flow explosion caused by ip6.mcast_rsvd. (#FDP-1557)
  https://issues.redhat.com/browse/FDP-1557
- controller: Slightly optimize the runtime_data handler for sb_ro.
- Revert "northd: Don't skip the unSNAT stage for traffic towards VIPs."
  - fixes HWOL for node port traffic with NVidia NICs
- controller: Install QoS rules even on 'system' ports. (#FDP-1472)
  https://issues.redhat.com/browse/FDP-1472
- controller: Make sure we run engine_cleanup after thread destroy.
- northd: Sample_Collector.set_ids can actually be 32-bit values.

New Features:
=============
- Added support to choose selection methods - dp_hash or
  hash (with specified hash fields) for ECMP routes
  while choosing nexthop.
- Added support for Spine-Leaf topology of logical switches by adding
  a new LSP type 'switch' that can directly connect two logical switches.
  Supported for both distributed and transit switches.
- SSL/TLS:
  * TLSv1 and TLSv1.1 protocols are deprecated and disabled by default
    on OpenFlow and database connections.  Use --ssl-protocols to turn
    them back on.  Support will be fully removed in the next release.
  * OpenSSL 1.1.1 or newer is now required for SSL/TLS support.
  * The protocol list in --ssl-protocols or corresponding database column
    now supports specifying simple protocol ranges like:
      - "TLSv1-TLSv1.2" to enable all protocols between TLSv1 and TLSv1.2.
      - "TLSv1.2+" to enable protocol TLSv1.2 and later.
    The value must be a list of protocols or exactly one protocol range.
  * Added explicit support for TLSv1.3.  It can now be enabled via
    --ssl-protocols (TLSv1.3 was supported in earlier versions only when
    this option was not set).  TLS ciphersuites for TLSv1.3 and later can
    be configured via --ssl-ciphersuites (--ssl-ciphers only applies to
    TLSv1.2 and earlier).
- Add "arp-nd-max-timeout-sec" config option to vswitchd external-ids to
  configure the interval (in seconds) between ovn-controller originated
  ARP/ND packets used for tracking ECMP next hop MAC addresses.
- Auto flush ECMP symmetric reply connection states when an ECMP route is
  removed by the CMS.  This behavior is controlled by the
  "ecmp_nexthop_monitor_enable" config option in the NB_Global table.
  Disabled by default.
- Improved handling of IPv6 traffic by enabling address prefix tracking
  in OVS for both IPv4 and IPv6 addresses, whenever possible, reducing
  the amount of IPv6 datapath flows.
- Add concept of Transit Routers, users are now allowed to specify
  options:requested-chassis for router ports; if the chassis is remote
  then the router port will behave as a remote port.
- Added a new ACL option "persist-established" that allows for
  established connections to bypass ACL matching. This way, if an ACL
  match changes, traffic on the established connection can still pass.
- Logical router policies can now be arranged in chains. Using the new
  "jump" action, combined with new "chain" and "jump_chain" columns,
  allows for policies to be chained together.
- Dynamic Routing support (FRR BGP integration for unicast routing)
- Add "options:ct-commit-all" to LR, that enables commit of all traffic
  to DNAT and SNAT zone when LR is stateful.

Co-authored-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Patryk Diak <pdiak@redhat.com>
@dceara
Copy link
Contributor

dceara commented Aug 6, 2025

/lgtm

Thanks!

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 6, 2025
@igsilya
Copy link
Contributor

igsilya commented Aug 6, 2025

I assume this list is compiled from the newer build. At least the fix above is not available in the 25.03.0-73.el9fdp. It is in the 25.03.1-36.el9s OKD version though, but I'd guess the el9 build is more important here.

@dceara
Copy link
Contributor

dceara commented Aug 6, 2025

I assume this list is compiled from the newer build. At least the fix above is not available in the 25.03.0-73.el9fdp. It is in the 25.03.1-36.el9s OKD version though, but I'd guess the el9 build is more important here.

Thanks for pointing that out @igsilya! I had incorrectly compiled the list of bug fixes. It should actually be:

Bug fixes:
==========
- pinctrl: Fix missing garp.
- binding: Avoid 100% CPU when postponing claims.
- northd: Sample_Collector.set_ids can actually be 32-bit values.

Sorry, @kyrtapz, can you please update the commit message again? The features section should be correct.

Thanks,
Dumitru

@tssurya
Copy link
Contributor

tssurya commented Aug 7, 2025

/retest

@tssurya
Copy link
Contributor

tssurya commented Aug 7, 2025

@kyrtapz are you gonna update the commit message again? -> I'm OK if you put that changed info on the PR description instead - so make your PR description also show the correct list of things.. I'm more interested in CI
DS Merge is ready and waiting for this

@kyrtapz
Copy link
Contributor Author

kyrtapz commented Aug 7, 2025

/test okd-scos-e2e-aws-ovn

@kyrtapz
Copy link
Contributor Author

kyrtapz commented Aug 7, 2025

Some failures are caused by openshift/kubernetes#2382.

@kyrtapz
Copy link
Contributor Author

kyrtapz commented Aug 7, 2025

/test okd-scos-e2e-aws-ovn

@asood-rh
Copy link
Contributor

asood-rh commented Aug 7, 2025

/test e2e-aws-ovn-fdp-qe

@kyrtapz
Copy link
Contributor Author

kyrtapz commented Aug 7, 2025

Trying a NOOP run here: #2524 (comment)

Same failures seen on a noop PR: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_ovn-kubernetes/2524/pull-ci-openshift-ovn-kubernetes-master-okd-scos-e2e-aws-ovn/1953477150501769216
So it is very unlikely that the OVN bump in this PR is causing it.
As a followup I started a thread with the OKD team: https://redhat-internal.slack.com/archives/CLKF3H5RS/p1754590575604929

@kyrtapz
Copy link
Contributor Author

kyrtapz commented Aug 7, 2025

Some failures are caused by openshift/kubernetes#2382.

The fix merged.
/retest

@asood-rh
Copy link
Contributor

asood-rh commented Aug 7, 2025

/test e2e-aws-ovn-fdp-qe

@asood-rh
Copy link
Contributor

asood-rh commented Aug 7, 2025

/test e2e-aws-ovn-fdp-qe
Egress IP test leaves around IP assigned to node

 error running /tmp/home/kubectl --server=https://api.ci-op-05mhig6h-96186.origin-ci-int-aws.dev.rhcloud.com:6443 --kubeconfig=/tmp/kubeconfig-2728899560 --namespace=e2e-test-networking-adminnetworkpolicy-2tnbf exec test-pod-73454-0 -- /bin/sh -x -c curl -I --connect-timeout 5 -s 10.0.109.68 10.0.83.191:30003:\nCommand stdout:\n\nstderr:\n+ curl -I --connect-timeout 5 -s 10.0.109.68 10.0.83.191:30003\ncommand terminated with exit code 28\n\nerror:\nexit status 28"

Will create a OCPQE ticket to find out which test does not clean up.

@asood-rh
Copy link
Contributor

asood-rh commented Aug 8, 2025

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Aug 8, 2025
@asood-rh
Copy link
Contributor

asood-rh commented Aug 8, 2025

/test e2e-aws-ovn-fdp-qe
Egress IP test leaves around IP assigned to node

error running /tmp/home/kubectl --server=https://api.ci-op-05mhig6h-96186.origin-ci-int-aws.dev.rhcloud.com:6443 --kubeconfig=/tmp/kubeconfig-2728899560 --namespace=e2e-test-networking-adminnetworkpolicy-2tnbf exec test-pod-73454-0 -- /bin/sh -x -c curl -I --connect-timeout 5 -s 10.0.109.68 10.0.83.191:30003:\nCommand stdout:\n\nstderr:\n+ curl -I --connect-timeout 5 -s 10.0.109.68 10.0.83.191:30003\ncommand terminated with exit code 28\n\nerror:\nexit status 28"
Will create a OCPQE ticket to find out which test does not clean up.

https://issues.redhat.com/browse/OCPQE-30442

@kyrtapz
Copy link
Contributor Author

kyrtapz commented Aug 8, 2025

/retest

1 similar comment
@tssurya
Copy link
Contributor

tssurya commented Aug 8, 2025

/retest

@kyrtapz
Copy link
Contributor Author

kyrtapz commented Aug 8, 2025

ci/prow/e2e-aws-ovn-hypershift-conformance-techpreview - This is failing on other PRs for the same reason, unlikely related to the OVN bump.
ci/prow/e2e-aws-ovn-hypershift-kubevirt - never passes anywhere :(
ci/prow/e2e-azure-ovn-upgrade - keeps failing for different reasons, I don't think it is related to this PR.
ci/prow/e2e-gcp-ovn-techpreview - hard to get it running, last meaningful run(linked) shows a failure that is unlikely to be caused by this PR. The job is not doing great in general.

@kyrtapz
Copy link
Contributor Author

kyrtapz commented Aug 8, 2025

/test e2e-azure-ovn
/test e2e-azure-ovn-upgrade
/test e2e-gcp-ovn-techpreview

@tssurya
Copy link
Contributor

tssurya commented Aug 8, 2025

/test qe-perfscale-aws-ovn-small-udn-density-churn-l3

@kyrtapz
Copy link
Contributor Author

kyrtapz commented Aug 8, 2025

qe-perfscale-aws-ovn-small-udn-density-churn-l3 is failing due to a know issue: https://issues.redhat.com/browse/OCPBUGS-59738

Copy link
Contributor

@tssurya tssurya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

CI looking good. We are confident that nothing to the best of our knowledge is broken here. But anyways we will have 1.5 sprints of soak time for OVN bump in 4.20 before we GA which is good.
This is blocking code from entering 4.19 since downstream merges are blocked. Let's get this in!
azure jobs that are required should hopefully pass over the weekend - the reasons they fail is not related to us!

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 8, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dceara, kyrtapz, tssurya

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kyrtapz
Copy link
Contributor Author

kyrtapz commented Aug 8, 2025

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 8, 2025
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD dca1e7d and 2 for PR HEAD 0a387dc in total

@kyrtapz
Copy link
Contributor Author

kyrtapz commented Aug 10, 2025

/retest-required

@tssurya
Copy link
Contributor

tssurya commented Aug 10, 2025

/retest-required

azure-upgrade seems adamant!

@tssurya
Copy link
Contributor

tssurya commented Aug 10, 2025

/tide refresh

@tssurya
Copy link
Contributor

tssurya commented Aug 10, 2025

oh no @kyrtapz i think this PR was opened before the most recent ds merge? so we might have a merge pool churn tide tells me its planning to retest 26 jobs :(

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 10, 2025

@kyrtapz: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/qe-perfscale-aws-ovn-small-udn-density-churn-l3 0a387dc link false /test qe-perfscale-aws-ovn-small-udn-density-churn-l3
ci/prow/e2e-aws-ovn-hypershift-conformance-techpreview 0a387dc link false /test e2e-aws-ovn-hypershift-conformance-techpreview
ci/prow/okd-scos-e2e-aws-ovn 0a387dc link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-aws-ovn-hypershift-kubevirt 0a387dc link false /test e2e-aws-ovn-hypershift-kubevirt
ci/prow/security 0a387dc link false /test security

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@kyrtapz
Copy link
Contributor Author

kyrtapz commented Aug 11, 2025

/override ci/prow/lint

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 11, 2025

@kyrtapz: Overrode contexts on behalf of kyrtapz: ci/prow/lint

Details

In response to this:

/override ci/prow/lint

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-bot openshift-merge-bot bot merged commit c1ecb1a into openshift:master Aug 11, 2025
48 of 53 checks passed
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ovn-kubernetes-base
This PR has been included in build ose-ovn-kubernetes-base-container-v4.20.0-202508110914.p0.gc1ecb1a.assembly.stream.el9.
All builds following this will include this PR.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ovn-kubernetes-microshift
This PR has been included in build ovn-kubernetes-microshift-container-v4.20.0-202508110914.p0.gc1ecb1a.assembly.stream.el9.
All builds following this will include this PR.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-ovn-kubernetes
This PR has been included in build ose-ovn-kubernetes-container-v4.20.0-202508110914.p0.gc1ecb1a.assembly.stream.el9.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants