Skip to content

Conversation

@jluhrsen
Copy link
Contributor

@jluhrsen jluhrsen commented Dec 4, 2025

This is identical to the automated branch sync PR here, but two one line changes to fix the lint job. See that commit message for more details.

This PR is actually enabling linting so we never noticed the issue before. Although the issue was harmless no-op.

with this PR we are in sync with 4.19 like we want to be, with the exception of having to re-pin libreswan because of OCPBUGS-55453

igsilya and others added 30 commits October 16, 2025 07:59
OVN-Kubernetes is always lagging behind on the version of OVN it pins.
This is causing a lot of trouble with keeping up with bug fixes and
especially CVE fixes on older branches, resulting in scanners flagging
this image with poor security grades and much longer time for bug
fixes to be delivered to customers as the PR backporting process can
take weeks or even months.

Removing the pin, so every time the new build is released in FDP, it
automatically gets into versions of OpneShift that use it.  There is
a pre-release testing process in place between FDP and OCP QE that
ensures the required test coverage before the new build is released
through FDP.

Keeping OKD versions separate since sometimes new major versions are
not released at the same time in FDP/RHEL and CentOS, so we may need
them different at some point in time.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
When multiple networks support was first added, all controllers that
were added used the label "Secondary" to indicate they were not
"Default". When UDN was added, it allowed "Secondary" networks to
function as the primary network for a pod, creating terminology
confusion. We now treat non-default networks all as "User-Defined
Networks". This commit changes all naming to conform to the latter.

The only places secondary is used now is for distinguishing whether or
not a UDN is acting as a primary or secondary network for a pod (it's
role).

The only exception to this is udn-isolation. I did not touch this
because it relies on dbIDs, which would impact functionality for
upgrade.

There is no functional change in this commit.

Signed-off-by: Tim Rozet <trozet@nvidia.com>
(cherry picked from commit bbca874)
The k8s e2e utility functions AddOrUpdateLabelOnNode/RemoveLabelOffNode
don't work for labels without a value. The incorrect handling of these
labels caused an incorrect sequence of nodes whem migrating different
than what the tests intended to test.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
(cherry picked from commit 434b48f)
There's two circumstances when IPs were being released incorrectly:

* when a live migratable pod completed with no migration ongoing it was
  not being released due to IsMigratedSourcePodStale outright assuming a
  completed pod was stale.
* when a live migratable pod completed on a different node than the VM's
  original as part of a migration it was being released when it
  shouldn't, we were simply not checking if it was a migration.

It also improves the tests to check for IP release.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
(cherry picked from commit 4c34982)
Don't attempt to release IPs that are not managed by the local zone
which can happen with live migratable pods, otherwise we would get
distracting error logs on release.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
(cherry picked from commit 7a155cc)
ConditionalIPRelease would always return false when checking IPs not
tracked in the local zone so in that case we were not correctly checking
for colliding pods.

This was hidden by the fact that IsMigratedSourcePodStale was used just
before instead of AllVMPodsAreCompleted until a very recent fix and that
would always return false for a completed live migratable pod.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
(cherry picked from commit 0dc8f27)
Or completion of a failed target pod

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
(cherry picked from commit c1b02b5)
As it is the most complex scenario and a superset of testing without it

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
(cherry picked from commit ef92f78)
I accidentally removed the check in recent PR [1] which could have
performance consequences as checking agains other pods has a cost.
Reintroduce the check with a hopefully useful comment to prevent it form
happening again.

[1] ovn-kubernetes/ovn-kubernetes#5626

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
(cherry picked from commit 76f6439)
When processing pods during an EgressIP status update, the controller used to stop
iterating as soon as it encountered a pod in Pending state (in my case, pod IPs are
not found when pod is in pending state with container creating status).
This caused any subsequent Running pods to be skipped, leaving their SNAT entries
unprogrammed on the egress node.

With this change, only Pending pods are skipped, while iteration continues for the
rest. This ensures that Running pods are properly processed and their SNAT entries
are programmed.

This change also skips pods that are unscheduled or use host networking.

Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com>
(cherry picked from commit 2afbaf6)
…d_4.20

[release-4.20] OCPBUGS-63631: Skip Pending pods in EgressIP status updates
When processing pods during an EgressIP status update, the controller used to stop
iterating as soon as it encountered a pod in Pending state (in my case, pod IPs are
not found when pod is in pending state with container creating status).
This caused any subsequent Running pods to be skipped, leaving their SNAT entries
unprogrammed on the egress node.

With this change, only Pending pods are skipped, while iteration continues for the
rest. This ensures that Running pods are properly processed and their SNAT entries
are programmed.

This change also skips pods that are unscheduled or use host networking.

Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com>
(cherry picked from commit 2afbaf6)
…rry-pick-2721-to-release-4.20

OCPBUGS-63577: [release-4.20] CORENET-6055: Dockerfile: Unpin OVN and consume the latest from FDP.
…rry-pick-2831-to-release-4.19

[release-4.19] OCPBUGS-63660: Skip Pending pods in EgressIP status updates
configure IP/VRF rules only on full/dpu-host mode, and confiugre
openflow rules only on full/dpu mode.

Signed-off-by: Yun Zhou <yunz@nvidia.com>
(cherry picked from commit a996442)
Signed-off-by: Yun Zhou <yunz@nvidia.com>
(cherry picked from commit 60404e5)
OCPBUGS-63007: kubevirt: fix bad release of IPs of live migratable pods
When multiple networks support was first added, all controllers that
were added used the label "Secondary" to indicate they were not
"Default". When UDN was added, it allowed "Secondary" networks to
function as the primary network for a pod, creating terminology
confusion. We now treat non-default networks all as "User-Defined
Networks". This commit changes all naming to conform to the latter.

The only places secondary is used now is for distinguishing whether or
not a UDN is acting as a primary or secondary network for a pod (it's
role).

The only exception to this is udn-isolation. I did not touch this
because it relies on dbIDs, which would impact functionality for
upgrade.

There is no functional change in this commit.

Signed-off-by: Tim Rozet <trozet@nvidia.com>
(cherry picked from commit bbca874)
The k8s e2e utility functions AddOrUpdateLabelOnNode/RemoveLabelOffNode
don't work for labels without a value. The incorrect handling of these
labels caused an incorrect sequence of nodes whem migrating different
than what the tests intended to test.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
(cherry picked from commit 434b48f)
There's two circumstances when IPs were being released incorrectly:

* when a live migratable pod completed with no migration ongoing it was
  not being released due to IsMigratedSourcePodStale outright assuming a
  completed pod was stale.
* when a live migratable pod completed on a different node than the VM's
  original as part of a migration it was being released when it
  shouldn't, we were simply not checking if it was a migration.

It also improves the tests to check for IP release.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
(cherry picked from commit 4c34982)
Don't attempt to release IPs that are not managed by the local zone
which can happen with live migratable pods, otherwise we would get
distracting error logs on release.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
(cherry picked from commit 7a155cc)
ConditionalIPRelease would always return false when checking IPs not
tracked in the local zone so in that case we were not correctly checking
for colliding pods.

This was hidden by the fact that IsMigratedSourcePodStale was used just
before instead of AllVMPodsAreCompleted until a very recent fix and that
would always return false for a completed live migratable pod.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
(cherry picked from commit 0dc8f27)
Or completion of a failed target pod

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
(cherry picked from commit c1b02b5)
As it is the most complex scenario and a superset of testing without it

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
(cherry picked from commit ef92f78)
I accidentally removed the check in recent PR [1] which could have
performance consequences as checking agains other pods has a cost.
Reintroduce the check with a hopefully useful comment to prevent it form
happening again.

[1] ovn-kubernetes/ovn-kubernetes#5626

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
(cherry picked from commit 76f6439)
…n are updated

Signed-off-by: arkadeepsen <arsen@redhat.com>
…tations are updated

Signed-off-by: arkadeepsen <arsen@redhat.com>
Addresses incorrect DNAT rules with <proto>/0 target port when using
services with externalTrafficPolicy: Local and named ports.

The issue occurred when allocateLoadBalancerNodePorts was false and
services referenced pod named ports. The previous implementation
used svcPort.TargetPort.IntValue() which returns 0 for named ports,
causing invalid DNAT rules.

This refactoring introduces/uses structured endpoint types that
properly handle port mapping from endpoint slices, ensuring the
actual pod port numbers are used instead of attempting to convert
named ports to integers.

This change unifies endpoint processing logic by having both the
services controller and nodePortWatcher use the same
GetEndpointsForService function. This ensures consistent endpoint
resolution and port mapping behavior across all service-related
components, preventing divergence in logic and similar unnoticed
port handling issues in the future.

Signed-off-by: Andreas Karis <ak.karis@gmail.com>
(cherry picked from commit 0651593)
Adds tests for loadBalancer services with named ports and
AllocateLoadBalancerNodePorts=False. Add new test cases in
Test_getEndpointsForService.

Signed-off-by: Andreas Karis <ak.karis@gmail.com>
(cherry picked from commit 282b01e)
Signed-off-by: Andreas Karis <ak.karis@gmail.com>
(cherry picked from commit 651759c)
@openshift-ci-robot
Copy link
Contributor

@ricky-rav: This PR has been marked as verified by ci.

Details

In response to this:

/verified by ci

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@kyrtapz
Copy link
Contributor

kyrtapz commented Dec 15, 2025

@jluhrsen
Copy link
Contributor Author

@ricky-rav @jluhrsen isn't the windows job supposed to be fixed by now? https://redhat-internal.slack.com/archives/CDCP2LA9L/p1765794568326359?thread_ts=1762987569.851699&cid=CDCP2LA9L

@kyrtapz , I don't really know any more. the bug we filed is still assigned to the windows team bot and no update to it. I'll comment on it to see if someone from that team can take a look. I would override it for now.

looks like the master job is passing a little bit now.

but 4.17, 4.18 and 4.19 have not seen a single pass since mid-November

4.16 passes sometimes though.

@ricky-rav
Copy link
Contributor

/retest-required

@jluhrsen
Copy link
Contributor Author

/retest-required

@ricky-rav , the e2e-aws-ovn-windows job is not going to pass. I got a reply on the bug here and looks like we need some back-ports to land before we can expect this job to pass.

@ricky-rav
Copy link
Contributor

/override e2e-aws-ovn-windows

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 16, 2025

@ricky-rav: /override requires failed status contexts, check run or a prowjob name to operate on.
The following unknown contexts/checkruns were given:

  • e2e-aws-ovn-windows

Only the following failed contexts/checkruns were expected:

  • ci/prow/4.18-upgrade-from-stable-4.17-e2e-aws-ovn-upgrade
  • ci/prow/4.18-upgrade-from-stable-4.17-e2e-gcp-ovn-rt-upgrade
  • ci/prow/4.18-upgrade-from-stable-4.17-images
  • ci/prow/e2e-aws-ovn
  • ci/prow/e2e-aws-ovn-edge-zones
  • ci/prow/e2e-aws-ovn-fdp-qe
  • ci/prow/e2e-aws-ovn-hypershift
  • ci/prow/e2e-aws-ovn-local-gateway
  • ci/prow/e2e-aws-ovn-local-to-shared-gateway-mode-migration
  • ci/prow/e2e-aws-ovn-serial
  • ci/prow/e2e-aws-ovn-shared-to-local-gateway-mode-migration
  • ci/prow/e2e-aws-ovn-upgrade
  • ci/prow/e2e-aws-ovn-upgrade-local-gateway
  • ci/prow/e2e-aws-ovn-windows
  • ci/prow/e2e-azure-ovn-upgrade
  • ci/prow/e2e-gcp-ovn
  • ci/prow/e2e-gcp-ovn-techpreview
  • ci/prow/e2e-metal-ipi-ovn-dualstack
  • ci/prow/e2e-metal-ipi-ovn-ipv6
  • ci/prow/gofmt
  • ci/prow/images
  • ci/prow/lint
  • ci/prow/okd-scos-images
  • ci/prow/security
  • ci/prow/unit
  • pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn
  • pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-edge-zones
  • pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-fdp-qe
  • pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-hypershift
  • pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-local-gateway
  • pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-local-to-shared-gateway-mode-migration
  • pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-serial
  • pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-shared-to-local-gateway-mode-migration
  • pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-upgrade
  • pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-upgrade-local-gateway
  • pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-windows
  • pull-ci-openshift-ovn-kubernetes-master-e2e-azure-ovn-upgrade
  • pull-ci-openshift-ovn-kubernetes-master-e2e-gcp-ovn
  • pull-ci-openshift-ovn-kubernetes-master-e2e-gcp-ovn-techpreview
  • pull-ci-openshift-ovn-kubernetes-master-e2e-metal-ipi-ovn-dualstack
  • pull-ci-openshift-ovn-kubernetes-master-e2e-metal-ipi-ovn-ipv6
  • pull-ci-openshift-ovn-kubernetes-master-gofmt
  • pull-ci-openshift-ovn-kubernetes-master-images
  • pull-ci-openshift-ovn-kubernetes-master-lint
  • pull-ci-openshift-ovn-kubernetes-master-okd-scos-images
  • pull-ci-openshift-ovn-kubernetes-master-security
  • pull-ci-openshift-ovn-kubernetes-master-unit
  • pull-ci-openshift-ovn-kubernetes-release-4.18-4.18-upgrade-from-stable-4.17-e2e-aws-ovn-upgrade
  • pull-ci-openshift-ovn-kubernetes-release-4.18-4.18-upgrade-from-stable-4.17-e2e-gcp-ovn-rt-upgrade
  • pull-ci-openshift-ovn-kubernetes-release-4.18-4.18-upgrade-from-stable-4.17-images
  • tide

If you are trying to override a checkrun that has a space in it, you must put a double quote on the context.

Details

In response to this:

/override e2e-aws-ovn-windows

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ricky-rav
Copy link
Contributor

/retest-required

@ricky-rav , the e2e-aws-ovn-windows job is not going to pass. I got a reply on the bug here and looks like we need some back-ports to land before we can expect this job to pass.

@kyrtapz could you add the magic label? :) Thanks

@ricky-rav
Copy link
Contributor

/override ci/prow/e2e-aws-ovn-windows

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 16, 2025

@ricky-rav: Overrode contexts on behalf of ricky-rav: ci/prow/e2e-aws-ovn-windows

Details

In response to this:

/override ci/prow/e2e-aws-ovn-windows

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jluhrsen
Copy link
Contributor Author

just need BPRA from @ricky-rav , but for kicks I see the 4.18 backport to make the windows job pass has merged. let's see

/test e2e-aws-ovn-windows

@ricky-rav
Copy link
Contributor

/label backport-risk-assessed

@openshift-ci openshift-ci bot added the backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. label Dec 19, 2025
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 02a1ed0 and 2 for PR HEAD c9b592e in total

@ricky-rav
Copy link
Contributor

/retest-required

1 similar comment
@ricky-rav
Copy link
Contributor

/retest-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 22, 2025

@jluhrsen: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 8036759 into openshift:release-4.18 Dec 22, 2025
25 of 26 checks passed
@openshift-ci-robot
Copy link
Contributor

@jluhrsen: Jira Issue Verification Checks: Jira Issue OCPBUGS-66428
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-66428 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

Jira Issue Verification Checks: Jira Issue OCPBUGS-66360
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-66360 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

Jira Issue OCPBUGS-48710: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-48710 has been moved to the MODIFIED state.

Jira Issue Verification Checks: Jira Issue OCPBUGS-66381
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-66381 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

Details

In response to this:

This is identical to the automated branch sync PR here, but two one line changes to fix the lint job. See that commit message for more details.

This PR is actually enabling linting so we never noticed the issue before. Although the issue was harmless no-op.

with this PR we are in sync with 4.19 like we want to be, with the exception of having to re-pin libreswan because of OCPBUGS-55453

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-robot
Copy link
Contributor

Fix included in accepted release 4.18.0-0.nightly-2025-12-24-222251

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.