OCPBUGS-81526: Branch Sync release-4.20 to release-4.19 [03-03-2026]#3034
Conversation
On breth0 we had the following flow for serviceCIDR: https://github.com/ovn-kubernetes/ovn-kubernetes/blob/6082160282cf263efd162c0d04ca81ae3c9ecda7/go-controller/pkg/node/gateway_shared_intf.go#L427 that was pinned to `ip` and hence ipv6 singlestack environments were not getting created correcly. We would end up with this following error: E0708 08:42:22.397822 12930 openflow_manager.go:133] Failed to add flows, error: exit status 1, stderr, ovs-ofctl: -:2: fd02::1: invalid IP address , flows: map[DEFAULT:[cookie=0xdeff105, priority=205, in_port=1, dl_dst=00:35:01:a6:81:fe, udp6, udp_dst=6081, actions=output:LOCAL cookie=0xdeff105, priority=200, in_port=1, udp6, udp_dst=6081, actions=NORMAL cookie=0xdeff105, priority=200, in_port=LOCAL, udp6, udp_dst=6081, actions=output:1 cookie=0xdeff105, priority=500, in_port=2, ipv6, ipv6_dst=fd69::2, ipv6_src=fd2e:6f44:5dd8:c956::19,actions=ct(commit,zone=64001,nat(dst=fd2e:6f44:5dd8:c956::19),table=4) cookie=0xdeff105, priority=500, in_port=3, ipv6, ipv6_dst=fd69::2, ipv6_src=fd2e:6f44:5dd8:c956::19,actions=ct(commit,zone=64001,nat(dst=fd2e:6f44:5dd8:c956::19),table=4) cookie=0xdeff105, priority=500, in_port=2, ipv6, ipv6_dst=fd00:1101::1bcd:4ef3:764:ec61, ipv6_src=fd2e:6f44:5dd8:c956::19,actions=ct(commit,zone=64001,table=4) cookie=0xdeff105, priority=500, in_port=3, ipv6, ipv6_dst=fd00:1101::1bcd:4ef3:764:ec61, ipv6_src=fd2e:6f44:5dd8:c956::19,actions=ct(commit,zone=64001,table=4) cookie=0xdeff105, priority=500, in_port=2, ipv6, ipv6_dst=fd2e:6f44:5dd8:ca56::19, ipv6_src=fd2e:6f44:5dd8:c956::19,actions=ct(commit,zone=64001,table=4) cookie=0xdeff105, priority=500, in_port=3, ipv6, ipv6_dst=fd2e:6f44:5dd8:ca56::19, ipv6_src=fd2e:6f44:5dd8:c956::19,actions=ct(commit,zone=64001,table=4) cookie=0xdeff105, priority=500, in_port=LOCAL, ipv6, ipv6_dst=fd69::1,actions=ct(zone=64002,nat,table=5) cookie=0xdeff105, priority=500, in_port=LOCAL, ipv6, ipv6_dst=fd02::/112, actions=ct(commit,zone=64001,nat(src=fd69::2),table=2) cookie=0xdeff105, priority=550, in_port=LOCAL, ipv6, ipv6_src=fd69::/112, ipv6_dst=fd02::/112, actions=ct(commit,zone=64001,table=2) cookie=0xdeff105, priority=550, in_port=LOCAL, ipv6, ipv6_src=fdc4:1042:13::/56, ipv6_dst=fd02::/112, actions=ct(commit,zone=64001,table=2) cookie=0xdeff105, priority=500, in_port=2, ipv6, ipv6_src=fd02::/112, ipv6_dst=fd69::/112,actions=ct(zone=64001,nat,table=3) cookie=0xdeff105, priority=105, in_port=2, ipv6, ipv6_dst=fd02::/112,actions=drop cookie=0xdeff105, priority=500, in_port=3, ipv6, ipv6_src=fd02::/112, ipv6_dst=fd69::/112,actions=ct(zone=64001,nat,table=3) cookie=0xdeff105, priority=105, in_port=3, ipv6, ipv6_dst=fd02::/112,actions=drop cookie=0xdeff105, priority=110, table=0, in_port=1, ipv6, nw_frag=yes, actions=ct(table=0,zone=64004) cookie=0xdeff105, priority=100, table=1, ipv6, ct_state=+trk+est, ct_mark=0x1, actions=output:2 cookie=0xdeff105, priority=100, table=1, ipv6, ct_state=+trk+rel, ct_mark=0x1, actions=output:2 cookie=0xdeff105, priority=100, table=1, ipv6, ct_state=+trk+est, ct_mark=0x4, actions=output:3 cookie=0xdeff105, priority=100, table=1, ipv6, ct_state=+trk+rel, ct_mark=0x4, actions=output:3 cookie=0xdeff105, priority=100, table=1, ip6, ct_state=+trk+est, ct_mark=0x2, actions=output:LOCAL cookie=0xdeff105, priority=100, table=1, ip6, ct_state=+trk+rel, ct_mark=0x2, actions=output:LOCAL cookie=0xdeff105, priority=10, table=1, dl_dst=00:35:01:a6:81:fe, actions=output:LOCAL cookie=0xdeff105, priority=100, table=2, actions=set_field:00:35:01:a6:81:fe->eth_dst,output:2 cookie=0xdeff105, priority=200, table=2, ip6, ipv6_src=fdc4:1042:13::/56, actions=set_field:00:35:01:a6:81:fe->eth_dst,output:3 cookie=0xdeff105, priority=200, table=2, ip6, pkt_mark=0x1001, actions=set_field:00:35:01:a6:81:fe->eth_dst,output:3 cookie=0xdeff105, table=3, actions=move:NXM_OF_ETH_DST[]->NXM_OF_ETH_SRC[],set_field:00:35:01:a6:81:fe->eth_dst,output:LOCAL cookie=0xdeff105, table=4,ipv6, actions=ct(commit,zone=64002,nat(src=fd69::1),table=3) cookie=0xdeff105, table=5, ipv6, actions=ct(commit,zone=64001,nat,table=2) cookie=0xdeff105, priority=10, table=0, in_port=1, dl_dst=00:35:01:a6:81:fe, actions=output:2,output:3,output:LOCAL cookie=0xdeff105, priority=10, table=0, in_port=3, dl_src=00:35:01:a6:81:fe, actions=output:NORMAL cookie=0xdeff105, priority=9, table=0, in_port=3, actions=drop cookie=0xdeff105, priority=10, table=0, in_port=2, dl_src=00:35:01:a6:81:fe, actions=output:NORMAL cookie=0xdeff105, priority=9, table=0, in_port=2, actions=drop cookie=0xdeff105, priority=105, in_port=2, dl_src=00:35:01:a6:81:fe, ipv6, pkt_mark=0x3f0 actions=ct(commit, zone=64000, nat(src=fd2e:6f44:5dd8:c956::19), exec(set_field:0x1->ct_mark)),output:1 cookie=0xdeff105, priority=100, in_port=2, dl_src=00:35:01:a6:81:fe, ipv6, actions=ct(commit, zone=64000, exec(set_field:0x1->ct_mark)), output:1 cookie=0xdeff105, priority=102, in_port=2, dl_src=00:35:01:a6:81:fe, ipv6, ipv6_dst=fd00:1101::1bcd:4ef3:764:ec61/128, actions=ct(commit, zone=64000, exec(set_field:0x1->ct_mark)), output:NORMAL cookie=0xdeff105, priority=102, in_port=2, dl_src=00:35:01:a6:81:fe, ipv6, ipv6_dst=fd2e:6f44:5dd8:c956::19/128, actions=ct(commit, zone=64000, exec(set_field:0x1->ct_mark)), output:NORMAL cookie=0xdeff105, priority=102, in_port=2, dl_src=00:35:01:a6:81:fe, ipv6, ipv6_dst=fd2e:6f44:5dd8:ca56::19/128, actions=ct(commit, zone=64000, exec(set_field:0x1->ct_mark)), output:NORMAL cookie=0xdeff105, priority=102, in_port=2, dl_src=00:35:01:a6:81:fe, icmp6, icmpv6_type=135, actions=ct(commit, zone=64000, exec(set_field:0x1->ct_mark)), output:NORMAL cookie=0xdeff105, priority=102, in_port=2, dl_src=00:35:01:a6:81:fe, icmp6, icmpv6_type=136, actions=ct(commit, zone=64000, exec(set_field:0x1->ct_mark)), output:NORMAL cookie=0xdeff105, priority=105, in_port=3, dl_src=00:35:01:a6:81:fe, ipv6, pkt_mark=0x3f0 actions=ct(commit, zone=64000, nat(src=fd2e:6f44:5dd8:c956::19), exec(set_field:0x4->ct_mark)),output:1 cookie=0xdeff105, priority=100, in_port=3, dl_src=00:35:01:a6:81:fe, ipv6, ipv6_src=fd69::b, actions=ct(commit, zone=64000, nat(src=fd2e:6f44:5dd8:c956::19), exec(set_field:0x4->ct_mark)), output:1 cookie=0xdeff105, priority=100, in_port=LOCAL, ipv6, actions=ct(commit, zone=64000, exec(set_field:0x2->ct_mark)), output:1 cookie=0xdeff105, priority=50, in_port=1, ipv6, actions=ct(zone=64000, nat, table=1) cookie=0xdeff105, priority=104, in_port=2, ipv6, ipv6_src=fd01::/48, actions=drop cookie=0xdeff105, priority=109, in_port=2, dl_src=00:35:01:a6:81:fe, ipv6, ipv6_src=fd01:0:0:6::/64actions=ct(commit, zone=64000, exec(set_field:0x1->ct_mark)), output:1 cookie=0xdeff105, priority=15, table=1, ipv6, ipv6_dst=fd01::/48, actions=output:2 cookie=0xdeff105, priority=16, table=1, ipv6, ipv6_dst=fd01:0:0:6::2, actions=output:LOCAL cookie=0xdeff105, priority=15, table=1, ipv6, ipv6_dst=fdc4:1042:13::/56, actions=output:3 cookie=0xdeff105, priority=16, table=1, ipv6, ipv6_dst=fdc4:1042:13:3::2, actions=output:LOCAL cookie=0xdeff105, priority=10, table=1, dl_dst=00:35:01:a6:81:fe, actions=output:LOCAL cookie=0xdeff105, priority=14, table=1,icmp6,icmpv6_type=134 actions=FLOOD cookie=0xdeff105, priority=14, table=1,icmp6,icmpv6_type=136 actions=FLOOD cookie=0xdeff105, priority=13, table=1, in_port=1, udp6, tp_dst=3784, actions=output:2,output:LOCAL cookie=0xdeff105, priority=0, table=1, actions=output:NORMAL] NORMAL:[table=0,priority=0,actions=NORMAL This commit fixes that but also, fixes the Unit tests which were already dualstack aware but to take IPV6 serviceCIDR family which by default was only the IPV4 address, so the v6 service cidr rules were not getting installed correctly. This commit also changes a few spots to consistently use `ipv6` protocol prefix rather than `ip6` so that pattern matching used in unit tests works correctly w/o the need to account for both types of matches. Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com> (cherry picked from commit 38935ee) Conflicts: go-controller/pkg/node/bridgeconfig/bridgeflows.go because f8ad956 was already backported to 4.20
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com> (cherry picked from commit 1bbb7f3)
kapi service can never be dualstack. In order to test the ipv6 service cidr for kapi server, we need a singlestack ipv6 lane Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com> (cherry picked from commit 8000cfd)
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com> (cherry picked from commit b9ecb33)
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com> (cherry picked from commit d268c01)
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com> (cherry picked from commit e717e42)
When ovnkube-node restarts, it runs syncPodsForUserDefinedNetwork which calls allocatePodIPs. For IPAM-less localnet networks (switches with no subnets), IsNonHostSubnetSwitch returns true, causing allocatePodIPs to return empty string. This prevents the pod from being added to expectedLogicalPorts map, causing deleteStaleLogicalSwitchPorts to delete the LSP. This change add an explicit flag at the subnet allocator to denote that that allocator was created as part of a no host subnet switch, this way code explicitly diferenciates between localnet ipamless and no host subnet since both do no have a subnet but no host subnet do not even have LSPs Signed-off-by: Enrique Llorente <ellorent@redhat.com> (cherry picked from commit d1c55f1) (cherry picked from commit 440dab6)
OCPBUGS-74268: release-4.20 fix(localnet, ipamless): Prevent LSP deletion on sync
[release-4.20] OCPBUGS-73788: Fix service flows for BGP on IPV6
Remove the temporary migration code that was added in 2023 to support the transition to OVN Interconnect (IC) architecture. This HACK code tracked whether remote zone nodes had completed migration using the "k8s.ovn.org/remote-zone-migrated" annotation. This code is no longer needed. Changes: - Remove OvnNodeMigratedZoneName constant and helper functions (SetNodeZoneMigrated, HasNodeMigratedZone, NodeMigratedZoneAnnotationChanged) - Remove migrated field from nodeInfo struct in node_tracker.go - Simplify isLocalZoneNode() in base_network_controller.go and egressip.go - Remove HACK helper functions (checkOVNSBNodeLRSR, fetchLBNames, lbExists, portExists) and migration sync flow from default_node_network_controller.go - Remove remote-zone-migrated annotation from webhook allowed annotations - Update tests to remove references to the migration annotation Assisted by Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com> (cherry picked from commit 7d408c1) (cherry picked from commit 83de58c) Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
The layer2 UDN cleanup tests for IC clusters were failing because of a
zone mismatch between the controller and the test node:
- Controller zone: read from NBGlobal.Name ("global")
- Node zone: set via annotation ("test" when IC enabled)
This mismatch was previously masked in two spots:
1. The HACK in isLocalZoneNode() (removed by commit 7d408c1):
When the controller's zone was "global" (the default), the HACK
bypassed the zone comparison entirely and instead checked whether
the node had a migration annotation. Since the test node had no
migration annotation, it was treated as local despite the zone
mismatch.
2. Unconditional gateway cleanup in deleteNodeEvent (changed by
commit 8725a93 to only cleanup nodes tracked in localZoneNodes)
With both items above removed/changed, the test correctly fails because
the node is treated as remote (zones don't match), so it's not added to
localZoneNodes, and cleanup is skipped.
Fix the test by:
- using setupConfig() to set config.Default.Zone to testICZone when IC
is enabled
- setting NBGlobal.Name to config.Default.Zone (which setupConfig()
already configured correctly)
This ensures the controller and node are in the same zone, so the node
is correctly treated as local and its gateway entities are cleaned up.
🤖 Assisted by [Claude Code](https://claude.com/claude-code)
Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
(cherry picked from commit acb088c)
(cherry picked from commit e950ad5)
When the hybrid overlay feature is enabled (specifically when hybrid overlay cluster subnets are configured), the HandleDeleteNode function would return early after releasing the hybrid overlay subnet. This caused the regular cluster subnets allocated to the node to never be released, leading to a subnet leak that eventually exhausts the cluster CIDR pool. This commit fixes the issue by removing the early return, ensuring that both the hybrid overlay subnets and the standard node subnets are properly released upon node deletion. A new test case TestNodeAllocator_HandleDeleteNode is added to verify that both types of subnets are correctly released. Signed-off-by: Aswin Suryanarayanan <asuryan@redhat.com> (cherry picked from commit c44cbbf) (cherry picked from commit 7826344)
OCPBUGS-77081,OCPBUGS-77094: [release-4.20] combined backport PR for 2 escalations
…4.20-to-release-4.19-03-03-2026
|
/ok-to-test |
|
@openshift-pr-manager[bot]: This pull request explicitly references no jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@openshift-pr-manager[bot]: trigger 5 job(s) of type blocking for the ci release of OCP 4.19
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/0c27bf00-172b-11f1-988a-2b32b02fd983-0 trigger 11 job(s) of type blocking for the nightly release of OCP 4.19
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/0c27bf00-172b-11f1-988a-2b32b02fd983-1 |
|
/payload-job periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aws-ovn-conformance |
|
/test e2e-aws-ovn-edge-zones |
|
@jluhrsen: trigger 4 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/085cedd0-174b-11f1-9a78-b4b414b1dc03-0 |
|
/test 4.19-upgrade-from-stable-4.18-e2e-aws-ovn-upgrade |
|
/payload-job periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aws-ovn-conformance |
|
@jluhrsen: trigger 3 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/4d0ae4f0-1778-11f1-9f19-16819d782f77-0 |
|
/test 4.19-upgrade-from-stable-4.18-e2e-aws-ovn-upgrade |
|
/retitle OCPBUGS-48709: Branch Sync release-4.20 to release-4.19 [03-03-2026] |
|
@openshift-pr-manager[bot]: This pull request references Jira Issue OCPBUGS-48709, which is valid. 7 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@tssurya , from the below it looks like there are 4 merge commits, 8 new commits and 4 duplicates: |
|
/verified by ci |
|
@jluhrsen: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/hold |
I don't know which BGP bug it is. I retitled it with our dummy bug for 4.19 syncs. I think all labels are here now as However, I added a |
#2934 is being automatically brought in |
|
/jira cherry-pick OCPBUGS-73788 |
1 similar comment
|
/jira cherry-pick OCPBUGS-73788 |
|
/retitle OCPBUGS-81526: Branch Sync release-4.20 to release-4.19 [03-03-2026] |
|
@openshift-pr-manager[bot]: This pull request references Jira Issue OCPBUGS-81526, which is valid. The bug has been moved to the POST state. 7 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (jechen@redhat.com), skipping review request. The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
that looks like it's for OCPBUGS-73788. I tried the |
|
https://redhat.atlassian.net/browse/OCPBUGS-73788 |
|
@jechen0648: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/hold cancel |
|
@openshift-pr-manager: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@openshift-pr-manager[bot]: Jira Issue Verification Checks: Jira Issue OCPBUGS-81526 Jira Issue OCPBUGS-81526 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Automated branch sync: release-4.20 to release-4.19.