OCPBUGS-77046,OCPBUGS-77086: [release-4.21] combined backport PR for 2 escalations#2984
Conversation
|
@ricky-rav: This pull request references Jira Issue OCPBUGS-77046, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira cherry-pick OCPBUGS-77046 |
|
@ricky-rav: Jira Issue OCPBUGS-77046 has been cloned as Jira Issue OCPBUGS-77047. Will retitle bug to link to clone. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Remove the temporary migration code that was added in 2023 to support the transition to OVN Interconnect (IC) architecture. This HACK code tracked whether remote zone nodes had completed migration using the "k8s.ovn.org/remote-zone-migrated" annotation. This code is no longer needed. Changes: - Remove OvnNodeMigratedZoneName constant and helper functions (SetNodeZoneMigrated, HasNodeMigratedZone, NodeMigratedZoneAnnotationChanged) - Remove migrated field from nodeInfo struct in node_tracker.go - Simplify isLocalZoneNode() in base_network_controller.go and egressip.go - Remove HACK helper functions (checkOVNSBNodeLRSR, fetchLBNames, lbExists, portExists) and migration sync flow from default_node_network_controller.go - Remove remote-zone-migrated annotation from webhook allowed annotations - Update tests to remove references to the migration annotation Assisted by Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com> (cherry picked from commit 7d408c1)
The layer2 UDN cleanup tests for IC clusters were failing because of a
zone mismatch between the controller and the test node:
- Controller zone: read from NBGlobal.Name ("global")
- Node zone: set via annotation ("test" when IC enabled)
This mismatch was previously masked in two spots:
1. The HACK in isLocalZoneNode() (removed by commit 7d408c1):
When the controller's zone was "global" (the default), the HACK
bypassed the zone comparison entirely and instead checked whether
the node had a migration annotation. Since the test node had no
migration annotation, it was treated as local despite the zone
mismatch.
2. Unconditional gateway cleanup in deleteNodeEvent (changed by
commit 8725a93 to only cleanup nodes tracked in localZoneNodes)
With both items above removed/changed, the test correctly fails because
the node is treated as remote (zones don't match), so it's not added to
localZoneNodes, and cleanup is skipped.
Fix the test by:
- using setupConfig() to set config.Default.Zone to testICZone when IC
is enabled
- setting NBGlobal.Name to config.Default.Zone (which setupConfig()
already configured correctly)
This ensures the controller and node are in the same zone, so the node
is correctly treated as local and its gateway entities are cleaned up.
🤖 Assisted by [Claude Code](https://claude.com/claude-code)
Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
(cherry picked from commit acb088c)
6f6b1ae to
e950ad5
Compare
|
@ricky-rav: This pull request references Jira Issue OCPBUGS-77047, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@ricky-rav: This pull request references Jira Issue OCPBUGS-77046, which is invalid:
Comment DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest-required |
|
@ricky-rav: This pull request references Jira Issue OCPBUGS-77046, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. This pull request references Jira Issue OCPBUGS-77147, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. This pull request references Jira Issue OCPBUGS-77086, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/ok-to-test |
|
@ricky-rav: trigger 5 job(s) of type blocking for the ci release of OCP 4.21
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7bf115d0-1171-11f1-8c80-0cd1d044b3fb-0 trigger 14 job(s) of type blocking for the nightly release of OCP 4.21
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7bf115d0-1171-11f1-8c80-0cd1d044b3fb-1 |
|
/test e2e-aws-ovn-edge-zones |
|
/jira refresh |
|
@ricky-rav: This pull request references Jira Issue OCPBUGS-77046, which is invalid:
Comment This pull request references Jira Issue OCPBUGS-77086, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
We've decided during the team meeting today to remove Peri's commit from the backports, since it doesn't fix the customer's issue. /verified by QE |
|
@ricky-rav: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/test e2e-aws-ovn-upgrade-local-gateway |
|
/test 4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kyrtapz, ricky-rav The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/jira refresh |
|
@ricky-rav: This pull request references Jira Issue OCPBUGS-77046, which is valid. The bug has been moved to the POST state. 7 validation(s) were run on this bug
Requesting review from QA contact: This pull request references Jira Issue OCPBUGS-77086, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@ricky-rav: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Already reported here: https://redhat-internal.slack.com/archives/GQ0CU2623/p1771857612299149?thread_ts=1771338827.722999&cid=GQ0CU2623
|
|
/jira refresh |
|
@ricky-rav: This pull request references Jira Issue OCPBUGS-77046, which is valid. 7 validation(s) were run on this bug
Requesting review from QA contact: This pull request references Jira Issue OCPBUGS-77086, which is valid. 7 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (jechen@redhat.com), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/test e2e-metal-ipi-ovn-dualstack-bgp Let's retest these two lanes, since supposedly |
|
/test 4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade |
|
We are in a serious time crunch and the failures are unrelated to the changes we are bringing in as per Riccardo's analisys. |
|
/override ci/prow/4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade |
|
@kyrtapz: Overrode contexts on behalf of kyrtapz: ci/prow/4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/override ci/prow/e2e-aws-ovn-edge-zones |
|
/override ci/prow/e2e-aws-ovn-edge-zones |
|
@kyrtapz: Overrode contexts on behalf of kyrtapz: ci/prow/e2e-aws-ovn-edge-zones, ci/prow/e2e-metal-ipi-ovn-dualstack-bgp, ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw, ci/prow/qe-perfscale-payload-control-plane-6nodes DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@kyrtapz: Overrode contexts on behalf of kyrtapz: ci/prow/e2e-aws-ovn-edge-zones DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
81c9d5c
into
openshift:release-4.21
|
@ricky-rav: Jira Issue Verification Checks: Jira Issue OCPBUGS-77046 Jira Issue OCPBUGS-77046 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 Jira Issue Verification Checks: Jira Issue OCPBUGS-77086 Jira Issue OCPBUGS-77086 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
3 commits, 2 escalated bugs
OCPBUGS-77046: Remove IC zone migration HACK code
Some minor conflicts in
go-controller/pkg/node/default_node_network_controller.goUpstream PRs:
Downstream:
OCPBUGS-77086: [release-4.21] nodeallocator: fix subnet leak when hybrid overlay is enabled needs-ok-to-test
Originally posted here: #2990
Clean backport
Upstream:
ovn-kubernetes/ovn-kubernetes#5798
Downstream:
4.22: #2980
nodeallocator: fix subnet leak when hybrid overlay is enabled
When the hybrid overlay feature is enabled (specifically when hybrid overlay
cluster subnets are configured), the HandleDeleteNode function would return
early after releasing the hybrid overlay subnet. This caused the regular
cluster subnets allocated to the node to never be released, leading to a
subnet leak that eventually exhausts the cluster CIDR pool.
This commit fixes the issue by removing the early return, ensuring that
both the hybrid overlay subnets and the standard node subnets are
properly released upon node deletion.
A new test case TestNodeAllocator_HandleDeleteNode is added to verify
that both types of subnets are correctly released.