OCPBUGS-77082,OCPBUGS-77096: [release-4.19] combined backport PR for 2 escalations#2986
Conversation
|
@ricky-rav: This pull request references Jira Issue OCPBUGS-77082, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest-required |
1 similar comment
|
/retest-required |
|
@ricky-rav: This pull request references Jira Issue OCPBUGS-77082, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. This pull request references Jira Issue OCPBUGS-77150, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. This pull request references Jira Issue OCPBUGS-77096, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Remove the temporary migration code that was added in 2023 to support the transition to OVN Interconnect (IC) architecture. This HACK code tracked whether remote zone nodes had completed migration using the "k8s.ovn.org/remote-zone-migrated" annotation. This code is no longer needed. Changes: - Remove OvnNodeMigratedZoneName constant and helper functions (SetNodeZoneMigrated, HasNodeMigratedZone, NodeMigratedZoneAnnotationChanged) - Remove migrated field from nodeInfo struct in node_tracker.go - Simplify isLocalZoneNode() in base_network_controller.go and egressip.go - Remove HACK helper functions (checkOVNSBNodeLRSR, fetchLBNames, lbExists, portExists) and migration sync flow from default_node_network_controller.go - Remove remote-zone-migrated annotation from webhook allowed annotations - Update tests to remove references to the migration annotation Assisted by Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com> (cherry picked from commit 7d408c1) (cherry picked from commit 83de58c) Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com> (cherry picked from commit e61ba4b)
The layer2 UDN cleanup tests for IC clusters were failing because of a
zone mismatch between the controller and the test node:
- Controller zone: read from NBGlobal.Name ("global")
- Node zone: set via annotation ("test" when IC enabled)
This mismatch was previously masked in two spots:
1. The HACK in isLocalZoneNode() (removed by commit 7d408c1):
When the controller's zone was "global" (the default), the HACK
bypassed the zone comparison entirely and instead checked whether
the node had a migration annotation. Since the test node had no
migration annotation, it was treated as local despite the zone
mismatch.
2. Unconditional gateway cleanup in deleteNodeEvent (changed by
commit 8725a93 to only cleanup nodes tracked in localZoneNodes)
With both items above removed/changed, the test correctly fails because
the node is treated as remote (zones don't match), so it's not added to
localZoneNodes, and cleanup is skipped.
Fix the test by:
- using setupConfig() to set config.Default.Zone to testICZone when IC
is enabled
- setting NBGlobal.Name to config.Default.Zone (which setupConfig()
already configured correctly)
This ensures the controller and node are in the same zone, so the node
is correctly treated as local and its gateway entities are cleaned up.
🤖 Assisted by [Claude Code](https://claude.com/claude-code)
Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
(cherry picked from commit acb088c)
(cherry picked from commit e950ad5)
(cherry picked from commit c8ada1f)
6bb6628 to
860e4bf
Compare
|
/ok-to-test |
|
@ricky-rav: trigger 5 job(s) of type blocking for the ci release of OCP 4.19
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/99433050-1171-11f1-87d3-a036434d754c-0 trigger 11 job(s) of type blocking for the nightly release of OCP 4.19
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/99433050-1171-11f1-87d3-a036434d754c-1 |
|
/retest-required |
|
/payload-job periodic-ci-openshift-release-main-ci-4.19-upgrade-from-stable-4.18-e2e-aws-ovn-upgrade |
|
@ricky-rav: trigger 6 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/591e6500-11ab-11f1-8301-3008e31f5e41-0 |
|
/retest-required |
|
/payload-job periodic-ci-openshift-release-main-ci-4.19-upgrade-from-stable-4.18-e2e-aws-ovn-upgrade |
|
@ricky-rav: trigger 3 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7d514c90-1236-11f1-91e4-255a2a821e35-0 |
|
/payload-job periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aws-ovn |
|
@ricky-rav: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7cc28800-1260-11f1-972c-93119393ddc2-0 |
|
OCPBUGS-77096 is pre-merge verified with this PR |
|
/payload-job periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aws-ovn |
|
@ricky-rav: This pull request references Jira Issue OCPBUGS-77082, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. This pull request references Jira Issue OCPBUGS-77096, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@ricky-rav: This pull request references Jira Issue OCPBUGS-77082, which is invalid:
Comment This pull request references Jira Issue OCPBUGS-77096, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/test e2e-aws-ovn-hypershift |
|
/test e2e-gcp-ovn |
|
https://issues.redhat.com/browse/OCPBUGS-77082 pre merge tested by @SachinNinganure Adding verified label based on above results. |
|
@asood-rh: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest-required |
|
/test e2e-gcp-ovn |
|
/test e2e-aws-ovn-edge-zones |
1 similar comment
|
/test e2e-aws-ovn-edge-zones |
|
|
@ricky-rav: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ricky-rav, tssurya The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/label backport-risk-assessed |
|
/jira refresh |
|
@tssurya: This pull request references Jira Issue OCPBUGS-77082, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. This pull request references Jira Issue OCPBUGS-77096, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@ricky-rav: This pull request references Jira Issue OCPBUGS-77082, which is valid. The bug has been moved to the POST state. 7 validation(s) were run on this bug
Requesting review from QA contact: This pull request references Jira Issue OCPBUGS-77096, which is valid. The bug has been moved to the POST state. 7 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (jechen@redhat.com), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@tssurya: Overrode contexts on behalf of tssurya: ci/prow/e2e-aws-ovn-edge-zones DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
8adb10a
into
openshift:release-4.19
|
@ricky-rav: Jira Issue Verification Checks: Jira Issue OCPBUGS-77082 Jira Issue OCPBUGS-77082 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 Jira Issue Verification Checks: Jira Issue OCPBUGS-77096 Jira Issue OCPBUGS-77096 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
3 commits, 2 escalated bugs
OCPBUGS-77082: [release-4.19] Remove IC zone migration HACK code
Clean cherry pick from 4.20 backport (#2985)
Upstream PRs:
Downstream:
OCPBUGS-77096: [release-4.19] nodeallocator: fix subnet leak when hybrid overlay is enabled
Clean backport of #2992
When the hybrid overlay feature is enabled (specifically when hybrid overlay cluster subnets are configured), the HandleDeleteNode function would return early after releasing the hybrid overlay subnet. This caused the regular cluster subnets allocated to the node to never be released, leading to a subnet leak that eventually exhausts the cluster CIDR pool.
This commit fixes the issue by removing the early return, ensuring that both the hybrid overlay subnets and the standard node subnets are properly released upon node deletion.
A new test case TestNodeAllocator_HandleDeleteNode is added to verify that both types of subnets are correctly released.