OCPBUGS-64697: [release-4.20] Referencing pod named ports within a service results in bad DNAT rules containing tcp/0 target port#2844
Conversation
Addresses incorrect DNAT rules with <proto>/0 target port when using services with externalTrafficPolicy: Local and named ports. The issue occurred when allocateLoadBalancerNodePorts was false and services referenced pod named ports. The previous implementation used svcPort.TargetPort.IntValue() which returns 0 for named ports, causing invalid DNAT rules. This refactoring introduces/uses structured endpoint types that properly handle port mapping from endpoint slices, ensuring the actual pod port numbers are used instead of attempting to convert named ports to integers. This change unifies endpoint processing logic by having both the services controller and nodePortWatcher use the same GetEndpointsForService function. This ensures consistent endpoint resolution and port mapping behavior across all service-related components, preventing divergence in logic and similar unnoticed port handling issues in the future. Signed-off-by: Andreas Karis <ak.karis@gmail.com> (cherry picked from commit 0651593)
Adds tests for loadBalancer services with named ports and AllocateLoadBalancerNodePorts=False. Add new test cases in Test_getEndpointsForService. Signed-off-by: Andreas Karis <ak.karis@gmail.com> (cherry picked from commit 282b01e)
Signed-off-by: Andreas Karis <ak.karis@gmail.com> (cherry picked from commit 651759c)
E2E test "Allow connection to an external IP using a source port that is equal to a node port" might flake if a service is already created with the same nodePort number. Give it a chance to recover by selecting a different port. Signed-off-by: Andreas Karis <ak.karis@gmail.com> (cherry picked from commit 2fff366)
The test validates LoadBalancer services with: - Named targetPorts (http/udp) instead of numeric ports - AllocateLoadBalancerNodePorts=false configuration - ExternalTrafficPolicy=Local behavior Signed-off-by: Andreas Karis <ak.karis@gmail.com> (cherry picked from commit cd70830)
|
@ricky-rav: This pull request references Jira Issue OCPBUGS-64697, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@ricky-rav: This pull request references Jira Issue OCPBUGS-64697, which is valid. The bug has been moved to the POST state. 7 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest-required |
2 similar comments
|
/retest-required |
|
/retest-required |
|
/verified by @Meina-rh |
|
@Meina-rh: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest-required |
1 similar comment
|
/retest-required |
|
/payload 4.20 ci blocking |
|
@ricky-rav: trigger 5 job(s) of type blocking for the ci release of OCP 4.20
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/6bacbc60-bc30-11f0-9dfe-136ed4410777-0 trigger 13 job(s) of type blocking for the nightly release of OCP 4.20
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/6bacbc60-bc30-11f0-9dfe-136ed4410777-1 |
|
/payload-job periodic-ci-openshift-release-master-ci-4.20-e2e-azure-ovn-upgrade |
|
@jluhrsen: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/ec37c500-be7e-11f0-9d6a-220802a21fa5-0 |
|
/payload-job periodic-ci-openshift-release-master-ci-4.20-e2e-gcp-ovn-upgrade |
|
@ricky-rav: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/3d10dac0-bfa1-11f0-84b9-2440655d8703-0 |
|
/assign @jcaamano |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jcaamano, ricky-rav The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@jcaamano: Overrode contexts on behalf of jcaamano: ci/prow/lint DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest-required |
|
@ricky-rav: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/retest-required |
798404b
into
openshift:release-4.20
|
@ricky-rav: Jira Issue Verification Checks: Jira Issue OCPBUGS-64697 Jira Issue OCPBUGS-64697 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Fix included in accepted release 4.20.0-0.nightly-2025-11-15-175712 |
Relax linter CI job by add missing bits of cherry-pick conflicts introduced by openshift#2844 Signed-off-by: Or Mergi <ormergi@redhat.com>
Relax linter CI job by adding the missing bits of cherry-pick conflicts introduced by openshift#2844 Signed-off-by: Or Mergi <ormergi@redhat.com>
OCPBUGS-65951: [release-4.20]: Fix linter issues, add missing cheryy-pick bits of #2844
Two conflicts:
Conflict 1
In downstream ovn-k
makeNodeSwitchTargetIPstakesservice *corev1.Serviceas its first input argument. This is a downstream hack for openshift.Conflict 2
Three functions in test/e2e/service.go needed to be added manually:
WaitForServingAndReadyServiceEndpointsNum,countServingAndReadyEndpointsSlicesNum,getServingAndReadyEndpointSliceAddresses.