OCPBUGS-81192: Fix race condition in internal-to-external LB migration test#1407
Conversation
📝 WalkthroughWalkthroughThis change updates test logic in test/e2e/unmanaged_dns_test.go: adds comments documenting dnsManagementPolicy transition behavior for external ingress controllers; replaces a wait.PollImmediate loop with wait.PollUntilContextTimeout to wait for load balancer Service status changes and ensure LoadBalancer.Ingress is populated; captures the load balancer address (IP or Hostname) and uses it for the final external verification instead of the wildcard record target. No exported/public declarations were modified. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@gcs278: This pull request references Jira Issue OCPBUGS-81192, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
When migrating from internal to external scope, use the Service LB address directly instead of DNSRecord target to avoid race condition where IngressController reports DNSReady before DNSRecord reconciles. Also add header comments to DNS migration tests for consistency. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2a2730d to
2cdc405
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
test/e2e/unmanaged_dns_test.go (1)
290-293: Use poll callback context for API reads.Inside
wait.PollUntilContextTimeout,kclient.Get(context.TODO(), ...)ignores the poll context; usectxso cancellation/timeouts propagate cleanly.Proposed change
- if err := kclient.Get(context.TODO(), controller.LoadBalancerServiceName(ic), lbService); err != nil { + if err := kclient.Get(ctx, controller.LoadBalancerServiceName(ic), lbService); err != nil {As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/unmanaged_dns_test.go` around lines 290 - 293, The poll callback is using context.TODO() for the API read which ignores the poll's cancellation/timeout; inside the callback passed to wait.PollUntilContextTimeout, replace the kclient.Get call's context.TODO() with the provided ctx so that kclient.Get(ctx, controller.LoadBalancerServiceName(ic), lbService) correctly observes cancellation/timeouts from the poll.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/unmanaged_dns_test.go`:
- Around line 302-310: The polling callback that reads
lbService.Status.LoadBalancer.Ingress[0] must validate that the chosen address
is non-empty before returning success: after extracting lbAddress from
lbService.Status.LoadBalancer.Ingress[0].IP or .Hostname, ensure lbAddress != ""
and if it is empty return false, nil so the poll continues; update the logic
around lbService, lbAddress in the polling loop (the snippet that currently
returns true,nil) so verifyExternalIngressController() never receives an empty
lbAddress.
---
Nitpick comments:
In `@test/e2e/unmanaged_dns_test.go`:
- Around line 290-293: The poll callback is using context.TODO() for the API
read which ignores the poll's cancellation/timeout; inside the callback passed
to wait.PollUntilContextTimeout, replace the kclient.Get call's context.TODO()
with the provided ctx so that kclient.Get(ctx,
controller.LoadBalancerServiceName(ic), lbService) correctly observes
cancellation/timeouts from the poll.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 2c92fca6-44a4-46ab-852b-df3243b934cc
📒 Files selected for processing (1)
test/e2e/unmanaged_dns_test.go
| if len(lbService.Status.LoadBalancer.Ingress) == 0 { | ||
| t.Logf("service %s has no load balancer ingress, retrying...", lbService.Name) | ||
| return false, nil | ||
| } | ||
| lbAddress = lbService.Status.LoadBalancer.Ingress[0].IP | ||
| if lbAddress == "" { | ||
| lbAddress = lbService.Status.LoadBalancer.Ingress[0].Hostname | ||
| } | ||
| return true, nil |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
head -n 330 test/e2e/unmanaged_dns_test.go | tail -n +290Repository: openshift/cluster-ingress-operator
Length of output: 2160
🏁 Script executed:
# Get more context on how lbAddress is used in the test
rg "lbAddress" test/e2e/unmanaged_dns_test.go -A 3 -B 3Repository: openshift/cluster-ingress-operator
Length of output: 1511
Guard against empty LB ingress address before returning success.
The polling loop at lines 306-309 can return success even when both IP and Hostname are empty strings. This causes lbAddress to be empty when passed to the subsequent verifyExternalIngressController() call, resulting in flaky connectivity checks. Add a validation check to ensure lbAddress is not empty before the polling loop succeeds:
Proposed change
lbAddress = lbService.Status.LoadBalancer.Ingress[0].IP
if lbAddress == "" {
lbAddress = lbService.Status.LoadBalancer.Ingress[0].Hostname
}
+if lbAddress == "" {
+ t.Logf("service %s has ingress entry without IP/hostname, retrying...", lbService.Name)
+ return false, nil
+}
return true, nil📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if len(lbService.Status.LoadBalancer.Ingress) == 0 { | |
| t.Logf("service %s has no load balancer ingress, retrying...", lbService.Name) | |
| return false, nil | |
| } | |
| lbAddress = lbService.Status.LoadBalancer.Ingress[0].IP | |
| if lbAddress == "" { | |
| lbAddress = lbService.Status.LoadBalancer.Ingress[0].Hostname | |
| } | |
| return true, nil | |
| if len(lbService.Status.LoadBalancer.Ingress) == 0 { | |
| t.Logf("service %s has no load balancer ingress, retrying...", lbService.Name) | |
| return false, nil | |
| } | |
| lbAddress = lbService.Status.LoadBalancer.Ingress[0].IP | |
| if lbAddress == "" { | |
| lbAddress = lbService.Status.LoadBalancer.Ingress[0].Hostname | |
| } | |
| if lbAddress == "" { | |
| t.Logf("service %s has ingress entry without IP/hostname, retrying...", lbService.Name) | |
| return false, nil | |
| } | |
| return true, nil |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/e2e/unmanaged_dns_test.go` around lines 302 - 310, The polling callback
that reads lbService.Status.LoadBalancer.Ingress[0] must validate that the
chosen address is non-empty before returning success: after extracting lbAddress
from lbService.Status.LoadBalancer.Ingress[0].IP or .Hostname, ensure lbAddress
!= "" and if it is empty return false, nil so the poll continues; update the
logic around lbService, lbAddress in the polling loop (the snippet that
currently returns true,nil) so verifyExternalIngressController() never receives
an empty lbAddress.
|
@gcs278: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
Fixes a race condition in
TestUnmanagedDNSToManagedDNSInternalIngressControllerthat causes the test to fail with connection timeouts when migrating from internal to external load balancer scope.Problem
When the test migrates the IngressController from:
The IngressController may report
DNSReady=Truebefore the ingress-operator reconciles the DNSRecord with the new external IP. This causes the test to use the old internal IP (fromwildcardRecord.Spec.Targets[0]) for connectivity verification, resulting in timeouts.Solution
Use the Service's load balancer address directly (
lbAddress) instead of waiting for the DNSRecord target to sync. The Service LB address is the authoritative source and is immediately available after the cloud provider provisions the external LB.Additional Changes
PollUntilContextTimeoutinstead of deprecatedPollImmediateTest Plan
TestUnmanagedDNSToManagedDNSInternalIngressControllerpasses on GCP🤖 Generated with Claude Code