OCPBUGS-2493: Fix TestUnmanagedDNSToManagedDNSInternal E2E test race conditions#845
Conversation
|
@gcs278: This pull request references Jira Issue OCPBUGS-2493, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/test e2e-azure-operator |
|
e2e-aws-operator failed because We might just need a longer timeout on the polling loop; the scope change is known to take as long as ~6 minutes on AWS: https://bugzilla.redhat.com/show_bug.cgi?id=2034795 |
|
e2e-azure-operator failed too, with a similar failure: Searching for Then it is updated to managed with a public IP address, again as expected: So it isn't obvious to me what is going wrong. |
test/e2e/unmanaged_dns_test.go
Outdated
| return false, nil | ||
| } else if ingresscontroller.IsServiceInternal(lbService) { | ||
| // The service got recreated, but is not external | ||
| t.Fatalf("load balancer %s is internal but should be external", lbService.Name) |
There was a problem hiding this comment.
Shouldn't this just return an error? We catch and report any error after the call to PollImmediate.
There was a problem hiding this comment.
This means the service was successfully deleted and recreated, but still is not what we expect. There isn't another process that would recreate it again as far as I know, so we are dead in the water, might as well stop the test opposed to keep going.
There was a problem hiding this comment.
Andy's point is that if you returned an error here from the polling loop, then the if err != nil { t.Fatalf(...) } immediately after the loop would still suffice to terminate the test.
There was a problem hiding this comment.
Ah sorry I thought you meant t.Errorf vs. t.Fatalf.
Will fix in next push.
78e844f to
e45dae7
Compare
|
@Miciah must be another problem. Increased I've never seen |
|
/test e2e-azure-operator |
cec3bfb to
a237518
Compare
|
/test e2e-azure-operator |
|
I added |
|
Whoops I broke something..hang on |
a237518 to
8060eda
Compare
The ingress-operator logs in the e2e-azure-operator job indicated that DNS had been updated. I didn't check the ingress-operator logs in the e2e-aws-operator job. However, my guess would be that it's a delay in the LB initializing. |
|
/test e2e-azure-operator |
|
e2e-aws-operator failed, and the ingress clusteroperator is reporting the following: |
|
Also, |
8060eda to
ec4a671
Compare
Yea sorry I thought I fixed it in the last push but forget to do a |
|
/test e2e-azure-operator |
ec4a671 to
2e51367
Compare
|
/test e2e-azure-operator |
|
Successful e2e-aws-operator run |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Miciah The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…er E2E flake
`test/e2e/unmanaged_dns_test.go`:
- Fixed service deletion race condition by ensuring loadbalancer service changed
- Fixed check for IsServiceInternal which was actually checking for internal
when should have been checking for external
- Fixed IsServiceInternal using outdated service object
`test/e2e/util_test.go`:
- Add HTTP responce body close in waitForHTTPClientCondition (the main issue)
- Increased timeout to 10 minutes for verifyInternalIngressController and verifyExternalIngressController
- Added curl pod restart in case of failure to verifyInternalIngressController
- Added wait for DNS Resolution in waitForHTTPClientCondition for debuggability
49c5983 to
fd2e991
Compare
|
/test e2e-azure-operator |
|
e2e-gcp-operator failed because |
|
e2e-aws-operator has successful e2e but failed on |
|
e2e-aws-operator failed because etcd failed to come up. e2e-gcp-ovn-serial failed because the disruption/ingress-to-oauth-server connection/new and ingress-to-console connection/new disruption tests failed. |
|
e2e-gcp-operator failed on both The test output shows that
The kube-controller-manager logs show that k-c-m ensured the LB at 03:14:34.646931. Strangely, I cannot find the create for that particular service in the kube-apiserver access logs, even though I can see various get and patch requests for that service and creates for other services that the operator created. My guess is that kube-apiserver or k-c-m was overloaded, and the create in kube-apiserver, the watch in k-c-m, or the provisioning of the LB in GCP was delayed, causing |
|
Ack, seems like e2e-azure-operator failed on e2e-aws-operator appears to have passed E2E tests, but looks like it's going to fail on other stuff. |
|
/test e2e-aws-operator |
|
/retest-required |
|
e2e-aws-operator: /retest-required |
|
e2e-gcp-ovn-serial failed on known distruption issues |
|
/override e2e-gcp-ovn-serial |
|
@Miciah: /override requires failed status contexts, check run or a prowjob name to operate on.
Only the following failed contexts/checkruns were expected:
If you are trying to override a checkrun that has a space in it, you must put a double quote on the context. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/override ci/prow/e2e-gcp-ovn-serial |
|
@Miciah: Overrode contexts on behalf of Miciah: ci/prow/e2e-gcp-ovn-serial DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Looks like e2e-aws-operator is going to fail because the e2e-aws-operator-gather-must-gather step failed and because |
|
@Miciah: Overrode contexts on behalf of Miciah: ci/prow/e2e-aws-operator DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@gcs278: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-2493 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@gcs278: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/label qe-approved |
OCPBUGS-2493: Fix TestUnmanagedDNSToManagedDNSInternalIngressController E2E flake
test/e2e/unmanaged_dns_test.go:when should have been checking for external
test/e2e/util_test.go: