OCPBUGS-60620: e2e: Deflake tests by using ReplicaSet for test workload#1262
Conversation
bbc76c4 to
b574b3f
Compare
7bb7c34 to
5647a21
Compare
|
/hold Holding this PR to better split between different flakes (e.g. #1265). |
|
All operator tests finished green. Let's retry them by pushing an amended commit with no real changed. |
62a2d3b to
71e2a03
Compare
|
No operator test failures, second time in a row. Let's try again... |
71e2a03 to
e9e5d82
Compare
|
@alebedev87: This pull request references Jira Issue OCPBUGS-60620, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (iamin@redhat.com), skipping review request. The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
e9e5d82 to
d4fe23d
Compare
|
/unhold |
|
@alebedev87: This pull request references Jira Issue OCPBUGS-60620, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (iamin@redhat.com), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/assign |
test/e2e/idle_connection_test.go
Outdated
| } | ||
|
|
||
| return pod, nil | ||
| if err := kclient.Create(ctx, rs); err != nil { |
There was a problem hiding this comment.
do you wanna retry here for the sake of network problems? (as we've been doing on other PRs)
There was a problem hiding this comment.
Added createWithRetryOnConflict helper function (similar to existing update*WithRetryOnConflict helpers) and introduced it here.
test/e2e/util_gatewayapi_test.go
Outdated
| return nil, fmt.Errorf("failed to create pod %s/%s: %v", namespace, echoPod.Name, err) | ||
| // buildEchoReplicaSet builds a replicaset which creates a pod that listens on port 8080. | ||
| echoRs := buildEchoReplicaSet(backendRefname, namespace) | ||
| if err := kclient.Create(context.TODO(), echoRs); err != nil { |
There was a problem hiding this comment.
same comment as above on the create and retry. I am wondering if we care on actually creating some helper function for these "create" operations, with something like "create or retry" to avoid network issues.
There was a problem hiding this comment.
Added createWithRetryOnConflict helper function and introduced it here.
d4fe23d to
75a0e3c
Compare
test/e2e/util_test.go
Outdated
|
|
||
| // createWithRetryOnConflict creates the given object. If there is a conflict error on create | ||
| // then the create is retried until the timeout is reached. | ||
| func createWithRetryOnConflict[T client.Object](ctx context.Context, obj T, timeout time.Duration) error { |
There was a problem hiding this comment.
I don't think you need generics/type parameters here, you can just pass client.Object as the argument of object. kclient already accepts a client.Object
There was a problem hiding this comment.
Converted to non generic one.
test/e2e/util_test.go
Outdated
| func createWithRetryOnConflict[T client.Object](ctx context.Context, obj T, timeout time.Duration) error { | ||
| return wait.PollUntilContextTimeout(ctx, 2*time.Second, timeout, true, func(ctx context.Context) (bool, error) { | ||
| if err := kclient.Create(ctx, obj); err != nil && !errors.IsAlreadyExists(err) { | ||
| if errors.IsConflict(err) { |
There was a problem hiding this comment.
I don't know where this can happen on a kclient.Create operation, at least for me the API will answer with Conflict only when you are doing patch/update operation. In case Create, I can think of:
- Already exists - you cover on the if branch
- Does not exist - So you create it
- Another error (network, permission, etc) - you retry
Maybe I am missing something on the Conflict situation here
There was a problem hiding this comment.
Maybe I am missing something on the Conflict situation here
No, you are right. I copied the function from an existing update*OnConflict function. Updated it to retry on any error but AlreadyExists.
Switch the test HTTPRoute's backend workload to a ReplicaSet to better tolerate pod evictions or deletions during runs.
Switch the test route's backend workload to a ReplicaSet to better tolerate pod evictions or deletions during runs. Update test logic to expect a response prefix, since the echo pod names returned in the response are now suffixed with a random hash by the ReplicaSet controller.
75a0e3c to
edfab23
Compare
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rikatz The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
|
The tests failing are not required. @Miciah this test may need an override |
|
@rikatz : just for info: we can use |
1 similar comment
|
This test is failing and has been reported to the Cloud Platform team. Resolution depends at least on merge of openshift/cloud-provider-gcp#86. /override ci/prow/e2e-gcp-operator |
|
@candita: Overrode contexts on behalf of candita: ci/prow/e2e-gcp-operator DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@alebedev87: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
2f21242
into
openshift:master
|
@alebedev87: Jira Issue OCPBUGS-60620: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-60620 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
This PR aims at trying to de-flake the e2e tests which were flaking during the CI testing of #1257: