Skip to content

Conversation

@gcs278
Copy link
Contributor

@gcs278 gcs278 commented May 30, 2023

WIP DO NOT MERGE

CI testing for TestAWSELBConnectionIdleTimeout increased timeout.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 30, 2023
@openshift-ci openshift-ci bot requested review from alebedev87 and candita May 30, 2023 20:50
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 30, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from gcs278. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gcs278
Copy link
Contributor Author

gcs278 commented May 31, 2023

/test e2e-aws-operator

@gcs278
Copy link
Contributor Author

gcs278 commented Jun 1, 2023

@gcs278
Copy link
Contributor Author

gcs278 commented Jun 1, 2023

Actually, I take that back. It looks like it actually made progress, but it's super messy.

  1. It took just over 5 minutes, then it passed, the net.LookupIP(route.Spec.Host) polling loop.
  2. Then I see:
=== NAME  TestAll/parallel/TestAWSELBConnectionIdleTimeout
    operator_test.go:2737: found expected annotation service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout=120

which means it moved on to the last polling loop where it does a client.Do(request):

response, err := client.Do(request)

3. It then switches to these EOF errors (seems like progress):

=== NAME  TestAll/parallel/TestAWSELBConnectionIdleTimeout
    operator_test.go:2765: got unexpected error after elapsed time 2.589120429s: Get "http://idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-f53br6ii-43abb.origin-ci-int-aws.dev.rhcloud.com": EOF

for a little bit
4. Then it switches back to no such host:

    operator_test.go:2765: got unexpected error after elapsed time 2.071273ms: Get "http://idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-f53br6ii-43abb.origin-ci-int-aws.dev.rhcloud.com": dial tcp: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-f53br6ii-43abb.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host

Adding a larger timeout to the last polling loop, I wonder if the update to the LB Service needs another rollout/update of the ELB. Things might be getting slower with AWS.

@gcs278 gcs278 force-pushed the OCPBUGS-13810-timeout branch from 9958602 to 28041b9 Compare June 1, 2023 13:56
@gcs278
Copy link
Contributor Author

gcs278 commented Jun 1, 2023

@gcs278
Copy link
Contributor Author

gcs278 commented Jun 1, 2023

2nd try, it worked, failed on other things
/test e2e-aws-operator

@gcs278
Copy link
Contributor Author

gcs278 commented Jun 1, 2023

3rd try - TestRouterCompressionOperation failed...
/test e2e-aws-operator

@gcs278
Copy link
Contributor Author

gcs278 commented Jun 5, 2023

The last run seemed to work okay, didn't see the "bouncing"
/test e2e-aws-operator

@gcs278 gcs278 force-pushed the OCPBUGS-13810-timeout branch from ab6a30b to f798341 Compare June 7, 2023 14:22
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 7, 2023

@gcs278: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure-ovn f798341 link false /test e2e-azure-ovn
ci/prow/e2e-hypershift f798341 link true /test e2e-hypershift
ci/prow/e2e-gcp-ovn f798341 link false /test e2e-gcp-ovn
ci/prow/e2e-aws-operator f798341 link true /test e2e-aws-operator
ci/prow/e2e-aws-ovn-upgrade f798341 link true /test e2e-aws-ovn-upgrade

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 19, 2023
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@gcs278
Copy link
Contributor Author

gcs278 commented Jul 19, 2023

I think I got out of this what I needed to solve the bug.
https://issues.redhat.com/browse/OCPBUGS-14966 has now been opened and I have a reproducer that is not in CI.

@gcs278
Copy link
Contributor Author

gcs278 commented Jul 19, 2023

/close

@openshift-ci openshift-ci bot closed this Jul 19, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 19, 2023

@gcs278: Closed this PR.

Details

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants