Skip to content

Conversation

@soltysh
Copy link
Contributor

@soltysh soltysh commented Feb 9, 2023

Found in https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/27694/pull-ci-openshift-origin-master-e2e-aws-ovn-upgrade/1623489514854944768 see the

disruption_tests: [sig-network-edge] Verify DNS availability during and after upgrade success

failure specifically. I've noticed that we don't wait for that test DS, which might result in the following scenario:

STEP: Creating a DaemonSet to verify DNS availability 02/09/23 02:20:13.567
...
Feb  9 02:20:13.998: FAIL: too many pods were waiting: ns/e2e-check-for-dns-availability-3377 pod/dns-test-b0e2bd0a-19b2-4c42-bbed-842415bd67ad-92dw9,ns/e2e-check-for-dns-availability-3377 pod/dns-test-b0e2bd0a-19b2-4c42-bbed-842415bd67ad-lgj2g,ns/e2e-check-for-dns-availability-3377 pod/dns-test-b0e2bd0a-19b2-4c42-bbed-842415bd67ad-qvcdm

as you see above, the time between creating the DS and actually checking if they are running is less than .5s, which will result in unnecessary errors like from the linked one.

This PR adds a wait after the DS creation to ensure that we have the DS running before continuing with the rest of the test.

@openshift-ci openshift-ci bot requested review from bparees and deads2k February 9, 2023 15:08
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 9, 2023
@soltysh soltysh changed the title Wait for DNS DS pods to be ready OCPBUGS-6902: Wait for DNS DS pods to be ready Feb 9, 2023
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Feb 9, 2023
@openshift-ci-robot
Copy link

@soltysh: This pull request references Jira Issue OCPBUGS-6902, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.13.0) matches configured target version for branch (4.13.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @droslean

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Found in https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/27694/pull-ci-openshift-origin-master-e2e-aws-ovn-upgrade/1623489514854944768 see the

disruption_tests: [sig-network-edge] Verify DNS availability during and after upgrade success

failure specifically. I've noticed that we don't wait for that test DS, which might result in the following scenario:

STEP: Creating a DaemonSet to verify DNS availability 02/09/23 02:20:13.567
...
Feb  9 02:20:13.998: FAIL: too many pods were waiting: ns/e2e-check-for-dns-availability-3377 pod/dns-test-b0e2bd0a-19b2-4c42-bbed-842415bd67ad-92dw9,ns/e2e-check-for-dns-availability-3377 pod/dns-test-b0e2bd0a-19b2-4c42-bbed-842415bd67ad-lgj2g,ns/e2e-check-for-dns-availability-3377 pod/dns-test-b0e2bd0a-19b2-4c42-bbed-842415bd67ad-qvcdm

as you see above, the time between creating the DS and actually checking if they are running is less than .5s, which will result in unnecessary errors like from the linked one.

This PR adds a wait after the DS creation to ensure that we have the DS running before continuing with the rest of the test.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested a review from droslean February 9, 2023 16:01
@sanchezl
Copy link
Contributor

sanchezl commented Feb 9, 2023

/retest

1 similar comment
@soltysh
Copy link
Contributor Author

soltysh commented Feb 10, 2023

/retest

@deads2k
Copy link
Contributor

deads2k commented Feb 14, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 14, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 14, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 6277a1f and 2 for PR HEAD 4defbf3 in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 14, 2023

@soltysh: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-single-node 4defbf3 link false /test e2e-aws-ovn-single-node
ci/prow/e2e-aws-ovn-single-node-upgrade 4defbf3 link false /test e2e-aws-ovn-single-node-upgrade
ci/prow/e2e-azure-ovn-etcd-scaling 4defbf3 link false /test e2e-azure-ovn-etcd-scaling
ci/prow/e2e-aws-ovn-single-node-serial 4defbf3 link false /test e2e-aws-ovn-single-node-serial

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@soltysh
Copy link
Contributor Author

soltysh commented Feb 14, 2023

/retest-required

@openshift-merge-robot openshift-merge-robot merged commit 9812f2e into openshift:master Feb 14, 2023
@openshift-ci-robot
Copy link

@soltysh: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-6902 has been moved to the MODIFIED state.

Details

In response to this:

Found in https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/27694/pull-ci-openshift-origin-master-e2e-aws-ovn-upgrade/1623489514854944768 see the

disruption_tests: [sig-network-edge] Verify DNS availability during and after upgrade success

failure specifically. I've noticed that we don't wait for that test DS, which might result in the following scenario:

STEP: Creating a DaemonSet to verify DNS availability 02/09/23 02:20:13.567
...
Feb  9 02:20:13.998: FAIL: too many pods were waiting: ns/e2e-check-for-dns-availability-3377 pod/dns-test-b0e2bd0a-19b2-4c42-bbed-842415bd67ad-92dw9,ns/e2e-check-for-dns-availability-3377 pod/dns-test-b0e2bd0a-19b2-4c42-bbed-842415bd67ad-lgj2g,ns/e2e-check-for-dns-availability-3377 pod/dns-test-b0e2bd0a-19b2-4c42-bbed-842415bd67ad-qvcdm

as you see above, the time between creating the DS and actually checking if they are running is less than .5s, which will result in unnecessary errors like from the linked one.

This PR adds a wait after the DS creation to ensure that we have the DS running before continuing with the rest of the test.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@soltysh soltysh deleted the wait_dns branch February 15, 2023 09:44
@Miciah
Copy link
Contributor

Miciah commented Jul 24, 2023

@openshift-cherrypick-robot

@Miciah: new pull request created: #28083

Details

In response to this:

I'm seeing similar failures for 4.12: https://search.ci.openshift.org/?search=FAIL%3A+too+many+pods+were+waiting%3A+ns%2Fe2e-check-for-dns-availability-&maxAge=168h&context=1&type=build-log&name=release-4.12&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

I hope it isn't too presumptuous of me to initiate a backport.
/cherry-pick release-4.12

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants