Skip to content

OCPBUGSM-35025: reenable unidling ci tests#27538

Merged
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
msherif1234:unidling
Nov 11, 2022
Merged

OCPBUGSM-35025: reenable unidling ci tests#27538
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
msherif1234:unidling

Conversation

@msherif1234
Copy link
Copy Markdown
Contributor

Signed-off-by: Mohamed Mahmoud mmahmoud@redhat.com

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 9, 2022
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 9, 2022
@openshift-ci openshift-ci bot requested review from deads2k and mfojtik November 9, 2022 15:05
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 9, 2022
@msherif1234 msherif1234 force-pushed the unidling branch 2 times, most recently from 69e9fc0 to 6900443 Compare November 9, 2022 23:42
@msherif1234 msherif1234 changed the title WIP: reenable unidling test to debug ci failures OCPBUGSM-35025: reenable unidling test to debug ci failures Nov 9, 2022
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 9, 2022
@msherif1234 msherif1234 force-pushed the unidling branch 2 times, most recently from 0d85a7d to 85557e5 Compare November 10, 2022 12:48
@msherif1234 msherif1234 requested review from oribon and removed request for deads2k and mfojtik November 10, 2022 13:13
@oribon
Copy link
Copy Markdown

oribon commented Nov 10, 2022

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 10, 2022
@msherif1234
Copy link
Copy Markdown
Contributor Author

/retest

@msherif1234
Copy link
Copy Markdown
Contributor Author

/assign @bparees

@bparees
Copy link
Copy Markdown
Contributor

bparees commented Nov 10, 2022

@dgoodwin what does TRT want to see before this test gets re-enabled?

@bparees
Copy link
Copy Markdown
Contributor

bparees commented Nov 10, 2022

The fact that this is titled "to debug CI failures" and not "re-enable it because we've fixed it" makes me think this should not merge right now. I'd expect you to either debug the results using CI jobs run against this PR (not merging it until you're satisfied you've fix it), or update the test to always pass but dump out the debugging info you need, if you need more CI runs than you can reasonably get from a single PR, in order to recreate the issue. (The latter should only be done w/ careful tracking of a bug to return the test to properly fail, as well as clear comments in place explaining the situation)

@msherif1234 msherif1234 changed the title OCPBUGSM-35025: reenable unidling test to debug ci failures OCPBUGSM-35025: reenable unidling ci tests Nov 10, 2022
@msherif1234
Copy link
Copy Markdown
Contributor Author

The fact that this is titled "to debug CI failures" and not "re-enable it because we've fixed it" makes me think this should not merge right now. I'd expect you to either debug the results using CI jobs run against this PR (not merging it until you're satisfied you've fix it), or update the test to always pass but dump out the debugging info you need, if you need more CI runs than you can reasonably get from a single PR, in order to recreate the issue. (The latter should only be done w/ careful tracking of a bug to return the test to properly fail, as well as clear comments in place explaining the situation)

sorry forget to update the title, I was able to repro the issues manually and was able to verify the fix, we hit the issue with ovn ci runs.
the following is list of all idling/unidling test cases that I ran manually to repro and confirm the fix

./openshift-tests run-test "[sig-network-edge][Feature:Idling] Unidling should handle many TCP connections by possibly dropping those over a certain bound [Serial] [Suite:openshift/conformance/serial]"
./openshift-tests run-test "[sig-network-edge][Feature:Idling] Unidling should handle many UDP senders (by continuing to drop all packets on the floor) [Serial] [Suite:openshift/conformance/serial]"
./openshift-tests run-test "[sig-network-edge][Feature:Idling] Unidling should work with TCP (when fully idled) [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-network-edge][Feature:Idling] Unidling should work with TCP (while idling) [Disabled:Broken]"
./openshift-tests run-test "[sig-network-edge][Feature:Idling] Unidling should work with UDP [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-network-edge][Feature:Idling] Idling with a single service and DeploymentConfig should idle the service and DeploymentConfig properly [apigroup:apps.openshift.io] [Disabled:Broken]"
./openshift-tests run-test "[sig-network-edge][Feature:Idling] Idling with a single service and ReplicationController should idle the service and ReplicationController properly [Suite:openshift/conformance/parallel]"

@dgoodwin
Copy link
Copy Markdown
Contributor

Given it looks like this was originally blocking payloads, we could use the payload command to verify but unfortunately things are not good in master right now for 4.13.

With the above details it sounds like we can merge and we will keep an eye out for breakage, if it does we can just revert and go from there. I'll let the team know to watch out for idling test problems on ovn. Naturally we'll want to check all job results and make sure it's not popping up anywhere, including the non-required jobs here. Looks decent from the jobs that completed so far.

@bparees
Copy link
Copy Markdown
Contributor

bparees commented Nov 10, 2022

sorry forget to update the title, I was able to repro the issues manually and was able to verify the fix, we hit the issue with ovn ci runs.

got it, thanks. It looks good to me, but i'm still going to ask @dgoodwin to weigh in in case TRT has any particular concerns or requests before we re-enable. They can remove the hold when they're satisfied.

/approve
/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 10, 2022
@bparees
Copy link
Copy Markdown
Contributor

bparees commented Nov 10, 2022

whoops, race condition w/ Devan.

/hold cancel

@openshift-ci openshift-ci bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Nov 10, 2022
@msherif1234
Copy link
Copy Markdown
Contributor Author

/retest-required

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 11, 2022
@msherif1234
Copy link
Copy Markdown
Contributor Author

/retest-required

- idling tests were failing because controller group
  wasn't matching against the right group.
- undiling TCP tests were failing because of curl
 exit with (7) failed to connect to host error
 Curling a service with no endpoints has different results:
 openshift-sdn "hangs" while ovn-k rejects immediately.
 Because ovk-k refuses the connection curl's retry mechanism
 doesn't trigger unless --retry-connrefused is specified.
 Adding it allows the initial curl(s) to fail until the endpoints
 actually come back from unidling.

Signed-off-by: Mohamed Mahmoud <mmahmoud@redhat.com>
@msherif1234 msherif1234 requested review from jcaamano and removed request for oribon November 11, 2022 12:59
@jcaamano
Copy link
Copy Markdown
Contributor

/lgtm

This test is still disabled both for ovn-k and openshift-sdn:

"[sig-network-edge][Feature:Idling] Unidling [apigroup:apps.openshift.io][apigroup:route.openshift.io] should work with TCP (while idling)": " [Disabled:Broken]"

I think we want it enabled at least for ovn-k but @msherif1234 prefers to do that on a different PR.

Also, I have seen a flake once but what seems an unrelated reason:
https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/27538/pull-ci-openshift-origin-master-e2e-aws-ovn-fips/1591052931736539136

Nov 11 14:21:29.047: INFO: At 2022-11-11 14:20:29 +0000 UTC - event for idling-echo-1-deploy: {kubelet ip-10-0-178-181.us-east-2.compute.internal} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_idling-echo-1-deploy_e2e-test-cli-idling-mt4gt_18f885ac-ac81-42b7-b2d0-b2683ae44544_0(848e978df89ccebf974468650a829b1903b862fcd7679836a141bc6ee6bb3322): error adding pod e2e-test-cli-idling-mt4gt_idling-echo-1-deploy to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [e2e-test-cli-idling-mt4gt/idling-echo-1-deploy/18f885ac-ac81-42b7-b2d0-b2683ae44544:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[e2e-test-cli-idling-mt4gt/idling-echo-1-deploy 848e978df89ccebf974468650a829b1903b862fcd7679836a141bc6ee6bb3322] [e2e-test-cli-idling-mt4gt/idling-echo-1-deploy 848e978df89ccebf974468650a829b1903b862fcd7679836a141bc6ee6bb3322] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded

Maybe we should follow up anyway.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 11, 2022
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Nov 11, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bparees, jcaamano, msherif1234, oribon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 394aaa3 into openshift:master Nov 11, 2022
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Nov 11, 2022

@msherif1234: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-single-node b42ae04 link false /test e2e-aws-ovn-single-node
ci/prow/e2e-agnostic-ovn-cmd b42ae04 link false /test e2e-agnostic-ovn-cmd
ci/prow/e2e-aws-ovn-single-node-upgrade b42ae04 link false /test e2e-aws-ovn-single-node-upgrade
ci/prow/e2e-openstack-ovn b42ae04 link false /test e2e-openstack-ovn
ci/prow/e2e-aws-ovn-single-node-serial b42ae04 link false /test e2e-aws-ovn-single-node-serial

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@msherif1234
Copy link
Copy Markdown
Contributor Author

msherif1234 commented Nov 11, 2022

Disabled:Broken
opened this for ovnk but very low priority
https://bugzilla.redhat.com/show_bug.cgi?id=2004076

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants