Test Timeout: test/cmd/observe.sh:22: executing 'oc observe services --once --all-namespaces' #12930

stevekuznetsov · 2017-02-12T15:08:20Z

Looks like this:

00:47:51.777 Running test/cmd/observe.sh:22: executing 'oc observe services --once --all-namespaces' expecting success and text 'default kubernetes'...
05:54:44.989 Connection to 172.18.7.33 closed by remote host.
05:54:44.993 Build step 'Execute shell' marked build as failure

We should add a client timeout

The text was updated successfully, but these errors were encountered:

juanvallejo · 2017-02-13T21:27:49Z

@stevekuznetsov would the --exit-after flag not work here?

cc @smarterclayton

stevekuznetsov · 2017-02-13T21:48:04Z

No, I don't think so -- that exits with a successful result code, and we want the test to fail if it times out.

juanvallejo · 2017-02-13T22:08:54Z

since updating --exit-after to timeout with a non-zero exit code would break scripts / change behavior of the flag unexpectedly, I can open a PR that adds a second flag --timeout-after which would exit with 1 after the specified duration. If a second timeout flag is non-ideal, we could introduce a new option --signal which would override the 0 exit code of --exit-after

smarterclayton · 2017-02-14T19:57:28Z

I just don't know that this is part of the observe use case. It runs forever, which is the whole point. The test is timing out, and we just lack the test infrastructure in test/cmd to properly detect test timeout and cleanup the child process.

…

On Mon, Feb 13, 2017 at 5:08 PM, Juan Vallejo ***@***.***> wrote: since updating --exit-after to timeout with a non-zero exit code would break scripts / change behavior of the flag unexpectedly, I can open a PR that adds a second flag --timeout-after which would exit with 1 after the specified duration. If a second timeout flag is non-ideal, we could introduce a new option --signal which would override the 0 exit code of --exit-after — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#12930 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_p0211ody8Np_4ol_J26At9AD3_cYks5rcNR6gaJpZM4L-gv7> .

stevekuznetsov · 2017-02-14T20:01:19Z

@juanvallejo this is a particularly nasty test failure ... not sure I understand why the prio downgrade happened.

@smarterclayton I suggested to @juanvallejo to just change this to os::cmd ... "timeout oc observe ... " -- I understand what you mean about the failure cause here, but we cannot have this run for five hours. Give it a ten minute window and then give up.

smarterclayton · 2017-02-14T22:42:11Z

There is literally no scenario under which it's reasonable for this to hang unless we have a very serious bug. Adding specific changes to each test is the wrong solution. A top level timeout is the right solution, coupled with the fix. Understanding why this is hanging is p0.

…

On Tue, Feb 14, 2017 at 3:01 PM, Steve Kuznetsov ***@***.***> wrote: @juanvallejo <https://github.com/juanvallejo> this is a particularly nasty test failure ... not sure I understand why the prio downgrade happened. @smarterclayton <https://github.com/smarterclayton> I suggested to @juanvallejo <https://github.com/juanvallejo> to just change this to os::cmd ... "timeout oc observe ... " -- I understand what you mean about the failure cause here, but we cannot have this run for five hours. Give it a ten minute window and then give up. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#12930 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_p7WW8Rw23xZQbtv6xNJyI7a7Duw9ks5rcggTgaJpZM4L-gv7> .

stevekuznetsov · 2017-02-15T00:10:22Z

Right, but 6h long merge queue execution intervals don't bode well for the merge queue leading up to feature complete. If you think we should live with the pain while we figure out the flake, sure. I'm not in agreement but at least if it's painful it will be higher priority to look at.

I also saw this in:
https://ci.openshift.redhat.com/jenkins/job/test_job/3/console

smarterclayton · 2017-02-15T01:27:49Z

Put a timeout on "make check" On Feb 14, 2017, at 7:10 PM, Steve Kuznetsov <[email protected]> wrote: Right, but 6h long merge queue execution intervals don't bode well for the merge queue leading up to feature complete. If you think we should live with the pain while we figure out the flake, sure. I'm not in agreement but at least if it's painful it will be higher priority to look at. I also saw this in: https://ci.openshift.redhat.com/jenkins/job/test_job/3/console — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#12930 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_p3alILgB7xZu2ahVQCo9RT6TBTwIks5rckJwgaJpZM4L-gv7> .

stevekuznetsov · 2017-02-15T13:34:07Z

Then we get corrupted test output and no XML. Do we want that?

smarterclayton · 2017-02-15T13:57:38Z

Timeout won't tell you what the actual problem is - if that's the point here, add something to panic child processes. Whatever issue this is isn't going to be found by granular timeouts.

juanvallejo · 2017-02-20T14:54:24Z

closing via #12980

stevekuznetsov added component/cli kind/test-flake Categorizes issue or PR as related to test flakes. priority/P1 labels Feb 12, 2017

stevekuznetsov mentioned this issue Feb 12, 2017

remove mongo clustered test: replaced by statefulset example and test #12913

Merged

pweil- assigned juanvallejo Feb 13, 2017

stevekuznetsov closed this as completed Feb 13, 2017

juanvallejo reopened this Feb 13, 2017

juanvallejo added priority/P2 and removed priority/P1 labels Feb 14, 2017

stevekuznetsov added priority/P1 and removed priority/P2 labels Feb 14, 2017

juanvallejo mentioned this issue Feb 14, 2017

add timeout to observe services test #12958

Closed

stevekuznetsov added priority/P0 and removed priority/P1 labels Feb 15, 2017

juanvallejo mentioned this issue Feb 15, 2017

add closure that guarantees mutex unlock in loop #12980

Merged

stevekuznetsov mentioned this issue Feb 16, 2017

Connection closed by remote host #12988

Closed

juanvallejo closed this as completed Feb 20, 2017

soltysh mentioned this issue Apr 13, 2017

Test Timeout: test/end-to-end/core.sh:94: executing 'oc rsh dc/docker-registry cat config.yml' #13757

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test Timeout: test/cmd/observe.sh:22: executing 'oc observe services --once --all-namespaces' #12930

Test Timeout: test/cmd/observe.sh:22: executing 'oc observe services --once --all-namespaces' #12930

stevekuznetsov commented Feb 12, 2017

juanvallejo commented Feb 13, 2017 •

edited

Loading

stevekuznetsov commented Feb 13, 2017

juanvallejo commented Feb 13, 2017

smarterclayton commented Feb 14, 2017 via email

stevekuznetsov commented Feb 14, 2017

smarterclayton commented Feb 14, 2017 via email

stevekuznetsov commented Feb 15, 2017

smarterclayton commented Feb 15, 2017 via email

stevekuznetsov commented Feb 15, 2017

smarterclayton commented Feb 15, 2017

juanvallejo commented Feb 20, 2017

Test Timeout: test/cmd/observe.sh:22: executing 'oc observe services --once --all-namespaces' #12930

Test Timeout: test/cmd/observe.sh:22: executing 'oc observe services --once --all-namespaces' #12930

Comments

stevekuznetsov commented Feb 12, 2017

juanvallejo commented Feb 13, 2017 • edited Loading

stevekuznetsov commented Feb 13, 2017

juanvallejo commented Feb 13, 2017

smarterclayton commented Feb 14, 2017 via email

stevekuznetsov commented Feb 14, 2017

smarterclayton commented Feb 14, 2017 via email

stevekuznetsov commented Feb 15, 2017

smarterclayton commented Feb 15, 2017 via email

stevekuznetsov commented Feb 15, 2017

smarterclayton commented Feb 15, 2017

juanvallejo commented Feb 20, 2017

juanvallejo commented Feb 13, 2017 •

edited

Loading