cross platform: Change default timeout to 120 secs #1413

obastemur · 2016-08-10T15:43:12Z

CI / Debug build tests fail due to timeout.
Array/array_slice.js test actually takes 14 seconds to complete
on an empty macbook-pro with Quad 2.9 Ghz i7 cores on it.
Previous 60 seconds timeout may not cover some tests on CI VM / Debug build.

Fixes #1406

CI / Debug build tests fail due to timeout. Array/array_slice.js test actually takes 14 seconds to complete on an empty macbook-pro with Quad 2.9 Ghz i7 cores on it. Previous 60 seconds timeout may not cover some tests on CI VM / Debug build

dilijev · 2016-08-10T23:38:44Z

FYI @ianwjhalliday

Does it make sense to change the default timeout, or would it be better to just selectively increase the timeout for the problematic tests?

Since we see this for some VMs internally on Windows builds, if we're going to increase the default timeout, we could increase it across all OSs.

But I'm a bit worried about extending the timeout and then missing tests that do become problematic. The real solution is to either make the tests shorter to accomplish the same thing, or if the longer runtime is important to the test, then increase the timeout selectively for those tests and run them in the slow/nightly configurations only.

Other timeouts:
Linux #1417
Windows #1407

obastemur · 2016-08-11T12:31:31Z

But I'm a bit worried about extending the timeout and then missing tests that do become problematic.

@dilijev While sharing the same CI / physical machine resources with many other projects, I can't see how test timings are important. (from 60 to 120 seconds)

dilijev · 2016-08-11T17:29:15Z

@obastemur It's an indication that these tests may be badly written or belong in slow tests. Unless there's a reason to have them run for a long time, unit tests should basically finish instantly. It's just an issue of codebase quality (which includes tests).

obastemur · 2016-08-11T20:07:03Z

Unless there's a reason to have them run for a long time, unit tests should basically finish instantly.

In that case we shouldn't even have 60 seconds timeout.. The problem here is pretty straight-forward. As it is written on PR's description; That particular test takes 14 secs on a very decent machine with no additional task. CI can be pretty busy hence it may need more than 60.. IMO there is no practical difference (on a heavily shared CI) between 60 and 120.

dilijev · 2016-08-11T20:58:43Z

@obastemur You may be right about there being no practical difference between 60 and 120. There's definitely a real reason to have some timeout because some tests may not complete at all for one reason or another and block up a test machine, which is part of why we added the timeouts in the first place.

For the set of tests we see failing regularly, it seems 120s is not enough on some VMs anyway, so I've submitted #1421 to increase the timeouts to 5 minutes to solve the immediate problem.

Few enough tests run beyond even 10s that it seems really overkill to expand the default to a place where it might take twice as long to notice a test is starting to take a long time. We don't have this problem very often and we're seeing it now because of new test infra configurations. If we take the tests that fail regularly and fix up the timeouts, the problem gets resolved without the sledgehammer approach.

ianwjhalliday · 2016-08-11T22:51:29Z

In an ideal world where CI machines had enough power that they didn't starve tests of CPU time long enough that they timeout I would say we should not up the default timeout. But then in that ideal world I would have to admit that my principles here say that we should shrink the default timeout to no more than something small like 5 seconds.

This is squishy. I feel on the one hand the 60 second timeout is nice to catch longer running tests that starve easily, and get our attention on them to decide whether they are worthwhile or have room for improvement (or should be relegated to the slower tests bucket). Extending it to 120 or higher would still catch infinite loops, but it would allow slower tests to creep in unnoticed.

I think I prefer leaving the default as is and dealing with individual offending tests on a per case basis. If this continues to be a problem only on CI then we could up the default timeout, or maybe we need to push for better CI resources if there is a large disparity between them and our dev machines.

Alternatively, as Doug mentioned elsewhere, we could introduce a warning threshold and then increase the timeout significantly. In which case I would suggest finding a way to have a 5 second warning threshold without it becoming desensitizing noise in CI.

obastemur · 2016-08-12T13:29:05Z

@dilijev @ianwjhalliday the whole claim sounds like comparing apples to oranges.

We don't have JIT etc. enabled on xplat and there is no Debug vs Release test timeout approach on both RL and runtests.py. This PR was targeting runtests.py only and curious if any of you have measured tests on interpreter mode / *nix VM and have clear expectation on timeouts there?

There is a need for Release/Debug/JIT/no JIT timeout approach in case (which I don't agree) we want to keep 60 secs around. IMO, timeout we have serves one purpose; kill a test/process somehow failed to succeed in a given max time. We share the same CI with many other projects and it is hard (if it is possible) to measure anything making sense from the current CI. It just helps us to see if we break anything with a particular PR. In case a PR affects the performance, this CI is definitely not the place to catch that.

Reminder; closed this PR already. Since we have another PR merged to master no need for merging this one IMHO.

ianwjhalliday · 2016-08-13T00:36:21Z

That's fair. Note that we do run the test suite in -nonative mode for the interpreted variant (which also collects profile data for the dynapogo variant). So we should hold the *nix version of ChakraCore tests to the same standard.

We did an exercise almost a couple years ago now where we looked at the time each test took on a sample dev machine and examined all the tests that took longer than 5 seconds, and either moved them to the Slow bucket or modified them to not be slow. Perhaps we should do this exercise again, on a lower end machine. We should have the data from the CI. I'll work with @dilijev to go through it and see what we find.

msftclas added the cla-already-signed label Aug 10, 2016

obastemur mentioned this pull request Aug 11, 2016

Extend test timeout for allocation*.js and array_slice.js which can run longer on busy or low-resource test VMs. #1421

Merged

obastemur closed this Aug 11, 2016

dilijev mentioned this pull request Aug 11, 2016

[Tests] Add test timeout *warning* at a lower threshold than *error* so we can detect test runtime regressions before they start causing errors. #1425

Closed

obastemur deleted the timeout_py branch September 2, 2016 01:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cross platform: Change default timeout to 120 secs #1413

cross platform: Change default timeout to 120 secs #1413

Uh oh!

obastemur commented Aug 10, 2016

Uh oh!

dilijev commented Aug 10, 2016 •

edited

Loading

Uh oh!

obastemur commented Aug 11, 2016

Uh oh!

dilijev commented Aug 11, 2016

Uh oh!

obastemur commented Aug 11, 2016

Uh oh!

dilijev commented Aug 11, 2016 •

edited

Loading

Uh oh!

ianwjhalliday commented Aug 11, 2016

Uh oh!

obastemur commented Aug 12, 2016

Uh oh!

ianwjhalliday commented Aug 13, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cross platform: Change default timeout to 120 secs #1413

cross platform: Change default timeout to 120 secs #1413

Uh oh!

Conversation

obastemur commented Aug 10, 2016

Uh oh!

dilijev commented Aug 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

obastemur commented Aug 11, 2016

Uh oh!

dilijev commented Aug 11, 2016

Uh oh!

obastemur commented Aug 11, 2016

Uh oh!

dilijev commented Aug 11, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ianwjhalliday commented Aug 11, 2016

Uh oh!

obastemur commented Aug 12, 2016

Uh oh!

ianwjhalliday commented Aug 13, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dilijev commented Aug 10, 2016 •

edited

Loading

dilijev commented Aug 11, 2016 •

edited

Loading