Skip to content

Conversation

@obastemur
Copy link
Collaborator

CI / Debug build tests fail due to timeout.
Array/array_slice.js test actually takes 14 seconds to complete
on an empty macbook-pro with Quad 2.9 Ghz i7 cores on it.
Previous 60 seconds timeout may not cover some tests on CI VM / Debug build.

Fixes #1406

CI / Debug build tests fail due to timeout.
Array/array_slice.js test actually takes 14 seconds to complete
on an empty macbook-pro with Quad 2.9 Ghz i7 cores on it.
Previous 60 seconds timeout may not cover some tests on CI VM / Debug build
@dilijev
Copy link
Contributor

dilijev commented Aug 10, 2016

FYI @ianwjhalliday

Does it make sense to change the default timeout, or would it be better to just selectively increase the timeout for the problematic tests?

Since we see this for some VMs internally on Windows builds, if we're going to increase the default timeout, we could increase it across all OSs.

But I'm a bit worried about extending the timeout and then missing tests that do become problematic. The real solution is to either make the tests shorter to accomplish the same thing, or if the longer runtime is important to the test, then increase the timeout selectively for those tests and run them in the slow/nightly configurations only.

Other timeouts:
Linux #1417
Windows #1407

@obastemur
Copy link
Collaborator Author

But I'm a bit worried about extending the timeout and then missing tests that do become problematic.

@dilijev While sharing the same CI / physical machine resources with many other projects, I can't see how test timings are important. (from 60 to 120 seconds)

@dilijev
Copy link
Contributor

dilijev commented Aug 11, 2016

@obastemur It's an indication that these tests may be badly written or belong in slow tests. Unless there's a reason to have them run for a long time, unit tests should basically finish instantly. It's just an issue of codebase quality (which includes tests).

@obastemur
Copy link
Collaborator Author

Unless there's a reason to have them run for a long time, unit tests should basically finish instantly.

In that case we shouldn't even have 60 seconds timeout.. The problem here is pretty straight-forward. As it is written on PR's description; That particular test takes 14 secs on a very decent machine with no additional task. CI can be pretty busy hence it may need more than 60.. IMO there is no practical difference (on a heavily shared CI) between 60 and 120.

@dilijev
Copy link
Contributor

dilijev commented Aug 11, 2016

@obastemur You may be right about there being no practical difference between 60 and 120. There's definitely a real reason to have some timeout because some tests may not complete at all for one reason or another and block up a test machine, which is part of why we added the timeouts in the first place.

For the set of tests we see failing regularly, it seems 120s is not enough on some VMs anyway, so I've submitted #1421 to increase the timeouts to 5 minutes to solve the immediate problem.

Few enough tests run beyond even 10s that it seems really overkill to expand the default to a place where it might take twice as long to notice a test is starting to take a long time. We don't have this problem very often and we're seeing it now because of new test infra configurations. If we take the tests that fail regularly and fix up the timeouts, the problem gets resolved without the sledgehammer approach.

@ianwjhalliday
Copy link
Collaborator

In an ideal world where CI machines had enough power that they didn't starve tests of CPU time long enough that they timeout I would say we should not up the default timeout. But then in that ideal world I would have to admit that my principles here say that we should shrink the default timeout to no more than something small like 5 seconds.

This is squishy. I feel on the one hand the 60 second timeout is nice to catch longer running tests that starve easily, and get our attention on them to decide whether they are worthwhile or have room for improvement (or should be relegated to the slower tests bucket). Extending it to 120 or higher would still catch infinite loops, but it would allow slower tests to creep in unnoticed.

I think I prefer leaving the default as is and dealing with individual offending tests on a per case basis. If this continues to be a problem only on CI then we could up the default timeout, or maybe we need to push for better CI resources if there is a large disparity between them and our dev machines.

Alternatively, as Doug mentioned elsewhere, we could introduce a warning threshold and then increase the timeout significantly. In which case I would suggest finding a way to have a 5 second warning threshold without it becoming desensitizing noise in CI.

@obastemur
Copy link
Collaborator Author

@dilijev @ianwjhalliday the whole claim sounds like comparing apples to oranges.

We don't have JIT etc. enabled on xplat and there is no Debug vs Release test timeout approach on both RL and runtests.py. This PR was targeting runtests.py only and curious if any of you have measured tests on interpreter mode / *nix VM and have clear expectation on timeouts there?

There is a need for Release/Debug/JIT/no JIT timeout approach in case (which I don't agree) we want to keep 60 secs around. IMO, timeout we have serves one purpose; kill a test/process somehow failed to succeed in a given max time. We share the same CI with many other projects and it is hard (if it is possible) to measure anything making sense from the current CI. It just helps us to see if we break anything with a particular PR. In case a PR affects the performance, this CI is definitely not the place to catch that.

Reminder; closed this PR already. Since we have another PR merged to master no need for merging this one IMHO.

@ianwjhalliday
Copy link
Collaborator

That's fair. Note that we do run the test suite in -nonative mode for the interpreted variant (which also collects profile data for the dynapogo variant). So we should hold the *nix version of ChakraCore tests to the same standard.

We did an exercise almost a couple years ago now where we looked at the time each test took on a sample dev machine and examined all the tests that took longer than 5 seconds, and either moved them to the Slow bucket or modified them to not be slow. Perhaps we should do this exercise again, on a lower end machine. We should have the data from the CI. I'll work with @dilijev to go through it and see what we find.

@obastemur obastemur deleted the timeout_py branch September 2, 2016 01:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants