-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: run tests with a release+asserts build and 4 workers #11614
Conversation
…cores this will hopefully run faster and use less memory
Differences in backtraces maybe? If this helps (which at least from the small sample size so far it looks like it does) I say go for it. |
At the time #5228 was merged, apparently the Travis builds only took 5 minutes. Crazy how things have ballooned since then. Since we are doing the separate "starts without |
[av skip]
If we can fix #10205 and this PR works out it will help a lot. |
I'm not so sure the builds of C libraries are that big of a contributor to the build time, but if we can find an up-to-date enough PPA or bring them into juliadeps it'll help some. If this speeds things up enough, it might also be worth trying to bring back osx builds. I'm trying that and cleaning some other things up on a branch. |
6 led to a gc segfault apparently, not an oom killer but not a good sign either - https://travis-ci.org/JuliaLang/julia/jobs/65938347 |
Incidently, that's the same codepath I mentioned in #11606 (comment) ... Edit: and I'm running a collection here just to make sure if it's an issue or not... |
And the stack overflow in dates https://travis-ci.org/JuliaLang/julia/jobs/65938351 has been happening on the buildbots a bunch but this might be the first time I've seen it on Travis I think, so I haven't filed a separate issue for it yet. |
I'd be in favor of merging if you rebase out the second commit, or just cherry-picking the first. |
So it happens with 4 workers as well? https://travis-ci.org/JuliaLang/julia/jobs/65979278 |
If I had to guess, I'd say probably shortly after the tuple overhaul. There's a similar-looking stack overflow in Enums that happened at 0cd2677 on this build http://buildbot.e.ip.saba.us:8010/builders/build_ubuntu14.04-x86/builds/1366/steps/shell_2/logs/stdio and one in dates a few days later http://buildbot.e.ip.saba.us:8010/builders/build_ubuntu14.04-x86/builds/1423/steps/shell_2/logs/stdio |
Does it make sense to print frame number in the backtrace? I find it quite hard to compare the backtrace printed with the symbols stript or not.... |
I run the sparse test 300 times last night and got 3 times this stackoverflow error and one * sparse exception on 1: ERROR: LoadError: LoadError: assertion failed: |F' \ ones(elty,5) - full(A1pd)' \ ones(5)| <= 1.1641532182693481e-5
F' \ ones(elty,5) = [1.8239389082351972e6
934570.1978360965
8.603310773126363
30.915492846568245
5.86573069208295]
full(A1pd)' \ ones(5) = [1.8239389082799375e6,934570.1978590211,8.603310773126363,30.91549284656825,5.86573069208295]
difference = 4.474027082324028e-5 > 1.1641532182693481e-5
in error at ./error.jl:22
in test_approx_eq at ./test.jl:139
in anonymous at ./no file:382
in include at ./boot.jl:253
in include_from_node1 at ./loading.jl:133
in include at ./boot.jl:253
in runtests at /home/yuyichao/projects/julia/master/test/testdefs.jl:197
in anonymous at ./multi.jl:644
in run_work_thunk at ./multi.jl:605
in remotecall_fetch at ./multi.jl:678
in remotecall_fetch at ./multi.jl:693
in anonymous at ./task.jl:1422
while loading /home/yuyichao/projects/julia/master/test/sparsedir/cholmod.jl, in expression starting on line 318
while loading /home/yuyichao/projects/julia/master/test/sparse.jl, in expression starting on line 6
ERROR: LoadError: LoadError: LoadError: assertion failed: |F' \ ones(elty,5) - full(A1pd)' \ ones(5)| <= 1.1641532182693481e-5
F' \ ones(elty,5) = [1.8239389082351972e6
934570.1978360965
8.603310773126363
30.915492846568245
5.86573069208295]
full(A1pd)' \ ones(5) = [1.8239389082799375e6,934570.1978590211,8.603310773126363,30.91549284656825,5.86573069208295]
difference = 4.474027082324028e-5 > 1.1641532182693481e-5
in error at ./error.jl:22
in test_approx_eq at ./test.jl:139
in anonymous at ./no file:382
in include at ./boot.jl:253
in include_from_node1 at ./loading.jl:133
in include at ./boot.jl:253
in runtests at /home/yuyichao/projects/julia/master/test/testdefs.jl:197
in anonymous at ./multi.jl:644
in run_work_thunk at ./multi.jl:605
in remotecall_fetch at ./multi.jl:678
in remotecall_fetch at ./multi.jl:693
in anonymous at ./task.jl:1422
while loading /home/yuyichao/projects/julia/master/test/sparsedir/cholmod.jl, in expression starting on line 318
while loading /home/yuyichao/projects/julia/master/test/sparse.jl, in expression starting on line 6
while loading /home/yuyichao/projects/julia/master/test/runtests.jl, in expression Which looks like a normal precision error for me. Does the result make sense for the input and should we relax the requirement here a little bit? |
Probably. cc @andreasnoack |
I think it is better to set the seed to have it deterministic. If we relax the tolerance then it will just happen again, but will smaller probability. |
Agreed. The seed should be fixed. |
Hoping this will help #11553 a bit. Seems to take about the same amount of time, but with half as many workers. (The tests are roughly 2x slower in a debug build.)
Were there reasons to run the tests in debug mode other than assertions?