RFC: run tests with a release+asserts build and 4 workers #11614

JeffBezanson · 2015-06-08T04:41:00Z

Hoping this will help #11553 a bit. Seems to take about the same amount of time, but with half as many workers. (The tests are roughly 2x slower in a debug build.)

Were there reasons to run the tests in debug mode other than assertions?

…cores this will hopefully run faster and use less memory

tkelman · 2015-06-08T14:53:39Z

Were there reasons to run the tests in debug mode other than assertions?

Differences in backtraces maybe? If this helps (which at least from the small sample size so far it looks like it does) I say go for it.

tkelman · 2015-06-08T18:47:11Z

At the time #5228 was merged, apparently the Travis builds only took 5 minutes. Crazy how things have ballooned since then. Since we are doing the separate "starts without sys.so" test now, and I think we've been putting debug symbols in libjulia.so by default for a while, I'm not sure if running with a whole debug build actually helps catch much of anything. We can keep an eye out by running tests with julia-debug on the buildbots if we need to.

[av skip]

JeffBezanson · 2015-06-08T19:08:07Z

If we can fix #10205 and this PR works out it will help a lot.

tkelman · 2015-06-08T19:20:48Z

I'm not so sure the builds of C libraries are that big of a contributor to the build time, but if we can find an up-to-date enough PPA or bring them into juliadeps it'll help some.

If this speeds things up enough, it might also be worth trying to bring back osx builds. I'm trying that and cleaning some other things up on a branch.

tkelman · 2015-06-08T20:09:18Z

6 led to a gc segfault apparently, not an oom killer but not a good sign either - https://travis-ci.org/JuliaLang/julia/jobs/65938347

yuyichao · 2015-06-08T20:21:37Z

Incidently, that's the same codepath I mentioned in #11606 (comment) ...

Edit: and I'm running a collection here just to make sure if it's an issue or not...

tkelman · 2015-06-08T20:27:26Z

And the stack overflow in dates https://travis-ci.org/JuliaLang/julia/jobs/65938351 has been happening on the buildbots a bunch but this might be the first time I've seen it on Travis I think, so I haven't filed a separate issue for it yet.

tkelman · 2015-06-08T21:49:35Z

I'd be in favor of merging if you rebase out the second commit, or just cherry-picking the first.

yuyichao · 2015-06-09T02:03:12Z

So it happens with 4 workers as well? https://travis-ci.org/JuliaLang/julia/jobs/65979278
When did it first happens on the buildbot?

tkelman · 2015-06-09T02:08:28Z

If I had to guess, I'd say probably shortly after the tuple overhaul. There's a similar-looking stack overflow in Enums that happened at 0cd2677 on this build http://buildbot.e.ip.saba.us:8010/builders/build_ubuntu14.04-x86/builds/1366/steps/shell_2/logs/stdio and one in dates a few days later http://buildbot.e.ip.saba.us:8010/builders/build_ubuntu14.04-x86/builds/1423/steps/shell_2/logs/stdio

yuyichao · 2015-06-09T02:10:55Z

Does it make sense to print frame number in the backtrace? I find it quite hard to compare the backtrace printed with the symbols stript or not....

yuyichao · 2015-06-09T12:12:30Z

I run the sparse test 300 times last night and got 3 times this stackoverflow error and one

     * sparse              exception on 1: ERROR: LoadError: LoadError: assertion failed: |F' \ ones(elty,5) - full(A1pd)' \ ones(5)| <= 1.1641532182693481e-5
  F' \ ones(elty,5) = [1.8239389082351972e6
 934570.1978360965
 8.603310773126363
 30.915492846568245
 5.86573069208295]
  full(A1pd)' \ ones(5) = [1.8239389082799375e6,934570.1978590211,8.603310773126363,30.91549284656825,5.86573069208295]
  difference = 4.474027082324028e-5 > 1.1641532182693481e-5
 in error at ./error.jl:22
 in test_approx_eq at ./test.jl:139
 in anonymous at ./no file:382
 in include at ./boot.jl:253
 in include_from_node1 at ./loading.jl:133
 in include at ./boot.jl:253
 in runtests at /home/yuyichao/projects/julia/master/test/testdefs.jl:197
 in anonymous at ./multi.jl:644
 in run_work_thunk at ./multi.jl:605
 in remotecall_fetch at ./multi.jl:678
 in remotecall_fetch at ./multi.jl:693
 in anonymous at ./task.jl:1422
while loading /home/yuyichao/projects/julia/master/test/sparsedir/cholmod.jl, in expression starting on line 318
while loading /home/yuyichao/projects/julia/master/test/sparse.jl, in expression starting on line 6
ERROR: LoadError: LoadError: LoadError: assertion failed: |F' \ ones(elty,5) - full(A1pd)' \ ones(5)| <= 1.1641532182693481e-5
  F' \ ones(elty,5) = [1.8239389082351972e6
 934570.1978360965
 8.603310773126363
 30.915492846568245
 5.86573069208295]
  full(A1pd)' \ ones(5) = [1.8239389082799375e6,934570.1978590211,8.603310773126363,30.91549284656825,5.86573069208295]
  difference = 4.474027082324028e-5 > 1.1641532182693481e-5
 in error at ./error.jl:22
 in test_approx_eq at ./test.jl:139
 in anonymous at ./no file:382
 in include at ./boot.jl:253
 in include_from_node1 at ./loading.jl:133
 in include at ./boot.jl:253
 in runtests at /home/yuyichao/projects/julia/master/test/testdefs.jl:197
 in anonymous at ./multi.jl:644
 in run_work_thunk at ./multi.jl:605
 in remotecall_fetch at ./multi.jl:678
 in remotecall_fetch at ./multi.jl:693
 in anonymous at ./task.jl:1422
while loading /home/yuyichao/projects/julia/master/test/sparsedir/cholmod.jl, in expression starting on line 318
while loading /home/yuyichao/projects/julia/master/test/sparse.jl, in expression starting on line 6
while loading /home/yuyichao/projects/julia/master/test/runtests.jl, in expression

Which looks like a normal precision error for me. Does the result make sense for the input and should we relax the requirement here a little bit?

tkelman · 2015-06-09T18:48:48Z

Which looks like a normal precision error for me. Does the result make sense for the input and should we relax the requirement here a little bit?

Probably. cc @andreasnoack

andreasnoack · 2015-06-09T19:17:30Z

I think it is better to set the seed to have it deterministic. If we relax the tolerance then it will just happen again, but will smaller probability.

StefanKarpinski · 2015-06-09T19:22:48Z

Agreed. The seed should be fixed.

try changing travis config to use a release+asserts build and only 4 …

e369b25

…cores this will hopefully run faster and use less memory

push our luck and try 6 test workers

33c5d92

[av skip]

JeffBezanson closed this Jun 8, 2015

tkelman mentioned this pull request Jun 9, 2015

Test re-enabling osx travis #11622

Merged

tkelman deleted the jb/travistweaks branch June 9, 2015 00:46

andreasnoack mentioned this pull request Jun 9, 2015

Use seed in CHOLMOD tests #11639

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: run tests with a release+asserts build and 4 workers #11614

RFC: run tests with a release+asserts build and 4 workers #11614

JeffBezanson commented Jun 8, 2015

tkelman commented Jun 8, 2015

tkelman commented Jun 8, 2015

JeffBezanson commented Jun 8, 2015

tkelman commented Jun 8, 2015

tkelman commented Jun 8, 2015

yuyichao commented Jun 8, 2015

tkelman commented Jun 8, 2015

tkelman commented Jun 8, 2015

yuyichao commented Jun 9, 2015

tkelman commented Jun 9, 2015

yuyichao commented Jun 9, 2015

yuyichao commented Jun 9, 2015

tkelman commented Jun 9, 2015

andreasnoack commented Jun 9, 2015

StefanKarpinski commented Jun 9, 2015

RFC: run tests with a release+asserts build and 4 workers #11614

RFC: run tests with a release+asserts build and 4 workers #11614

Conversation

JeffBezanson commented Jun 8, 2015

tkelman commented Jun 8, 2015

tkelman commented Jun 8, 2015

JeffBezanson commented Jun 8, 2015

tkelman commented Jun 8, 2015

tkelman commented Jun 8, 2015

yuyichao commented Jun 8, 2015

tkelman commented Jun 8, 2015

tkelman commented Jun 8, 2015

yuyichao commented Jun 9, 2015

tkelman commented Jun 9, 2015

yuyichao commented Jun 9, 2015

yuyichao commented Jun 9, 2015

tkelman commented Jun 9, 2015

andreasnoack commented Jun 9, 2015

StefanKarpinski commented Jun 9, 2015