Fix multiple problems with views-of-views #10505

timholy · 2015-03-13T14:11:55Z

This is a public service announcement from your local "oh, crap" department.

Due to what was probably a copy/paste error, the SubArray tests were grabbing a global variable A rather than an argument SB. A happened to be a plain array, with the consequence that the SubArray tests were not thoroughly testing views-of-views:

A = rand(20, 20, 20)
B = slice(A, 1:5, 6, 6:9)  # a view (2-dimensional)
C = sub(B, 6:14)           # a view of a view (1-dimensional)

I note that, had I passed that file through Lint, it would have caught the problem.

Unsurprisingly, fixing this test problem uncovered bugs in my implementation of SubArrays. Many "simple" (hah!) things worked because I had tested quite a lot via the command line, and also copied in some old hand-written tests. But constructs like the one above---specifically, those that use linear indexing in creation of a view-of-view, when one of the uber-parent dimensions had been sliced out---certainly had problems. I think people would have quickly gotten BoundsErrors if this were being commonly used, so the fact that this hasn't been noticed until now is a good sign. Once we start returning views from indexing operations, views-of-views are going to be really common, so it surely would have turned up then.

One major downside to this fix is that, now that the tests are running properly, the subarray test takes (on my machine) 14 minutes to complete. I think that's probably too long. However, I also don't think the test is excessive in its thoroughness:

it only goes up to 3 dimensions, which I've found to be the absolute minimum for uncovering errors in logic. Originally I used 4 dimensions.
the slow part (views-of-views) uses just 5 different instances of indexes: one Colon, one Int, one UnitRange{Int}, one StepRange{Int}, and one Vector{Int}. It uses all possible combinations of these, leading to 125 views. It then creates views-of-views using indexes of the same types, for 1, 2, and 3-dimensional indexing. That's a lot of combinations (and a lot of code to compile), but the different choices have really important consequences for both cartesian indexing and inference of LD (the maximum dimension with a uniform stride), which is key to our current SubArrays' ability to perform fast linear indexing in many situations. The logic behind this inference is quite complicated, and seems to demand thorough testing.

I suspect there's no alternative but to hand-pick a subset of these combinations for routine testing, and perhaps save the exhaustive tests for a rarely-run test suite. But I didn't want to start decreasing the stringency of our tests without discussing it first, especially when an accidental lack of stringency led to bugs going unnoticed.

[Note that the large size of this commit partly reflects cleanup as well as bugfixes.]

Sorry, all!

Due to what was probably a copy/paste error, the tests were grabbing a global variable A (rather than an argument) that happened to be a plain array. So we weren't really testing views-of-views. Unsurprisingly, fixing this problem uncovered bugs. Many "simple" things worked, but sufficiently complicated constructs like A = rand(20, 20, 20) B = slice(A, 1:5, 6, 6:9) C = sub(B, :) (specifically, those that use linear indexing in creation of a view-of-view) certainly had problems. Drat. Well, hopefully this has been rare up until now; it seems rather likely that folks would have gotten BoundsErrors if anyone had actually been using this. But we start returning views from getindex it will become common, so better that it was caught now. In debugging and fixing the problems, I took the opportunity to do some cleanup to make this more maintainable, specifically: adopting a consistent naming convention, improving many variable names, and adding comments.

timholy · 2015-03-13T17:10:51Z

The appveyor failure looks like a timeout (unsurprisingly).

tkelman · 2015-03-14T05:28:41Z

If we really need to run a substantially slower set of tests on every commit and PR, an easy option would be to split up the existing tests into more entries in the build matrix. This will impact our queue time (esp. on appveyor), but not much else.

While our tests have been gradually growing in coverage lately, it also seems there have been some regressions in codegen speed. Profiling and taking a few performance passes on codegen might be called for soon, particularly if we're aiming to upgrade LLVM before 0.4, which AIUI will only make this problem worse.

An outstanding todo has been to move more testing onto the buildbots, the perf suite and otherwise. Other than Elliot and myself I don't think many people are paying very close attention to those at the moment though.

timholy · 2015-03-14T09:06:50Z

Where does one go to look at buildbot results? Other than the coveralls page (which I do check now and then). Indeed, a puzzling gap in coverage helped me discover the problem here.

Interesting failure in bitarray tests here, I haven't seen that one before.

tkelman · 2015-03-14T09:12:17Z

http://buildbot.e.ip.saba.us:8010/builders - I have a hard time remembering that myself, would be great if build.julialang.org could be made to redirect there

ViralBShah · 2015-03-14T09:38:18Z

The redirect can be set up by @StefanKarpinski

It is probably time to have a comprehensive test suite and a quick test suite.

mbauman · 2015-03-14T13:05:02Z

Interesting failure in bitarray tests here, I haven't seen that one before.

I saw a very similar failure on Win32 during the parallel tests on #10512. It was new to me, too. Glad that it isn't an issue with either of these PRs, but sad that it looks like we've got another non-deterministic type error on our hands.

timholy · 2015-03-15T04:05:59Z

OK, I think I've come up with a good solution: run a random small fraction of the subarray tests. There's an environmental variable you can set to specify the fraction (including all of them). Strategy or name bikeshedding welcome.

tkelman · 2015-03-15T09:59:52Z

I'm not a fan of the idea of running a random subset of tests. Nondeterministic test failures is probably the single most frustrating thing about working on/with Julia. Unless you're confident that the percentage you've chosen would be enough to always immediately catch any problem that might get introduced that would cause any of these tests to fail...

If we're going to choose based on an environment variable, wouldn't it be more customary to give it an all-caps name?

timholy · 2015-03-15T11:21:22Z

I appreciate the feedback as always, @tkelman. It's great to have such a conscientious steward of our CI infrastructure.

I'm not confident that 2.5% of the tests would be enough to catch any problem, so the issue you are worried about is real. But I think this is in a different category from the nondeterministic failures that plague other julia development (BTW, on Linux rr is indeed awesome). In this case it's not like you're going to get a segfault or something; there will be a clear backtrace, and the solution will be obvious: git bisect with a run of the full test. The theory is that even if folks aren't watching the buildbots, it will be caught before too much time elapses.

The alternative is to hand-pick a subset of tests to run consistently. That's not bad either, but I'm not confident of my ability to pick a subset that would catch any and all possible errors. Therefore, a potential outcome is that a bug might be introduced that affects a subset that isn't tested. Indeed, developers will change their code until the test suite passes, so a consistent, biased subset of the full set of tests might encourage this outcome. In that scenario, if no one is watching the buildbots, then it could go a long time without being caught.

Anyway, willing to go either way here, but to me the random subset seems the safer solution.

Indeed I can change to all-caps. I'll do that if this strategy ends up being approved, or delete it if not.

ViralBShah · 2015-03-15T15:47:48Z

The reason I do not like picking a random subset of tests is because you can easily have a failure on one subset, and not have it occur again. You never know then if it is some CI environment issue, or a real failure.

I would much rather pick a small set (even imperfect) for regular testing where failures are deterministic, and put in efforts towards building a comprehensive test suite that runs nightly.

StefanKarpinski · 2015-03-15T15:50:01Z

One option would be to pick a fixed random subset of these tests. That is deterministic but arbitrary.

tknopp · 2015-03-15T19:27:23Z

yep. Random tests are dangerous. Better pick a fixed random set. May it be possible to achieve this with a fixed seed or is the random sequence generated by a certain seed hardware specific?

ivarne · 2015-03-15T19:39:41Z

I dislike the "Write fewer tests" option, or never run all of them.

If we make the choice of test permutations to run random, we should at least make it simple to rerun the same set of tests (in the same order) trivial. The test failure message must be very clear that a randomly chosen test failed, and detailed steps to rerun the exact same test run. (eg print the seed used for the RNG).

Another way to pick tests would be to run every Nth permutation skipping the first K, and configure N to a value that gives a reasonable runtime, and pick a random offset K. (Maybe based on VERSION.build[1] % N, so that we cycle through the full test space, and get full reproducibility when rerunning the same commit.)

A convenient interface might be:

@test_subset 5 for I_type in subtypes(Integer), F_type in subtypes(FloatingPoint)
    testfun(zero(I_type), one(F_type))
end

Where the 5 specifies that every 5th permutation should be tested for this batch, because if a test set has 1000 combinations, and another has 50, we should run different portions of the test space for each of them. One might have a global setting to increase this fraction.

PS: What is the recursive option for subtypes(Integer), that only returns leaftypes?

timholy · 2015-03-16T00:09:22Z

What if I were to run all the tests inside a try/catch and have the catch handle the error by (1) informing the user that a randomly-selected test failed, and (2) printing the exact combination of indexes used in the failing case? That way people could play with that combination manually until their patch succeeds.

@ivarne, your interface is a good idea. One challenge, though: currently to run 2.5% of the total test suite, I construct a view from array using 15.8% of the possible choices, and then a view from these views with another 15.8% of the choices. (0.158^2 == 0.025 to within the accuracy of these statements.) But you probably want the two "levels" to be independent of each other.

This allows the tests to complete in a reasonable time. One can force all tests to run by setting the environment variable testfraction=1

nalimilan · 2015-03-16T08:54:40Z

It feels simpler and more robust to me to have an extensive test suite that would be run nightly and on demand, and keep an (hopefully complete enough) subset to run on every PR. With a random subset, one would never be sure anyway that the next build wouldn't trigger a failure. At last, with a deterministic testsuite, we can decide to have most typical use cases in the default subset, so that 99% of mistakes are caught before merging. And we'll still be able to run the full testsuite when a PR looks like it could break something subtle.

Finally, if some subset was known to be run only nightly, we could make it run very complete tests, like going over all possible combination of types from Base. If tests were chosen at random, we would still be inclined to avoid too time-consuming checks like that, by fear of diluting essential ones in an ocean of "just in case" tests.

timholy · 2015-03-16T12:45:16Z

The consensus seems clearly against random subsets, so I will change this as soon as I can.

StefanKarpinski · 2015-03-16T16:54:44Z

There's another option, which is to craft a subset of tests that is hard to "game" and designed so that it covers most of the corner cases. Of course, that's easier said than done, but it would be ideal.

ivarne · 2015-03-16T17:34:04Z

Hard to game

We don't anticipate people writing intentionally broken code that passes the first level of tests. The goal is rather that we should have a high probability to catch inadvertently introduced problems.

Fix multiple problems with views-of-views

timholy · 2015-03-22T12:25:27Z

OK, I went with JULIA_TESTFULL as a name for the environment variable.

amitmurthy · 2015-03-22T14:43:39Z

Thought I'll mention here that parallel tests also have a conditional execution of tests dependent on env variable PTEST_FULL.

https://github.com/JuliaLang/julia/blob/master/test/parallel.jl#L185)

tkelman · 2015-03-22T14:48:57Z

@amitmurthy I think most of us had missed that comment and were unaware of that, otherwise we would probably have more people running with PTEST_FULL set to check that... and it would've come up in this thread as precedent, since we may as well use the same environment variable to control both sets of comprehensive tests.

(Also, when linking to lines of code, please hit y so the link gets canonicalized with a commit sha, so the line number remains relevant even after code might get moved around in that file.)

amitmurthy · 2015-03-22T15:12:05Z

Its been introduced quite a ways back - probably the reason it has not been noticed. We can standardize on JULIA_TESTFULL. As written, the additional parallel tests launch and tear down upto 90 workers via ssh - which takes around 7-8 GB of additional RAM.

amitmurthy · 2015-03-22T15:15:34Z

I think that number should be reduced a bit if we want more people running the full parallel test.

timholy · 2015-03-22T21:18:28Z

Thanks for bringing this to our attention, @amitmurthy. I agree with all of your proposals.

mbauman mentioned this pull request Mar 15, 2015

RFC: Give AbstractArrays smart and performant indexing behaviors for free #10525

Merged

timholy referenced this pull request Mar 16, 2015

Test a random subset of view-of-view tests

109991d

This allows the tests to complete in a reasonable time. One can force all tests to run by setting the environment variable testfraction=1

timholy force-pushed the teh/fixsubarrays branch from 109991d to 9a86698 Compare March 20, 2015 09:05

By default, run a selected subset of SubArray tests

39f6dbe

timholy added a commit that referenced this pull request Mar 22, 2015

Merge pull request #10505 from JuliaLang/teh/fixsubarrays

e1f0310

Fix multiple problems with views-of-views

timholy merged commit e1f0310 into master Mar 22, 2015

timholy deleted the teh/fixsubarrays branch March 22, 2015 12:24

amitmurthy mentioned this pull request Mar 23, 2015

change conditional parallel tests to check for JULIA_TESTFULL #10605

Merged

timholy mentioned this pull request May 5, 2015

More comprehensive test coverage #9493

Closed

timholy mentioned this pull request Jun 20, 2015

For Loop Slowdown #11787

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix multiple problems with views-of-views #10505

Fix multiple problems with views-of-views #10505

timholy commented Mar 13, 2015

timholy commented Mar 13, 2015

tkelman commented Mar 14, 2015

timholy commented Mar 14, 2015

tkelman commented Mar 14, 2015

ViralBShah commented Mar 14, 2015

mbauman commented Mar 14, 2015

timholy commented Mar 15, 2015

tkelman commented Mar 15, 2015

timholy commented Mar 15, 2015

ViralBShah commented Mar 15, 2015

StefanKarpinski commented Mar 15, 2015

tknopp commented Mar 15, 2015

ivarne commented Mar 15, 2015

timholy commented Mar 16, 2015

nalimilan commented Mar 16, 2015

timholy commented Mar 16, 2015

StefanKarpinski commented Mar 16, 2015

ivarne commented Mar 16, 2015

timholy commented Mar 22, 2015

amitmurthy commented Mar 22, 2015

tkelman commented Mar 22, 2015

amitmurthy commented Mar 22, 2015

amitmurthy commented Mar 22, 2015

timholy commented Mar 22, 2015

Fix multiple problems with views-of-views #10505

Fix multiple problems with views-of-views #10505

Conversation

timholy commented Mar 13, 2015

timholy commented Mar 13, 2015

tkelman commented Mar 14, 2015

timholy commented Mar 14, 2015

tkelman commented Mar 14, 2015

ViralBShah commented Mar 14, 2015

mbauman commented Mar 14, 2015

timholy commented Mar 15, 2015

tkelman commented Mar 15, 2015

timholy commented Mar 15, 2015

ViralBShah commented Mar 15, 2015

StefanKarpinski commented Mar 15, 2015

tknopp commented Mar 15, 2015

ivarne commented Mar 15, 2015

timholy commented Mar 16, 2015

nalimilan commented Mar 16, 2015

timholy commented Mar 16, 2015

StefanKarpinski commented Mar 16, 2015

ivarne commented Mar 16, 2015

timholy commented Mar 22, 2015

amitmurthy commented Mar 22, 2015

tkelman commented Mar 22, 2015

amitmurthy commented Mar 22, 2015

amitmurthy commented Mar 22, 2015

timholy commented Mar 22, 2015