APL indexing #15431

mbauman · 2016-03-10T01:00:21Z

The dimensionality of indexing A[I_1, I_2, …, I_N] is now sum(map(ndims, (I_1, I_2, …, I_N))).

This now brings SubArrays to full feature parity with builtin arrays with respect to indexing semantics.

mbauman · 2016-03-10T02:07:45Z

Looks like we'll need to be more judicious in what minimal subset of indices we use in the non-full subarray tests… they're timing out the builds.

Edit: the abridged tests are about 4x longer. The unabridged ones are still running.

tkelman · 2016-03-10T09:58:47Z

doc/manual/arrays.rst

+``A[I_1[i_1], I_2[i_2], ..., I_n[i_n]]``. If ``I_1`` is changed to a
+two-dimensional matrix, then ``X`` becomes an ``n+1``-dimensional array of
+shape ``(size(I_1, 1), size(I_1, 2), length(I_2), ..., length(I_n))``. The
+matrix adds a dimension. The location ``(i_1, i_2, i_3, ..., i_n+1)`` contains


does the last index need to be i_{n+1} ?

Sure, that's clearer.

mschauer · 2016-03-10T13:52:14Z

In the assignment doc it says "If X is an array, it must have the same number of elements as the product of the lengths of the indices". Isn't this a bit misleading, because setindex_shape_check base/operators.jl#L351 demands that both shapes are the same after removing all singleton dimensions.

mbauman · 2016-03-10T14:25:39Z

That's not quite it, either. It allows, for example A[:,:] = 1:length(A). The previous wording is too strict, whereas mine is too lax, but neither accurately describes the behavior. Suggestions are welcome.

mbauman · 2016-03-10T14:43:34Z

Actually, we may want to loosen this up in the context of this PR:

julia> A = rand(3,4); B = rand(3,4)
       A[[1 2; 3 1], 1] = B[[1 2; 3 1], 1]
ERROR: DimensionMismatch("tried to assign 2x2 array to 4x1 destination")

timholy · 2016-03-10T15:52:40Z

Really nice, @mbauman!

The subarray tests are already so long, any increase is distressing. These tests seem to have gotten longer than they used to be, too (I'm seeing ~800s now, I think they used to be 300s), though perhaps that's a consequence of the move to LLVM 3.7?

nalimilan · 2016-03-10T15:57:48Z

@timholy Maybe #15246?

timholy · 2016-03-10T16:32:31Z

Hmm, profiling suggests that this is unlikely, though a little more analysis might be justified. Most of the increase seems due to the addition of bounds checks which triggers collection of a lot of backtraces. Rather than using @test_throws, perhaps we should implement a checkbounds(Bool, A, indexes) and just test the boolean return value?

timholy · 2016-03-10T20:35:11Z

Now that I think about it, I don't like that solution. Is there a way we could suspend backtrace collection from an error, and just test whether it throws the error?

mbauman · 2016-03-10T23:11:04Z

The bigger issue is that codegen time is exponential in the number of index types. For each new type that we test (in indexN* or oind), we double the test time. I think that by combining a few (e.g., by using the SubArray index to also test multidimensional behaviors), we can keep the coverage we need and keep the test time down. Let's see if this latest commit passes tests.

timholy · 2016-03-11T02:04:27Z

That's true, but it's also true that on my machine approximately half of test/subarray.jl's runtime is spent collecting backtraces.

timholy · 2016-03-11T02:04:53Z

Ah, but the "spot checks" should resolve that!

tkelman · 2016-04-21T20:18:23Z

@mbauman does this need anything other than a rebase? would anyone be able to pick this up if Matt's too busy?

mbauman · 2016-04-21T21:05:44Z

The major roadblock here is the time it takes to run the subarray tests. I'd like to test with 3 new index types: 0-, 2-, and 3-dimensional arrays. This results in an 8x increase in compilation time during testing. ReshapedArrays may add yet more types to the fray. I actually never managed to complete a JULIA_TESTFULL run on my wimpy laptop (IIRC it began swapping due to all the compiled methods, but even with unlimited RAM it'd probably take ~10-20 hours).

So we may need to pare down the tests a little further, but I don't see how without losing much-needed coverage. I can rebase and trigger a new travis run, but I'm not sure when I'll be able to dig into the test time issue.

tkelman · 2016-04-21T21:13:50Z

The compiler has been getting faster, so it's worth another try.

mbauman · 2016-04-22T01:55:39Z

Hey - this is looking great. Here are the CI logs for the subarray test:

Most recent master:

Win 32:   1221.47 seconds, maxrss  741.64 MB
Win 64:   1271.07 seconds, maxrss 1261.11 MB
Linux 32:  728.04 seconds, maxrss  962.89 MB
Linux 64:  505.72 seconds, maxrss 1602.45 MB

This PR:

Win 32:    829.14 seconds, maxrss  799.12 MB
Win 64:    934.77 seconds, maxrss 1212.62 MB (parallel test failed?)
Linux 32:  634.22 seconds, maxrss 1388.58 MB
Linux 64:  625.93 seconds, maxrss 1834.32 MB

tkelman · 2016-04-22T02:45:48Z

base/multidimensional.jl

+@inline replace_colons(V, indexes, I) = replace_colons_dim(V, indexes, 1, I)
+@inline replace_colons_dim(V, indexes, dim, I::Tuple{}) = ()
+@inline replace_colons_dim(V, indexes, dim, I::Tuple{Colon}) = (1:trailingsize(indexes[1], dim)*prod(index_lengths_dim(V.parent, length(V.indexes)-length(indexes)+2, tail(indexes)...)),)
+@inline replace_colons_dim(V, indexes, dim, I::Tuple{Colon, Vararg{Any}}) = (1:size(indexes[1], dim), replace_colons_dim(V, indexes, dim+1, tail(I))...)


wrap some of the long lines here?

Done. I'm looking forward to seeing this section deleted entirely — linear indexing is really just implicitly indexing into a reshaped view of the array, so now that we have reshaped arrays, we can do just that.

timholy · 2016-04-22T12:49:36Z

#15999 should help, too.

mbauman · 2016-04-23T15:51:23Z

Alright, this is good to go! With #15999, all CI systems are under 500 seconds. Looks like Linux32 timed out trying to compile BLAS or something… not sure what happened there.

timholy · 2016-04-23T19:31:27Z

base/multidimensional.jl

-@inline index_shape_dim(A, dim, i::AbstractVector, I...) = (length(i), index_shape_dim(A, dim+1, I...)...)
-@inline index_shape_dim{N}(A, dim, i::AbstractVector{CartesianIndex{N}}, I...) = (length(i), index_shape_dim(A, dim+N, I...)...)
+@inline index_shape_dim(A, dim, i::AbstractArray, I...) = (size(i)..., index_shape_dim(A, dim+1, I...)...)
+@inline index_shape_dim(A, dim, i::AbstractArray{Bool}, I...) = (sum(i), index_shape_dim(A, dim+1, I...)...)


If I'm reading this right, does this mean all AbstractArray{Bool} act like they are vectors, for the purpose of determining output dimensionality?

That's correct. They can't really do anything else.

One potentially crazy interpretation would be that they should remove dimensions from the parent:

A = rand(1:10, 3,4,5) A[bitrand(3,4), 1] # would select a vector of random elements from the first page of `A`

That's a separate "feature" though, I think.

I agree that's pretty much the only thing they can do. I'd simply suggest that we document this, it's slightly at odds with the "sum of the dimensions of the indexes".

One additional option is to support AbstractArray{Bool,N} if it's the only index, and otherwise require each dimension to be an AbstractVector{Bool}.

Yeah, that's probably a reasonable restriction. As it stands, I have a hard time imagining a real use-case for indexing with a multidimensional logical array that's somewhere between 1- and N- indices unless we allow it to span multiple dimensions. One of the nice things about this APL indexing patch, though, is how it generalizes the first-index behavior and removes special cases like that.

On the other hand, I must say that I really value how restrictive Julia currently is with regards to logical indexing. I frequently hit Matlab bugs where vectorized comparisons of struct arrays (mask = [s.a] == x) will silently drop empty elements, and then it'll silently allow me to index a larger vector with it (e.g., [s(mask).b]). So artificial restrictions here can be very good.

tkelman · 2016-04-28T15:27:16Z

Sorry, I had been messing with the travis cache last time you pushed here. Restarted, should be fine now. Assuming it passes, is this okay to merge, are we ready here?

mbauman · 2016-04-28T15:37:15Z

Let's disallow indexing by multidimensional BitArrays unless they're the only index and are the same dimension as the parent array. I think that can just be done as a bounds check to avoid contortions on dispatch.

mbauman · 2016-04-29T00:01:15Z

Ok, assuming CI passes this is good to go.

tkelman · 2016-05-01T10:24:38Z

d'oh, needs another rebase. @timholy do you have any final comments here?

timholy · 2016-05-01T13:39:54Z

Just one: fantastic work, as usual.

* Just do spot-checks on throwing bounds errors * Be more judicious in the index types tested * Run the subarray test first

a single index that matches the size of the array it indexes into

tkelman · 2016-05-01T16:09:41Z

test/choosetests.jl

@@ -15,11 +15,11 @@ Upon return, `tests` is a vector of fully-expanded test names, and
 """ ->
 function choosetests(choices = [])
    testnames = [
-        "linalg", "core", "inference", "keywordargs", "numbers", "printf",
-        "char", "string", "triplequote", "unicode",
+        "linalg", "subarray", "core", "inference", "keywordargs", "numbers",


yay compiler getting faster

quinnj · 2016-05-01T18:20:00Z

For the "array peasantry" among us, can we get a quick before and after example of the change here?

mbauman · 2016-05-01T19:18:06Z

Sure! This generalizes a behavior that we already have:

julia> (0.1:0.1:1.0)[[1 2; 3 4]]
2×2 Array{Float64,2}:
 0.1  0.2
 0.3  0.4

This is indexing into a vector with a two-dimensional array, and this works on Julia 0.4. It uses the shape of the index array to determine where to put the elements of the vector. It's effectively reshape(A[vec(I)], size(I)). This PR extends this behavior to indexing with multiple indices.

I find this sort of operation extremely useful when you're extracting windowed sections of data. Take a look at this:

julia> A = 0.1:0.1:9.9
       A[find(round(A) .== A)' .+ (-2:2)]
5x9 Array{Float64,2}:
 0.8  1.8  2.8  3.8  4.8  5.8  6.8  7.8  8.8
 0.9  1.9  2.9  3.9  4.9  5.9  6.9  7.9  8.9
 1.0  2.0  3.0  4.0  5.0  6.0  7.0  8.0  9.0
 1.1  2.1  3.1  4.1  5.1  6.1  7.1  8.1  9.1
 1.2  2.2  3.2  4.2  5.2  6.2  7.2  8.2  9.2

Here, we're extracting five indices about the whole numbers in the array — each column here represents a window about some repeated event. This also works on Julia 0.4.

But what happens if, instead of a vector, we have a matrix? This gets harder to display nicely in short meaningful chunks, but let's try this:

julia> A = [sin(-10:.1:10) cos(-10:.1:10)]
201×2 Array{Float64,2}:
  0.544021  -0.839072
  0.457536  -0.889191
  ⋮
 -0.457536  -0.889191
 -0.544021  -0.839072

julia> zero_crossings = find(diff(A[:,1] .>= 0))'
1×7 Array{Int64,2}:
 6  38  69  100  132  163  195

julia> A[zero_crossings .+ (-2:2), :]
5×7×2 Array{Float64,3}:
[:, :, 1] =
  0.271761   -0.21512     0.255541   -0.29552     0.239249   -0.279415    0.22289
  0.174327   -0.116549    0.157746   -0.198669    0.14112    -0.182163    0.124454
  0.0751511  -0.0168139   0.0583741  -0.0998334   0.0415807  -0.0830894   0.0247754
 -0.0247754   0.0830894  -0.0415807   0.0        -0.0583741   0.0168139  -0.0751511
 -0.124454    0.182163   -0.14112     0.0998334  -0.157746    0.116549   -0.174327

[:, :, 2] =
 -0.962365  0.976588  -0.966798  0.955336  -0.970958  0.96017   -0.974844
 -0.984688  0.993185  -0.98748   0.980067  -0.989992  0.983268  -0.992225
 -0.997172  0.999859  -0.998295  0.995004  -0.999135  0.996542  -0.999693
 -0.999693  0.996542  -0.999135  1.0       -0.998295  0.999859  -0.997172
 -0.992225  0.983268  -0.989992  0.995004  -0.98748   0.993185  -0.984688

Look at that! Now you have a snapshot into the coupled behavior between sin and cos in these two pages. You can see that the first page represents sin's zero-crossing behavior, and the second page is the coupled behavior of cos — it's near negative one when sin is decreasing and positive one when sin increases. We found the indices at which sin crossed zero, and then created a five-index window about those indices. By indexing into A with this two-dimensional matrix and selecting both columns with a :, we've added a dimension. These dimensions are meaningful: 5 element windows, 7 zero crossings, and 2 functions. You could just as easily select just one of the two columns, and it'd work just the same. This is the swanky new behavior.

For a somewhat less contrived example, take a look at the example on AxisArrays.jl's readme — that's where I initially implemented this sort of indexing. It's a little more complicated there since I use intervals to describe my windows in terms of the units on the axes, but the concept is exactly the same and those intervals just get reduced down to ranges (actually RangeMatrices) on their way to being used as an index.

In my line of research, I'm constantly taking windowed repetitions of multidimensional arrays like this. We often want to look at the behavior of some data signal at some event or time to create a windowed average (e.g., spike-triggered average PDF warning) or peri-event histogram (e.g., PSTH another PDF).

mbauman added this to the 0.5.0 milestone Mar 10, 2016

mbauman force-pushed the mb/apl branch from 7010952 to 9f1897c Compare March 10, 2016 01:05

tkelman reviewed Mar 10, 2016
View reviewed changes

mbauman mentioned this pull request Mar 25, 2016

ReshapedArrays, take2 #15449

Merged

mbauman force-pushed the mb/apl branch 2 times, most recently from 15dd1a3 to 0e663bc Compare April 21, 2016 21:23

tkelman reviewed Apr 22, 2016
View reviewed changes

mbauman force-pushed the mb/apl branch 2 times, most recently from d68315c to dea7ec8 Compare April 23, 2016 12:52

timholy reviewed Apr 23, 2016
View reviewed changes

StefanKarpinski assigned mbauman Apr 28, 2016

mbauman force-pushed the mb/apl branch from dea7ec8 to 6e508b9 Compare April 28, 2016 23:33

mbauman added 4 commits May 1, 2016 11:31

APL indexing

242b67c

Speed up SubArray tests

7c97596

* Just do spot-checks on throwing bounds errors * Be more judicious in the index types tested * Run the subarray test first

Only allow logical indexing with vectors or...

ddedbdb

a single index that matches the size of the array it indexes into

Add abstractarray tests for APL indexing

393fd72

mbauman force-pushed the mb/apl branch from f492460 to 393fd72 Compare May 1, 2016 15:34

tkelman reviewed May 1, 2016
View reviewed changes

tkelman merged commit 89775a5 into master May 1, 2016

tkelman deleted the mb/apl branch May 1, 2016 18:02

tkelman mentioned this pull request May 5, 2016

Also skip linalg/triangular on osx travis #16210

Merged

mschauer mentioned this pull request May 19, 2016

Add the corresponding iterator trait to size(::Number) = () #16444

Merged

mbauman mentioned this pull request Jul 8, 2016

Fix setindex! with SubDArray source JuliaParallel/DistributedArrays.jl#74

Closed

mbauman mentioned this pull request Aug 29, 2016

BoundsError when indexing an array with 2d bit array #18271

Closed

mbauman mentioned this pull request Jan 10, 2023

Possibly undocumented behavior of indexing assignment to arrays #40018

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

APL indexing #15431

APL indexing #15431

mbauman commented Mar 10, 2016

mbauman commented Mar 10, 2016

tkelman Mar 10, 2016

mbauman Mar 10, 2016

mschauer commented Mar 10, 2016

mbauman commented Mar 10, 2016

mbauman commented Mar 10, 2016

timholy commented Mar 10, 2016

nalimilan commented Mar 10, 2016

timholy commented Mar 10, 2016

timholy commented Mar 10, 2016

mbauman commented Mar 10, 2016

timholy commented Mar 11, 2016

timholy commented Mar 11, 2016

tkelman commented Apr 21, 2016

mbauman commented Apr 21, 2016

tkelman commented Apr 21, 2016

mbauman commented Apr 22, 2016 •

edited

Loading

tkelman Apr 22, 2016

mbauman Apr 23, 2016

timholy commented Apr 22, 2016

mbauman commented Apr 23, 2016

timholy Apr 23, 2016

mbauman Apr 23, 2016

mbauman Apr 23, 2016

timholy Apr 23, 2016

mbauman Apr 23, 2016 •

edited

Loading

tkelman commented Apr 28, 2016

mbauman commented Apr 28, 2016

mbauman commented Apr 29, 2016

tkelman commented May 1, 2016

timholy commented May 1, 2016

tkelman May 1, 2016

quinnj commented May 1, 2016

mbauman commented May 1, 2016 •

edited

Loading

APL indexing #15431

APL indexing #15431

Conversation

mbauman commented Mar 10, 2016

mbauman commented Mar 10, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mschauer commented Mar 10, 2016

mbauman commented Mar 10, 2016

mbauman commented Mar 10, 2016

timholy commented Mar 10, 2016

nalimilan commented Mar 10, 2016

timholy commented Mar 10, 2016

timholy commented Mar 10, 2016

mbauman commented Mar 10, 2016

timholy commented Mar 11, 2016

timholy commented Mar 11, 2016

tkelman commented Apr 21, 2016

mbauman commented Apr 21, 2016

tkelman commented Apr 21, 2016

mbauman commented Apr 22, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timholy commented Apr 22, 2016

mbauman commented Apr 23, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbauman Apr 23, 2016 • edited Loading

Choose a reason for hiding this comment

tkelman commented Apr 28, 2016

mbauman commented Apr 28, 2016

mbauman commented Apr 29, 2016

tkelman commented May 1, 2016

timholy commented May 1, 2016

Choose a reason for hiding this comment

quinnj commented May 1, 2016

mbauman commented May 1, 2016 • edited Loading

mbauman commented Apr 22, 2016 •

edited

Loading

mbauman Apr 23, 2016 •

edited

Loading

mbauman commented May 1, 2016 •

edited

Loading