-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
APL indexing #15431
APL indexing #15431
Conversation
Looks like we'll need to be more judicious in what minimal subset of indices we use in the non-full subarray tests… they're timing out the builds. Edit: the abridged tests are about 4x longer. The unabridged ones are still running. |
``A[I_1[i_1], I_2[i_2], ..., I_n[i_n]]``. If ``I_1`` is changed to a | ||
two-dimensional matrix, then ``X`` becomes an ``n+1``-dimensional array of | ||
shape ``(size(I_1, 1), size(I_1, 2), length(I_2), ..., length(I_n))``. The | ||
matrix adds a dimension. The location ``(i_1, i_2, i_3, ..., i_n+1)`` contains |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the last index need to be i_{n+1}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that's clearer.
In the assignment doc it says "If X is an array, it must have the same number of elements as the product of the lengths of the indices". Isn't this a bit misleading, because setindex_shape_check base/operators.jl#L351 demands that both shapes are the same after removing all singleton dimensions. |
That's not quite it, either. It allows, for example |
Actually, we may want to loosen this up in the context of this PR: julia> A = rand(3,4); B = rand(3,4)
A[[1 2; 3 1], 1] = B[[1 2; 3 1], 1]
ERROR: DimensionMismatch("tried to assign 2x2 array to 4x1 destination") |
Really nice, @mbauman! The subarray tests are already so long, any increase is distressing. These tests seem to have gotten longer than they used to be, too (I'm seeing ~800s now, I think they used to be 300s), though perhaps that's a consequence of the move to LLVM 3.7? |
Hmm, profiling suggests that this is unlikely, though a little more analysis might be justified. Most of the increase seems due to the addition of bounds checks which triggers collection of a lot of backtraces. Rather than using |
Now that I think about it, I don't like that solution. Is there a way we could suspend backtrace collection from an error, and just test whether it throws the error? |
The bigger issue is that codegen time is exponential in the number of index types. For each new type that we test (in |
That's true, but it's also true that on my machine approximately half of |
Ah, but the "spot checks" should resolve that! |
@mbauman does this need anything other than a rebase? would anyone be able to pick this up if Matt's too busy? |
The major roadblock here is the time it takes to run the subarray tests. I'd like to test with 3 new index types: 0-, 2-, and 3-dimensional arrays. This results in an 8x increase in compilation time during testing. ReshapedArrays may add yet more types to the fray. I actually never managed to complete a JULIA_TESTFULL run on my wimpy laptop (IIRC it began swapping due to all the compiled methods, but even with unlimited RAM it'd probably take ~10-20 hours). So we may need to pare down the tests a little further, but I don't see how without losing much-needed coverage. I can rebase and trigger a new travis run, but I'm not sure when I'll be able to dig into the test time issue. |
The compiler has been getting faster, so it's worth another try. |
15dd1a3
to
0e663bc
Compare
Hey - this is looking great. Here are the CI logs for the subarray test: Most recent master:
This PR:
|
@inline replace_colons(V, indexes, I) = replace_colons_dim(V, indexes, 1, I) | ||
@inline replace_colons_dim(V, indexes, dim, I::Tuple{}) = () | ||
@inline replace_colons_dim(V, indexes, dim, I::Tuple{Colon}) = (1:trailingsize(indexes[1], dim)*prod(index_lengths_dim(V.parent, length(V.indexes)-length(indexes)+2, tail(indexes)...)),) | ||
@inline replace_colons_dim(V, indexes, dim, I::Tuple{Colon, Vararg{Any}}) = (1:size(indexes[1], dim), replace_colons_dim(V, indexes, dim+1, tail(I))...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrap some of the long lines here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I'm looking forward to seeing this section deleted entirely — linear indexing is really just implicitly indexing into a reshaped view of the array, so now that we have reshaped arrays, we can do just that.
#15999 should help, too. |
d68315c
to
dea7ec8
Compare
Alright, this is good to go! With #15999, all CI systems are under 500 seconds. Looks like Linux32 timed out trying to compile BLAS or something… not sure what happened there. |
@inline index_shape_dim(A, dim, i::AbstractVector, I...) = (length(i), index_shape_dim(A, dim+1, I...)...) | ||
@inline index_shape_dim{N}(A, dim, i::AbstractVector{CartesianIndex{N}}, I...) = (length(i), index_shape_dim(A, dim+N, I...)...) | ||
@inline index_shape_dim(A, dim, i::AbstractArray, I...) = (size(i)..., index_shape_dim(A, dim+1, I...)...) | ||
@inline index_shape_dim(A, dim, i::AbstractArray{Bool}, I...) = (sum(i), index_shape_dim(A, dim+1, I...)...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading this right, does this mean all AbstractArray{Bool}
act like they are vectors, for the purpose of determining output dimensionality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. They can't really do anything else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One potentially crazy interpretation would be that they should remove dimensions from the parent:
A = rand(1:10, 3,4,5)
A[bitrand(3,4), 1] # would select a vector of random elements from the first page of `A`
That's a separate "feature" though, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that's pretty much the only thing they can do. I'd simply suggest that we document this, it's slightly at odds with the "sum of the dimensions of the indexes".
One additional option is to support AbstractArray{Bool,N}
if it's the only index, and otherwise require each dimension to be an AbstractVector{Bool}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's probably a reasonable restriction. As it stands, I have a hard time imagining a real use-case for indexing with a multidimensional logical array that's somewhere between 1- and N- indices unless we allow it to span multiple dimensions. One of the nice things about this APL indexing patch, though, is how it generalizes the first-index behavior and removes special cases like that.
On the other hand, I must say that I really value how restrictive Julia currently is with regards to logical indexing. I frequently hit Matlab bugs where vectorized comparisons of struct arrays (mask = [s.a] == x
) will silently drop empty elements, and then it'll silently allow me to index a larger vector with it (e.g., [s(mask).b]
). So artificial restrictions here can be very good.
Sorry, I had been messing with the travis cache last time you pushed here. Restarted, should be fine now. Assuming it passes, is this okay to merge, are we ready here? |
Let's disallow indexing by multidimensional BitArrays unless they're the only index and are the same dimension as the parent array. I think that can just be done as a bounds check to avoid contortions on dispatch. |
Ok, assuming CI passes this is good to go. |
d'oh, needs another rebase. @timholy do you have any final comments here? |
Just one: fantastic work, as usual. |
* Just do spot-checks on throwing bounds errors * Be more judicious in the index types tested * Run the subarray test first
a single index that matches the size of the array it indexes into
@@ -15,11 +15,11 @@ Upon return, `tests` is a vector of fully-expanded test names, and | |||
""" -> | |||
function choosetests(choices = []) | |||
testnames = [ | |||
"linalg", "core", "inference", "keywordargs", "numbers", "printf", | |||
"char", "string", "triplequote", "unicode", | |||
"linalg", "subarray", "core", "inference", "keywordargs", "numbers", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yay compiler getting faster
For the "array peasantry" among us, can we get a quick before and after example of the change here? |
Sure! This generalizes a behavior that we already have: julia> (0.1:0.1:1.0)[[1 2; 3 4]]
2×2 Array{Float64,2}:
0.1 0.2
0.3 0.4 This is indexing into a vector with a two-dimensional array, and this works on Julia 0.4. It uses the shape of the index array to determine where to put the elements of the vector. It's effectively I find this sort of operation extremely useful when you're extracting windowed sections of data. Take a look at this: julia> A = 0.1:0.1:9.9
A[find(round(A) .== A)' .+ (-2:2)]
5x9 Array{Float64,2}:
0.8 1.8 2.8 3.8 4.8 5.8 6.8 7.8 8.8
0.9 1.9 2.9 3.9 4.9 5.9 6.9 7.9 8.9
1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
1.1 2.1 3.1 4.1 5.1 6.1 7.1 8.1 9.1
1.2 2.2 3.2 4.2 5.2 6.2 7.2 8.2 9.2 Here, we're extracting five indices about the whole numbers in the array — each column here represents a window about some repeated event. This also works on Julia 0.4. But what happens if, instead of a vector, we have a matrix? This gets harder to display nicely in short meaningful chunks, but let's try this: julia> A = [sin(-10:.1:10) cos(-10:.1:10)]
201×2 Array{Float64,2}:
0.544021 -0.839072
0.457536 -0.889191
⋮
-0.457536 -0.889191
-0.544021 -0.839072
julia> zero_crossings = find(diff(A[:,1] .>= 0))'
1×7 Array{Int64,2}:
6 38 69 100 132 163 195
julia> A[zero_crossings .+ (-2:2), :]
5×7×2 Array{Float64,3}:
[:, :, 1] =
0.271761 -0.21512 0.255541 -0.29552 0.239249 -0.279415 0.22289
0.174327 -0.116549 0.157746 -0.198669 0.14112 -0.182163 0.124454
0.0751511 -0.0168139 0.0583741 -0.0998334 0.0415807 -0.0830894 0.0247754
-0.0247754 0.0830894 -0.0415807 0.0 -0.0583741 0.0168139 -0.0751511
-0.124454 0.182163 -0.14112 0.0998334 -0.157746 0.116549 -0.174327
[:, :, 2] =
-0.962365 0.976588 -0.966798 0.955336 -0.970958 0.96017 -0.974844
-0.984688 0.993185 -0.98748 0.980067 -0.989992 0.983268 -0.992225
-0.997172 0.999859 -0.998295 0.995004 -0.999135 0.996542 -0.999693
-0.999693 0.996542 -0.999135 1.0 -0.998295 0.999859 -0.997172
-0.992225 0.983268 -0.989992 0.995004 -0.98748 0.993185 -0.984688 Look at that! Now you have a snapshot into the coupled behavior between For a somewhat less contrived example, take a look at the example on AxisArrays.jl's readme — that's where I initially implemented this sort of indexing. It's a little more complicated there since I use intervals to describe my windows in terms of the units on the axes, but the concept is exactly the same and those intervals just get reduced down to ranges (actually RangeMatrices) on their way to being used as an index. In my line of research, I'm constantly taking windowed repetitions of multidimensional arrays like this. We often want to look at the behavior of some data signal at some event or time to create a windowed average (e.g., spike-triggered average PDF warning) or peri-event histogram (e.g., PSTH another PDF). |
The dimensionality of indexing
A[I_1, I_2, …, I_N]
is nowsum(map(ndims, (I_1, I_2, …, I_N)))
.This now brings SubArrays to full feature parity with builtin arrays with respect to indexing semantics.