-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Give AbstractArrays smart and performant indexing behaviors for free #10525
Conversation
6fc9483
to
a7ea8e3
Compare
# Both index_shape and checksize compute sum(I); manually hoist it out | ||
N = sum(I) | ||
dest = similar(src, (N,)) | ||
size(dest) == (N,) || throw(DimensionMismatch()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this size check necessary? Haven't you just created dest of that size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really - similar
has. It could give us the wrong answer, and the check here should be very cheap compared to the allocations and many assignments (although I may need to hide it in a function; I haven't profiled yet). In general, my approach has been to not trust the output of similar
. Master effectively does the same thing for arrays (although similar
and the check are split across method boundaries): multidimensional.jl:219
I'm super-excited about this, and will try to review soon. I just have pretty limited time slots right now. |
This would be so awesome if it works out. |
sub2ind(dims::(Integer,Integer), i1::Integer) = i1 | ||
sub2ind(dims::(Integer,Integer), i1::Integer, i2::Integer) = i1+dims[1]*(i2-1) | ||
sub2ind(dims::(Integer,Integer), i1::Integer, i2::Integer, I::Integer...) = i1+dims[1]*(i2-1+sum(I)-length(I)) | ||
sub2ind(dims::(Integer,Integer,Integer...), i1::Integer, i2::Integer, I::Integer...) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need an @inline
here to ensure good performance for higher dimensions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for N>5. See #10337 (comment).
This is awesome. In broad terms, I agree this is how indexing should work. You also have a lot of clever insights into how to implement this---it was fun to read your code. You're basically building this on top of #8227? Perhaps we should just merge that and deal with the consequences later.
I'm not sure I agree with this. ReshapedArrays.jl, Interpolations.jl, and your own AxisArrays.jl (if you pass in a Finally, scalar indexing with non-reals is a genuine possibility, see the remarks re |
Thanks!
Yes, in spirit. I just needed some way to express I think your alternative proposal in #7799 (comment) might be more attractive, but that just punts the complexity up the chain to a system I don't know and only a few could implement. The method table would keep track of the methods compiled with and without bounds checks. When compiling without bounds checks, the compiler would simply elide everything within |
Hunh, that sounds a lot like multiple dispatch. |
Using multiple dispatch has some elegance, but it has problems of its own. The first problem is that it makes defining The second problem is that, at least as far as I can tell, there's still a lowering issue. It's not possible to create an |
Following up from #10507 (comment). @JeffBezanson, your input here would be helpful: how can we restrict dispatch to calling a particular method only for a specific number of varargs? I see a couple of options:
To me the last seems most attractive, followed by the first or third. |
17c649b
to
f837e7b
Compare
One more option worth considering: rather than a tuple, use a |
f837e7b
to
84143ea
Compare
Barring further suggestions, I'm going to (time permitting) start playing around with adding a new field to a getindex{T,N}(A::AbstractArray{T,N}, indexes...N) |
That would be awesome! |
d6802ad
to
dea9350
Compare
Minor oversight from #10525, this restores the previous behavior where indexing a SubArray by, e.g., [1 2; 3 4], returns an array of the same size with the given linear indices.
fix #11187 (pass struct and tuple objects by stack pointer) fix #11450 (ccall emission was frobbing the stack) likely may fix #11026 and may fix #11003 (ref #10525) invalid stack-read on 32-bit this additionally changes the julia specSig calling convention to pass non-primitive types by pointer instead of by-value this additionally fixes a bug in gen_cfunction that could be exposed by turning off specSig this additionally moves the alloca calls in ccall (and other places) to the entry BasicBlock in the function, ensuring that llvm detects them as static allocations and moves them into the function prologue this additionally fixes some undefined behavior from changing a variable's size through a alloca-cast instead of zext/sext/trunc this additionally prepares for turning back on allocating tuples as vectors, since the gc now guarantees 16-byte alignment future work this makes possible: - create a function to replace the jlallocobj_func+init_bits_value call pair (to reduce codegen pressure) - allow moving pointers sometimes rather than always copying immutable data - teach the GC how it can re-use an existing pointer as a box
fix #11187 (pass struct and tuple objects by stack pointer) fix #11450 (ccall emission was frobbing the stack) likely may fix #11026 and may fix #11003 (ref #10525) invalid stack-read on 32-bit this additionally changes the julia specSig calling convention to pass non-primitive types by pointer instead of by-value this additionally fixes a bug in gen_cfunction that could be exposed by turning off specSig this additionally moves the alloca calls in ccall (and other places) to the entry BasicBlock in the function, ensuring that llvm detects them as static allocations and moves them into the function prologue this additionally fixes some undefined behavior from changing a variable's size through a alloca-cast instead of zext/sext/trunc this additionally prepares for turning back on allocating tuples as vectors, since the gc now guarantees 16-byte alignment future work this makes possible: - create a function to replace the jlallocobj_func+init_bits_value call pair (to reduce codegen pressure) - allow moving pointers sometimes rather than always copying immutable data - teach the GC how it can re-use an existing pointer as a box
fix #11187 (pass struct and tuple objects by stack pointer) fix #11450 (ccall emission was frobbing the stack) likely may fix #11026 and may fix #11003 (ref #10525) invalid stack-read on 32-bit this additionally changes the julia specSig calling convention to pass non-primitive types by pointer instead of by-value this additionally fixes a bug in gen_cfunction that could be exposed by turning off specSig this additionally moves the alloca calls in ccall (and other places) to the entry BasicBlock in the function, ensuring that llvm detects them as static allocations and moves them into the function prologue this additionally fixes some undefined behavior from changing a variable's size through a alloca-cast instead of zext/sext/trunc this additionally prepares for turning back on allocating tuples as vectors, since the gc now guarantees 16-byte alignment future work this makes possible: - create a function to replace the jlallocobj_func+init_bits_value call pair (to reduce codegen pressure) - allow moving pointers sometimes rather than always copying immutable data - teach the GC how it can re-use an existing pointer as a box
fix #11187 (pass struct and tuple objects by stack pointer) fix #11450 (ccall emission was frobbing the stack) likely may fix #11026 and may fix #11003 (ref #10525) invalid stack-read on 32-bit this additionally changes the julia specSig calling convention to pass non-primitive types by pointer instead of by-value this additionally fixes a bug in gen_cfunction that could be exposed by turning off specSig this additionally moves the alloca calls in ccall (and other places) to the entry BasicBlock in the function, ensuring that llvm detects them as static allocations and moves them into the function prologue this additionally fixes some undefined behavior from changing a variable's size through a alloca-cast instead of zext/sext/trunc this additionally prepares for turning back on allocating tuples as vectors, since the gc now guarantees 16-byte alignment future work this makes possible: - create a function to replace the jlallocobj_func+init_bits_value call pair (to reduce codegen pressure) - allow moving pointers sometimes rather than always copying immutable data - teach the GC how it can re-use an existing pointer as a box
fix JuliaLang#11187 (pass struct and tuple objects by stack pointer) fix JuliaLang#11450 (ccall emission was frobbing the stack) likely may fix JuliaLang#11026 and may fix JuliaLang#11003 (ref JuliaLang#10525) invalid stack-read on 32-bit this additionally changes the julia specSig calling convention to pass non-primitive types by pointer instead of by-value this additionally fixes a bug in gen_cfunction that could be exposed by turning off specSig this additionally moves the alloca calls in ccall (and other places) to the entry BasicBlock in the function, ensuring that llvm detects them as static allocations and moves them into the function prologue this additionally fixes some undefined behavior from changing a variable's size through a alloca-cast instead of zext/sext/trunc this additionally prepares for turning back on allocating tuples as vectors, since the gc now guarantees 16-byte alignment future work this makes possible: - create a function to replace the jlallocobj_func+init_bits_value call pair (to reduce codegen pressure) - allow moving pointers sometimes rather than always copying immutable data - teach the GC how it can re-use an existing pointer as a box
fix JuliaLang#11187 (pass struct and tuple objects by stack pointer) fix JuliaLang#11450 (ccall emission was frobbing the stack) likely may fix JuliaLang#11026 and may fix JuliaLang#11003 (ref JuliaLang#10525) invalid stack-read on 32-bit this additionally changes the julia specSig calling convention to pass non-primitive types by pointer instead of by-value this additionally fixes a bug in gen_cfunction that could be exposed by turning off specSig this additionally moves the alloca calls in ccall (and other places) to the entry BasicBlock in the function, ensuring that llvm detects them as static allocations and moves them into the function prologue this additionally fixes some undefined behavior from changing a variable's size through a alloca-cast instead of zext/sext/trunc this additionally prepares for turning back on allocating tuples as vectors, since the gc now guarantees 16-byte alignment future work this makes possible: - create a function to replace the jlallocobj_func+init_bits_value call pair (to reduce codegen pressure) - allow moving pointers sometimes rather than always copying immutable data - teach the GC how it can re-use an existing pointer as a box
Thanks to JuliaLang/julia#10525 we no longer need these :)
Back in #10525, I decided to deprecate `setindex!(A, x, I::Union{Real, Int, AbstractArray}...)` for symmetry since `getindex` only allows vector indices when there's more than one index. But looking forward, I would really like to work towards APL semantics in 0.5 wherein the sum of the dimensionality of the indices is the dimensionality of the output array. For example, indexing `A[[1 2; 3 4], 1]` would output a 2-dimensional `2x2` array: `[A[1,1] A[2, 1]; A[3,1] A[4,1]]`. In which case, we'd add support back in for `setindex!` with array indices for symmetry. This seems like needless churn - let's just leave things be until 0.5.
Also use #10525 for indexing operations
This is still a work in progress, but I'd like to get some feedback on the architecture and design here before applying this same sort of scheme tosetindex!
.The basic idea is that the only getindex method defined in base for abstract arrays is
getindex(::AbstractArray, I...)
. And the only methods that an AbstractArray subtype must define aresize
and just onegetindex
method:Unfortunately, it is currently impossible to express the latter method for arbitrary dimensionalities, but in practice it's not a big issue: most LinearSlow() arrays have a fixed dimension.
This is achieved through dispatch on an internal
_getindex
method, which recomputes the indices such that it can call the canonicalgetindex
method that the user must define. If the user has not defined their canonical method, it will fall back to an error method in_getindex
. I use similar scheme forunsafe_getindex
, with the exception that we can fallback to the safe version if the subtype hasn't defined the canonical unsafe method. This enables fast vector indexing by checking bounds of the index vectors instead of on each element. And once@inbounds
is extensible, AbstractArrays will be able to support it by default.The difficulty with all this redirection is that an extra function call can wreck indexing performance, and it can be hard to avoid.
I've had particular difficulty getting good performance with(Fixed with a more sensible inlining strategy)CartesianIndexes
, and I still lag in performance there by 20x for big arrays. I think call site inline annotations would be a magic bullet, but there may be other tricks we can use, too. I've not looked into this very carefully yet, though.TL/DR:
In my cursory performance tests hacked onto Tim's indexing perf suite from his reshape work (more tests are needed), I'm close to matching or outperforming master with
Array
with only these definitions:(Of course, in places where we're not quite able to close the gap we can always reinstate the specialized methods. This is just a very useful stress-test of both functionality and performance.)
cc: @timholy