Efficient cartesian iteration (new version of #6437) #8432

timholy · 2014-09-21T01:02:56Z

This is the complement of "adding array views": while it seems not to be widely appreciated, the reality is that none of the View proposals (mine included) have good performance in the general case if we rely on linear indexing (demos below, see for example also a recent StackOverflow post). This is basically a direct consequence of the fact that div is horribly slow. Consequently, we need to transition to writing algorithms with multidimensional/cartesian iterators. While the Cartesian macros provide a way to do this, I think it's fair to say that everyone would like a solution that doesn't rely on macros. While this PR offers only a small fraction of the power of Cartesian, at least simple cases can now get good performance without resorting to macros.

This is basically a redo of the long-languishing #6437, with two differences:

It exploits the @inline macro I just merged
It's based on @carlobaldassi's immutable approach rather than the tuple approach I tried in WIP: Add Cartesian product iteration. Fixes #1917 #6437. With the tuples there was a 2.5x penalty, which I think is enough in practice to discourage use.

At least for low dimensional Arrays, which have efficient linear indexing, there's scarcely any difference between linear and cartesian iteration. Nevertheless I preserved linear indexing for Arrays using the "traits trick". If you have an AbstractArray that has fast linear indexing, you can cause start to use it simply by declaring Base.linearindexing(::MyArrayType) = Base.LinearFast().

The only user-visible disadvantage I'm aware of in not using tuples is that the notation for multidimensional indexing needs to be A[I] rather than A[I...]. Consequently if/when we switch to tuples, people will have to update their code. In my opinion, that's a small price to pay for finally having efficient multidimensional iteration.

In the performance demos below, there's still room for some improvement. The most important one, I suspect, would be for types that have tuples in them to "inline" the tuple into the type (as pointed out by @simonster among others). Of course if we get that, this could be redone using tuples.

Performance demos:

function mdsum(A)
    s = 0.0
    for a in eachelement(A)
        s += a
    end
    s
end

function mdsum2(A)
    s = 0.0
    @inbounds for I in eachindex(A)
        s += A[I]
    end
    s
end

function sumlinear(A)
    s = 0.0
    @inbounds for i = 1:length(A)
        s += A[i]
    end
    s
end

function sumcart2(A::AbstractMatrix)
    s = 0.0
    @inbounds for j = 1:size(A,2), i = 1:size(A,1)
        s += A[i,j]
    end
    s
end

println("Arrays")
A = rand(10^4, 10^4)
mdsum(A); mdsum2(A); sumlinear(A); sumcart2(A)
@time mdsum(A)
@time mdsum2(A)
@time sumlinear(A)
@time sumcart2(A)

println("SubArrays")
B = sub(A, 2:9999, 2:9999)
mdsum(B); mdsum2(B); sumlinear(B); sumcart2(B)
@time mdsum(B)
@time mdsum2(B)
@time sumlinear(B)
@time sumcart2(B)

using ArrayViews
println("ArrayViews")
B = view(A, 2:9999, 2:9999)
mdsum(B); mdsum2(B); sumlinear(B); sumcart2(B)
@time mdsum(B)
@time mdsum2(B)
@time sumlinear(B)
@time sumcart2(B)

Results:

julia> include("/tmp/testiteration.jl")
Arrays
elapsed time: 0.15763366 seconds (13896 bytes allocated)
elapsed time: 0.209725186 seconds (96 bytes allocated)
elapsed time: 0.133438424 seconds (96 bytes allocated)
elapsed time: 0.133621887 seconds (96 bytes allocated)
SubArrays
elapsed time: 0.491196255 seconds (96 bytes allocated)
elapsed time: 0.406468852 seconds (96 bytes allocated)
elapsed time: 1.871101162 seconds (96 bytes allocated)
elapsed time: 0.283826772 seconds (96 bytes allocated)
ArrayViews
elapsed time: 1.331280489 seconds (96 bytes allocated)
elapsed time: 0.27865117 seconds (96 bytes allocated)
elapsed time: 2.543739468 seconds (96 bytes allocated)
elapsed time: 0.253180521 seconds (96 bytes allocated)
4.99765960924932e7

timholy · 2014-09-21T01:07:28Z

I should have added, the reason for the poor performance with mdsum and ArrayViews is the check for negative array dimensions. If you delete that, then the performance of the two mdsum variants are similar. It would be nice, of course, if the compiler would hoist out the size check.

carlobaldassi · 2014-09-21T22:27:59Z

@timholy sorry, I just saw your comment on #6035 (I've been catching up with my inbox today). If this gets merged, it could indeed be used to simplify and generalize the broadcasting code considerably (among other things), so I'm all for it.

timholy · 2014-09-25T14:34:44Z

I changed the title now to include WIP, since with the merger of stagedfunctions there are likely a couple of tweaks I want to make (see #8472 (comment)).

jakebolewski · 2014-09-25T14:41:06Z

Does this mean that cartesian.jl in its current form can be removed from Base?

timholy · 2014-09-25T15:01:26Z

Not even close---Cartesian does so much that this cannot (yet) do. But it does mean that one won't have to use @nloops in simple cases, like copy!, fill!, etc. (Basically, any function that has the pattern of visiting each element of an AbstractArray.) The reality is that most of base (main exceptions: reductions & broadcasting) doesn't need more than that, so I think this could have a pervasive effect in allowing us to write macro-free code that nevertheless performs well for arbitrary AbstractArrays.

It also allows something cool that Cartesian can't easily do: "zipped" iteration over containers that have commensurate sizes but different shapes, like copying a 5x1x3 array to a 5x3 array. I've always been annoyed by this open TODO. It really relied on linear indexing, and if X is of type that doesn't have efficient linear indexing, you're doomed. Cartesian struggled with this because @ngenerate can really only handle iteration over a single fixed dimensionality. (@Jutho added a nice extension that can handle such cases. It was merged into the Cartesian package but not into Base.)

It's also worth pointing out that with stagedfunctions, @ngenerate is on the chopping block. Very happy about that. But won't have time to get to it for a while.

timholy · 2014-11-11T11:26:33Z

While I can't find it now, I seem to recall a recent declaration that stagedfunctions should not have side effects. So, until we have stagedtypes (#8472), I think this should stay as-is. Will merge soon if no complaints arise.

vchuravy · 2014-11-11T11:36:54Z

Comment about side effects: #8886 (comment)

Jutho · 2014-11-11T11:45:43Z

I have one comment. The current implementation does not allow new methods to make use of the new Iterators, since they would need to be implemented in the

let implemented
...
end

block, together with eachelement and eachindex. A nicer implementation could be to have an

let implemented=IntSet()
    stagedfunction Base.call(::Type{SizeIterator},sizes::NTuple{N,Int})
        if !(N in implemented)
            generate ...
        end
       . ..
        return ...
    end
end

and then let eachelement do something like

function eachindex(A::AbstractArray)
return SizeIterator(size(A))
end

timholy · 2014-11-11T12:08:29Z

Thanks, @vchuravy.

Agreed, @Jutho. I'll look into that.

See #8432 (comment)

timholy · 2014-11-11T16:40:43Z

@Jutho, it's done, see if this is better.

Waiting for Travis (had to re-start OSX, but it passed Linux).

Jutho · 2014-11-11T22:43:27Z

I would think one could still do without the register function. I was thinking something along the lines of
https://gist.github.com/Jutho/832f3f4aee84cf927a53

I also took the liberty to rename your types to CartesianIndex and IndexIterator, which I find personally more appropriate than Subscript (kind of vague) and SizeIterator (it's not really iterating over sizes, but over indices). But that's a purely subjective matter.

timholy · 2014-11-12T01:57:46Z

It looks mostly good, but if I insert a println("Here") at this line, then I get a problem (note that it pre-constructs the types for dimensions 1-8, and starting at 9 it relies on dynamic generation):

julia> A9 = rand(3,3,3,3,3,3,3,3,3);

julia> A8 = rand(3,3,3,3,3,3,3,3);

julia> S9 = sub(A9, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3);

julia> S8 = sub(A8, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3);

julia> z = 0.0; for s in IteratorsMD.eachelement(S8)
           z += s
       end
Here

julia> z = 0.0; for s in IteratorsMD.eachelement(S9)
           z += s
       end

julia>

It never actually builds the type.

Jutho · 2014-11-12T07:02:41Z

Indeed, I made a small last-minute change which broke it. I restored this now (latest revision of the same gist) by having the start{T,N}(AT::(AbstractArray{T,N},LinearSlow)) out of the eval and letting it call CartesianIndex, which triggers the type generation when using eachelement. For eachindex, the types are already generated when the iterator is constructed. In principle, one could also move start{N}(iter::IndexIterator{N}) out of the eval and letting it call CartesianIndex, but the check for any zero dimension might be slightly uglier.

Closes #1917, closes #6437

…mplementations

See #8432 (comment)

timholy · 2014-11-12T22:02:53Z

I decided to go with your implementation. Technically it is not effect-free, since calling gen_cartesian from inside the stagedfunction can trigger consequences that are not encapsulated by the returned expression. I had been wondering about doing it this way, but decided against it given #8886 (comment). But it's such a carefully-managed side-effect, I'm back to thinking that this counts as a legitimate exception.

One nice thing is that this version allows us to get rid of eachelement. (Before we only needed it for dimensions > 8, but it's just nicer this way.) I also fixed a small bug in one implementation of next. b9744b1#diff-89e97f67f8058d32eac8d917db5bbdbfR80.

Let's see what Travis says; if it goes green, then I say it's merge-time.

ivarne · 2014-11-15T01:12:07Z

Not sure. I might be miss remembering this, but I'm pretty sure I tried this once, and I don't remember any problems. I would also guess absolute file references would just work, because the previous directory is still there. Now I'm starting to fear that I never finished that test, because the copy took too long and I got side tracked.

timholy · 2014-11-15T12:05:37Z

OK, I think I figured it out. julia was getting confused, especially during the bootstrap phase, about the difference between the traits-call start(::(AbstractArray,LinearFast/Slow)) and normal iteration for tuples.

I ran the entire perf suite, and didn't see anything convincing in terms of a performance regression. And of course, we have things like this:

function mdsum(A)
    s = 0.0
    for a in A
        s += a
    end
    s
end
A = randn(10^4,10^4)
S = sub(A, :, :)

After warmup, master:

julia> @time mdsum(A)
elapsed time: 0.184208858 seconds (13824 bytes allocated)
2311.344158094607

julia> @time mdsum(S)
elapsed time: 2.006607241 seconds (96 bytes allocated)
2311.344158094607

This PR:

julia> @time mdsum(A)
elapsed time: 0.217518824 seconds (13824 bytes allocated)
-653.3611064403863

julia> @time mdsum(S)
elapsed time: 0.223128702 seconds (216 bytes allocated)
-653.3611064403863

Any other issues to be concerned about before merging?

Jutho · 2014-11-15T12:34:14Z

That's great news. Makes sense, ... in hindsight. Great detective work.

timholy · 2014-11-15T13:15:18Z

Yeah. The tricky part was that (1) it wasn't causing errors, and (2) it only affected code compiled during the bootstrap sequence before type-inference gets turned on. So I was getting different timing results for whether this was being built on top of a successful build of master, or if I did a make clean && make. For a while that was quite confusing.

Efficient cartesian iteration (new version of #6437)

timholy · 2014-11-15T13:18:42Z

Man that felt good. I've been wanting this for ages.

StefanKarpinski · 2014-11-15T13:27:37Z

Exciting.

ViralBShah · 2014-11-15T18:57:14Z

This is really amazing to have.

tkelman · 2014-11-16T03:02:41Z

This is causing a bunch of ambiguity warnings on Windows with SharedArrays - https://gist.github.com/anonymous/dc0a75ac1474f5d67b8a (the backtrace failure at the end has been happening for a while, that's not relevant here)

Jutho · 2014-11-16T10:36:29Z

I guess the current implementation assumes that there were no SharedArrays for Windows, that't easy to change. However, the underlying problem might also be that the getindex definitions for SharedArray are to general. I had the same problem with trying to define a new getindex method for indexing an AbstractArray with a new custom type in TensorOperations.jl in order to implement some index notation. Because SharedArray defines a getindex(::SharedArray,::Any), you always need to define special methods for SharedArray, which requires looking into its definition etc. I don't think this is a viable strategy, so I would like to see the type of the index arguments in the getindex methods become restricted to e.g. Union(Real,AbstractVector). Then these extra patches like https://github.com/JuliaLang/julia/pull/8432/files#diff-89e97f67f8058d32eac8d917db5bbdbfR46 become unnecessary.

blakejohnson · 2014-11-16T21:21:44Z

base/exports.jl

@@ -515,6 +515,8 @@ export
    cumsum,
    cumsum!,
    cumsum_kbn,
+    eachelement,


In the end you ended up not keeping eachelement, so this export should be removed.

Oops, indeed. See 40a5a41. Thanks!

As reported and discussed here: #8432 (comment)

ivarne · 2014-11-19T13:20:16Z

Hmm... How does this new function eachindex() work?

timholy · 2014-11-19T14:35:56Z

Hmm... seems I should add some documentation!

timholy · 2014-11-19T15:02:55Z

@ivarne, see 9b4f212

ivarne · 2014-11-19T16:09:05Z

😃 Now I understand what this is really about.

Should it be defined for other indexable containers too?

Does it have enough overlap with keys() so that we can overload that function instead? (I guess the answer is NO, because eachindex returns a tuple you have to splat into getindex, while keys returns the key value)

Jutho · 2014-11-19T16:12:41Z

Well, it's not really a tuple because of the overhead, so its a new type CartesianIndex_N. Since getindex is overloaded to accept this type, there is also no splatting involved. So maybe we could consider this as another method of keys.

See JuliaLang#8432 (comment)

As reported and discussed here: JuliaLang#8432 (comment)

This was referenced Sep 22, 2014

Towards array nirvana #7941

Closed

WIP: Staged functions #7474

Merged

Jutho mentioned this pull request Sep 25, 2014

RFC: staged / meta / generated types ? #8472

Open

timholy changed the title ~~Efficient cartesian iteration (new version of #6437)~~ WIP: efficient cartesian iteration (new version of #6437) Sep 25, 2014

timholy mentioned this pull request Sep 28, 2014

New array views based on stagedfunctions #8501

Merged

jiahao force-pushed the master branch 3 times, most recently from 6c7c7e3 to 1a4c02f Compare October 11, 2014 22:06

jiahao force-pushed the master branch from cdde4df to 7fdc860 Compare October 28, 2014 04:20

timholy changed the title ~~WIP: efficient cartesian iteration (new version of #6437)~~ Efficient cartesian iteration (new version of #6437) Nov 11, 2014

timholy added a commit that referenced this pull request Nov 11, 2014

Make type-construction more modular

fb036e3

See #8432 (comment)

timholy and others added 4 commits November 12, 2014 14:53

Add a multidimensional (cartesian) iterator

0314913

Closes #1917, closes #6437

Make the output of setindex! for SharedArrays consistent with other i…

db2fa01

…mplementations

Make type-construction more modular

dbd1bd4

See #8432 (comment)

Rework iteration using stagedfunctions

b9744b1

timholy force-pushed the teh/cartesian_iteration branch from fb036e3 to b9744b1 Compare November 12, 2014 21:56

Avoid confusion with start(::Tuple) iteration

2fa852d

timholy added a commit that referenced this pull request Nov 15, 2014

Merge pull request #8432 from JuliaLang/teh/cartesian_iteration

37ffa13

Efficient cartesian iteration (new version of #6437)

timholy merged commit 37ffa13 into master Nov 15, 2014

blakejohnson reviewed Nov 16, 2014
View reviewed changes

Jutho added a commit that referenced this pull request Nov 17, 2014

Fix ambiguity warnings with SharedArray

1c12bcb

As reported and discussed here: #8432 (comment)

Jutho mentioned this pull request Nov 17, 2014

fix ambiguity warning with SharedArray #9036

Merged

timholy deleted the teh/cartesian_iteration branch November 18, 2014 18:42

waTeim pushed a commit to waTeim/julia that referenced this pull request Nov 23, 2014

Make type-construction more modular

1594923

See JuliaLang#8432 (comment)

waTeim pushed a commit to waTeim/julia that referenced this pull request Nov 23, 2014

Fix ambiguity warnings with SharedArray

031c7eb

As reported and discussed here: JuliaLang#8432 (comment)

timholy mentioned this pull request Jan 4, 2015

WIP: Output type handling (fixes #17) JuliaMath/Interpolations.jl#19

Merged

Jutho mentioned this pull request Jan 7, 2015

Tighten type of state in next and done of StepRange and UnitRange #9670

Closed

ivarne mentioned this pull request Feb 14, 2015

add some of Iterators.jl to Base? #10162

Closed

timholy mentioned this pull request Apr 17, 2015

Change for i=1:length(A) to for i in eachindex(A) #10858

Merged

6 tasks

Ken-B mentioned this pull request Jul 16, 2015

Typo #12170

Closed

stevengj mentioned this pull request Aug 24, 2015

change for i in 1:length(a) to i in eachindex(a) (continuing #10858) #12788

Closed

ivarne mentioned this pull request Aug 26, 2015

Add keys(::Array) and values(::Array) #12811

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficient cartesian iteration (new version of #6437) #8432

Efficient cartesian iteration (new version of #6437) #8432

timholy commented Sep 21, 2014

timholy commented Sep 21, 2014

carlobaldassi commented Sep 21, 2014

timholy commented Sep 25, 2014

jakebolewski commented Sep 25, 2014

timholy commented Sep 25, 2014

timholy commented Nov 11, 2014

vchuravy commented Nov 11, 2014

Jutho commented Nov 11, 2014

timholy commented Nov 11, 2014

timholy commented Nov 11, 2014

Jutho commented Nov 11, 2014

timholy commented Nov 12, 2014

Jutho commented Nov 12, 2014

timholy commented Nov 12, 2014

ivarne commented Nov 15, 2014

timholy commented Nov 15, 2014

Jutho commented Nov 15, 2014

timholy commented Nov 15, 2014

timholy commented Nov 15, 2014

StefanKarpinski commented Nov 15, 2014

ViralBShah commented Nov 15, 2014

tkelman commented Nov 16, 2014

Jutho commented Nov 16, 2014

blakejohnson Nov 16, 2014

timholy Nov 16, 2014

ivarne commented Nov 19, 2014

timholy commented Nov 19, 2014

timholy commented Nov 19, 2014

ivarne commented Nov 19, 2014

Jutho commented Nov 19, 2014

Efficient cartesian iteration (new version of #6437) #8432

Efficient cartesian iteration (new version of #6437) #8432

Conversation

timholy commented Sep 21, 2014

timholy commented Sep 21, 2014

carlobaldassi commented Sep 21, 2014

timholy commented Sep 25, 2014

jakebolewski commented Sep 25, 2014

timholy commented Sep 25, 2014

timholy commented Nov 11, 2014

vchuravy commented Nov 11, 2014

Jutho commented Nov 11, 2014

timholy commented Nov 11, 2014

timholy commented Nov 11, 2014

Jutho commented Nov 11, 2014

timholy commented Nov 12, 2014

Jutho commented Nov 12, 2014

timholy commented Nov 12, 2014

ivarne commented Nov 15, 2014

timholy commented Nov 15, 2014

Jutho commented Nov 15, 2014

timholy commented Nov 15, 2014

timholy commented Nov 15, 2014

StefanKarpinski commented Nov 15, 2014

ViralBShah commented Nov 15, 2014

tkelman commented Nov 16, 2014

Jutho commented Nov 16, 2014

blakejohnson Nov 16, 2014

Choose a reason for hiding this comment

timholy Nov 16, 2014

Choose a reason for hiding this comment

ivarne commented Nov 19, 2014

timholy commented Nov 19, 2014

timholy commented Nov 19, 2014

ivarne commented Nov 19, 2014

Jutho commented Nov 19, 2014