-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficient cartesian iteration (new version of #6437) #8432
Conversation
I should have added, the reason for the poor performance with |
I changed the title now to include WIP, since with the merger of stagedfunctions there are likely a couple of tweaks I want to make (see #8472 (comment)). |
Does this mean that cartesian.jl in its current form can be removed from Base? |
Not even close---Cartesian does so much that this cannot (yet) do. But it does mean that one won't have to use It also allows something cool that Cartesian can't easily do: "zipped" iteration over containers that have commensurate sizes but different shapes, like copying a 5x1x3 array to a 5x3 array. I've always been annoyed by this open TODO. It really relied on linear indexing, and if It's also worth pointing out that with stagedfunctions, |
6c7c7e3
to
1a4c02f
Compare
While I can't find it now, I seem to recall a recent declaration that |
Comment about side effects: #8886 (comment) |
I have one comment. The current implementation does not allow new methods to make use of the new Iterators, since they would need to be implemented in the let implemented
...
end block, together with let implemented=IntSet()
stagedfunction Base.call(::Type{SizeIterator},sizes::NTuple{N,Int})
if !(N in implemented)
generate ...
end
. ..
return ...
end
end and then let function eachindex(A::AbstractArray)
return SizeIterator(size(A))
end |
@Jutho, it's done, see if this is better. Waiting for Travis (had to re-start OSX, but it passed Linux). |
I would think one could still do without the register function. I was thinking something along the lines of I also took the liberty to rename your types to |
It looks mostly good, but if I insert a julia> A9 = rand(3,3,3,3,3,3,3,3,3);
julia> A8 = rand(3,3,3,3,3,3,3,3);
julia> S9 = sub(A9, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3);
julia> S8 = sub(A8, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3);
julia> z = 0.0; for s in IteratorsMD.eachelement(S8)
z += s
end
Here
julia> z = 0.0; for s in IteratorsMD.eachelement(S9)
z += s
end
julia> It never actually builds the type. |
Indeed, I made a small last-minute change which broke it. I restored this now (latest revision of the same gist) by having the |
fb036e3
to
b9744b1
Compare
I decided to go with your implementation. Technically it is not effect-free, since calling One nice thing is that this version allows us to get rid of Let's see what Travis says; if it goes green, then I say it's merge-time. |
Not sure. I might be miss remembering this, but I'm pretty sure I tried this once, and I don't remember any problems. I would also guess absolute file references would just work, because the previous directory is still there. Now I'm starting to fear that I never finished that test, because the copy took too long and I got side tracked. |
OK, I think I figured it out. julia was getting confused, especially during the bootstrap phase, about the difference between the traits-call I ran the entire perf suite, and didn't see anything convincing in terms of a performance regression. And of course, we have things like this: function mdsum(A)
s = 0.0
for a in A
s += a
end
s
end
A = randn(10^4,10^4)
S = sub(A, :, :) After warmup, master: julia> @time mdsum(A)
elapsed time: 0.184208858 seconds (13824 bytes allocated)
2311.344158094607
julia> @time mdsum(S)
elapsed time: 2.006607241 seconds (96 bytes allocated)
2311.344158094607 This PR: julia> @time mdsum(A)
elapsed time: 0.217518824 seconds (13824 bytes allocated)
-653.3611064403863
julia> @time mdsum(S)
elapsed time: 0.223128702 seconds (216 bytes allocated)
-653.3611064403863 Any other issues to be concerned about before merging? |
That's great news. Makes sense, ... in hindsight. Great detective work. |
Yeah. The tricky part was that (1) it wasn't causing errors, and (2) it only affected code compiled during the bootstrap sequence before type-inference gets turned on. So I was getting different timing results for whether this was being built on top of a successful build of |
Efficient cartesian iteration (new version of #6437)
Man that felt good. I've been wanting this for ages. |
Exciting. |
This is really amazing to have. |
This is causing a bunch of ambiguity warnings on Windows with SharedArrays - https://gist.github.com/anonymous/dc0a75ac1474f5d67b8a (the backtrace failure at the end has been happening for a while, that's not relevant here) |
I guess the current implementation assumes that there were no SharedArrays for Windows, that't easy to change. However, the underlying problem might also be that the |
@@ -515,6 +515,8 @@ export | |||
cumsum, | |||
cumsum!, | |||
cumsum_kbn, | |||
eachelement, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the end you ended up not keeping eachelement
, so this export should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, indeed. See 40a5a41. Thanks!
As reported and discussed here: #8432 (comment)
Hmm... How does this new function eachindex() work? |
Hmm... seems I should add some documentation! |
😃 Now I understand what this is really about. Should it be defined for other indexable containers too? Does it have enough overlap with |
Well, it's not really a tuple because of the overhead, so its a new type |
As reported and discussed here: JuliaLang#8432 (comment)
This is the complement of "adding array views": while it seems not to be widely appreciated, the reality is that none of the View proposals (mine included) have good performance in the general case if we rely on linear indexing (demos below, see for example also a recent StackOverflow post). This is basically a direct consequence of the fact that
div
is horribly slow. Consequently, we need to transition to writing algorithms with multidimensional/cartesian iterators. While the Cartesian macros provide a way to do this, I think it's fair to say that everyone would like a solution that doesn't rely on macros. While this PR offers only a small fraction of the power of Cartesian, at least simple cases can now get good performance without resorting to macros.This is basically a redo of the long-languishing #6437, with two differences:
@inline
macro I just mergedimmutable
approach rather than the tuple approach I tried in WIP: Add Cartesian product iteration. Fixes #1917 #6437. With the tuples there was a 2.5x penalty, which I think is enough in practice to discourage use.At least for low dimensional
Array
s, which have efficient linear indexing, there's scarcely any difference between linear and cartesian iteration. Nevertheless I preserved linear indexing forArray
s using the "traits trick". If you have anAbstractArray
that has fast linear indexing, you can causestart
to use it simply by declaringBase.linearindexing(::MyArrayType) = Base.LinearFast()
.The only user-visible disadvantage I'm aware of in not using tuples is that the notation for multidimensional indexing needs to be
A[I]
rather thanA[I...]
. Consequently if/when we switch to tuples, people will have to update their code. In my opinion, that's a small price to pay for finally having efficient multidimensional iteration.In the performance demos below, there's still room for some improvement. The most important one, I suspect, would be for types that have tuples in them to "inline" the tuple into the type (as pointed out by @simonster among others). Of course if we get that, this could be redone using tuples.
Performance demos:
Results: