Using pointer arithmetic to avoid recomputing indexes #3

lindahua · 2013-10-09T21:37:24Z

For multidimensional array, calling getindex for each element isn't the optimal way to access data. Here is the code:

using Cartesian

function sum3d_handloop(a::Array{Float64,3})
    s = 0.
    d1 = size(a, 1)
    d2 = size(a, 2)
    d3 = size(a, 3)

    for i3=1:d3, i2=1:d2, i1=1:d1
        @inbounds s += a[i1, i2, i3]
    end
    s
end

function sum3d_nloop(a::Array{Float64,3})
    s = 0.
    @nloops 3 i a begin
        @inbounds s += @nref 3 a i
    end
    return s
end

function sum3d_ptarith(a::Array{Float64,3})  
    # using pointer arithmetics to move pointer

    s = 0.
    d1 = size(a, 1)
    d2 = size(a, 2)
    d3 = size(a, 3)
    t1 = stride(a, 1)
    t2 = stride(a, 2)
    t3 = stride(a, 3)

    pbase3 = pointer(a)
    esize = sizeof(Float64)

    for i3=1:d3
        pbase2 = pbase3
        for i2=1:d2
            pbase1 = pbase2
            for i1=1:d1
                s += unsafe_load(pbase1)
                pbase1 += t1 * esize
            end
            pbase2 += t2 * esize
        end
        pbase3 += t3 * esize
    end
end

const a = rand(100,100,100)

# warming
sum3d_handloop(a)
sum3d_nloop(a)
sum3d_ptarith(a)

# measure
@time for i = 1:100 sum3d_handloop(a) end  # 0.080215784 seconds
@time for i = 1:100 sum3d_nloop(a) end  # 0.081037291 seconds
@time for i = 1:100 sum3d_ptarith(a) end  # 0.042198018 seconds

This shows that @nloops can generate codes as efficient as a normally hand-crafted loop. However, it is about 2x slower than that using pointer arithmetics to move pointers. Also, the code that uses pointer arithmetics is regular enough and should be constructable using a macro.

The text was updated successfully, but these errors were encountered:

timholy · 2013-10-09T21:48:16Z

Good point. But, is that easy to get working for SubArrays too? If not, I don't think it would justify switching.

lindahua · 2013-10-09T21:58:53Z

The only requirement is that the array is strided, so it should work for both Array and SubArray.

timholy · 2013-10-10T00:30:38Z

Sorry, I read your note in about 30s as I was leaving work; I should have known better because I've written code like that myself in the past (not with the pointer, but with linear indexing, which is a less-efficient version of the same idea). Indeed, Cartesian v0.0 was based around macros that generated nested loops like these.

So, initially I was hasty also because I didn't want to go back to the bad old days. But, I've realized that we can get this almost for free from ec13dea, which I just pushed to provide support for easily writing reductions (as demonstrated on the mailing list; the version in test/ fixes one tiny bug). To enable you to write the code above we'll need one more thing, support for expressions like d->(d>1) ? (pbase_(k-1) = pbase_k) : nothing. Can't promise it tonight, unfortunately.

timholy · 2013-10-10T08:46:57Z

OK, you should have everything you need to write this type of code now. See the docs and examples in test/.

lindahua · 2013-10-10T12:18:06Z

You are amazing! I will play with this soon. I do wish all these can be built into the language with simplified syntax.

timholy · 2013-10-10T13:56:29Z

I do wish all these can be built into the language with simplified syntax.

Agreed. I suspect replacing the explicit macro-call syntax with something more readable wouldn't be super-hard (although certainly I don't know how one would go about that). I have a little harder time imagining how one could get the flexibility of the anonymous-function-expressions using a much simpler or more readable syntax. Something like

for I = <2:size(A,d)-1 for d = 1:ndims(A)>
    ...
end

might work? (I'm using angle-brackets to indicate something different than a comprehension.) And for reductions maybe something like

for I = <1:size(A,d) for d = 1:ndims(A)> with J = <size(out,d) == 1 ? 1 : I[d]>
    ...
end

The laplacian example, as an illustration of reference syntax, also bears thinking about. I could imagine

for d1 = 1:ndims(A)
    tmp += A[<d==d1 ? I[d]+1 : I[d] for d = 1:ndims(A)>]
end

Not a ton more readable than the original, but I think it's somewhat better. [EDIT: getting all this to work at compile time is important, and particularly the latter would present a challenge without some additional syntax.] Fundamentally it may be challenging to get both flexibility and readability.

CC @JeffBezanson, @StefanKarpinski.

timholy closed this as completed in 789b56e Oct 10, 2013

timholy mentioned this issue Oct 11, 2013

Cartesian Product iteration JuliaLang/julia#1917

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using pointer arithmetic to avoid recomputing indexes #3

Using pointer arithmetic to avoid recomputing indexes #3

lindahua commented Oct 9, 2013

timholy commented Oct 9, 2013

lindahua commented Oct 9, 2013

timholy commented Oct 10, 2013

timholy commented Oct 10, 2013

lindahua commented Oct 10, 2013

timholy commented Oct 10, 2013

Using pointer arithmetic to avoid recomputing indexes #3

Using pointer arithmetic to avoid recomputing indexes #3

Comments

lindahua commented Oct 9, 2013

timholy commented Oct 9, 2013

lindahua commented Oct 9, 2013

timholy commented Oct 10, 2013

timholy commented Oct 10, 2013

lindahua commented Oct 10, 2013

timholy commented Oct 10, 2013