Skip to content
This repository has been archived by the owner on Feb 9, 2020. It is now read-only.

Using pointer arithmetic to avoid recomputing indexes #3

Closed
lindahua opened this issue Oct 9, 2013 · 6 comments
Closed

Using pointer arithmetic to avoid recomputing indexes #3

lindahua opened this issue Oct 9, 2013 · 6 comments

Comments

@lindahua
Copy link

lindahua commented Oct 9, 2013

For multidimensional array, calling getindex for each element isn't the optimal way to access data. Here is the code:

using Cartesian

function sum3d_handloop(a::Array{Float64,3})
    s = 0.
    d1 = size(a, 1)
    d2 = size(a, 2)
    d3 = size(a, 3)

    for i3=1:d3, i2=1:d2, i1=1:d1
        @inbounds s += a[i1, i2, i3]
    end
    s
end

function sum3d_nloop(a::Array{Float64,3})
    s = 0.
    @nloops 3 i a begin
        @inbounds s += @nref 3 a i
    end
    return s
end

function sum3d_ptarith(a::Array{Float64,3})  
    # using pointer arithmetics to move pointer

    s = 0.
    d1 = size(a, 1)
    d2 = size(a, 2)
    d3 = size(a, 3)
    t1 = stride(a, 1)
    t2 = stride(a, 2)
    t3 = stride(a, 3)

    pbase3 = pointer(a)
    esize = sizeof(Float64)

    for i3=1:d3
        pbase2 = pbase3
        for i2=1:d2
            pbase1 = pbase2
            for i1=1:d1
                s += unsafe_load(pbase1)
                pbase1 += t1 * esize
            end
            pbase2 += t2 * esize
        end
        pbase3 += t3 * esize
    end
end

const a = rand(100,100,100)

# warming
sum3d_handloop(a)
sum3d_nloop(a)
sum3d_ptarith(a)

# measure
@time for i = 1:100 sum3d_handloop(a) end  # 0.080215784 seconds
@time for i = 1:100 sum3d_nloop(a) end  # 0.081037291 seconds
@time for i = 1:100 sum3d_ptarith(a) end  # 0.042198018 seconds

This shows that @nloops can generate codes as efficient as a normally hand-crafted loop. However, it is about 2x slower than that using pointer arithmetics to move pointers. Also, the code that uses pointer arithmetics is regular enough and should be constructable using a macro.

@timholy
Copy link
Owner

timholy commented Oct 9, 2013

Good point. But, is that easy to get working for SubArrays too? If not, I don't think it would justify switching.

@lindahua
Copy link
Author

lindahua commented Oct 9, 2013

The only requirement is that the array is strided, so it should work for both Array and SubArray.

@timholy
Copy link
Owner

timholy commented Oct 10, 2013

Sorry, I read your note in about 30s as I was leaving work; I should have known better because I've written code like that myself in the past (not with the pointer, but with linear indexing, which is a less-efficient version of the same idea). Indeed, Cartesian v0.0 was based around macros that generated nested loops like these.

So, initially I was hasty also because I didn't want to go back to the bad old days. But, I've realized that we can get this almost for free from ec13dea, which I just pushed to provide support for easily writing reductions (as demonstrated on the mailing list; the version in test/ fixes one tiny bug). To enable you to write the code above we'll need one more thing, support for expressions like d->(d>1) ? (pbase_(k-1) = pbase_k) : nothing. Can't promise it tonight, unfortunately.

@timholy
Copy link
Owner

timholy commented Oct 10, 2013

OK, you should have everything you need to write this type of code now. See the docs and examples in test/.

@lindahua
Copy link
Author

You are amazing! I will play with this soon. I do wish all these can be built into the language with simplified syntax.

@timholy
Copy link
Owner

timholy commented Oct 10, 2013

I do wish all these can be built into the language with simplified syntax.

Agreed. I suspect replacing the explicit macro-call syntax with something more readable wouldn't be super-hard (although certainly I don't know how one would go about that). I have a little harder time imagining how one could get the flexibility of the anonymous-function-expressions using a much simpler or more readable syntax. Something like

for I = <2:size(A,d)-1 for d = 1:ndims(A)>
    ...
end

might work? (I'm using angle-brackets to indicate something different than a comprehension.) And for reductions maybe something like

for I = <1:size(A,d) for d = 1:ndims(A)> with J = <size(out,d) == 1 ? 1 : I[d]>
    ...
end

The laplacian example, as an illustration of reference syntax, also bears thinking about. I could imagine

for d1 = 1:ndims(A)
    tmp += A[<d==d1 ? I[d]+1 : I[d] for d = 1:ndims(A)>]
end

Not a ton more readable than the original, but I think it's somewhat better. [EDIT: getting all this to work at compile time is important, and particularly the latter would present a challenge without some additional syntax.] Fundamentally it may be challenging to get both flexibility and readability.

CC @JeffBezanson, @StefanKarpinski.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants