Speed of sprand() #6708

ViralBShah · 2014-05-01T09:15:02Z

Based on the discussion here, we need to investigate if sparse() can be sped up.

https://groups.google.com/forum/#!topic/julia-users/X8Jca4SZMxo

Potential opportunities include experimenting with different ways of sorting inputs, avoiding the use of the generic combine function, @inbounds, etc.

The following example takes 1.7 seconds for me:

s = sprand(70000,70000,0.001)

The text was updated successfully, but these errors were encountered:

ViralBShah · 2014-05-01T09:26:41Z

On quick checks, @inbounds stuck in front of the for loops and removing the combine function has no effect on time. Checking pre-sorted inputs should help.

ViralBShah · 2014-05-01T09:27:17Z

Cc: @rwgardner

stevengj · 2014-05-01T15:24:39Z

On my machine, almost all of the time in your sprand test is spent in two lines:

ind = sort(rand(1:N, ik)) (here): takes about 0.6 seconds. (Probably this should be sort! to save memory, but that doesn't help performance much.)
I, J = ind2sub((m,n), uind) (here): takes about 1.3 seconds.

I'm not sure we can do much about the first, short of speeding up rand and sort! (although we might be able to avoid the sort entirely by a more clever statistical algorithm that generates the indices in order, via computing the probability that each term in 1:N appears P times ... this requires some careful statistics though). But maybe we can make ind2sub faster or avoid it entirely somehow?

Basically, to make this faster it seems like we should compute the entries row by row to start with.

StefanKarpinski · 2014-05-01T15:31:24Z

Is that code trying to get ik unique values from 1:N? If so, this seems like it could be done much more efficiently. Note that choosing a random subset of N elements is equivalent to choosing a random N-bit unsigned integer, or equivalently, N/64 random Uint64 values. That avoids any need for sorting or determining uniqueness.

stevengj · 2014-05-01T15:33:11Z

@StefanKarpinski, I think we should probably avoid computing random elements in 1:N entirely. Instead, at least for low-density filling of large arrays, we should randomly generate the indices to set in each ~~row~~ column (since we are using CSC).

StefanKarpinski · 2014-05-01T15:34:25Z

Yes, I guess when something's very sparse that's likely to be much better. Of course when the number of non-zeros is substantial, regardless of how sparse things are, it's still a lot of work.

stevengj · 2014-05-01T15:35:24Z

I don't think that large matrices with a large fraction of nonzeros are a practical concern. You shouldn't be using sparse formats at all in that case.

StefanKarpinski · 2014-05-01T15:35:58Z

I meant huge matrices with a small fraction of non-zeros where the number of non-zeros is still large.

stevengj · 2014-05-01T15:38:58Z

Of course the work will always be Ω(# nonzeros) (right now it is Θ(# log #)). This is mostly about improving the constant factor (although my proposal also reduces the complexity by a factor log(#/n)/log(#)).

stevengj · 2014-05-01T15:46:04Z

@ViralBShah, this is all about speeding up sprand; the benchmark doesn't seem to have anything to do with sparse(). Should the issue be renamed, or is there a different benchmark showing performance problems in sparse?

stevengj · 2014-05-01T15:53:47Z

@StefanKarpinski, your solution of generating a random N-bit integer is not practical here, because it requires Θ(N) work and storage, which could be many orders of magnitude greater than the number of nonzeros. In any case, working by columns should make the log factor in the sort! negligible.

StefanKarpinski · 2014-05-01T16:07:10Z

No, I see that. I'm trying to figure out where this is slow. Maybe ind2sub can be sped up, which is a good thing whether we manage to avoid it here or not.

StefanKarpinski · 2014-05-01T16:09:38Z

I suspect it's because our div and rem operations are so horribly pessimized, but I haven't narrowed it down to that yet. Inlining is making precise profiling a bit difficult.

ViralBShah · 2014-05-01T17:32:28Z

Originally, I meant this to track down the performance issues raised by @rwgardner in sparse, but I got distracted with other issues between the sprand test and the sparse test. I renamed this issue to track sprand, and will file a separate one for sparse if required.

timholy · 2014-05-01T17:39:34Z

The recent history behind sprand is described #4726.

Yes, ind2sub is a big bottleneck and surely comes down to our div and rem being slow.

StefanKarpinski · 2014-05-01T17:48:09Z

I still think it's crazy for our integer division to be so damned slow. We do the fast, efficient but slightly unsafe thing with integers everywhere else in the language. I don't think div should be an exception. We could provide a checked_div operation like checked_add, but that seems silly since unlike overflow, division by zero is easy to explicitly check for. There's also the trick where you can implement div by a fixed divisor with a multiply and a shift by factors computed in advance, which would be applicable to vectorized ind2sub for example.

JeffBezanson · 2014-05-01T18:02:44Z

There's a big difference between overflow and undefined behavior. Results that overflow are still well-defined, and can sometimes even be exploited to get the answer you want. The undefined answers, however, are random numbers, or possibly a trap if you're lucky. Getting different results for no clear reason is just not ok.

StefanKarpinski · 2014-05-01T21:36:39Z

It seems like generating each sparse column separately would be a big win here. The expected density of each column is the same as the density of the matrix of a whole, so that's pretty simple. The hard part is the patching up after-the-fact if the fraction isn't close. It may make more sense to allocate non-zeros to the columns up front and then just choose that many unique indices per column and sort those.

StefanKarpinski · 2014-05-01T21:37:22Z

So we're just not going to have a way to do integer division efficiently?

JeffBezanson · 2014-05-01T21:47:21Z

We can if LLVM adds intrinsics for x86 division, or something similar. We can also add our own unsafe division intrinsics, and a compiler flag to turn off the checks globally. We could also decide that the typemin/-1 case is not important, and at least remove the check for that.

timholy · 2014-05-01T21:47:58Z

I really like the idea of generating the columns separately, I should have thought of that myself. The big win is not needing to use ind2sub, but as a nice bonus it will slightly decrease the sort time too (due to the logN part).

timholy · 2014-05-01T21:49:55Z

Certainly for linear indexing of array views, having fast integer division would be huge, and is presumably a good example of a case where the unsafe algorithm should be fine?

tkelman · 2014-05-02T05:47:33Z

Possibly relevant? Proposed addition of LLVM intrinsics for safe division: http://article.gmane.org/gmane.comp.compilers.llvm.devel/72466

StefanKarpinski · 2014-05-02T09:16:53Z

This implementation is much faster:

using Distributions

function spr(m,n,p)
    M = Multinomial(iround(m*n*p), n)
    c = cumsum([1; rand(M)])
    r = rand(1:m, c[end]-1)
    for i = 1:length(c)-1
        lo, hi = c[i], c[i+1]-1
        sort!(r, lo, hi, QuickSort, Base.Order.Forward)
    end
    v = rand(length(r))
    S = SparseMatrixCSC(m,n,c,r,v)
end

julia> @time spr(70000, 70000, 0.001)
elapsed time: 0.254208626 seconds (89577536 bytes allocated)
70000x70000 sparse matrix with 4900000 Float64 entries:
    [312  ,     1]  =  0.759103
    [1498 ,     1]  =  0.197246
    [5473 ,     1]  =  0.192934
    [7402 ,     1]  =  0.742496
    [9401 ,     1]  =  0.538901
    [9555 ,     1]  =  0.150737
    [9859 ,     1]  =  0.880329
    ⋮
    [60202, 70000]  =  0.290513
    [61760, 70000]  =  0.828304
    [62626, 70000]  =  0.123417
    [64031, 70000]  =  0.272032
    [64734, 70000]  =  0.451746
    [65528, 70000]  =  0.205945
    [66508, 70000]  =  0.0673513
    [66696, 70000]  =  0.938025

As compared to this:

julia> @time sprand(70000, 70000, 0.001);
elapsed time: 1.817448442 seconds (548796520 bytes allocated)

Of course, it's not quite correct since it allows duplicate row indices in the same column. When I insert some code to try to handle that, it slows back down to the same as what we have now:

function spr(m,n,p)
    M = Multinomial(iround(m*n*p), n)
    c = cumsum([1; rand(M)])
    r = rand(1:m, c[end]-1)
    for i = 1:length(c)-1
        lo, hi = c[i], c[i+1]-1
        sort!(r, lo, hi, QuickSort, Base.Order.Forward)
        while true
            dups = false
            for j = lo:hi-1
                r[j] == r[j+1] || continue
                r[j] = rand(1:m)
                dups = true
            end
            dups || break
            sort!(r, lo, hi, InsertionSort, Base.Order.Forward)
        end
    end
    v = rand(length(r))
    S = SparseMatrixCSC(m,n,c,r,v)
end

julia> @time spr(70000, 70000, 0.001)
elapsed time: 1.94048937 seconds (652454080 bytes allocated)
70000x70000 sparse matrix with 4900000 Float64 entries:
    [200  ,     1]  =  0.870909
    [316  ,     1]  =  0.675384
    [2708 ,     1]  =  0.561914
    [3678 ,     1]  =  0.919305
    [3717 ,     1]  =  0.777615
    [4848 ,     1]  =  0.628789
    [6701 ,     1]  =  0.00253897
    ⋮

If we can figure out a faster way to handle duplicates, this is a better algorithm by far. Of course, we'd need to copy the multinomial sampling code out of Distributions. Another nice feature of this implementation is that it tends to give exactly the requested sparsity.

timholy · 2014-05-02T13:46:18Z

Won't this implementation will exponentially slow down as the density increases (dups will start being true most of the time)? But I really like the fact that it hits the target density exactly (within integer rounding).

stevengj · 2014-05-02T14:58:51Z

This implementation, which fills in the matrix by columns as I suggested, is about four times faster than the original on my machine:

# Return a random sorted subset of 1:N, where each element is included with probability p.
function randomsubset!(p,N,ind,uind)
    if p == 1
        resize!(uind, N)
        uind[:] = 1:N
        return uind
    end
    resize!(uind, 0)
    p == 0 && return uind
    if N < 100 # O(N) algorithm for small N
        for i = 1:N
            rand() < p && push!(uind, i)
        end
    else
        K = p <= 0.5 ? N*p : N*(1-p)
        # Use Newton's method to invert the birthday problem
        L = log1p(-1/N)
        KN = K/N
        k = K
        k = k + (expm1(-k*L) - exp(-k*L)*KN)/L
        k = k + (expm1(-k*L) - exp(-k*L)*KN)/L # for K<N/2, 2 iterations suffice
        ik = ifloor(k)
        if rand() < k - ik
            ik += 1
        end
        if ik == 0
            if p <= 0.5
                return uind
            else
                resize!(uind, N)
                uind[:] = 1:N
                return uind
            end
        end
        if (ik > length(ind))
            resize!(ind, ik)
        end
        for i = 1:ik
            ind[i] = rand(1:N)
        end
        sort!(ind,1,ik,QuickSort, Base.Order.Forward)
        if p <= 0.5
            j = ind[1]
            push!(uind, j)
            uj = j
            for i = 2:ik
                j = ind[i]
                if j != uj
                    push!(uind, j)
                    uj = j
                end
            end
        else
            push!(ind, N+1) # sentinel
            ii = 1
            for i = 1:N
                if i != ind[ii]
                    push!(uind, i)
                else
                    while (i == ind[ii])
                        ii += 1
                    end
                end
            end
        end
    end
    uind
end
function randomsubset(p, N)
    uind = Array(Int, int(N*p))
    sizehint(uind, int(N*p))
    randomsubset!(p,N,Int[],uind)
end
import Base.SparseMatrix.sparse_IJ_sorted!
function sprand2{T}(m::Integer, n::Integer, density::FloatingPoint, rng::Function,::Type{T}=eltype(rng(1)))
    0 <= density <= 1 || throw(ArgumentError("density must be between 0 and 1"))
    N = n*m
    N == 0 && return spzeros(T,m,n)
    N == 1 && return rand() <= density ? sparse(rng(1)) : spzeros(T,1,1)

    I, J = Array(Int, 0), Array(Int, 0) # indices of nonzero elements
    sizehint(I, int(N*density))
    sizehint(J, int(N*density))

    # coldensity = density of nonzero columns
    if density*m > 1
        coldensity = 1 - (1-density)^m 
    else # use Taylor series (= binomial expansion) to avoid cancellation errors
        x = -(coldensity = density * m)
        for k = 2:m
            x *= (k-m-1)*density/k
            oldcoldensity = coldensity
            coldensity -= x
            if oldcoldensity == coldensity
                break
            end
        end
    end

    rows = Array(Int, 0)
    sizehint(rows, int(m*density))
    ind = Int[]
    for j in randomsubset(coldensity, n)
        randomsubset!(density, m, ind, rows)
        append!(I, rows)
        nrows = length(rows)
        Jlen = length(J)
        resize!(J, Jlen+nrows)
        for i = Jlen+1:length(J)
            J[i] = j
        end
    end
    return sparse_IJ_sorted!(I, J, rng(length(I)), m, n, +)  # it will never need to combine
end
sprand2(m::Integer, n::Integer, density::FloatingPoint) = sprand2(m,n,density,rand,Float64)

StefanKarpinski · 2014-05-02T15:19:54Z

@timholy – yes, and the duplicate elimination is way too slow anyway, so it's not really meant as a real proposal, but a starting point. The multinomial sampling approach for choosing column counts is quite good though. I had some thoughts about how to deduplicate more efficiently, although perhaps @stevengj's algorithm is better than that.

timholy · 2014-05-02T15:41:27Z

A 4x speedup is great. And I like the idea of splitting that function out on its own, as it's surely useful in other contexts.

stevengj · 2014-05-02T21:47:44Z

Note, however, that my solution in its current form is probably not what we want: its distribution of nonzero entries is not equivalent to the original uniform sampling.

This is easily fixed by randomizing the number of entries per column properly.

faster sprand, new randsubseq (fixes #6714 and #6708)

stevengj · 2014-05-06T02:25:38Z

(Fix merged.)

ViralBShah added sparse labels May 1, 2014

ViralBShah changed the title ~~Speed of sparse()~~ Speed of sprand() May 1, 2014

stevengj mentioned this issue May 1, 2014

incorrect statistics for sprand #6714

Closed

stevengj added a commit to stevengj/julia that referenced this issue May 3, 2014

faster sprand, new randsubseq (fixes JuliaLang#6714 and JuliaLang#6708)

fbf6a8e

stevengj added a commit to stevengj/julia that referenced this issue May 3, 2014

faster sprand, new randsubseq (fixes JuliaLang#6714 and JuliaLang#6708)

7d80a3b

stevengj mentioned this issue May 3, 2014

faster sprand, new randsubseq (fixes #6714 and #6708) #6726

Merged

stevengj added a commit to stevengj/julia that referenced this issue May 3, 2014

faster sprand, new randsubseq (fixes JuliaLang#6714 and JuliaLang#6708)

c7e04e8

stevengj added a commit to stevengj/julia that referenced this issue May 3, 2014

faster sprand, new randsubseq (fixes JuliaLang#6714 and JuliaLang#6708)

79e7f20

stevengj added a commit to stevengj/julia that referenced this issue May 3, 2014

faster sprand, new randsubseq (fixes JuliaLang#6714 and JuliaLang#6708)

4d6b47d

stevengj added a commit to stevengj/julia that referenced this issue May 3, 2014

faster sprand, new randsubseq (fixes JuliaLang#6714 and JuliaLang#6708)

904f786

stevengj added a commit to stevengj/julia that referenced this issue May 3, 2014

faster sprand, new randsubseq (fixes JuliaLang#6714 and JuliaLang#6708)

aac8352

stevengj added a commit to stevengj/julia that referenced this issue May 4, 2014

faster sprand, new randsubseq (fixes JuliaLang#6714 and JuliaLang#6708)

f6702e2

stevengj added a commit that referenced this issue May 6, 2014

Merge pull request #6726 from stevengj/sprand

593a9b8

faster sprand, new randsubseq (fixes #6714 and #6708)

stevengj closed this as completed May 6, 2014

stevengj mentioned this issue Dec 24, 2018

Faster and simpler sprand for SparseMatrixCSC #30494

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed of sprand() #6708

Speed of sprand() #6708

ViralBShah commented May 1, 2014

ViralBShah commented May 1, 2014

ViralBShah commented May 1, 2014

stevengj commented May 1, 2014

StefanKarpinski commented May 1, 2014

stevengj commented May 1, 2014

StefanKarpinski commented May 1, 2014

stevengj commented May 1, 2014

StefanKarpinski commented May 1, 2014

stevengj commented May 1, 2014

stevengj commented May 1, 2014

stevengj commented May 1, 2014

StefanKarpinski commented May 1, 2014

StefanKarpinski commented May 1, 2014

ViralBShah commented May 1, 2014

timholy commented May 1, 2014

StefanKarpinski commented May 1, 2014

JeffBezanson commented May 1, 2014

StefanKarpinski commented May 1, 2014

StefanKarpinski commented May 1, 2014

JeffBezanson commented May 1, 2014

timholy commented May 1, 2014

timholy commented May 1, 2014

tkelman commented May 2, 2014

StefanKarpinski commented May 2, 2014

timholy commented May 2, 2014

stevengj commented May 2, 2014

StefanKarpinski commented May 2, 2014

timholy commented May 2, 2014

stevengj commented May 2, 2014

stevengj commented May 6, 2014

Speed of sprand() #6708

Speed of sprand() #6708

Comments

ViralBShah commented May 1, 2014

ViralBShah commented May 1, 2014

ViralBShah commented May 1, 2014

stevengj commented May 1, 2014

StefanKarpinski commented May 1, 2014

stevengj commented May 1, 2014

StefanKarpinski commented May 1, 2014

stevengj commented May 1, 2014

StefanKarpinski commented May 1, 2014

stevengj commented May 1, 2014

stevengj commented May 1, 2014

stevengj commented May 1, 2014

StefanKarpinski commented May 1, 2014

StefanKarpinski commented May 1, 2014

ViralBShah commented May 1, 2014

timholy commented May 1, 2014

StefanKarpinski commented May 1, 2014

JeffBezanson commented May 1, 2014

StefanKarpinski commented May 1, 2014

StefanKarpinski commented May 1, 2014

JeffBezanson commented May 1, 2014

timholy commented May 1, 2014

timholy commented May 1, 2014

tkelman commented May 2, 2014

StefanKarpinski commented May 2, 2014

timholy commented May 2, 2014

stevengj commented May 2, 2014

StefanKarpinski commented May 2, 2014

timholy commented May 2, 2014

stevengj commented May 2, 2014

stevengj commented May 6, 2014