Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sparse A+B is twice as slow with Int32 indices (on nightly build 0.4+pre7330) #12981

Closed
bdqp opened this issue Sep 6, 2015 · 7 comments
Closed
Labels
performance Must go faster sparse Sparse arrays

Comments

@bdqp
Copy link

bdqp commented Sep 6, 2015

On version 3.11, adding two sparse matrices A+B is equally fast with Int64 or Int32 indices. On the nightly build 0.4.0-pre+7330, the Int32 indices are 2X slower. Code and results below:

N=1000000; NNZ=20*N;
r = rand(int32(1:N),NNZ); # int32 or 64
c = rand(int32(1:N),NNZ); # int32 or 64
v = rand(Float32,NNZ);
A = sparse(r,c,v,N,N);

r = rand(int32(1:N),NNZ); # int32 or 64
c = rand(int32(1:N),NNZ); # int32 or 64
v = rand(Float32,NNZ);
B = sparse(r,c,v,N,N);

tic(); AB = A+B; toc()


# int32, ver 0.3.11
julia> tic(); AB = A+B; toc()
elapsed time: 0.766131035 seconds

# int64, ver 0.3.11
julia> tic(); AB = A+B; toc()
elapsed time: 0.782339827 seconds

# int32, ver 0.4.0-pre+7330
julia> tic(); AB = A+B; toc()
elapsed time: 2.187780335 seconds

# int64, ver 0.4.0-pre+7330
julia> tic(); AB = A+B; toc()
elapsed time: 0.98478374 seconds

[jiahao - code block formatting]

@KristofferC
Copy link
Member

Probably similar problem as the last one. I'll take a look.

@KristofferC
Copy link
Member

Should be fixed in #12984. Keep 'em coming. Good that we have someone checking performance with Int32s on a 64-bit system.

@jiahao jiahao added performance Must go faster sparse Sparse arrays labels Sep 6, 2015
@bdqp
Copy link
Author

bdqp commented Sep 6, 2015

Thanks for fixing these, really nice work.

There's one other thing that's a little "off" - unfortunately this isn't going to be a good bug report... It seems the sparse() function is about twice as slow as its equivalent in Octave. Something about the factor of 2 gives me deja vu all over again ;) Matlab is way slower, not competitive.

Looking at several other useful operations, it all looks pretty competitive in Julia - perhaps transpose is missing a trick somewhere? But it's not a particularly crucial function.

function prog()

N = 1000000;
NNZ = 20*N;
NRUNS = 5;

t = zeros(6,NRUNS);
for j = 1:NRUNS

r = rand(int64(1:N),NNZ);
c = rand(int64(1:N),NNZ);
v = rand(Float64,NNZ);
x = rand(Float64,N);

tic(); A = sparse(r,c,v,N,N); t[1,j] = toc();
tic(); B = 2*A;               t[2,j] = toc();
tic(); A = A';                t[3,j] = toc();
tic(); B = A+B;               t[4,j] = toc();
tic(); y = A*x;               t[5,j] = toc();
tic(); y = A'*x;              t[6,j] = toc();

end

# ignore 1st run
av = mean(t[:,2:end],2);
sd = std(t[:,2:end],2);

versioninfo()
@printf("\nN=%i NNZ=%i NRUNS=%i\n",N,NNZ,NRUNS)
@printf(" sparse: %f (%f)\n",av[1],sd[1]);
@printf(" 2*A   : %f (%f)\n",av[2],sd[2]);
@printf(" A'    : %f (%f)\n",av[3],sd[3]);
@printf(" A+B   : %f (%f)\n",av[4],sd[4]);
@printf(" A*x   : %f (%f)\n",av[5],sd[5]);
@printf(" A'*x  : %f (%f)\n",av[6],sd[6]);

end

Julia results:

Julia Version 0.4.0-pre+7330
Commit a63e79f (2015-09-06 09:38 UTC)
Platform Info:
System: Linux (x86_64-unknown-linux-gnu)
CPU: Intel(R) Celeron(R) CPU 1007U @ 1.50GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Nehalem)
LAPACK: libopenblas
LIBM: libopenlibm
LLVM: libLLVM-3.3

N=1000000 NNZ=20000000 NRUNS=5
 sparse: 4.679580 (0.093828)
 2*A   : 0.176557 (0.019690)
 A'    : 2.296293 (0.103127)
 A+B   : 0.657813 (0.017694)
 A*x   : 0.425249 (0.002387)
 A'*x  : 0.295716 (0.001954)
 58.556010 seconds (4.90 M allocations: 11.843 GB, 2.47% gc time)

Octave results:

GNU Octave Version: 4.0.0
GNU Octave License: GNU General Public License
Operating System: Linux 3.16.0-48-generic 64~14.04.1-Ubuntu SMP Thu Aug 20 23:03:57 UTC 2015 x86_64
no packages installed.

N=1000000 NNZ=20000000 NRUNS=5
 sparse: 2.709228 (0.011811)
 2*A   : 0.222145 (0.000448)
 A'    : 1.776688 (0.044113)
 A+B   : 1.081774 (0.007834)
 A*x   : 0.652453 (0.002009)
 A'*x  : 0.292271 (0.000704)

Matlab results:

MATLAB Version: 8.1.0.604 (R2013a)
MATLAB License Number: 00000000
Operating System: Linux 3.16.0-48-generic 64~14.04.1-Ubuntu SMP Thu Aug 20 23:03:57 UTC 2015 x86_64
Java Version: Java is not enabled

N=1000000 NNZ=20000000 NRUNS=5
 sparse: 8.625574 (0.008490)
 2*A   : 0.169647 (0.002336)
 A'    : 2.126985 (0.025129)
 A+B   : 0.543265 (0.001062)
 A*x   : 0.426534 (0.002292)
 A'*x  : 0.290045 (0.001137)
function prog()

N = 1000000;
NNZ = 20*N;
NRUNS = 5;

for j = 1:NRUNS

r = randi(N,NNZ,1);
c = randi(N,NNZ,1);
v = randn(NNZ,1);
x = randn(N,1);

tic; A = sparse(r,c,v,N,N); t(1,j) = toc;
tic; B = 2*A;               t(2,j) = toc;
tic; A = A';                t(3,j) = toc;
tic; B = A+B;               t(4,j) = toc;
tic; y = A*x;               t(5,j) = toc;
tic; y = A'*x;              t(6,j) = toc;

end

% ignore 1st run
av = mean(t(:,2:end),2);
sd = std(t(:,2:end),0,2);

ver
fprintf('\nN=%i NNZ=%i NRUNS=%i\n',N,NNZ,NRUNS);
fprintf(' sparse: %f (%f)\n', av(1),sd(1))
fprintf(' 2*A   : %f (%f)\n', av(2),sd(2))
fprintf(' A''    : %f (%f)\n',av(3),sd(3))
fprintf(' A+B   : %f (%f)\n', av(4),sd(4))
fprintf(' A*x   : %f (%f)\n', av(5),sd(5))
fprintf(' A''*x  : %f (%f)\n',av(6),sd(6))

end

@KristofferC
Copy link
Member

Regarding sparse, I guess we are slower than Octave because we use code to create a CSR matrix but use CSC so we have to transpose in the end.

@KristofferC
Copy link
Member

I inserted a gc() call between the function calls so that spillover allocations from a previous function would not contaminate the result. This is what I get (with Matlab in a single thread). My Matlab is much faster at sparse than yours, maybe they fixed that? I have Matlab 8.4.

Would be nice to make sparse faster. And as you say transpose is faster in Octave. Not sure what we can improve there though...

Func Julia Octave Matlab
sparse: 2.478101 (0.161879) 1.233007 (0.098839) 2.678221 (0.104867)
2*A : 0.052062 (0.000805) 0.120093 (0.042131) 0.070358 (0.001547)
A' : 1.131908 (0.010331) 0.810197 (0.075346) 1.159815 (0.246967)
A+B : 0.246413 (0.001277) 0.487021 (0.088988) 0.202465 (0.000276)
A*x : 0.169672 (0.003784) 0.340456 (0.009046) 0.162381 (0.016590)
A'*x : 0.146576 (0.003835) 1.191680 (0.099767) 0.135291 (0.003267)

@ViralBShah
Copy link
Member

It is preferable to have this discussion in a separate issue. How to read this table? What is the number outside the parentheses and inside?

@KristofferC
Copy link
Member

Average and standard deviation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster sparse Sparse arrays
Projects
None yet
Development

No branches or pull requests

4 participants