Slow sparse() construction for matrices with lots of rows #13400

ViralBShah · 2015-10-01T07:42:08Z

This is slow mainly because a CSR data structure is created that allocates a vector of size 100 million, before creating the CSC.

julia> @time sparse([1;100000000], [1;1], [1;1])
  2.237388 seconds (20 allocations: 2.235 GB, 8.30% gc time)
100000000x1 sparse matrix with 2 Int64 entries:
    [1        ,         1]  =  1
    [100000000,         1]  =  1

The text was updated successfully, but these errors were encountered:

ScottPJones · 2015-10-04T15:59:10Z

I am currently testing a new sparse implementation, which, at least for this case, is > 50x faster for this case (at least on my machine), using the techniques I described in my the comments in #12998, (which unfortunately have now been deleted). I've also written some improved code to benchmark various sparse operations, based on the code from @mb1234 (#12981 (comment)).
The benchmark routines display elapsed time (std), gc time (std), mem allocation, for each test,
and run gc() before each run.
I will put these in my GitHub repository if you are interested.

ViralBShah · 2015-10-04T16:36:46Z

Why don't you submit a PR, if this fits with the existing code and data structures?

ScottPJones · 2015-10-04T16:46:14Z

I will, as soon as I have it in a good state.
Right now, I'm fighting some type stability issues, and also a problem that I brought up in Julia-dev, about the how the stability of the sorting algorithm affects the results of the combine function.
https://groups.google.com/forum/#!topic/julia-dev/X7gUdQmWzJg
I'd be grateful for any thoughts on that matter.

ScottPJones · 2015-10-04T17:03:00Z

I have another question - currently, I build a work vector with the indices of all the non-zero entries in V,
which I then sort (using either a quicksort or mergesort), however, the performance is not what I'd hoped for (I am doing the comparison of first the J values, and if equal, compare the I values, for the sort).
I was wondering if you thought that creating an vector of an immutable type (that held the row, col, and index into V), and comparing the row/col as one unsigned value, would make the sorting much faster in Julia (that's the next thing I'm about to try out).
Finally, I noticed that sparse normally creates sparse arrays with the colptr and rowval vectors with type Int, i.e. Int64 on most machines these days - but that seems excessive, and can hurt performance with very large arrays - if both nrow and ncol are < 32768, or < 2^31, would it be OK to create the sparse matrix with Ti = Int16 or Ti = Int32, instead of Int64?
(At the very least, I'm planning on making it so the work vector is only as big as necessary, since how much fits in cache can greatly afftect the sorting speed).

Again, thanks in advance for any suggestions.

ViralBShah · 2015-10-05T00:48:11Z

I am aware the performance with sort is not what one hopes for, which is why we switched to the current implementation. There are probably special cases that could have a different implementation.

I am open to int32 by default, but not in this issue or PR. We can do a separate issue.

tkelman · 2015-10-05T01:05:26Z

Making integer sizes depend on values of the input indices is not type stable, that's a nonstarter. It should be using the integer type of the inputs to determine the integer type of the output, promoting to the largest common integer size to avoid needing multiple separate integer type parameters.

ScottPJones · 2015-10-05T01:13:09Z

I raised the question in the google-group whether sparse matrices need to be parameterized at all by Ti.
If that isn't exposed, then I think it would still be type stable.

tkelman · 2015-10-05T01:14:15Z

Parameterizing them by Ti is exactly how you achieve what you're asking for of using different integer types for the indices.

ScottPJones · 2015-10-05T01:17:01Z

No, not really. For example, if you have 1000000 rows and 100 columns, you could pass input vectors Vector{Int32} for I, and Vector{Int8} for J. The way it currently is, it forces them both to be Int32.

tkelman · 2015-10-05T01:19:22Z

SparseMatrixCSC could hypothetically be parameterized by different integer types for the rows vs the columns, but that doesn't strike me as a necessary complication. You can always make your own version of the data structure that does whatever you want it to.

ScottPJones · 2015-10-05T01:23:13Z

Also, it doesn't seem at all consistent with dense arrays. For dense arrays, things aren't parameterized by the types of the rows and columns that can be used to index them.

abstract AbstractSparseArray{Tv,Ti,N} <: AbstractArray{Tv,N}

I'm not sure, but I'll benchmark, if that parameterization really improves performance, or just causes additional code to be generated.

ScottPJones · 2015-10-05T01:38:20Z

Here are my current results:
I test the current base implementation, Kristoffer's sparse2 (even though that may not give correct results, because it doesn't handle 0s in the V array, or the combine function for duplicates), and
my own first pass at an implementation, using mergesort, but not optimizing the work vector.
One thing that showed up, was that using @mb1234's test case, A'*x was incredibly slow for some reason, allocating almost 16GB.
I had originally thought my sparse implementation was only around 56x faster than the implementation in base for the example in this issue, but it's actually >266,000x faster.

               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.5.0-dev+619 (2015-10-04 22:17 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit b22e43a* (0 days old master)
|__/                   |  x86_64-apple-darwin15.0.0

julia> include("/j/sparsetst.jl")
benchsparse (generic function with 1 method)

julia> benchsparse(5)
Julia Version 0.5.0-dev+619
Commit b22e43a* (2015-10-04 22:17 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin15.0.0)
  CPU: Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

N=1000000 NNZ=20000000 NRUNS=5
 sparse  : 4.665256 (0.277773) GC:0.093661 (0.015714) 679997632
 sparse2 : 15.050340 (0.406251) GC:0.886196 (0.059922) 1460688304
 sparsem : 6.933363 (0.336271) GC:0.038714 (0.003770) 568000384
 B = 2*A : 0.149884 (0.005311) GC:0.000497 (0.000024) 327997184
 A = A'  : 1.953446 (0.081938) GC:0.000204 (0.000005) 335997264
 B = A+B : 0.478192 (0.022557) GC:0.000209 (0.000036) 647994560
 y = A*x : 0.142446 (0.007932) GC:0.000000 (0.000000) 8000096
 y = A'*x: 95.633243 (4.670524) GC:0.290114 (0.049752) 15951452944

Kristoffer's test, runs=5
 sparse  : 0.554941 (0.040949) GC:0.012032 (0.012062) 195996960
 sparse2 : 2.663222 (0.479749) GC:0.098314 (0.009676) 146073312
 sparsem : 2.985522 (0.090272) GC:0.039565 (0.005959) 280800384
 B = 2*A : 0.008550 (0.000183) GC:0.000246 (0.000051) 32796512
 A = A'  : 0.057171 (0.007230) GC:0.000138 (0.000017) 33596592
 B = A+B : 0.037831 (0.004919) GC:0.000161 (0.000009) 64793248
 y = A*x : 0.012243 (0.003215) GC:0.000000 (0.000000) 800096
 y = A'*x: 5.762868 (0.106267) GC:0.042064 (0.004543) 251133312

sparse([1;100000000], [1;1], [1;1]), runs=5
 sparse  : 1.572819 (0.064329) GC:0.006269 (0.001235) 2400000768
 sparse2 : 0.000006 (0.000001) GC:0.000000 (0.000000) 704
 sparsem : 0.000002 (0.000000) GC:0.000000 (0.000000) 448
 B = 2*A : 0.000002 (0.000000) GC:0.000000 (0.000000) 288
 A = A'  : 0.721693 (0.006778) GC:0.000102 (0.000001) 1600000368

ViralBShah · 2015-10-05T05:28:53Z

We are not going to change the parametrization of row and column indices for SparseMatrixCSC. Despite my saying that this is not the discussion forum for that matter - I see that is exactly what happened here. Let's keep this discussion to the issue at hand, and avoid tangents. In order to keep my sanity, I will delete tangential comments.

We can review all this in the PR once you have it ready, and hopefully it does not degrade performance in other cases. If it uniformly improves performance everywhere, I for one, will not be complaining.

ScottPJones · 2015-10-05T17:46:55Z

Are there already any suites of performance tests for sparse arrays?
I know @mbauman is doing a lot for sparse vectors, currently I only have 3 examples to test with,
And I want to make sure that different levels of sparsity, duplicate entries, zero values in the V input vector, already sorted, reverse sorted, random order of the input, as well as size of the input arrays, and their types. Have I missed any important variable?
My latest tests are showing now 15x faster on @mb1234's example, 10x on Kristoffer's, and >260,000x on the example from the top of this issue.

mlubin · 2015-10-05T18:28:01Z

Look at the sparse matrix collection: https://www.cise.ufl.edu/research/sparse/matrices/ (https://github.com/JuliaSparse/MatrixMarket.jl)
Would be worth dumping them to coordinate form then calling sparse().

ViralBShah · 2015-10-05T18:31:03Z

There are some performance tests in test/perf/sparse IIRC. Let's please not repeat how many times your code is faster and on what dimensions, and just have the code.

ScottPJones · 2015-10-08T01:32:38Z

@mlubin Thanks for the pointers - it looks like I'll need to adapt the MatrixMarket.jl code, because I don't want the arrays already made, I want to read in the data for the sparse arrays in coordinate format (basically, do everything in MatrixMarket.jl except the call to sparse, and return the I,J,V, nrows, ncols, as well as the information about the sparse matrix)

mlubin · 2015-10-08T01:34:59Z

@ScottPJones, a pretty decent test would be to take the matrices in CSC form, transpose them, run findnz, and then reconstruct

ScottPJones · 2015-10-08T01:48:30Z

Wouldn't they be in sorted order though if I did that?
(I do want to test 3 ways, sorted, reverse ordered, random).
The change to just get the coordinate form from MatrixMarket.jl is pretty simple, I'll submit a PR shortly.

mlubin · 2015-10-08T01:49:02Z

Sorted by rows, not columns though

ScottPJones · 2015-10-08T01:50:57Z

But within the rows, aren't the columns also sorted?
I have a feeling from what I've read, that these matrices will not test two important things - "zero" entries in the V vector, and duplicates that need to be combined.
I think I'm going to have to generate those myself somehow.

ViralBShah · 2017-07-17T04:54:17Z

Now that we have real sparse vectors, this is no longer an issue. Also is twice as fast as before. We should find more realistic test cases to speed up performance of such cases.

Cc @Sacha0

tkelman · 2017-07-17T07:42:30Z

Is it being tracked in BaseBenchmarks? if not it should be added there

Sacha0 · 2017-07-17T23:55:26Z

We should find more realistic test cases to speed up performance of such cases.

Cases like @augustinyiptong's from #20447 should hit this issue as well (when built from a coordinate representation, that is)? Best!

ViralBShah added sparse Sparse arrays performance Must go faster labels Oct 1, 2015

ViralBShah mentioned this issue Oct 1, 2015

sparse and transpose on sparse matrices could be faster #12998

Closed

ScottPJones mentioned this issue Oct 24, 2015

Ability to fetch and save selected sets of matrixes to a directory JuliaLinearAlgebra/MatrixDepot.jl#19

Closed

Sacha0 mentioned this issue Jan 26, 2016

MIT-licensed sparse() parent method and expert driver #14798

Closed

ViralBShah closed this as completed Jul 17, 2017

tkelman added the potential benchmark Could make a good benchmark in BaseBenchmarks label Jul 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow sparse() construction for matrices with lots of rows #13400

Slow sparse() construction for matrices with lots of rows #13400

ViralBShah commented Oct 1, 2015

ScottPJones commented Oct 4, 2015

ViralBShah commented Oct 4, 2015

ScottPJones commented Oct 4, 2015

ScottPJones commented Oct 4, 2015

ViralBShah commented Oct 5, 2015

tkelman commented Oct 5, 2015

ScottPJones commented Oct 5, 2015

tkelman commented Oct 5, 2015

ScottPJones commented Oct 5, 2015

tkelman commented Oct 5, 2015

ScottPJones commented Oct 5, 2015

ScottPJones commented Oct 5, 2015

ViralBShah commented Oct 5, 2015

ScottPJones commented Oct 5, 2015

mlubin commented Oct 5, 2015

ViralBShah commented Oct 5, 2015

ScottPJones commented Oct 8, 2015

mlubin commented Oct 8, 2015

ScottPJones commented Oct 8, 2015

mlubin commented Oct 8, 2015

ScottPJones commented Oct 8, 2015

ViralBShah commented Jul 17, 2017

tkelman commented Jul 17, 2017

Sacha0 commented Jul 17, 2017

Slow sparse() construction for matrices with lots of rows #13400

Slow sparse() construction for matrices with lots of rows #13400

Comments

ViralBShah commented Oct 1, 2015

ScottPJones commented Oct 4, 2015

ViralBShah commented Oct 4, 2015

ScottPJones commented Oct 4, 2015

ScottPJones commented Oct 4, 2015

ViralBShah commented Oct 5, 2015

tkelman commented Oct 5, 2015

ScottPJones commented Oct 5, 2015

tkelman commented Oct 5, 2015

ScottPJones commented Oct 5, 2015

tkelman commented Oct 5, 2015

ScottPJones commented Oct 5, 2015

ScottPJones commented Oct 5, 2015

ViralBShah commented Oct 5, 2015

ScottPJones commented Oct 5, 2015

mlubin commented Oct 5, 2015

ViralBShah commented Oct 5, 2015

ScottPJones commented Oct 8, 2015

mlubin commented Oct 8, 2015

ScottPJones commented Oct 8, 2015

mlubin commented Oct 8, 2015

ScottPJones commented Oct 8, 2015

ViralBShah commented Jul 17, 2017

tkelman commented Jul 17, 2017

Sacha0 commented Jul 17, 2017