Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow sparse() construction for matrices with lots of rows #13400

Closed
ViralBShah opened this issue Oct 1, 2015 · 24 comments
Closed

Slow sparse() construction for matrices with lots of rows #13400

ViralBShah opened this issue Oct 1, 2015 · 24 comments
Labels
performance Must go faster potential benchmark Could make a good benchmark in BaseBenchmarks sparse Sparse arrays

Comments

@ViralBShah
Copy link
Member

This is slow mainly because a CSR data structure is created that allocates a vector of size 100 million, before creating the CSC.

julia> @time sparse([1;100000000], [1;1], [1;1])
  2.237388 seconds (20 allocations: 2.235 GB, 8.30% gc time)
100000000x1 sparse matrix with 2 Int64 entries:
    [1        ,         1]  =  1
    [100000000,         1]  =  1
@ScottPJones
Copy link
Contributor

I am currently testing a new sparse implementation, which, at least for this case, is > 50x faster for this case (at least on my machine), using the techniques I described in my the comments in #12998, (which unfortunately have now been deleted). I've also written some improved code to benchmark various sparse operations, based on the code from @mb1234 (#12981 (comment)).
The benchmark routines display elapsed time (std), gc time (std), mem allocation, for each test,
and run gc() before each run.
I will put these in my GitHub repository if you are interested.

@ViralBShah
Copy link
Member Author

Why don't you submit a PR, if this fits with the existing code and data structures?

@ScottPJones
Copy link
Contributor

I will, as soon as I have it in a good state.
Right now, I'm fighting some type stability issues, and also a problem that I brought up in Julia-dev, about the how the stability of the sorting algorithm affects the results of the combine function.
https://groups.google.com/forum/#!topic/julia-dev/X7gUdQmWzJg
I'd be grateful for any thoughts on that matter.

@ScottPJones
Copy link
Contributor

I have another question - currently, I build a work vector with the indices of all the non-zero entries in V,
which I then sort (using either a quicksort or mergesort), however, the performance is not what I'd hoped for (I am doing the comparison of first the J values, and if equal, compare the I values, for the sort).
I was wondering if you thought that creating an vector of an immutable type (that held the row, col, and index into V), and comparing the row/col as one unsigned value, would make the sorting much faster in Julia (that's the next thing I'm about to try out).
Finally, I noticed that sparse normally creates sparse arrays with the colptr and rowval vectors with type Int, i.e. Int64 on most machines these days - but that seems excessive, and can hurt performance with very large arrays - if both nrow and ncol are < 32768, or < 2^31, would it be OK to create the sparse matrix with Ti = Int16 or Ti = Int32, instead of Int64?
(At the very least, I'm planning on making it so the work vector is only as big as necessary, since how much fits in cache can greatly afftect the sorting speed).

Again, thanks in advance for any suggestions.

@ViralBShah
Copy link
Member Author

I am aware the performance with sort is not what one hopes for, which is why we switched to the current implementation. There are probably special cases that could have a different implementation.

I am open to int32 by default, but not in this issue or PR. We can do a separate issue.

@tkelman
Copy link
Contributor

tkelman commented Oct 5, 2015

Making integer sizes depend on values of the input indices is not type stable, that's a nonstarter. It should be using the integer type of the inputs to determine the integer type of the output, promoting to the largest common integer size to avoid needing multiple separate integer type parameters.

@ScottPJones
Copy link
Contributor

I raised the question in the google-group whether sparse matrices need to be parameterized at all by Ti.
If that isn't exposed, then I think it would still be type stable.

@tkelman
Copy link
Contributor

tkelman commented Oct 5, 2015

Parameterizing them by Ti is exactly how you achieve what you're asking for of using different integer types for the indices.

@ScottPJones
Copy link
Contributor

No, not really. For example, if you have 1000000 rows and 100 columns, you could pass input vectors Vector{Int32} for I, and Vector{Int8} for J. The way it currently is, it forces them both to be Int32.

@tkelman
Copy link
Contributor

tkelman commented Oct 5, 2015

SparseMatrixCSC could hypothetically be parameterized by different integer types for the rows vs the columns, but that doesn't strike me as a necessary complication. You can always make your own version of the data structure that does whatever you want it to.

@ScottPJones
Copy link
Contributor

Also, it doesn't seem at all consistent with dense arrays. For dense arrays, things aren't parameterized by the types of the rows and columns that can be used to index them.

abstract AbstractSparseArray{Tv,Ti,N} <: AbstractArray{Tv,N}

I'm not sure, but I'll benchmark, if that parameterization really improves performance, or just causes additional code to be generated.

@ScottPJones
Copy link
Contributor

Here are my current results:
I test the current base implementation, Kristoffer's sparse2 (even though that may not give correct results, because it doesn't handle 0s in the V array, or the combine function for duplicates), and
my own first pass at an implementation, using mergesort, but not optimizing the work vector.
One thing that showed up, was that using @mb1234's test case, A'*x was incredibly slow for some reason, allocating almost 16GB.
I had originally thought my sparse implementation was only around 56x faster than the implementation in base for the example in this issue, but it's actually >266,000x faster.

               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.5.0-dev+619 (2015-10-04 22:17 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit b22e43a* (0 days old master)
|__/                   |  x86_64-apple-darwin15.0.0

julia> include("/j/sparsetst.jl")
benchsparse (generic function with 1 method)

julia> benchsparse(5)
Julia Version 0.5.0-dev+619
Commit b22e43a* (2015-10-04 22:17 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin15.0.0)
  CPU: Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

N=1000000 NNZ=20000000 NRUNS=5
 sparse  : 4.665256 (0.277773) GC:0.093661 (0.015714) 679997632
 sparse2 : 15.050340 (0.406251) GC:0.886196 (0.059922) 1460688304
 sparsem : 6.933363 (0.336271) GC:0.038714 (0.003770) 568000384
 B = 2*A : 0.149884 (0.005311) GC:0.000497 (0.000024) 327997184
 A = A'  : 1.953446 (0.081938) GC:0.000204 (0.000005) 335997264
 B = A+B : 0.478192 (0.022557) GC:0.000209 (0.000036) 647994560
 y = A*x : 0.142446 (0.007932) GC:0.000000 (0.000000) 8000096
 y = A'*x: 95.633243 (4.670524) GC:0.290114 (0.049752) 15951452944

Kristoffer's test, runs=5
 sparse  : 0.554941 (0.040949) GC:0.012032 (0.012062) 195996960
 sparse2 : 2.663222 (0.479749) GC:0.098314 (0.009676) 146073312
 sparsem : 2.985522 (0.090272) GC:0.039565 (0.005959) 280800384
 B = 2*A : 0.008550 (0.000183) GC:0.000246 (0.000051) 32796512
 A = A'  : 0.057171 (0.007230) GC:0.000138 (0.000017) 33596592
 B = A+B : 0.037831 (0.004919) GC:0.000161 (0.000009) 64793248
 y = A*x : 0.012243 (0.003215) GC:0.000000 (0.000000) 800096
 y = A'*x: 5.762868 (0.106267) GC:0.042064 (0.004543) 251133312

sparse([1;100000000], [1;1], [1;1]), runs=5
 sparse  : 1.572819 (0.064329) GC:0.006269 (0.001235) 2400000768
 sparse2 : 0.000006 (0.000001) GC:0.000000 (0.000000) 704
 sparsem : 0.000002 (0.000000) GC:0.000000 (0.000000) 448
 B = 2*A : 0.000002 (0.000000) GC:0.000000 (0.000000) 288
 A = A'  : 0.721693 (0.006778) GC:0.000102 (0.000001) 1600000368

@ViralBShah
Copy link
Member Author

We are not going to change the parametrization of row and column indices for SparseMatrixCSC. Despite my saying that this is not the discussion forum for that matter - I see that is exactly what happened here. Let's keep this discussion to the issue at hand, and avoid tangents. In order to keep my sanity, I will delete tangential comments.

We can review all this in the PR once you have it ready, and hopefully it does not degrade performance in other cases. If it uniformly improves performance everywhere, I for one, will not be complaining.

@ScottPJones
Copy link
Contributor

Are there already any suites of performance tests for sparse arrays?
I know @mbauman is doing a lot for sparse vectors, currently I only have 3 examples to test with,
And I want to make sure that different levels of sparsity, duplicate entries, zero values in the V input vector, already sorted, reverse sorted, random order of the input, as well as size of the input arrays, and their types. Have I missed any important variable?
My latest tests are showing now 15x faster on @mb1234's example, 10x on Kristoffer's, and >260,000x on the example from the top of this issue.

@mlubin
Copy link
Member

mlubin commented Oct 5, 2015

Look at the sparse matrix collection: https://www.cise.ufl.edu/research/sparse/matrices/ (https://github.com/JuliaSparse/MatrixMarket.jl)
Would be worth dumping them to coordinate form then calling sparse().

@ViralBShah
Copy link
Member Author

There are some performance tests in test/perf/sparse IIRC. Let's please not repeat how many times your code is faster and on what dimensions, and just have the code.

@ScottPJones
Copy link
Contributor

@mlubin Thanks for the pointers - it looks like I'll need to adapt the MatrixMarket.jl code, because I don't want the arrays already made, I want to read in the data for the sparse arrays in coordinate format (basically, do everything in MatrixMarket.jl except the call to sparse, and return the I,J,V, nrows, ncols, as well as the information about the sparse matrix)

@mlubin
Copy link
Member

mlubin commented Oct 8, 2015

@ScottPJones, a pretty decent test would be to take the matrices in CSC form, transpose them, run findnz, and then reconstruct

@ScottPJones
Copy link
Contributor

Wouldn't they be in sorted order though if I did that?
(I do want to test 3 ways, sorted, reverse ordered, random).
The change to just get the coordinate form from MatrixMarket.jl is pretty simple, I'll submit a PR shortly.

@mlubin
Copy link
Member

mlubin commented Oct 8, 2015

Sorted by rows, not columns though

@ScottPJones
Copy link
Contributor

But within the rows, aren't the columns also sorted?
I have a feeling from what I've read, that these matrices will not test two important things - "zero" entries in the V vector, and duplicates that need to be combined.
I think I'm going to have to generate those myself somehow.

@ViralBShah
Copy link
Member Author

Now that we have real sparse vectors, this is no longer an issue. Also is twice as fast as before. We should find more realistic test cases to speed up performance of such cases.

Cc @Sacha0

@tkelman tkelman added the potential benchmark Could make a good benchmark in BaseBenchmarks label Jul 17, 2017
@tkelman
Copy link
Contributor

tkelman commented Jul 17, 2017

Is it being tracked in BaseBenchmarks? if not it should be added there

@Sacha0
Copy link
Member

Sacha0 commented Jul 17, 2017

We should find more realistic test cases to speed up performance of such cases.

Cases like @augustinyiptong's from #20447 should hit this issue as well (when built from a coordinate representation, that is)? Best!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster potential benchmark Could make a good benchmark in BaseBenchmarks sparse Sparse arrays
Projects
None yet
Development

No branches or pull requests

5 participants