Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

similar(::SparseMatrixCSC) waste of memory #26560

Closed
KlausC opened this issue Mar 21, 2018 · 6 comments · Fixed by #40444
Closed

similar(::SparseMatrixCSC) waste of memory #26560

KlausC opened this issue Mar 21, 2018 · 6 comments · Fixed by #40444
Labels

Comments

@KlausC
Copy link
Contributor

KlausC commented Mar 21, 2018

When calling similar for a sparse matrix, the produced sparse matrix contains uninitialized memory, which is not accessible by the defined user interface.
The size of fields rowval and nzval is identical to the corresponding fields in the source. The only way to use those entries is by modification of colptr, which is not a documented user interface.
I propose to reduce the length of those arrays to colptr[end]-1 of the new matrix.

This behaviour is new in v0.7 and maybe a regression.

julia> A = sprand(20, 10, 0.1)
20×10 SparseMatrixCSC{Float64,Int64} with 19 stored entries:

julia> (B = similar(A, 2, 5))
2×5 SparseMatrixCSC{Float64,Int64} with 0 stored entries

julia> dump(B)
SparseMatrixCSC{Float64,Int64}
  m: Int64 2
  n: Int64 5
  colptr: Array{Int64}((6,)) [1, 1, 1, 1, 1, 1]
  rowval: Array{Int64}((19,)) [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 28]
  nzval: Array{Float64}((19,)) [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.33398e-322]

julia> B[2,5] = 42
42

julia> dump(B)
SparseMatrixCSC{Float64,Int64}
  m: Int64 2
  n: Int64 5
  colptr: Array{Int64}((6,)) [1, 1, 1, 1, 1, 2]
  rowval: Array{Int64}((20,)) [2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 28]
  nzval: Array{Float64}((20,)) [42.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.33398e-322]
@KlausC KlausC changed the title similar(::SparseMatrixCSC) memory wasting similar(::SparseMatrixCSC) waste of memory Mar 21, 2018
@mbauman mbauman added the domain:arrays:sparse Sparse arrays label Mar 21, 2018
@mbauman
Copy link
Sponsor Member

mbauman commented Mar 21, 2018

Seems reasonable. On 0.6, similar(::SparseMatrixCSC) and similar(::CSC, ::Type) pre-allocated the same number of stored values, but if you specified any dimensions, then you got spzeros. There won't be a best option in all cases, but 0.6's behavior was probably more often what folks wanted.

@KlausC
Copy link
Contributor Author

KlausC commented Mar 21, 2018

Yes! The case without dimensions also copies (in v0.6 and v0.7) the colptr. So all slots in nzval are accessible with getindex/setindex.
That is different in the issue case: If you add a value, the vectors are resized to hold one more element, while the unusable slots remain so, as the last dump(B) demonstrates.

@Sacha0
Copy link
Member

Sacha0 commented Mar 22, 2018

If memory serves this aspect of similar's present behavior is intentional, in that similar on sparse matrices retains as much of the original data structure's structure as possible. In this case (providing a new shape of the same rank), while the specific nonzero structure is no longer meaningful and hence not retained, the storage structure can be meaningful and hence is retained. If memory serves this behavior is motivated by, for example, controlling allocation in sparse broadcast. Best!

@KlausC
Copy link
Contributor Author

KlausC commented Mar 23, 2018

My argument is the inaccessibility of the retained space in rowvaland nzval by api-users. I can imagine, that in some cases the space can be used in the course of constructing a new sparse matrix (which has to patch colptr in order to make use of the otherwise unreachable cells). But I would not offer this unfinished construct to the end-user, who would not know about it or be able to make use of it. For this internal use, I would propose to use an unpublished _similaror so...

@ViralBShah
Copy link
Member

ViralBShah commented Dec 17, 2018

I think it would be perfectly fine for similar to have length colptr[end]-1. The only issue is whether there are internal functions that depend on the current behaviour.

@abraunst
Copy link
Contributor

If there is a problem of avoiding reallocation, then maybe a way of having the best of both worlds would be just to sizehint! both vectors in similar (or in its callers)?

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants