Check that sparse matrix is valid before constructing a CHOLMOD.Sparse #20464

andreasnoack · 2017-02-05T14:50:45Z

These are just some consistency checks of the inputs before constructing a CHOLMOD.Sparse. Julia's SparseMatrixCSC should always pass the tests unless the buffers have been directly modified. However, as seen in #20024, serialization might corrupt the sparse matrices which can lead to cryptic bugs. The checks introduced in this PR would probably have made it easier to detect the issue.

Fixes #20024

Sacha0 · 2017-02-05T17:35:28Z

base/sparse/cholmod.jl

-    # check if columns are sorted
+    # checks
+    ## length of input
+    if length(nzval) != length(rowval)


CSC does not require that length(nzval) == length(rowval), and we do not generally enforce that condition (nor should we)?

I can see why you'd like to allow length(nzval) > colptr[end] but why would you allow length(nzval) != length(rowval)? It also appears that CHOLMOD assumes that these have the same size.

Sacha0 · 2017-02-05T17:38:37Z

base/sparse/cholmod.jl

+    if length(nzval) != length(rowval)
+        throw(ArgumentError("nzval and rowval must have same length"))
+    end
+    if length(colptr) != n + 1


Perhaps length(colptr) >= n + 1?

Why shouldn't length(colptr) == n + 1? Again, that seems to be the assumption of CHOLMOD.

Sacha0 · 2017-02-05T17:41:00Z

base/sparse/cholmod.jl

+    if length(colptr) != n + 1
+        throw(ArgumentError("length of colptr must be n + 1 = $(n + 1) but was $(length(colptr))"))
+    end
+    if colptr[end] != length(rowval)


CSC does not require that length(rowval) == colptr[n + 1], and we do not generally enforce that condition (nor should we)? Perhaps length(rowval) + 1 >= colptr[n + 1] and likewise for nzval?

I can see why you don't want this so I'll change it. However, I think I might have assumed this all over the CHOLMOD wrappers so there might be other places where the number of non-zeros is queried with length(nzval) which is then wrong.

Sacha0 · 2017-02-05T17:42:14Z

base/sparse/cholmod.jl

@@ -898,6 +909,17 @@ function (::Type{Sparse}){Tv<:VTypes}(m::Integer, n::Integer,
 end

 function (::Type{Sparse}){Tv<:VTypes}(A::SparseMatrixCSC{Tv,SuiteSparse_long}, stype::Integer)
+    ## Check length of input. This should never fail but see #20024
+    if length(A.nzval) != length(A.rowval)


https://github.com/JuliaLang/julia/pull/20464/files#r99494647 ?

Sacha0 · 2017-02-05T17:44:06Z

base/sparse/cholmod.jl

+    if length(A.colptr) != size(A,2) + 1
+        throw(ArgumentError("length of colptr must be size(A,2) + 1 = $(size(A,2) + 1) but was $(length(A.colptr))"))
+    end
+    if A.colptr[end] != length(A.rowval) + 1


https://github.com/JuliaLang/julia/pull/20464/files#r99494803 ?

Sacha0 · 2017-02-05T18:59:19Z

Re. discussion in the review comments above, the general argument for the weaker conditions (length(colptr) >= n + 1 rather than length(colptr) == n + 1, length(rowval) >= colptr[n + 1] - 1 rather than length(rowval) == colptr[n + 1] - 1 and likewise for nzval, and no restriction on the relative lengths of rowval and nzval) is better allocation control / predictability / reuse. Examples:

Re. length(rowval) >= colptr[n + 1] - 1 rather than length(rowval) == colptr[n + 1] - 1 (and likewise for nzval), suppose you want to store sparse matrices with different patterns (and particularly stored entry counts) in the same data structure at different points during execution. Under the weaker condition you can do so by allocating enough storage for all expected patterns once at the outset and reusing that storage, whereas under the stronger condition (and concomitant resize!ing) allocation becomes unpredictable. (This issue impacts e.g. sparse broadcast[!].)

Re. length(colptr) >= n + 1 rather than length(colptr) == n + 1, suppose you want to store sparse matrices with different shapes at different points during execution. Under the weaker condition you can do so by allocating enough storage for all expected shapes once at the outset and reusing that storage (constructing only a new SparseMatrixCSC wrapper for each new sparse matrix), whereas under the stronger condition (and concomitant resize!ing) allocation becomes unpredictable. (This issue impacts e.g. the applications discussed in the thread regarding immutability of SparseMatrixCSC.)

Re. no restrictions on the relative lengths of rowval and nzval, suppose you are operating solely on a pattern rather than a complete CSC sparse matrix. Under the weaker condition you can cut your memory footprint by almost a factor of two, whereas under the stronger condition you must carry around a potentially large dead buffer of one form or another. Though your CSC sparse matrix isn't technically valid in that scenario, that scenario is common, for example in the symbolic phases of sparse factorization or in preordering. Alternatively, suppose you want to write a preordering or factorization routine that accepts some buffers and returns a CSC sparse matrix that recycles those buffers as storage; those buffers may need to have different lengths, in which case the weaker condition grants better predictability of and control over allocation downstream. (Issues like these I encountered while working on ApproxMinimumDegree.jl.)

Of course if CHOLMOD expects any of the stronger conditions and will complain or fail if they are not satisfied, I agree wholeheartedly that checking them at CHOLMOD entry points is a great idea :).

As you note above some code in Base assumes some subset of the stronger conditions. For reasons such as the above, we might want to ween that code off those assumptions over time, and e.g. remove automatic resize!ing from a variety of operations. Similarly, decoupling SparseVector storage length from stored entry count is another thing we should strongly consider (to mirror CSC's flexibility and control). (I planned to write a little Julep on the above when things settled down a bit, but it's mostly above already :).)

Best!

andreasnoack · 2017-02-05T19:26:48Z

Thanks for the explanation. Generally, I'm fine with what you suggest but we should then really take a pass through the sparse code to make sure we use the buffers accordingly. E.g. I can see that this method makes the same assumptions as I did whereas the one right above doesn't. I'll update this PR to use your CSC definition.

Sacha0 · 2017-02-05T19:37:07Z

we should then really take a pass through the sparse code to make sure we use the buffers accordingly

Wholeheartedly agreed :). Best!

andreasnoack · 2017-02-06T19:06:28Z

I've changed the tests following the conclusion above. I've also changed the CHOLMOD code and the nnz definition to follow the rules you described and I've just confirmed that it fixes #20024.

Sacha0 · 2017-02-06T20:07:18Z

base/sparse/cholmod.jl

+    if length(colptr) <= n
+        throw(ArgumentError("length of colptr must be at least n + 1 = $(n + 1) but was $(length(colptr))"))
+    end
+    if colptr[n + 1] > length(rowval)


Should these checks be colptr[n + 1] > length(rowval) + 1 and colptr[n + 1] > length(nzval) + 1? If memory serves, colptr[n + 1] should be one greater than the number of stored entries? (Or is colptr indexed from zero here, in which case the existing checks would be correct?)

This constructor is zero based. I'm wondering if it should have a more alarming name for that reason.

This constructor is zero based. I'm wondering if it should have a more alarming name for that reason.

That sounds great! Similarly, if some instances of colptr are zero based, might be worth similarly naming them appropriately? Best!

Sacha0 · 2017-02-06T20:08:45Z

base/sparse/cholmod.jl

@@ -869,12 +880,12 @@ function (::Type{Sparse}){Tv<:VTypes}(m::Integer, n::Integer,
        end
    end

-    o = allocate_sparse(m, n, length(nzval), iss, true, stype, Tv)
+    o = allocate_sparse(m, n, colptr[n + 1], iss, true, stype, Tv)


Should the third argument to allocate_sparse be the number of stored entries or one more than the number of stored entries? (Or is colptr indexed from zero here, in which case the existing statement is correct?)

Sacha0 · 2017-02-06T20:10:02Z

base/sparse/cholmod.jl

-    unsafe_copy!(s.i, pointer(rowval), length(rowval))
-    unsafe_copy!(s.x, pointer(nzval), length(nzval))
+    unsafe_copy!(s.p, pointer(colptr), n + 1)
+    unsafe_copy!(s.i, pointer(rowval), colptr[n + 1])


Should these be colptr[n + 1] - 1, i.e. the number of stored entries? (Or is colptr indexed from zero here, in which case the existing statement is correct?)

tkelman · 2017-02-07T14:06:06Z

base/sparse/sparsematrix.jl

@@ -42,7 +42,7 @@ julia> nnz(A)
 3
 ```
 """
-nnz(S::SparseMatrixCSC) = Int(S.colptr[end]-1)
+nnz(S::SparseMatrixCSC) = Int(S.colptr[S.n + 1]-1)


the distinction here could use a more direct test that doesn't go through cholmod-specific methods

I've added some tests.

Don't use length of underlying arrays to query sizes Rename variables in zero based constructors

#20464) Don't use length of underlying arrays to query sizes Rename variables in zero based constructors (cherry picked from commit 769d37b) testsets removed for release-0.5, and Vector{<:VTypes} written out

Sacha0 reviewed Feb 5, 2017

View reviewed changes

kshyatt added domain:linear algebra Linear algebra domain:arrays:sparse Sparse arrays labels Feb 5, 2017

andreasnoack force-pushed the anj/spcheck branch 2 times, most recently from 26e5b8d to a26f482 Compare February 6, 2017 18:55

andreasnoack added the backport pending 0.5 label Feb 6, 2017

andreasnoack mentioned this pull request Feb 6, 2017

Return Q/R factors in SPQR #20024

Closed

Sacha0 reviewed Feb 6, 2017

View reviewed changes

andreasnoack force-pushed the anj/spcheck branch from a26f482 to 344605d Compare February 6, 2017 21:04

tkelman reviewed Feb 7, 2017

View reviewed changes

Check that sparse matrix is valid before constructing a CHOLMOD.Sparse

f047e92

Don't use length of underlying arrays to query sizes Rename variables in zero based constructors

andreasnoack force-pushed the anj/spcheck branch from 344605d to f047e92 Compare February 14, 2017 02:06

andreasnoack merged commit 769d37b into master Feb 14, 2017

andreasnoack deleted the anj/spcheck branch February 14, 2017 15:25

tkelman removed the backport pending 0.5 label Mar 5, 2017

Sacha0 mentioned this pull request Jun 26, 2017

implement a checkvalid function for sparse matrices and use it in show #22529

Closed

This was referenced Jan 21, 2019

similar(::SparseMatrixCSC, dims) returns sparse matrix with empty space (#26560) #30435

Closed

Make length(A.nzval)==nnz(A) #30662 #30676

Closed

andreasnoack mentioned this pull request Jan 29, 2019

small sparse test fixes #30874

Merged

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check that sparse matrix is valid before constructing a CHOLMOD.Sparse #20464

Check that sparse matrix is valid before constructing a CHOLMOD.Sparse #20464

andreasnoack commented Feb 5, 2017

Sacha0 Feb 5, 2017 •

edited

Loading

andreasnoack Feb 5, 2017

Sacha0 Feb 5, 2017

andreasnoack Feb 5, 2017

Sacha0 Feb 5, 2017 •

edited

Loading

andreasnoack Feb 5, 2017

Sacha0 Feb 5, 2017

Sacha0 Feb 5, 2017

Sacha0 commented Feb 5, 2017 •

edited

Loading

andreasnoack commented Feb 5, 2017

Sacha0 commented Feb 5, 2017

andreasnoack commented Feb 6, 2017

Sacha0 Feb 6, 2017 •

edited

Loading

andreasnoack Feb 6, 2017

Sacha0 Feb 6, 2017

Sacha0 Feb 6, 2017 •

edited

Loading

Sacha0 Feb 6, 2017 •

edited

Loading

tkelman Feb 7, 2017 •

edited

Loading

andreasnoack Feb 14, 2017

Check that sparse matrix is valid before constructing a CHOLMOD.Sparse #20464

Check that sparse matrix is valid before constructing a CHOLMOD.Sparse #20464

Conversation

andreasnoack commented Feb 5, 2017

Sacha0 Feb 5, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Sacha0 Feb 5, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Sacha0 commented Feb 5, 2017 • edited Loading

andreasnoack commented Feb 5, 2017

Sacha0 commented Feb 5, 2017

andreasnoack commented Feb 6, 2017

Sacha0 Feb 6, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Sacha0 Feb 6, 2017 • edited Loading

Choose a reason for hiding this comment

Sacha0 Feb 6, 2017 • edited Loading

Choose a reason for hiding this comment

tkelman Feb 7, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Sacha0 Feb 5, 2017 •

edited

Loading

Sacha0 Feb 5, 2017 •

edited

Loading

Sacha0 commented Feb 5, 2017 •

edited

Loading

Sacha0 Feb 6, 2017 •

edited

Loading

Sacha0 Feb 6, 2017 •

edited

Loading

Sacha0 Feb 6, 2017 •

edited

Loading

tkelman Feb 7, 2017 •

edited

Loading