Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: AbstractSparseMatrixCSC interface #33054

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 62 additions & 0 deletions stdlib/SparseArrays/docs/src/AbstractSparseMatrixCSC.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
AbstractSparseMatrixCSC{Tv,Ti<:Integer} <: AbstractSparseMatrix{Tv,Ti}

Supertype for matrix with compressed sparse column (CSC).

!!! compat "Julia 1.4"
`AbstractSparseMatrixCSC` requires at least Julia 1.4.

## `AbstractSparseMatrixCSC` interface

In addition to the [Abstract array interface](@ref man-interface-array), every
`AbstractSparseMatrixCSC` subtype `TS` must provide the following methods:

* [`size(::TS)`](@ref size)
* [`coloffsets(::TS) :: AbstractVector{<:Ti}`](@ref coloffsets)
* [`rowvals(::TS) :: AbstractVector{<:Ti}`](@ref rowvals)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #25118 @simonbyrne suggested rowindices. It is a better name but rowvals is already a documented public API so I don't know if we want to change this at this point. But this may be a good opportunity to do so?

Copy link
Member

@ViralBShah ViralBShah Aug 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again agree. But documented APIs should not be changed. We can do this for 2.0, or whenever we can version stdlib / or move sparse out of stdlib.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, I think we can define rowvals to be an alias to rowindices like this:

rowindices(S::SparseMatrixCSC) = getfield(S, :rowval)
rowvals(x) = rowindices(x)

and then use rowindices internally everywhere.

In the documentation, we can mention that the use of rowvals is highly discouraged and will be deprecated in the next major release.

This means that for end-users rowvals keep working as-is but for package authors of custom array types they are forced to use rowindices from the get-go.

* [`nonzeros(::TS) :: AbstractVector{<:Tv}`](@ref nonzeros)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #25118 @simonbyrne suggested values. Indeed, it's a bit strange that nonzeros(A) can contain zeros.

Unfortunately, values(::SparseMatrixCSC) is already a valid method which is defined to be an identity function by the fallback. But maybe it could be considered a "minor change" if we define appropriate key(::AbstractSparseMatrixCSC)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a reference, #22907 is the PR where keys, values and pairs are introduced for arrays. There seems to be no discussion on the treatment of structured and sparse arrays.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it would be breaking for functions like findmax implemented in #22907.

Copy link
Member

@ViralBShah ViralBShah Aug 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think values is a bit generic. Perhaps storedvals? But still can't change the documented API. Note it for future change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storedvals sounds like a good idea to me. If we go this way, we can rename nonzeroinds for SparseVector to storedindices as well (ref #33050 (comment)).


Other sparse matrix methods such as [`nzrange`](@ref) and [`nnz`](@ref) are automatically
defined in terms of above functions.

## Assumed invariance

To use algorithms defined in SparseArrays, a matrix `A` of type `AbstractSparseMatrixCSC`
must satisfy the following constraints.

### Matrix `A` and all vectors constituting `A` have one-based indexing
Copy link
Member Author

@tkf tkf Aug 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think one-based indexing is necessary, in principle. But I think most of the code in SparseArrays.jl requires it so I think it's better to specify it.


```julia
@assert !has_offset_axes(A)
@assert !has_offset_axes(coloffsets(A))
@assert !has_offset_axes(rowvals(A))
@assert !has_offset_axes(nonzeros(A))
```

### Row indices in `rowval(A)` for each column are sorted

```julia
for col in axes(A, 2)
@assert issorted(rowval(A)[nzrange(A, col)])
end
```

### Column pointers in `coloffsets(A)` are increasing, started at 1

```julia
@assert coloffsets(A)[1] === 1
@assert all(diff(coloffsets(A)) .>= 0)
```

### Row index vector `rowvals(A)` and non-zero value vector `nonzeros(A)` are long enough

```julia
@assert nnz(A) <= length(rowvals(A))
@assert nnz(A) <= length(nonzeros(A))
Copy link
Member Author

@tkf tkf Aug 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment (below) says 0 <= length(nonzeros(A)) but I suppose it was a typo? I'm using nnz(A) <= length(nonzeros(A)) instead (above).

# Assumes that 0 <= length(nzval) < typemax(Ti)

```

### Indices for rows, `rowvals(A)` and `nonzeros(A)` are representable by `Ti`

```julia
@assert size(A, 1) <= typemax(Ti)
@assert nnz(A) <= typemax(Ti)
Copy link
Member Author

@tkf tkf Aug 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is something I don't understand in #31724. It was mentioned that both m = size(A, 1) and n = size(A, 2) must be representable by Ti:

julia/HISTORY.md

Lines 88 to 90 in fb9cd34

* `SparseMatrixCSC(m,n,colptr,rowval,nzval)` perform consistency checks for arguments:
`colptr` must be properly populated and lengths of `colptr`, `rowval`, and `nzval`
must be compatible with `m`, `n`, and `eltype(colptr)`.

I understand that size(A, 1) <= typemax(Ti) has to hold because we use rowvals(A) for indexing to the rows [1]. However, size(A, 2) only appears as the indexing into getcolptr(S) so why does Ti matter?

Also regarding the current comment:

# Assumes that nnz <= length(rowval) < typemax(Ti)
# Assumes that 0 <= length(nzval) < typemax(Ti)

  • It says nnz(A) < typemax(Ti). But isn't nnz(A) == typemax(Ti) OK?
  • I suppose length(rowvals(A)) <= typemax(Ti) and length(nonzeros(A)) <= typemax(Ti) don't have to hold because the tail of those vectors can just be ignored?

[1] But technically size(A, 1) > typemax(Ti) sounds OK as long as the structural nonzeros appear only in the upper 1:typemax(Ti) rows?

```