Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: Allow arbitrary array type for nzval of sparse matrix #30173

Closed
wants to merge 2 commits into from

Conversation

tkf
Copy link
Member

@tkf tkf commented Nov 28, 2018

I request a generalization of SparseMatrixCSC such that nzval can be an arbitrary vector type, not just a Vector. This let us compose the sparse matrix type with custom array types. For example, the following code works with this PR:

using SparseArrays
using MappedArrays
using LazyArrays
using FillArrays
using Setfield: @set

S = sprandn(10, 10, 0.5)

# Creating a mapped array for a sparse matrix:
M1 = @set S.nzval = mappedarray(abs, S.nzval)
# which is equivalent to:
M1 = GenericMatrixCSC(S.m, S.n, S.colptr, S.rowval, mappedarray(abs, S.nzval))

# Similar effect with LazyArrays.BroadcastArray:
M2 = @set S.nzval = BroadcastArray(abs, S.nzval)

# Sometimes it is very useful to have a constant value for `nzval` to save
# space and time.  You can do it with `FillArrays.Fill`:
M3 = @set S.nzval = Fill('a', nnz(S))

Achieving similar effect with current SparseMatrixCSC is hard. For example, here is my attempt to implement very small subset of mapped sparse array based on MappedArrays. It took > 30 lines of code and yet have fewer features than the matrix M1 above (e.g., efficient mul!). Doing something like constant nzval as in M3 with current SparseMatrixCSC would be hard. I can imagine doing so with singleton number-like struct but that would be very awkward and inefficient.

Of course, new feature shouldn't break old ones. I tired to make SparseMatrixCSC is backward compatible as much as possible. In fact, this PR so far only changed stdlib/SparseArrays/src except for two lines of introspection-based code in stdlib/SparseArrays/test. (Of course, once the direction of this PR is approved I will add the tests for the generalized interface.) Since the tests for SuiteSparse pass and it uses SparseMatrixCSC heavily, I think the approach in this PR seems to be working. I also assumed that show is part of API (otherwise doctest would break) and defined backward compatible show(::IO, ::Type{<:SparseMatrixCSC}).

The main change in this PR is the new struct which I tentatively call GenericMatrixCSC. This is just a simple generalization of SparseMatrixCSC:

struct GenericMatrixCSC{Tv, Ti<:Integer,
                        Tc<:AbstractVector{Ti},
                        Tr<:AbstractVector{Ti},
                        Tz<:AbstractVector{Tv}} <: AbstractSparseMatrix{Tv,Ti}
    m::Int                  # Number of rows
    n::Int                  # Number of columns
    colptr::Tc              # Column i is in colptr[i]:(colptr[i+1]-1)
    rowval::Tr              # Row indices of stored values
    nzval::Tz               # Stored values, typically nonzeros

    ...

SparseMatrixCSC is redefined as

const SparseMatrixCSC{Tv,Ti} = GenericMatrixCSC{Tv,Ti,Vector{Ti},Vector{Ti},Vector{Tv}}

Other than that, the PR is mostly just simple renames although the diff look large.

An alternative approach would be to define all CSC matrix code based on an abstract type and interface (e.g., nonzeros) add add GenericMatrixCSC while keeping SparseMatrixCSC untouched. The disadvantage is that we need to maintain two very similar structs. I think SparseMatrixCSC would have coded like GenericMatrixCSC if the ability to use arbitrary array type for nzval was foreseen. If that's the case using concrete struct as in this PR is more natural. Furthermore, this approach let us succinctly construct sparse matrices using Setfield.@set (as in the code above).

Needless to say, the implementation detail is not important. I can switch to abstract type approach if that's required.

Relevant PRs: #25118, #25151

@kshyatt kshyatt added the domain:arrays:sparse Sparse arrays label Nov 28, 2018
@tkf tkf changed the title Request: Allow arbitrary array type for nzval of Sparse matrix Request: Allow arbitrary array type for nzval of sparse matrix Nov 28, 2018
@ViralBShah
Copy link
Member

I think it would be better to have this capability as a separate package, and making changes to stdlib/SparseArrays to make it easy to make such capabilities first class.

@ViralBShah
Copy link
Member

Maybe this could be part of the JuliaSparse org.

@tkf
Copy link
Member Author

tkf commented Dec 17, 2018

How about introducing AbstractCSCMatrix and replace the use of SparseMatrixCSC in SparseArrays with AbstractCSCMatrix whenever possible? I think this is quite straight forward (although the patch will be much larger).

making changes to stdlib/SparseArrays to make it easy to make such capabilities first class.

Thanks for considering this. I appreciate if SparseArrays can expose more "hook points" so that users can play with the low-level APIs like I wrote in the PR description.

@ViralBShah
Copy link
Member

It seems that for the purposes of what you are trying here, what we want to do is widen the nzval to be an AbstractVector as much as possible throughout the sparse codebase.

@tkf
Copy link
Member Author

tkf commented Dec 17, 2018

Hmm... yes, but that's exactly what my PR does. I thought you preferred to do this in a separate package?

Just to be clear, I wasn't suggesting to add the capabilities of MappedArrays, LazyArrays, FillArrays, Setfield, etc. to SparseArrays. That's was just a demo that how it could be composed with other packages, if nzval can be a vector of an arbitrary type.

@ViralBShah
Copy link
Member

If we want to add a new type, like GenericMatrixCSC, it is best done in a separate package. But I think we can fix SparseMatrixCSC such that it can have non-numeric data as well. See https://github.com/JuliaLang/julia/issues/30573

@tkf
Copy link
Member Author

tkf commented Jan 4, 2019

We still need some change in SparseArrays.jl for downstream packages to extend sparse matrix, right?

Let me know what is the best approach:

  1. Current PR: Define GenericMatrixCSC{Tv,Ti,Tc,Tr,Tz} and then make SparseMatrixCSC{Tv,Ti} an alias to GenericMatrixCSC{Tv,Ti,Vector{Ti},Vector{Ti},Vector{Tv}}. Here, Tc, Tr, and Tz are AbstractVector subtypes for colptr, rowval, and nzval.

  2. Like 1, but just extend SparseMatrixCSC{Tv,Ti} to SparseMatrixCSC{Tv,Ti,Tc,Tr,Tz}.

  3. Add AbstractMatrixCSC and re-define all SparseMatrixCSC methods in terms of AbstractMatrixCSC, whenever possible.

  4. Other suggestions?

(I know you dismissed 1, but I'm listing it for completeness)

Approach 1 and 2 are almost equivalent (1 can be made almost non-breaking). Approach 3 is slightly more challenging but it's (probably) completely non-breaking and more powerful. For example, maybe you can implement a different behavior in an external package for the case zero(T) is not defined (as discussed in #30573).

Approach 2 is the only one that does not introduce a new type. However, it breaks the following code:

struct MyStruct{Tv,Ti}
    i_need_this_to_be_concrete::SparseMatrixCSC{Tv,Ti}
end

@ViralBShah
Copy link
Member

What about a case where the SparseMatrixCSC constructor can take AbstractVector{T} for nzval, where T may be any Julia datatype? If we adopt iszero consistently everywhere and clean up other things wherever necessary, shouldn't we be able to have this with no new types?

@tkf
Copy link
Member Author

tkf commented Jan 28, 2019

the SparseMatrixCSC constructor can take AbstractVector{T} for nzval

That's approach 2 I suggested, right? As I mentioned, it breaks the code using ::SparseMatrixCSC{Tv,Ti} for fields, dispatch using ::Type{SparseMatrixCSC{Tv,Ti}}, etc. I prefer 1 or 3 since they are not breaking.

If we adopt iszero consistently everywhere and clean up other things wherever necessary, shouldn't we be able to have this with no new types?

The aim of this PR is to allow arbitrary vector type, not the element type (which can already be anything; see below [1]). So, I believe iszero is more-or-less separated issue.


[1] SparseMatrixCSC as a container can already hold any eltype as there is no constraint:

struct SparseMatrixCSC{Tv,Ti<:Integer} <: AbstractSparseMatrix{Tv,Ti}
m::Int # Number of rows
n::Int # Number of columns
colptr::Vector{Ti} # Column i is in colptr[i]:(colptr[i+1]-1)
rowval::Vector{Ti} # Row indices of stored values
nzval::Vector{Tv} # Stored values, typically nonzeros
function SparseMatrixCSC{Tv,Ti}(m::Integer, n::Integer, colptr::Vector{Ti}, rowval::Vector{Ti},
nzval::Vector{Tv}) where {Tv,Ti<:Integer}
@noinline throwsz(str, lbl, k) =
throw(ArgumentError("number of $str ($lbl) must be ≥ 0, got $k"))
m < 0 && throwsz("rows", 'm', m)
n < 0 && throwsz("columns", 'n', n)
new(Int(m), Int(n), colptr, rowval, nzval)
end
end

julia> SparseMatrixCSC(3, 3, [1, 2, 2, 2], [1], [:hello])
3×3 SparseMatrixCSC{Symbol,Int64} with 1 stored entry:
  [1, 1]  =  :hello

@codecov-io
Copy link

Codecov Report

Merging #30173 into master will increase coverage by 55.7%.
The diff coverage is 91.53%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #30173      +/-   ##
==========================================
+ Coverage   39.12%   94.83%   +55.7%     
==========================================
  Files         344      327      -17     
  Lines       61276    54302    -6974     
==========================================
+ Hits        23977    51496   +27519     
+ Misses      37299     2806   -34493
Impacted Files Coverage Δ
base/meta.jl 65.95% <ø> (+3.58%) ⬆️
base/version.jl 97.5% <ø> (+15.27%) ⬆️
base/compiler/compiler.jl 100% <ø> (+66.66%) ⬆️
base/special/hyperbolic.jl 98.95% <ø> (+4.26%) ⬆️
base/strings/search.jl 93.44% <ø> (+15.72%) ⬆️
base/deprecated.jl 73.01% <ø> (+20.63%) ⬆️
base/simdloop.jl 94.87% <ø> (+14.38%) ⬆️
base/special/trig.jl 97.79% <ø> (+3.8%) ⬆️
base/iostream.jl 97.16% <ø> (+18.17%) ⬆️
base/file.jl 96.13% <ø> (+28.23%) ⬆️
... and 597 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4f8a7b9...39d5bd6. Read the comment docs.

@ViralBShah
Copy link
Member

We definitely do not want to introduce new types in the sparse matrix code in stdlib. If anything we need to work towards making such things possible outside of stdlib, and refactor stuff in stdlib that might be getting in the way.

@ViralBShah ViralBShah closed this Jun 26, 2019
@tkf
Copy link
Member Author

tkf commented Jun 27, 2019

If anything we need to work towards making such things possible outside of stdlib, and refactor stuff in stdlib that might be getting in the way.

Could you please re-read my previous comments and reconsider. I've been trying to convince you that the whole purpose of this PR and other variants I proposed are exactly for "making such things possible outside of stdlib" and not implementing any extra features for end-users.

I appreciate if you point out why my proposals are not aligned to your point or any other unclear points. It also would be great if you assign other reviewers as well.

@fredrikekre fredrikekre reopened this Jun 27, 2019
@ViralBShah
Copy link
Member

ViralBShah commented Jun 27, 2019

Unfortunately I can't assign reviewers - we can ping people and see who is interested. I will re-review the whole thread myself.

@ViralBShah
Copy link
Member

Is it possible to rebase this PR and retrigger the tests with the new CI?

@tkf
Copy link
Member Author

tkf commented Jun 27, 2019

Thank you for taking the time to reconsider this.

@ViralBShah
Copy link
Member

Can we not change SparseMatrixCSC fields to be AbstractVectors?

@tkf
Copy link
Member Author

tkf commented Jun 27, 2019

I think that would mean to add AbstractMatrixCSC and refactor SparseMatrixCSC related code to use the interface functions (e.g., rowvals(A) instead of A.rowval). I think it's doable and is the safest option (especially no change in the type SparseMatrixCSC).

@ViralBShah
Copy link
Member

I think using interface functions to access the internals of the data structure is actually a good thing.

@tkf
Copy link
Member Author

tkf commented Jun 28, 2019

Thank you. So, I guess creating a patch with AbstractMatrixCSC is worth a shot? I think there will be (1) refactoring commits to use rowvals(A) etc. in the internals and then (2) I'll introduce AbstractMatrixCSC.

@ViralBShah
Copy link
Member

ViralBShah commented Jun 28, 2019

Yes, that sounds like the right direction to me. The challenge of course is that we can only know if it feels right after a couple of attempts. Please open a new PR for that, so we have this one to refer to as well.

Let's use the full name AbstractSparseMatrixCSC.

@tkf
Copy link
Member Author

tkf commented Jun 28, 2019

Thanks. I understand that there is always a possibility that PR is rejected. I'll do it once I have a long enough chunk of time. I'll use AbstractSparseMatrixCSC.

@ViralBShah
Copy link
Member

I really do want to try get it in. Looking forward.

@ViralBShah
Copy link
Member

I will close this one since we want to take a different approach. Let's do the new approach as a new PR - and we will still have this one to refer to for comparison.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants