Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SparseMatrix and SparseArray don't consitently define zero elements and allowed element types #24790

Closed
KlausC opened this issue Nov 26, 2017 · 12 comments
Labels
sparse Sparse arrays

Comments

@KlausC
Copy link
Contributor

KlausC commented Nov 26, 2017

When I first worked with sparse matrices, I was convinced, that the element type of those objects were AbstractFloat types. I am easily convinced, that all Number types make sense as well. Nevertheless there there seems to be a sloppy idea of which element types are actually intended.

The constructors of sparse arrays accept any element type. But you can call certain functions on sparse arrays only, if the element type supports certain methods. So the element types are de facto restricted by a kind of duck typing. The methods, which I refer to, include != 0, == 0, != zero, == zero, isequal, iszero for the question, if an element to be processed is zero or not.
I see, that all those alternatives lead to the same result for Numbers (with the exception of == and isequal, which deserves a separate consideration in the case of ±0.0).
But if I want to use a different element type Tv, I have to support this type with a consistent behaviour for all of those methods. That is far more inconvenient than just have to define zero(::Type{Tv}), what would be sufficient in the case of a consistent use within the SparseArrays module.

Using iszero(x) would be a proper choice. It falls back to x == zero(typeof(x)). Comparison with 0 is good for if x isa Number, but leads into a blind alley, for other types, because all elements would be stored silently (even if zero is defined).

Here is a list of the current implementation (v0.7 2017-11-24) which illustrates the diversity in handling zeros:
How (and where) is defined, if an entry is considered a zero?

check expression source function comments
v != 0 sparsevector.jl:387 convert from dense
v != 0 sparsevector.jl:292 setindex! but only, if index is new
v != 0 sparsematrix.jl:2324 setindex! (but only, if index is new)
v != zero(Tv) sparsematrix.jl:2598 setindex!
nothing sparseivector.jl:751 getindex(x,::AbstractUnitange) new sparse vector constructed, but no check
x != 0 sparsevector.jl:1975 dropzeros!
x != 0 sparsematrix.jl:1227 dropzeros!
x != 0 sparsevector.jl:724 findnz
x != 0 sparsematrix.jl:1303 findnz
x != 0 sparsematrix.jl:1786 findz
x != 0 sparsematrix.jl:3189 is_hermsym
x != 0 sparsematrix.jl:3227 istriu
x != 0 sparsematrix.jl:3246 istril
x != 0 sparsematrix.jl:1564,1567,1574,1577 ==
convert(eltype(A), x) != zero(eltype(A)) sparsematrix.jl:2033 Base.fill!
v != zero(v) sparsevector.jl:1590,:1689 A_mul_B!(::Number, ...)
v == zero(T) sparsematrix.jl:1641 Base._mapreduce
isequal(f(zero(Tv)), zero(Tv) sparsematrix.jl:1744 _mapreducecols!
iszero(A.nzval) sparsematrix.jl:1513 Base.iszero
iszero(x) sparsematrix.jl:1520 Base.isone
isequal(v, zero(T)) sparsematrix.jl:3472 Base.hash
x == 0 higherorderfns.jl:139 _iszero
Base.iszero(x::Number) higherorderfns.jl:140 _iszero
Base.iszero(x::AbstractArray) higherorderfns.jl:141 _iszero

How is zero element created?

expression source function comments
zero(eltype(A)) sparsevector.jl:688 find(p, A)
zero(Tv) sparsevector:751 getindex(x,::Integer)
zero(T) sparsematrix:1890,1892 getindex(x,::Integer)
zero(Tv) sparsematrix.jl:2594,2598,2616,2623 setindex!
zero(eltype(S)) sparsematrix.jl:73 count
zero(eltype(S)) sparsematrix.jl:1262 find
zeros(Tv, m, n) sparsematrix.jl:377 convert (to dense - if sparse array has no structural zeros, not used)
zero(T) sparsematrix.jl:1598,1620,1629,1631,1639 Base._mapreduce
zero(T) sparsematrix.jl:1708 Base._mapreducedim!
zero(T) sparsematrix.jl:1743 _mapreducecols!
zero(T) sparsematrix.jl:1810 _findr
zeros(Tv,m) sparsematrix.jl:3373 sortSparseMatrixCSC!
zeros(eltype(A)) higherorderfns.jl:142,143 _zeros_eltypes

Inconsitencies

setindex!(A, -0.0, ind...) behavior depends on ind is structural zero or not

Example where the result of getindex depends on the pre-history (before the last setindex!):

julia> B = spzeros(2)
2-element SparseVector{Float64,Int64} with 0 stored entries
julia> B[1] = 1;
julia> B[1] = -0.0; B[2] = -0.0;
julia> B[1], B[2]
(-0.0, 0.0)

Proposal:

  1. Change in sparse/*jl all the checks if x is zero by the call iszero(x).

  2. Amend the documentation with a hint, that the element types of SparseArrays need to define the `zero(::Type{T}) method.

  3. Change the values of all AbstractFloat zero values -0.0 to +0.0 of the same type, before they are stored into nzvals.

  4. Change the documentation about getindex(::AbstractSparseArray{T},...) to mention, that all returned zeros have positive signs for T<:Number.

  5. Define a fallback method Base.zero(x) = zero(typeof(x)) and Base.zero(::Type{T}) where T = error("..."). That would be in accordance to current doc of zero.

@KristofferC
Copy link
Member

Could you give a code snippet as an example where these "inconsistencies" lead to an inconsistent result?

@fredrikekre fredrikekre added the sparse Sparse arrays label Nov 26, 2017
@ViralBShah
Copy link
Member

The issue needs a better title! Would be nice to have consistency.

@KlausC KlausC changed the title SparseMatrix and SparseArray have unclear design SparseMatrix and SparseArray don't consitently define zero elements and allowed element types Nov 26, 2017
@KlausC
Copy link
Contributor Author

KlausC commented Nov 26, 2017

Another code snippet, demonstrating the need (v0.6.1 and v0.7-DEV):

julia> using Base.Dates

julia> S = sparse([ Second(0); Second(1)])
2-element SparseVector{Base.Dates.Second,Int64} with 2 stored entries:
  [1]  =  0 seconds
  [2]  =  1 second
julia> zero(Second(1))
0 seconds
julia> iszero(Second(0))
true
julia> Second(0) == 0
false

Generally it is not possible to take advantage of the storage economy of sparse arrays for dimensionful or unitful element types.

@KristofferC
Copy link
Member

Ref #9325

@KlausC
Copy link
Contributor Author

KlausC commented Nov 26, 2017

@KristofferC : I see that this issue has bee discussed 3 years ago. The main argument then, not to change anything (as I undestood), was performance.
I cannot verify that with the current version of Julia. The timing of x == 0 and iszero(x) does not seem to differ for x isa Union{Float64,BigInt,BigFloat}.
Maybe it is time now to review the arguments.

@yurivish
Copy link
Contributor

I ran into this while giving a demonstration of Julia to a friend. We had a custom struct type with zero and iszero defined and wanted to make a sparse matrix out of a dense matrix containing a custom type.

Due to the != 0 check, there was no way for us to communicate our zero value to the sparse matrix code, and the zeros were thus represented explicitly in the sparse matrix:

julia> struct MyType
           val::Bool
       end

julia> Base.zero(::Type{MyType}) = MyType(false)

julia> Base.iszero(x::MyType) = x.val == false

julia> sparse([MyType(true), MyType(false)])
2-element SparseVector{MyType,Int64} with 2 stored entries:
  [1]  =  MyType(true)
  [2]  =  MyType(false)

@ekinakyurek
Copy link
Contributor

I had same issue, is there a progress on this issue ? What are the challenges ?

@KristofferC
Copy link
Member

In best case, changing some x == 0 to iszero(x).

@Lecrapouille
Copy link

@yurivish I made a PR on your problem. Now returning:

julia> sparse([MyType(true), MyType(false)])
2-element SparseVector{MyType,Int64} with 1 stored entry:
  [1]  =  MyType(true)

Is it ok for you ?

@yurivish
Copy link
Contributor

Thanks! That is the behavior I think is right in this case.

@KlausC KlausC closed this as completed Feb 5, 2020
@Lecrapouille
Copy link

@KlausC do you think my commit has a chance to be merged?

@KlausC
Copy link
Contributor Author

KlausC commented Feb 6, 2020

@Lecrapouille, which PR do you mean exactly? You know I have been trying to clean-up some of the inconsistencies in SparseMatrixCSC for a while, but found unexpected resistance. So my answer to this question is not important. You should ask people with permissions to merge.

The bugs you point out in issue https://github.com/JuliaLang/julia/issues/33332 are still not fixed. I support your fixes mentioned there, but lost overview of which is the PR providing the fix.

BTW, the example you mention here: #24790 (comment) seems to work for me without your PR merged. (not in v"1.2" but in v"1.5.0")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sparse Sparse arrays
Projects
None yet
Development

No branches or pull requests

7 participants