`map` on sparse arrays does not work with non-numeric data #19561

amitmurthy · 2016-12-12T03:44:29Z

julia> map(x-> x != 0.0 ? "Hello" : x, sparse(eye(4)))
ERROR: MethodError: no method matching zero(::Type{Any})

Also a problem when mixing ints and floats.

julia> map(x-> x != 0.0 ? 1 : x, sparse(eye(4)))
ERROR: MethodError: no method matching zero(::Type{Any})

If sparse arrays are not meant to be used with non-numeric data, we should at least throw a better error message.

The text was updated successfully, but these errors were encountered:

oscardssmith · 2016-12-12T03:52:22Z

What version of Julia are you using? on 0.5.0 this works

amitmurthy · 2016-12-12T03:54:06Z

master. map was recently implemented for sparse arrays.

Sacha0 · 2016-12-12T05:33:09Z

The new map[!]/broadcast[!] methods for SparseMatrixCSCs check whether the map/broadcast operation yields zero when all arguments are zero. Specifically, for e.g. map(f, A, B), the check is

fofzeros = f(zero(eltype(A)), zero(eltype(B)))
fpreszeros = fofzeros == zero(fofzeros)

This check requires that (1) the input array eltypes provide zero and (2) the return type of f (evaluated for arguments of the input array eltypes) provides zero.

The first requirement is fundamental: A sparse matrix C is only well defined if zero(eltype(C)) is well defined, as otherwise C's unstored entries are not well defined (in which case maping over C makes little sense).

The second requirement is relaxable: We need only the result of the hypothetical iszero(fofzeros). Not having iszero, we must use either fofzeros == zero(fofzeros) or fofzeros == 0 (alternatives?). The former handles units gracefully, but fails where zero(fofzeros) is not defined. The latter does not handle units gracefully, but works in some other cases where zero(fofzeros) is not defined.

Both examples above violate the second requirement and would work given iszero (without tradeoff) or with fofzeros == 0 in place of fofzeros == zero(fofzeros) (trading off handling units).

Thoughts? Best!

oscardssmith · 2016-12-12T05:46:53Z

alternatively wouldn't defining zero for type any work? I'm not sure what that would be though.

tkelman · 2016-12-12T06:02:56Z

If you only ever do operations on the stored entries (or their locations), then it doesn't always matter whether the unstored entries have a realizable value in the same element type as the stored entries. I've found sparse matrices of symbols or functions to be useful on occasion as a data structure, where it was useful to have them in CSC format, do concatenations or similar operations.

We should just add iszero.

Sacha0 · 2016-12-12T20:15:16Z

If you only ever do operations on the stored entries (or their locations), then it doesn't always matter whether the unstored entries have a realizable value in the same element type as the stored entries. I've found sparse matrices of symbols or functions to be useful on occasion as a data structure, where it was useful to have them in CSC format, do concatenations or similar operations.

Absolutely, SparseMatrixCSCs with only stored entries well defined have utility. Applying whole-array operations (e.g. map) to such objects makes little sense though; the hypothetical mapstored seems the appropriate operation in that case.

We should just add iszero.

+1 :). Best!

nalimilan · 2016-12-12T20:28:03Z

But how would iszero help here? We wouldn't define it on Symbol, String nor Function, right?

Sacha0 · 2016-12-12T20:55:33Z

But how would iszero help here? We wouldn't define it on Symbol, String nor Function, right?

In both cases above, iszero(fofzeros) could check fofzeros == 0 rather than fofzeros == zero(fofzeros), skirting lack of definition of zero for Any?

Sacha0 · 2016-12-12T21:11:15Z

Looked closer and I'm mistaken. The zero preservation check discussed above works as written in these cases. Rather, checking whether output entry Cx = f(xs...) is zero via Cx != zero(eltype(C)) causes the reported failures. Replacing that check with iszero(Cx), or shy of that Cx != zero(Cx), would fix the second reported failure (the output sparse matrix having mixed numeric type). But with a bit more thought, the present semantics of map[!]/broadcast[!] over sparse matrices (not storing zeros) preclude making the first reported failure work without the pun of defining zero for Strings (or at least comparison with 0). Best!

Sacha0 · 2016-12-12T21:26:46Z

But with a bit more thought, the present semantics of map[!]/broadcast[!] over sparse matrices (not storing zeros) preclude making the first reported failure work without the pun of defining zero for Strings (or at least comparison with 0).

That thought was silly. iszero would save us in that case as well, there being no problem with comparison against 0: Replacing Cx != zero(eltype(C)) with iszero(Cx) and having the fallback iszero(x) = x == 0 would fix the first reported failure as well. Thoughts? Best!

stevengj · 2016-12-16T20:58:42Z

Using sparse arrays/vectors for non-algebraic data (types that do not define zero) seems like a bad pun to me. These are linear-algebra objects.

stevengj · 2016-12-16T21:05:17Z

For example, what does getindex return for the non-stored entries if zero is not defined?

stevengj · 2016-12-16T21:06:44Z

If you just want a general "sparse" associative "array" from Int to a random type T, you should use Dict or similar.

stevengj · 2016-12-16T21:11:01Z

And if you want to map the nonzero elements of a sparse array to non-numeric values, you should use map(f, nonzeros(A)).

tkelman · 2016-12-16T21:17:23Z

Dict isn't CSC. It's useful on occasion to have a sparse array that has the same index iteration structure as a numeric SparseMatrixCSC, with effectively undefined non-stored entries. Linear algebra operations won't work on them, but many array manipulation operations will.

stevengj · 2016-12-16T21:22:53Z

Lots of things are occasionally useful but lead to poor library design. We normally try to avoid bad type puns in Base.

stevengj · 2016-12-16T21:29:04Z

Could you give a concrete example where nonzeros(A) would not suffice?

tkelman · 2016-12-16T21:34:32Z

We allow non numeric or heterogeneously typed dense arrays. Allowing for the same with sparse matrices is a feature that it would be a regression to lose. Not being able to do linear algebra on some array types doesn't always make it a pun or bad library design to support.

For this particular example of map I agree nonzeros is mostly good enough. Will have to check whether the assumption that zero(eltype(A)) exists impacts other use cases.

stevengj · 2016-12-16T23:18:42Z

The definition of sparsity here is "mostly zero", which is what makes it seem like a pun, and rather different from an Associative type (a partial function), which is what you seem to want to use it for.

stevengj · 2016-12-16T23:22:54Z

The generalization would make more sense if we allowed an arbitrary default value, but that could cause trouble with code expecting "sparse" to mean "default zero"

Maybe we should have a DefaultArray type with an arbitrary default value, and have SparseMatrixCSC be a subtype?

Sacha0 · 2016-12-16T23:25:58Z

Ref. https://github.com/JuliaComputing/IndexedTables.jl

oscardssmith · 2016-12-17T00:52:36Z

@stevenj this is a really good idea. It will also provide a path where we don't return dense arrays when we broadcast nonzero preserving functions, which could be a huge win for memory usage.

stevengj · 2016-12-17T01:15:38Z

If we want to use it for avoiding dense broadcast output, then the sparse case can't be a subtype (wouldn't be type stable). I dunno, maybe all sparse arrays should be DefaultArrays after all. @StefanKarpinski was arguing for this in another issue, and I was skeptical, but maybe it's worth the trouble.

oscardssmith · 2016-12-17T02:16:51Z

The really big advantage would be that all broadcasts would be efficient and produce the same type no matter what. Also, if I understood his proposal correctly, we could do even better, as his was a way to cheat a default from a sparse, but if it were going on base, we could further simplify the code.

Sacha0 · 2016-12-18T21:33:20Z

For a potential stopgap solution, please see #17623 (comment). Best!

If some feel strongly about the stricter iszero definition while others feel strongly about {map/broadcast over sparse matrices where the output element types don't provide zero}, we can accommodate both desires with a strict iszero definition and a _sparsebc_iszero which wraps iszero but provides the permissive fallback.

Fix #19561 (sparse map/broadcast where the output eltype is not a concrete subtype of Number)

Sacha0 · 2017-03-02T18:47:46Z

Type inference seems to have improved (:tada:) to the point that half of the tests for this issue are ineffective:

julia> intoneorfloatzero(x) = x != 0.0 ? Int(1) : Float64(x)
intoneorfloatzero (generic function with 1 method)

julia> foo = map(intoneorfloatzero, speye(4))
4×4 SparseMatrixCSC{Union{Float64, Int64},Int64} with 4 stored entries:
  [1, 1]  =  1
  [2, 2]  =  1
  [3, 3]  =  1
  [4, 4]  =  1

julia> eltype(foo)
Union{Float64, Int64}

julia> zero(eltype(foo))
0

(Previously zero(eltype(foo)) would fail.) Additionally, I found four code paths not fixed by #19589 and missed by the existing tests. Stronger tests and fix inbound. Best!

… a concrete subtype of Number (JuliaLang#19561, later part).

where the output eltype is not a concrete subtype of Number (#19561, later part).

amitmurthy · 2017-05-22T05:39:26Z

Closed by #20862 I believe.

andreasnoack · 2017-07-26T13:28:57Z

As mentioned in #22945 (comment), the consequence of the fix here might cause some trouble elsewhere. @amitmurthy would you be able to share some real world examples where map on a SparseMatrix{String,Int} would be useful?

amitmurthy · 2017-07-26T15:15:42Z

The specific requirement was for a SparseMatrix{Ref,Int} in light of how asyncmap (and hence pmap) is implemented. While still a TODO -

julia/base/asyncmap.jl

Lines 263 to 268 in 0a37b3d

    
           # TODO: Optimize for sparse arrays 
        
           # For now process as regular arrays and convert back 
        
           function asyncmap(f, s::AbstractSparseArray...; kwargs...) 
        
               sa = map(Array, s) 
        
               return sparse(asyncmap(f, sa...; kwargs...)) 
        
           end

- it can be optimized if SparseMatrix{Ref,Int} continues to be supported.

Other than the above I don't have any other real-world examples for non-numeric sparse arrays.

Sacha0 · 2017-07-26T22:44:20Z

Above @tkelman mentions use cases for SparseMatrixCSCs of symbols and/or functions:

I've found sparse matrices of symbols or functions to be useful on occasion as a data structure, where it was useful to have them in CSC format, do concatenations or similar operations.

tkelman added sparse Sparse arrays regression Regression in behavior compared to a previous version labels Dec 12, 2016

oscardssmith mentioned this issue Dec 13, 2016

Taking Sparse Arrays Seriously #19573

Closed

Sacha0 mentioned this issue Dec 14, 2016

Fix #19561 (sparse map/broadcast where the output eltype is not a concrete subtype of Number) #19589

Merged

pabloferz mentioned this issue Dec 15, 2016

Make sparse operations less dependent on inference #19611

Closed

martinholters mentioned this issue Dec 16, 2016

Remove explicit dependence of sparse broadcast on type inference #19623

Closed

Sacha0 mentioned this issue Dec 16, 2016

make "dot" operations (.+ etc) fusing broadcasts #17623

Merged

10 tasks

Sacha0 closed this as completed in #19589 Jan 1, 2017

Sacha0 added a commit that referenced this issue Jan 1, 2017

Merge pull request #19589 from Sacha0/mapbciszero

1539061

Fix #19561 (sparse map/broadcast where the output eltype is not a concrete subtype of Number)

Sacha0 reopened this Mar 2, 2017

Sacha0 mentioned this issue Mar 2, 2017

fix #19561 v2 (sparse broadcast[!] where output eltype is not a concrete <:Number) #20862

Merged

Sacha0 added a commit to Sacha0/julia that referenced this issue Mar 2, 2017

Fix sparse broadcast[!] for some cases where the output eltype is not…

3dc07d6

… a concrete subtype of Number (JuliaLang#19561, later part).

tkelman pushed a commit that referenced this issue Mar 4, 2017

Fix sparse broadcast[!] for some cases (#20862)

bb76add

where the output eltype is not a concrete subtype of Number (#19561, later part).

amitmurthy closed this as completed May 22, 2017

andreasnoack mentioned this issue Jul 26, 2017

Fix vecnorm for Vector{Vector{T}} #22945

Merged

abraunst mentioned this issue Jan 14, 2022

sparse arrays with algebraic, non-numerical data JuliaSparse/SparseArrays.jl#46

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`map` on sparse arrays does not work with non-numeric data #19561

`map` on sparse arrays does not work with non-numeric data #19561

amitmurthy commented Dec 12, 2016

oscardssmith commented Dec 12, 2016

amitmurthy commented Dec 12, 2016

Sacha0 commented Dec 12, 2016

oscardssmith commented Dec 12, 2016 •

edited

Loading

tkelman commented Dec 12, 2016

Sacha0 commented Dec 12, 2016

nalimilan commented Dec 12, 2016

Sacha0 commented Dec 12, 2016

Sacha0 commented Dec 12, 2016

Sacha0 commented Dec 12, 2016 •

edited

Loading

stevengj commented Dec 16, 2016

stevengj commented Dec 16, 2016

stevengj commented Dec 16, 2016 •

edited

Loading

stevengj commented Dec 16, 2016

tkelman commented Dec 16, 2016

stevengj commented Dec 16, 2016

stevengj commented Dec 16, 2016

tkelman commented Dec 16, 2016

stevengj commented Dec 16, 2016

stevengj commented Dec 16, 2016

Sacha0 commented Dec 16, 2016

oscardssmith commented Dec 17, 2016

stevengj commented Dec 17, 2016

oscardssmith commented Dec 17, 2016

Sacha0 commented Dec 18, 2016

Sacha0 commented Mar 2, 2017

amitmurthy commented May 22, 2017

andreasnoack commented Jul 26, 2017

amitmurthy commented Jul 26, 2017

Sacha0 commented Jul 26, 2017

map on sparse arrays does not work with non-numeric data #19561

map on sparse arrays does not work with non-numeric data #19561

Comments

amitmurthy commented Dec 12, 2016

oscardssmith commented Dec 12, 2016

amitmurthy commented Dec 12, 2016

Sacha0 commented Dec 12, 2016

oscardssmith commented Dec 12, 2016 • edited Loading

tkelman commented Dec 12, 2016

Sacha0 commented Dec 12, 2016

nalimilan commented Dec 12, 2016

Sacha0 commented Dec 12, 2016

Sacha0 commented Dec 12, 2016

Sacha0 commented Dec 12, 2016 • edited Loading

stevengj commented Dec 16, 2016

stevengj commented Dec 16, 2016

stevengj commented Dec 16, 2016 • edited Loading

stevengj commented Dec 16, 2016

tkelman commented Dec 16, 2016

stevengj commented Dec 16, 2016

stevengj commented Dec 16, 2016

tkelman commented Dec 16, 2016

stevengj commented Dec 16, 2016

stevengj commented Dec 16, 2016

Sacha0 commented Dec 16, 2016

oscardssmith commented Dec 17, 2016

stevengj commented Dec 17, 2016

oscardssmith commented Dec 17, 2016

Sacha0 commented Dec 18, 2016

Sacha0 commented Mar 2, 2017

amitmurthy commented May 22, 2017

andreasnoack commented Jul 26, 2017

amitmurthy commented Jul 26, 2017

Sacha0 commented Jul 26, 2017

`map` on sparse arrays does not work with non-numeric data #19561

`map` on sparse arrays does not work with non-numeric data #19561

oscardssmith commented Dec 12, 2016 •

edited

Loading

Sacha0 commented Dec 12, 2016 •

edited

Loading

stevengj commented Dec 16, 2016 •

edited

Loading