Fix #18974 (promotion in sparse unary broadcast methods) #18975

Sacha0 · 2016-10-16T20:12:01Z

Without this PR

julia> sin.(spdiagm(1:4))
ERROR: InexactError()
 in convert(::Type{Int64}, ::Float64) at ./float.jl:637
 in _broadcast_unary_nz2z_z2z_T(::Base.#sin, ::SparseMatrixCSC{Int64,Int64}, ::Type{Int64}) at ./sparse/sparsematrix.jl:1425
 in broadcast(::Base.#sin, ::SparseMatrixCSC{Int64,Int64}) at ./sparse/sparsematrix.jl:1403

whereas with this PR

julia> sin.(spdiagm(1:4))
4×4 sparse matrix with 4 Float64 nonzero entries:
        [1, 1]  =  0.841471
        [2, 2]  =  0.909297
        [3, 3]  =  0.14112
        [4, 4]  =  -0.756802

Best!

tkelman · 2016-10-17T13:32:16Z

test/sparse/sparse.jl

+# `SparseMatrixCSC`s determine a reasonable return element type. (Issue #18974.)
+let
+    A = spdiagm(1:4)
+    @test eltype(expm1.(A)) <: AbstractFloat # representative for _unary_nz2nz_z2z class


may as well be more specific?

Good call. Forced Int64/Float64 in the tests. Thanks!

stevengj · 2016-10-17T17:36:29Z

base/sparse/sparsematrix.jl

@@ -1433,7 +1433,7 @@ function _broadcast_unary_nz2z_z2z_T{TvA,TiA,TvB}(f::Function, A::SparseMatrixCS
    return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval)
 end
 function _broadcast_unary_nz2z_z2z{Tv}(f::Function, A::SparseMatrixCSC{Tv})
-    _broadcast_unary_nz2z_z2z_T(f, A, Tv)
+    _broadcast_unary_nz2z_z2z_T(f, A, promote_op(f, Tv))


Wouldn't it be better to call broadcast(f, A.nzval) and just pass the resulting array? That way we re-use the type computations in broadcast (which are more sophisticated than promote_op for non-empty arrays).

For nz2nz_z2z that would probably make sense. I think there was a different issue that discussed removing the nz2z_z2z class.

Even for nz2z_z2z you could use broadcast(f, A.nzval) and filter out zeros. You could even do it in-place (with a resize! at the end), without wasting much memory if the fraction of zeros introduced is small.

Great idea that. Another rewrite of sparse unary broadcast is on my agenda. Would you prefer to (1) merge this PR as an interim fix or (2) close this PR and incorporate a better fix for #18974 (specifically your suggestion) as part of that rewrite? Thanks!

I'd prefer to focus on fixing broadcast generally for sparse-matrix types, since an interim solution is not urgent.

@stevengj, I benchmarked two versions of your suggestion (for nz2nz_z2z-class unary operations). The overhead from calling broadcast(f, view(A.nzval, 1:nnz(A))) (and to a lesser degree broadcast(f, A.nzval)) seems substantial for small to moderate size sparse matrices. Thoughts? (Benchmarks below). Best!

function oldbc{Tv}(f::Function, A::SparseMatrixCSC{Tv}) oldbc_T(f, A, Base.promote_op(f, Tv)) end function oldbc_T{TvA,TiA,TvB}(f::Function, A::SparseMatrixCSC{TvA,TiA}, ::Type{TvB}) Bcolptr = Vector{TiA}(A.n + 1) Browval = Vector{TiA}(nnz(A)) Bnzval = Vector{TvB}(nnz(A)) copy!(Bcolptr, 1, A.colptr, 1, A.n + 1) copy!(Browval, 1, A.rowval, 1, nnz(A)) @inbounds @simd for k in 1:nnz(A) Bnzval[k] = f(A.nzval[k]) end return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval) end function newbc{TvA,TiA}(f::Function, A::SparseMatrixCSC{TvA,TiA}) Bcolptr = Vector{TiA}(A.n + 1) Browval = Vector{TiA}(nnz(A)) copy!(Bcolptr, 1, A.colptr, 1, A.n + 1) copy!(Browval, 1, A.rowval, 1, nnz(A)) Bnzval = broadcast(f, view(A.nzval, 1:nnz(A))) return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval) end function newbcnv{TvA,TiA}(f::Function, A::SparseMatrixCSC{TvA,TiA}) Bcolptr = Vector{TiA}(A.n + 1) Browval = Vector{TiA}(nnz(A)) copy!(Bcolptr, 1, A.colptr, 1, A.n + 1) copy!(Browval, 1, A.rowval, 1, nnz(A)) Bnzval = broadcast(f, A.nzval) return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval) end using BenchmarkTools for N in [10^2, 10^3, 10^4] println(N) A = sprand(N, N, 0.1) oldbench = @benchmark oldbc($abs, $A) newbench = @benchmark newbc($abs, $A) newnvbench = @benchmark newbcnv($abs, $A) println(ratio(median(newnvbench), median(oldbench))) println(ratio(median(newbench), median(oldbench))) end

yields

100 BenchmarkTools.TrialRatio: time: 11.641124140719773 gctime: 1.0 memory: 1.0632790028763184 allocs: 7.75 time tolerance: 5.00% memory tolerance: 1.00% BenchmarkTools.TrialRatio: time: 11.815406389001213 gctime: 1.0 memory: 1.0661553211888783 allocs: 8.0 time tolerance: 5.00% memory tolerance: 1.00% 1000 BenchmarkTools.TrialRatio: time: 1.2436132534507298 gctime: 1.0 memory: 1.0006739279095351 allocs: 5.833333333333333 time tolerance: 5.00% memory tolerance: 1.00% BenchmarkTools.TrialRatio: time: 1.6859576533933556 gctime: 1.0 memory: 1.000703660023191 allocs: 6.0 time tolerance: 5.00% memory tolerance: 1.00% 10000 BenchmarkTools.TrialRatio: time: 1.0039543258604076 gctime: 1.0021435474746039 memory: 1.0000067974835716 allocs: 5.142857142857143 time tolerance: 5.00% memory tolerance: 1.00% BenchmarkTools.TrialRatio: time: 1.2308898483951354 gctime: 1.004561220662931 memory: 1.0000070973725528 allocs: 5.285714285714286 time tolerance: 5.00% memory tolerance: 1.00%

That's odd, when I've tried broadcast on any non-tiny array it's usually nearly as fast as a manual loop. Can you drill down to find the source of the slowdown?

Drilling down a bit. What is your metric for tiny? Best! (Edit: I originally forgot to interpolate globals in the benchmarks below. Fixing that substantially decreased the performance gap. Perhaps something went amiss with the benchmarks above as well... (Edit edit: No, the benchmarks above were correct. Mystery elsewhere.))

function manmap(f, x) y = similar(x) @inbounds @simd for k in indices(x, 1) y[k] = f(x[k]) end return y end using BenchmarkTools foo = ones(10^3); mapbench = @benchmark map($abs, $foo); manbench = @benchmark manmap($abs, $foo); bcbench = @benchmark broadcast($abs, $foo); ratio(median(mapbench), median(manbench)) ratio(median(bcbench), median(manbench))

yields

julia> ratio(median(mapbench), median(manbench)) BenchmarkTools.TrialRatio: time: 1.1104553119730185 gctime: 1.0 memory: 1.0019685039370079 allocs: 2.0 time tolerance: 5.00% memory tolerance: 1.00% julia> ratio(median(bcbench), median(manbench)) BenchmarkTools.TrialRatio: time: 1.0185497470489038 gctime: 1.0 memory: 1.0 allocs: 1.0 time tolerance: 5.00% memory tolerance: 1.00%

(Julia Commit 438d1ea (2016-10-22 07:39 UTC) / Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz)

Mystery solved. Specialization on the type of the function being broadcast is not occurring:

function structure{TvA,TiA}(A::SparseMatrixCSC{TvA,TiA}) Bcolptr = Vector{TiA}(A.n + 1) Browval = Vector{TiA}(nnz(A)) copy!(Bcolptr, 1, A.colptr, 1, A.n + 1) copy!(Browval, 1, A.rowval, 1, nnz(A)) return Bcolptr, Browval end function oldbc{Tv}(f::Function, A::SparseMatrixCSC{Tv}) oldbc_T(f, A, Base.promote_op(f, Tv)) end function oldbc_T{TvA,TiA,TvB}(f::Function, A::SparseMatrixCSC{TvA,TiA}, ::Type{TvB}) Bcolptr, Browval = structure(A) Bnzval = Vector{TvB}(nnz(A)) @inbounds @simd for k in 1:nnz(A) Bnzval[k] = f(A.nzval[k]) end return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval) end function newbc{TvA,TiA}(f::Function, A::SparseMatrixCSC{TvA,TiA}) Bcolptr, Browval = structure(A) Bnzval = broadcast(f, view(A.nzval, 1:nnz(A))) return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval) end function newbcnv{TvA,TiA}(f::Function, A::SparseMatrixCSC{TvA,TiA}) Bcolptr, Browval = structure(A) Bnzval = broadcast(f, A.nzval) return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval) end function newbc_forcespec{TvA,TiA,FT<:Function}(f::FT, A::SparseMatrixCSC{TvA,TiA}) Bcolptr, Browval = structure(A) Bnzval = broadcast(f, view(A.nzval, 1:nnz(A))) return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval) end function newbcnv_forcespec{TvA,TiA,FT<:Function}(f::FT, A::SparseMatrixCSC{TvA,TiA}) Bcolptr, Browval = structure(A) Bnzval = broadcast(f, A.nzval) return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval) end using BenchmarkTools N = 10^2; A = sprand(N, N, 0.1); oldbcbench = @benchmark oldbc($abs, $A); newbcbench = @benchmark newbc($abs, $A); newbcnvbench = @benchmark newbcnv($abs, $A); newbc_fsbench = @benchmark newbc_forcespec($abs, $A); newbcnv_fsbench = @benchmark newbcnv_forcespec($abs, $A); begin println("new with view versus old") println(ratio(median(newbcbench), median(oldbcbench))) println("new without view versus old") println(ratio(median(newbcnvbench), median(oldbcbench))) println("new with view and forced function-type specialization versus old") println(ratio(median(newbc_fsbench), median(oldbcbench))) println("new (without view) and forced function-type specialization versus old") println(ratio(median(newbcnv_fsbench), median(oldbcbench))) end

yields

new with view versus old BenchmarkTools.TrialRatio: time: 11.564562706270626 gctime: 1.0 memory: 1.0655270655270654 allocs: 6.6 time tolerance: 5.00% memory tolerance: 1.00% new without view versus old BenchmarkTools.TrialRatio: time: 11.552599009900991 gctime: 1.0 memory: 1.0626780626780628 allocs: 6.4 time tolerance: 5.00% memory tolerance: 1.00% new with view and forced function-type specialization versus old BenchmarkTools.TrialRatio: time: 1.5556930693069306 gctime: 1.0 memory: 1.0037986704653372 allocs: 1.4 time tolerance: 5.00% memory tolerance: 1.00% new (without view) and forced function-type specialization versus old BenchmarkTools.TrialRatio: time: 0.9927805280528053 gctime: 1.0 memory: 1.0009496676163343 allocs: 1.2 time tolerance: 5.00% memory tolerance: 1.00%

So with forced specialization this approach performs well, though using a view incurs some overhead. Should forcing specialization be necessary? Best!

@JeffBezanson, this seems like undesirable behavior; I don't think we want <:Function to be necessary to get good performance in higher-order functions?

tkelman · 2016-10-19T18:55:53Z

test/sparse/sparse.jl

+# `SparseMatrixCSC`s determine a reasonable return element type. (Issue #18974.)
+let
+    A = spdiagm(Int64(1):Int64(4))
+    @test is(eltype(expm1.(A)), Float64) # representative for _unary_nz2nz_z2z class


use isa or ===, since is was recently deprecated

Good catch. Fixed. Thanks!

sorry I think it'll need to be eltype(expm1.(A)) === Float64, or use isa with the array type

…ver SparseMatrixCSCs.

tkelman · 2016-11-25T09:59:23Z

test/sparse/sparse.jl

@@ -1635,3 +1635,12 @@ let
    @test isa(abs.(A), SparseMatrixCSC) # representative for _unary_nz2nz_z2z class
    @test isa(exp.(A), Array) # representative for _unary_nz2nz_z2nz class
 end
+
+# Check that `broadcast` methods specialized for unary operations over
+# `SparseMatrixCSC`s determine a reasonable return element type. (Issue #18974.)


#19065 didn't have a test, should this be added?

Good catch, will prepare a short PR. Thanks for the reminder!

…ver a sparse matrix) from closed PR JuliaLang#18975 that was missed in PR JuliaLang#19065.

Sacha0 added the sparse Sparse arrays label Oct 16, 2016

tkelman reviewed Oct 17, 2016

View reviewed changes

stevengj reviewed Oct 17, 2016

View reviewed changes

Sacha0 force-pushed the spbcpromote branch from 895c192 to 7565582 Compare October 19, 2016 18:51

tkelman reviewed Oct 19, 2016

View reviewed changes

Fix promotion in broadcast methods specialized for unary operations o…

191af6a

…ver SparseMatrixCSCs.

Sacha0 force-pushed the spbcpromote branch from 7565582 to 191af6a Compare October 19, 2016 19:01

Sacha0 closed this Oct 19, 2016

Sacha0 mentioned this pull request Oct 22, 2016

Simplify nz2nz_z2z-class sparse unary broadcast (and fix #18974) #19065

Merged

Sacha0 deleted the spbcpromote branch October 22, 2016 23:59

Sacha0 mentioned this pull request Oct 27, 2016

specialization on function argument not occuring as expected #19137

Closed

tkelman reviewed Nov 25, 2016

View reviewed changes

Sacha0 added a commit to Sacha0/julia that referenced this pull request Nov 26, 2016

Add test for issue JuliaLang#18974 (incorrect eltype from broadcast o…

74c0bb7

…ver a sparse matrix) from closed PR JuliaLang#18975 that was missed in PR JuliaLang#19065.

Sacha0 mentioned this pull request Nov 26, 2016

Add test for issue #18974 #19425

Merged

Sacha0 added a commit to Sacha0/julia that referenced this pull request Nov 27, 2016

Add test for issue JuliaLang#18974 (incorrect eltype from broadcast o…

4cf8f62

…ver a sparse matrix) from closed PR JuliaLang#18975 that was missed in PR JuliaLang#19065.

Sacha0 added a commit to Sacha0/julia that referenced this pull request Dec 1, 2016

Add test for issue JuliaLang#18974 (incorrect eltype from broadcast o…

ca87f75

…ver a sparse matrix) from closed PR JuliaLang#18975 that was missed in PR JuliaLang#19065.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #18974 (promotion in sparse unary broadcast methods) #18975

Fix #18974 (promotion in sparse unary broadcast methods) #18975

Sacha0 commented Oct 16, 2016

tkelman Oct 17, 2016

Sacha0 Oct 19, 2016

stevengj Oct 17, 2016

tkelman Oct 17, 2016

stevengj Oct 17, 2016 •

edited

Loading

Sacha0 Oct 19, 2016

stevengj Oct 19, 2016

Sacha0 Oct 21, 2016

stevengj Oct 22, 2016

Sacha0 Oct 22, 2016 •

edited

Loading

Sacha0 Oct 22, 2016

stevengj Oct 23, 2016

tkelman Oct 19, 2016 •

edited

Loading

Sacha0 Oct 19, 2016

tkelman Oct 19, 2016

tkelman Nov 25, 2016

Sacha0 Nov 26, 2016

Sacha0 Nov 26, 2016

Fix #18974 (promotion in sparse unary broadcast methods) #18975

Fix #18974 (promotion in sparse unary broadcast methods) #18975

Conversation

Sacha0 commented Oct 16, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevengj Oct 17, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Sacha0 Oct 22, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tkelman Oct 19, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevengj Oct 17, 2016 •

edited

Loading

Sacha0 Oct 22, 2016 •

edited

Loading

tkelman Oct 19, 2016 •

edited

Loading