Fix #18974 (promotion in sparse unary broadcast methods) #18975

stevengj · 2016-10-17T17:36:29Z

Wouldn't it be better to call broadcast(f, A.nzval) and just pass the resulting array? That way we re-use the type computations in broadcast (which are more sophisticated than promote_op for non-empty arrays).

For nz2nz_z2z that would probably make sense. I think there was a different issue that discussed removing the nz2z_z2z class.

Even for nz2z_z2z you could use broadcast(f, A.nzval) and filter out zeros. You could even do it in-place (with a resize! at the end), without wasting much memory if the fraction of zeros introduced is small.

Great idea that. Another rewrite of sparse unary broadcast is on my agenda. Would you prefer to (1) merge this PR as an interim fix or (2) close this PR and incorporate a better fix for #18974 (specifically your suggestion) as part of that rewrite? Thanks!

I'd prefer to focus on fixing broadcast generally for sparse-matrix types, since an interim solution is not urgent.

@stevengj, I benchmarked two versions of your suggestion (for nz2nz_z2z-class unary operations). The overhead from calling broadcast(f, view(A.nzval, 1:nnz(A))) (and to a lesser degree broadcast(f, A.nzval)) seems substantial for small to moderate size sparse matrices. Thoughts? (Benchmarks below). Best!

function oldbc{Tv}(f::Function, A::SparseMatrixCSC{Tv}) oldbc_T(f, A, Base.promote_op(f, Tv)) end function oldbc_T{TvA,TiA,TvB}(f::Function, A::SparseMatrixCSC{TvA,TiA}, ::Type{TvB}) Bcolptr = Vector{TiA}(A.n + 1) Browval = Vector{TiA}(nnz(A)) Bnzval = Vector{TvB}(nnz(A)) copy!(Bcolptr, 1, A.colptr, 1, A.n + 1) copy!(Browval, 1, A.rowval, 1, nnz(A)) @inbounds @simd for k in 1:nnz(A) Bnzval[k] = f(A.nzval[k]) end return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval) end function newbc{TvA,TiA}(f::Function, A::SparseMatrixCSC{TvA,TiA}) Bcolptr = Vector{TiA}(A.n + 1) Browval = Vector{TiA}(nnz(A)) copy!(Bcolptr, 1, A.colptr, 1, A.n + 1) copy!(Browval, 1, A.rowval, 1, nnz(A)) Bnzval = broadcast(f, view(A.nzval, 1:nnz(A))) return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval) end function newbcnv{TvA,TiA}(f::Function, A::SparseMatrixCSC{TvA,TiA}) Bcolptr = Vector{TiA}(A.n + 1) Browval = Vector{TiA}(nnz(A)) copy!(Bcolptr, 1, A.colptr, 1, A.n + 1) copy!(Browval, 1, A.rowval, 1, nnz(A)) Bnzval = broadcast(f, A.nzval) return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval) end using BenchmarkTools for N in [10^2, 10^3, 10^4] println(N) A = sprand(N, N, 0.1) oldbench = @benchmark oldbc($abs, $A) newbench = @benchmark newbc($abs, $A) newnvbench = @benchmark newbcnv($abs, $A) println(ratio(median(newnvbench), median(oldbench))) println(ratio(median(newbench), median(oldbench))) end

yields

100 BenchmarkTools.TrialRatio: time: 11.641124140719773 gctime: 1.0 memory: 1.0632790028763184 allocs: 7.75 time tolerance: 5.00% memory tolerance: 1.00% BenchmarkTools.TrialRatio: time: 11.815406389001213 gctime: 1.0 memory: 1.0661553211888783 allocs: 8.0 time tolerance: 5.00% memory tolerance: 1.00% 1000 BenchmarkTools.TrialRatio: time: 1.2436132534507298 gctime: 1.0 memory: 1.0006739279095351 allocs: 5.833333333333333 time tolerance: 5.00% memory tolerance: 1.00% BenchmarkTools.TrialRatio: time: 1.6859576533933556 gctime: 1.0 memory: 1.000703660023191 allocs: 6.0 time tolerance: 5.00% memory tolerance: 1.00% 10000 BenchmarkTools.TrialRatio: time: 1.0039543258604076 gctime: 1.0021435474746039 memory: 1.0000067974835716 allocs: 5.142857142857143 time tolerance: 5.00% memory tolerance: 1.00% BenchmarkTools.TrialRatio: time: 1.2308898483951354 gctime: 1.004561220662931 memory: 1.0000070973725528 allocs: 5.285714285714286 time tolerance: 5.00% memory tolerance: 1.00%

That's odd, when I've tried broadcast on any non-tiny array it's usually nearly as fast as a manual loop. Can you drill down to find the source of the slowdown?

Drilling down a bit. What is your metric for tiny? Best! (Edit: I originally forgot to interpolate globals in the benchmarks below. Fixing that substantially decreased the performance gap. Perhaps something went amiss with the benchmarks above as well... (Edit edit: No, the benchmarks above were correct. Mystery elsewhere.))

function manmap(f, x) y = similar(x) @inbounds @simd for k in indices(x, 1) y[k] = f(x[k]) end return y end using BenchmarkTools foo = ones(10^3); mapbench = @benchmark map($abs, $foo); manbench = @benchmark manmap($abs, $foo); bcbench = @benchmark broadcast($abs, $foo); ratio(median(mapbench), median(manbench)) ratio(median(bcbench), median(manbench))

yields

julia> ratio(median(mapbench), median(manbench)) BenchmarkTools.TrialRatio: time: 1.1104553119730185 gctime: 1.0 memory: 1.0019685039370079 allocs: 2.0 time tolerance: 5.00% memory tolerance: 1.00% julia> ratio(median(bcbench), median(manbench)) BenchmarkTools.TrialRatio: time: 1.0185497470489038 gctime: 1.0 memory: 1.0 allocs: 1.0 time tolerance: 5.00% memory tolerance: 1.00%

(Julia Commit 438d1ea (2016-10-22 07:39 UTC) / Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz)

Mystery solved. Specialization on the type of the function being broadcast is not occurring:

function structure{TvA,TiA}(A::SparseMatrixCSC{TvA,TiA}) Bcolptr = Vector{TiA}(A.n + 1) Browval = Vector{TiA}(nnz(A)) copy!(Bcolptr, 1, A.colptr, 1, A.n + 1) copy!(Browval, 1, A.rowval, 1, nnz(A)) return Bcolptr, Browval end function oldbc{Tv}(f::Function, A::SparseMatrixCSC{Tv}) oldbc_T(f, A, Base.promote_op(f, Tv)) end function oldbc_T{TvA,TiA,TvB}(f::Function, A::SparseMatrixCSC{TvA,TiA}, ::Type{TvB}) Bcolptr, Browval = structure(A) Bnzval = Vector{TvB}(nnz(A)) @inbounds @simd for k in 1:nnz(A) Bnzval[k] = f(A.nzval[k]) end return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval) end function newbc{TvA,TiA}(f::Function, A::SparseMatrixCSC{TvA,TiA}) Bcolptr, Browval = structure(A) Bnzval = broadcast(f, view(A.nzval, 1:nnz(A))) return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval) end function newbcnv{TvA,TiA}(f::Function, A::SparseMatrixCSC{TvA,TiA}) Bcolptr, Browval = structure(A) Bnzval = broadcast(f, A.nzval) return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval) end function newbc_forcespec{TvA,TiA,FT<:Function}(f::FT, A::SparseMatrixCSC{TvA,TiA}) Bcolptr, Browval = structure(A) Bnzval = broadcast(f, view(A.nzval, 1:nnz(A))) return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval) end function newbcnv_forcespec{TvA,TiA,FT<:Function}(f::FT, A::SparseMatrixCSC{TvA,TiA}) Bcolptr, Browval = structure(A) Bnzval = broadcast(f, A.nzval) return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval) end using BenchmarkTools N = 10^2; A = sprand(N, N, 0.1); oldbcbench = @benchmark oldbc($abs, $A); newbcbench = @benchmark newbc($abs, $A); newbcnvbench = @benchmark newbcnv($abs, $A); newbc_fsbench = @benchmark newbc_forcespec($abs, $A); newbcnv_fsbench = @benchmark newbcnv_forcespec($abs, $A); begin println("new with view versus old") println(ratio(median(newbcbench), median(oldbcbench))) println("new without view versus old") println(ratio(median(newbcnvbench), median(oldbcbench))) println("new with view and forced function-type specialization versus old") println(ratio(median(newbc_fsbench), median(oldbcbench))) println("new (without view) and forced function-type specialization versus old") println(ratio(median(newbcnv_fsbench), median(oldbcbench))) end

yields

new with view versus old BenchmarkTools.TrialRatio: time: 11.564562706270626 gctime: 1.0 memory: 1.0655270655270654 allocs: 6.6 time tolerance: 5.00% memory tolerance: 1.00% new without view versus old BenchmarkTools.TrialRatio: time: 11.552599009900991 gctime: 1.0 memory: 1.0626780626780628 allocs: 6.4 time tolerance: 5.00% memory tolerance: 1.00% new with view and forced function-type specialization versus old BenchmarkTools.TrialRatio: time: 1.5556930693069306 gctime: 1.0 memory: 1.0037986704653372 allocs: 1.4 time tolerance: 5.00% memory tolerance: 1.00% new (without view) and forced function-type specialization versus old BenchmarkTools.TrialRatio: time: 0.9927805280528053 gctime: 1.0 memory: 1.0009496676163343 allocs: 1.2 time tolerance: 5.00% memory tolerance: 1.00%

So with forced specialization this approach performs well, though using a view incurs some overhead. Should forcing specialization be necessary? Best!

@JeffBezanson, this seems like undesirable behavior; I don't think we want <:Function to be necessary to get good performance in higher-order functions?

tkelman · 2016-11-25T09:59:23Z

#19065 didn't have a test, should this be added?

Good catch, will prepare a short PR. Thanks for the reminder!

-Original file line number
+Diff line change
@@ Expand Up @@
         return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval)
     end
     function _broadcast_unary_nz2z_z2z{Tv}(f::Function, A::SparseMatrixCSC{Tv})
-        _broadcast_unary_nz2z_z2z_T(f, A, Tv)
+        _broadcast_unary_nz2z_z2z_T(f, A, promote_op(f, Tv))
     end
     @_enumerate_childmethods(_broadcast_unary_nz2z_z2z,
         sin, sinh, sind, asin, asinh, asind,
@@ Expand Down Expand Up @@
         return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval)
     end
     function _broadcast_unary_nz2nz_z2z{Tv}(f::Function, A::SparseMatrixCSC{Tv})
-        _broadcast_unary_nz2nz_z2z_T(f, A, Tv)
+        _broadcast_unary_nz2nz_z2z_T(f, A, promote_op(f, Tv))
     end
     @_enumerate_childmethods(_broadcast_unary_nz2nz_z2z,
         log1p, expm1, abs, abs2, conj)
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -1635,3 +1635,12 @@ let @@
         @test isa(abs.(A), SparseMatrixCSC) # representative for _unary_nz2nz_z2z class
         @test isa(exp.(A), Array) # representative for _unary_nz2nz_z2nz class
     end
+    # Check that `broadcast` methods specialized for unary operations over
+    # `SparseMatrixCSC`s determine a reasonable return element type. (Issue #18974.)
+    let
+        A = spdiagm(Int64(1):Int64(4))
+        @test isa(eltype(expm1.(A)), Float64) # representative for _unary_nz2nz_z2z class
+        @test isa(eltype(exp.(A)), Float64) # representative for _unary_nz2nz_z2nz class
+        @test isa(eltype(sin.(A)), Float64) # representative for _unary_nz2z_z2z class
+    end

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #18974 (promotion in sparse unary broadcast methods) #18975

Diff view

Diff view

There are no files selected for viewing

stevengj Oct 17, 2016

tkelman Oct 17, 2016

stevengj Oct 17, 2016 •

edited

Loading

Sacha0 Oct 19, 2016

stevengj Oct 19, 2016

Sacha0 Oct 21, 2016

stevengj Oct 22, 2016

Sacha0 Oct 22, 2016 •

edited

Loading

Sacha0 Oct 22, 2016

stevengj Oct 23, 2016

tkelman Nov 25, 2016

Sacha0 Nov 26, 2016

Sacha0 Nov 26, 2016

Fix #18974 (promotion in sparse unary broadcast methods) #18975

Fix #18974 (promotion in sparse unary broadcast methods) #18975

Diff view

Diff view

There are no files selected for viewing

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevengj Oct 17, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Sacha0 Oct 22, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevengj Oct 17, 2016 •

edited

Loading

Sacha0 Oct 22, 2016 •

edited

Loading