Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #18974 (promotion in sparse unary broadcast methods) #18975

Closed
wants to merge 1 commit into from

Conversation

Sacha0
Copy link
Member

@Sacha0 Sacha0 commented Oct 16, 2016

Without this PR

julia> sin.(spdiagm(1:4))
ERROR: InexactError()
 in convert(::Type{Int64}, ::Float64) at ./float.jl:637
 in _broadcast_unary_nz2z_z2z_T(::Base.#sin, ::SparseMatrixCSC{Int64,Int64}, ::Type{Int64}) at ./sparse/sparsematrix.jl:1425
 in broadcast(::Base.#sin, ::SparseMatrixCSC{Int64,Int64}) at ./sparse/sparsematrix.jl:1403

whereas with this PR

julia> sin.(spdiagm(1:4))
4×4 sparse matrix with 4 Float64 nonzero entries:
        [1, 1]  =  0.841471
        [2, 2]  =  0.909297
        [3, 3]  =  0.14112
        [4, 4]  =  -0.756802

Best!

@Sacha0 Sacha0 added the sparse Sparse arrays label Oct 16, 2016
# `SparseMatrixCSC`s determine a reasonable return element type. (Issue #18974.)
let
A = spdiagm(1:4)
@test eltype(expm1.(A)) <: AbstractFloat # representative for _unary_nz2nz_z2z class
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may as well be more specific?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Forced Int64/Float64 in the tests. Thanks!

@@ -1433,7 +1433,7 @@ function _broadcast_unary_nz2z_z2z_T{TvA,TiA,TvB}(f::Function, A::SparseMatrixCS
return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval)
end
function _broadcast_unary_nz2z_z2z{Tv}(f::Function, A::SparseMatrixCSC{Tv})
_broadcast_unary_nz2z_z2z_T(f, A, Tv)
_broadcast_unary_nz2z_z2z_T(f, A, promote_op(f, Tv))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to call broadcast(f, A.nzval) and just pass the resulting array? That way we re-use the type computations in broadcast (which are more sophisticated than promote_op for non-empty arrays).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For nz2nz_z2z that would probably make sense. I think there was a different issue that discussed removing the nz2z_z2z class.

Copy link
Member

@stevengj stevengj Oct 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even for nz2z_z2z you could use broadcast(f, A.nzval) and filter out zeros. You could even do it in-place (with a resize! at the end), without wasting much memory if the fraction of zeros introduced is small.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea that. Another rewrite of sparse unary broadcast is on my agenda. Would you prefer to (1) merge this PR as an interim fix or (2) close this PR and incorporate a better fix for #18974 (specifically your suggestion) as part of that rewrite? Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to focus on fixing broadcast generally for sparse-matrix types, since an interim solution is not urgent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevengj, I benchmarked two versions of your suggestion (for nz2nz_z2z-class unary operations). The overhead from calling broadcast(f, view(A.nzval, 1:nnz(A))) (and to a lesser degree broadcast(f, A.nzval)) seems substantial for small to moderate size sparse matrices. Thoughts? (Benchmarks below). Best!

function oldbc{Tv}(f::Function, A::SparseMatrixCSC{Tv})
    oldbc_T(f, A, Base.promote_op(f, Tv))
end
function oldbc_T{TvA,TiA,TvB}(f::Function, A::SparseMatrixCSC{TvA,TiA}, ::Type{TvB})
    Bcolptr = Vector{TiA}(A.n + 1)
    Browval = Vector{TiA}(nnz(A))
    Bnzval = Vector{TvB}(nnz(A))
    copy!(Bcolptr, 1, A.colptr, 1, A.n + 1)
    copy!(Browval, 1, A.rowval, 1, nnz(A))
    @inbounds @simd for k in 1:nnz(A)
        Bnzval[k] = f(A.nzval[k])
    end
    return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval)
end

function newbc{TvA,TiA}(f::Function, A::SparseMatrixCSC{TvA,TiA})
    Bcolptr = Vector{TiA}(A.n + 1)
    Browval = Vector{TiA}(nnz(A))
    copy!(Bcolptr, 1, A.colptr, 1, A.n + 1)
    copy!(Browval, 1, A.rowval, 1, nnz(A))
    Bnzval = broadcast(f, view(A.nzval, 1:nnz(A)))
    return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval)
end

function newbcnv{TvA,TiA}(f::Function, A::SparseMatrixCSC{TvA,TiA})
    Bcolptr = Vector{TiA}(A.n + 1)
    Browval = Vector{TiA}(nnz(A))
    copy!(Bcolptr, 1, A.colptr, 1, A.n + 1)
    copy!(Browval, 1, A.rowval, 1, nnz(A))
    Bnzval = broadcast(f, A.nzval)
    return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval)
end

using BenchmarkTools

for N in [10^2, 10^3, 10^4]
    println(N)
    A = sprand(N, N, 0.1)
    oldbench = @benchmark oldbc($abs, $A)
    newbench = @benchmark newbc($abs, $A)
    newnvbench = @benchmark newbcnv($abs, $A)
    println(ratio(median(newnvbench), median(oldbench)))
    println(ratio(median(newbench), median(oldbench)))
end

yields

100
BenchmarkTools.TrialRatio:
  time:             11.641124140719773
  gctime:           1.0
  memory:           1.0632790028763184
  allocs:           7.75
  time tolerance:   5.00%
  memory tolerance: 1.00%
BenchmarkTools.TrialRatio:
  time:             11.815406389001213
  gctime:           1.0
  memory:           1.0661553211888783
  allocs:           8.0
  time tolerance:   5.00%
  memory tolerance: 1.00%
1000
BenchmarkTools.TrialRatio:
  time:             1.2436132534507298
  gctime:           1.0
  memory:           1.0006739279095351
  allocs:           5.833333333333333
  time tolerance:   5.00%
  memory tolerance: 1.00%
BenchmarkTools.TrialRatio:
  time:             1.6859576533933556
  gctime:           1.0
  memory:           1.000703660023191
  allocs:           6.0
  time tolerance:   5.00%
  memory tolerance: 1.00%
10000
BenchmarkTools.TrialRatio:
  time:             1.0039543258604076
  gctime:           1.0021435474746039
  memory:           1.0000067974835716
  allocs:           5.142857142857143
  time tolerance:   5.00%
  memory tolerance: 1.00%
BenchmarkTools.TrialRatio:
  time:             1.2308898483951354
  gctime:           1.004561220662931
  memory:           1.0000070973725528
  allocs:           5.285714285714286
  time tolerance:   5.00%
  memory tolerance: 1.00%

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's odd, when I've tried broadcast on any non-tiny array it's usually nearly as fast as a manual loop. Can you drill down to find the source of the slowdown?

Copy link
Member Author

@Sacha0 Sacha0 Oct 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drilling down a bit. What is your metric for tiny? Best! (Edit: I originally forgot to interpolate globals in the benchmarks below. Fixing that substantially decreased the performance gap. Perhaps something went amiss with the benchmarks above as well... (Edit edit: No, the benchmarks above were correct. Mystery elsewhere.))

function manmap(f, x)
    y = similar(x)
    @inbounds @simd for k in indices(x, 1)
        y[k] = f(x[k])
    end
    return y
end

using BenchmarkTools

foo = ones(10^3);
mapbench = @benchmark map($abs, $foo);
manbench = @benchmark manmap($abs, $foo);
bcbench = @benchmark broadcast($abs, $foo);
ratio(median(mapbench), median(manbench))
ratio(median(bcbench), median(manbench))

yields

julia> ratio(median(mapbench), median(manbench))
BenchmarkTools.TrialRatio:
  time:             1.1104553119730185
  gctime:           1.0
  memory:           1.0019685039370079
  allocs:           2.0
  time tolerance:   5.00%
  memory tolerance: 1.00%

julia> ratio(median(bcbench), median(manbench))
BenchmarkTools.TrialRatio:
  time:             1.0185497470489038
  gctime:           1.0
  memory:           1.0
  allocs:           1.0
  time tolerance:   5.00%
  memory tolerance: 1.00%

(Julia Commit 438d1ea (2016-10-22 07:39 UTC) / Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mystery solved. Specialization on the type of the function being broadcast is not occurring:

function structure{TvA,TiA}(A::SparseMatrixCSC{TvA,TiA})
    Bcolptr = Vector{TiA}(A.n + 1)
    Browval = Vector{TiA}(nnz(A))
    copy!(Bcolptr, 1, A.colptr, 1, A.n + 1)
    copy!(Browval, 1, A.rowval, 1, nnz(A))
    return Bcolptr, Browval
end
function oldbc{Tv}(f::Function, A::SparseMatrixCSC{Tv})
    oldbc_T(f, A, Base.promote_op(f, Tv))
end
function oldbc_T{TvA,TiA,TvB}(f::Function, A::SparseMatrixCSC{TvA,TiA}, ::Type{TvB})
    Bcolptr, Browval = structure(A)
    Bnzval = Vector{TvB}(nnz(A))
    @inbounds @simd for k in 1:nnz(A)
        Bnzval[k] = f(A.nzval[k])
    end
    return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval)
end
function newbc{TvA,TiA}(f::Function, A::SparseMatrixCSC{TvA,TiA})
    Bcolptr, Browval = structure(A)
    Bnzval = broadcast(f, view(A.nzval, 1:nnz(A)))
    return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval)
end
function newbcnv{TvA,TiA}(f::Function, A::SparseMatrixCSC{TvA,TiA})
    Bcolptr, Browval = structure(A)
    Bnzval = broadcast(f, A.nzval)
    return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval)
end
function newbc_forcespec{TvA,TiA,FT<:Function}(f::FT, A::SparseMatrixCSC{TvA,TiA})
    Bcolptr, Browval = structure(A)
    Bnzval = broadcast(f, view(A.nzval, 1:nnz(A)))
    return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval)
end
function newbcnv_forcespec{TvA,TiA,FT<:Function}(f::FT, A::SparseMatrixCSC{TvA,TiA})
    Bcolptr, Browval = structure(A)
    Bnzval = broadcast(f, A.nzval)
    return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval)
end

using BenchmarkTools

N = 10^2;
A = sprand(N, N, 0.1);
oldbcbench = @benchmark oldbc($abs, $A);
newbcbench = @benchmark newbc($abs, $A);
newbcnvbench = @benchmark newbcnv($abs, $A);
newbc_fsbench = @benchmark newbc_forcespec($abs, $A);
newbcnv_fsbench = @benchmark newbcnv_forcespec($abs, $A);

begin
    println("new with view versus old")
    println(ratio(median(newbcbench), median(oldbcbench)))
    println("new without view versus old")
    println(ratio(median(newbcnvbench), median(oldbcbench)))
    println("new with view and forced function-type specialization versus old")
    println(ratio(median(newbc_fsbench), median(oldbcbench)))
    println("new (without view) and forced function-type specialization versus old")
    println(ratio(median(newbcnv_fsbench), median(oldbcbench)))
end

yields

new with view versus old
BenchmarkTools.TrialRatio:
  time:             11.564562706270626
  gctime:           1.0
  memory:           1.0655270655270654
  allocs:           6.6
  time tolerance:   5.00%
  memory tolerance: 1.00%
new without view versus old
BenchmarkTools.TrialRatio:
  time:             11.552599009900991
  gctime:           1.0
  memory:           1.0626780626780628
  allocs:           6.4
  time tolerance:   5.00%
  memory tolerance: 1.00%
new with view and forced function-type specialization versus old
BenchmarkTools.TrialRatio:
  time:             1.5556930693069306
  gctime:           1.0
  memory:           1.0037986704653372
  allocs:           1.4
  time tolerance:   5.00%
  memory tolerance: 1.00%
new (without view) and forced function-type specialization versus old
BenchmarkTools.TrialRatio:
  time:             0.9927805280528053
  gctime:           1.0
  memory:           1.0009496676163343
  allocs:           1.2
  time tolerance:   5.00%
  memory tolerance: 1.00%

So with forced specialization this approach performs well, though using a view incurs some overhead. Should forcing specialization be necessary? Best!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JeffBezanson, this seems like undesirable behavior; I don't think we want <:Function to be necessary to get good performance in higher-order functions?

# `SparseMatrixCSC`s determine a reasonable return element type. (Issue #18974.)
let
A = spdiagm(Int64(1):Int64(4))
@test is(eltype(expm1.(A)), Float64) # representative for _unary_nz2nz_z2z class
Copy link
Contributor

@tkelman tkelman Oct 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use isa or ===, since is was recently deprecated

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I think it'll need to be eltype(expm1.(A)) === Float64, or use isa with the array type

@@ -1635,3 +1635,12 @@ let
@test isa(abs.(A), SparseMatrixCSC) # representative for _unary_nz2nz_z2z class
@test isa(exp.(A), Array) # representative for _unary_nz2nz_z2nz class
end

# Check that `broadcast` methods specialized for unary operations over
# `SparseMatrixCSC`s determine a reasonable return element type. (Issue #18974.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#19065 didn't have a test, should this be added?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, will prepare a short PR. Thanks for the reminder!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sacha0 added a commit to Sacha0/julia that referenced this pull request Nov 26, 2016
…ver a sparse matrix) from closed PR JuliaLang#18975 that was missed in PR JuliaLang#19065.
Sacha0 added a commit to Sacha0/julia that referenced this pull request Nov 27, 2016
…ver a sparse matrix) from closed PR JuliaLang#18975 that was missed in PR JuliaLang#19065.
Sacha0 added a commit to Sacha0/julia that referenced this pull request Dec 1, 2016
…ver a sparse matrix) from closed PR JuliaLang#18975 that was missed in PR JuliaLang#19065.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sparse Sparse arrays
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants