-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #18974 (promotion in sparse unary broadcast methods) #18975
Conversation
# `SparseMatrixCSC`s determine a reasonable return element type. (Issue #18974.) | ||
let | ||
A = spdiagm(1:4) | ||
@test eltype(expm1.(A)) <: AbstractFloat # representative for _unary_nz2nz_z2z class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may as well be more specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. Forced Int64
/Float64
in the tests. Thanks!
@@ -1433,7 +1433,7 @@ function _broadcast_unary_nz2z_z2z_T{TvA,TiA,TvB}(f::Function, A::SparseMatrixCS | |||
return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval) | |||
end | |||
function _broadcast_unary_nz2z_z2z{Tv}(f::Function, A::SparseMatrixCSC{Tv}) | |||
_broadcast_unary_nz2z_z2z_T(f, A, Tv) | |||
_broadcast_unary_nz2z_z2z_T(f, A, promote_op(f, Tv)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to call broadcast(f, A.nzval)
and just pass the resulting array? That way we re-use the type computations in broadcast
(which are more sophisticated than promote_op
for non-empty arrays).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For nz2nz_z2z
that would probably make sense. I think there was a different issue that discussed removing the nz2z_z2z
class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even for nz2z_z2z
you could use broadcast(f, A.nzval)
and filter out zeros. You could even do it in-place (with a resize!
at the end), without wasting much memory if the fraction of zeros introduced is small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea that. Another rewrite of sparse unary broadcast is on my agenda. Would you prefer to (1) merge this PR as an interim fix or (2) close this PR and incorporate a better fix for #18974 (specifically your suggestion) as part of that rewrite? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to focus on fixing broadcast
generally for sparse-matrix types, since an interim solution is not urgent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevengj, I benchmarked two versions of your suggestion (for nz2nz_z2z-class unary operations). The overhead from calling broadcast(f, view(A.nzval, 1:nnz(A)))
(and to a lesser degree broadcast(f, A.nzval)
) seems substantial for small to moderate size sparse matrices. Thoughts? (Benchmarks below). Best!
function oldbc{Tv}(f::Function, A::SparseMatrixCSC{Tv})
oldbc_T(f, A, Base.promote_op(f, Tv))
end
function oldbc_T{TvA,TiA,TvB}(f::Function, A::SparseMatrixCSC{TvA,TiA}, ::Type{TvB})
Bcolptr = Vector{TiA}(A.n + 1)
Browval = Vector{TiA}(nnz(A))
Bnzval = Vector{TvB}(nnz(A))
copy!(Bcolptr, 1, A.colptr, 1, A.n + 1)
copy!(Browval, 1, A.rowval, 1, nnz(A))
@inbounds @simd for k in 1:nnz(A)
Bnzval[k] = f(A.nzval[k])
end
return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval)
end
function newbc{TvA,TiA}(f::Function, A::SparseMatrixCSC{TvA,TiA})
Bcolptr = Vector{TiA}(A.n + 1)
Browval = Vector{TiA}(nnz(A))
copy!(Bcolptr, 1, A.colptr, 1, A.n + 1)
copy!(Browval, 1, A.rowval, 1, nnz(A))
Bnzval = broadcast(f, view(A.nzval, 1:nnz(A)))
return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval)
end
function newbcnv{TvA,TiA}(f::Function, A::SparseMatrixCSC{TvA,TiA})
Bcolptr = Vector{TiA}(A.n + 1)
Browval = Vector{TiA}(nnz(A))
copy!(Bcolptr, 1, A.colptr, 1, A.n + 1)
copy!(Browval, 1, A.rowval, 1, nnz(A))
Bnzval = broadcast(f, A.nzval)
return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval)
end
using BenchmarkTools
for N in [10^2, 10^3, 10^4]
println(N)
A = sprand(N, N, 0.1)
oldbench = @benchmark oldbc($abs, $A)
newbench = @benchmark newbc($abs, $A)
newnvbench = @benchmark newbcnv($abs, $A)
println(ratio(median(newnvbench), median(oldbench)))
println(ratio(median(newbench), median(oldbench)))
end
yields
100
BenchmarkTools.TrialRatio:
time: 11.641124140719773
gctime: 1.0
memory: 1.0632790028763184
allocs: 7.75
time tolerance: 5.00%
memory tolerance: 1.00%
BenchmarkTools.TrialRatio:
time: 11.815406389001213
gctime: 1.0
memory: 1.0661553211888783
allocs: 8.0
time tolerance: 5.00%
memory tolerance: 1.00%
1000
BenchmarkTools.TrialRatio:
time: 1.2436132534507298
gctime: 1.0
memory: 1.0006739279095351
allocs: 5.833333333333333
time tolerance: 5.00%
memory tolerance: 1.00%
BenchmarkTools.TrialRatio:
time: 1.6859576533933556
gctime: 1.0
memory: 1.000703660023191
allocs: 6.0
time tolerance: 5.00%
memory tolerance: 1.00%
10000
BenchmarkTools.TrialRatio:
time: 1.0039543258604076
gctime: 1.0021435474746039
memory: 1.0000067974835716
allocs: 5.142857142857143
time tolerance: 5.00%
memory tolerance: 1.00%
BenchmarkTools.TrialRatio:
time: 1.2308898483951354
gctime: 1.004561220662931
memory: 1.0000070973725528
allocs: 5.285714285714286
time tolerance: 5.00%
memory tolerance: 1.00%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's odd, when I've tried broadcast on any non-tiny array it's usually nearly as fast as a manual loop. Can you drill down to find the source of the slowdown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drilling down a bit. What is your metric for tiny? Best! (Edit: I originally forgot to interpolate globals in the benchmarks below. Fixing that substantially decreased the performance gap. Perhaps something went amiss with the benchmarks above as well... (Edit edit: No, the benchmarks above were correct. Mystery elsewhere.))
function manmap(f, x)
y = similar(x)
@inbounds @simd for k in indices(x, 1)
y[k] = f(x[k])
end
return y
end
using BenchmarkTools
foo = ones(10^3);
mapbench = @benchmark map($abs, $foo);
manbench = @benchmark manmap($abs, $foo);
bcbench = @benchmark broadcast($abs, $foo);
ratio(median(mapbench), median(manbench))
ratio(median(bcbench), median(manbench))
yields
julia> ratio(median(mapbench), median(manbench))
BenchmarkTools.TrialRatio:
time: 1.1104553119730185
gctime: 1.0
memory: 1.0019685039370079
allocs: 2.0
time tolerance: 5.00%
memory tolerance: 1.00%
julia> ratio(median(bcbench), median(manbench))
BenchmarkTools.TrialRatio:
time: 1.0185497470489038
gctime: 1.0
memory: 1.0
allocs: 1.0
time tolerance: 5.00%
memory tolerance: 1.00%
(Julia Commit 438d1ea (2016-10-22 07:39 UTC) / Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mystery solved. Specialization on the type of the function being broadcast is not occurring:
function structure{TvA,TiA}(A::SparseMatrixCSC{TvA,TiA})
Bcolptr = Vector{TiA}(A.n + 1)
Browval = Vector{TiA}(nnz(A))
copy!(Bcolptr, 1, A.colptr, 1, A.n + 1)
copy!(Browval, 1, A.rowval, 1, nnz(A))
return Bcolptr, Browval
end
function oldbc{Tv}(f::Function, A::SparseMatrixCSC{Tv})
oldbc_T(f, A, Base.promote_op(f, Tv))
end
function oldbc_T{TvA,TiA,TvB}(f::Function, A::SparseMatrixCSC{TvA,TiA}, ::Type{TvB})
Bcolptr, Browval = structure(A)
Bnzval = Vector{TvB}(nnz(A))
@inbounds @simd for k in 1:nnz(A)
Bnzval[k] = f(A.nzval[k])
end
return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval)
end
function newbc{TvA,TiA}(f::Function, A::SparseMatrixCSC{TvA,TiA})
Bcolptr, Browval = structure(A)
Bnzval = broadcast(f, view(A.nzval, 1:nnz(A)))
return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval)
end
function newbcnv{TvA,TiA}(f::Function, A::SparseMatrixCSC{TvA,TiA})
Bcolptr, Browval = structure(A)
Bnzval = broadcast(f, A.nzval)
return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval)
end
function newbc_forcespec{TvA,TiA,FT<:Function}(f::FT, A::SparseMatrixCSC{TvA,TiA})
Bcolptr, Browval = structure(A)
Bnzval = broadcast(f, view(A.nzval, 1:nnz(A)))
return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval)
end
function newbcnv_forcespec{TvA,TiA,FT<:Function}(f::FT, A::SparseMatrixCSC{TvA,TiA})
Bcolptr, Browval = structure(A)
Bnzval = broadcast(f, A.nzval)
return SparseMatrixCSC(A.m, A.n, Bcolptr, Browval, Bnzval)
end
using BenchmarkTools
N = 10^2;
A = sprand(N, N, 0.1);
oldbcbench = @benchmark oldbc($abs, $A);
newbcbench = @benchmark newbc($abs, $A);
newbcnvbench = @benchmark newbcnv($abs, $A);
newbc_fsbench = @benchmark newbc_forcespec($abs, $A);
newbcnv_fsbench = @benchmark newbcnv_forcespec($abs, $A);
begin
println("new with view versus old")
println(ratio(median(newbcbench), median(oldbcbench)))
println("new without view versus old")
println(ratio(median(newbcnvbench), median(oldbcbench)))
println("new with view and forced function-type specialization versus old")
println(ratio(median(newbc_fsbench), median(oldbcbench)))
println("new (without view) and forced function-type specialization versus old")
println(ratio(median(newbcnv_fsbench), median(oldbcbench)))
end
yields
new with view versus old
BenchmarkTools.TrialRatio:
time: 11.564562706270626
gctime: 1.0
memory: 1.0655270655270654
allocs: 6.6
time tolerance: 5.00%
memory tolerance: 1.00%
new without view versus old
BenchmarkTools.TrialRatio:
time: 11.552599009900991
gctime: 1.0
memory: 1.0626780626780628
allocs: 6.4
time tolerance: 5.00%
memory tolerance: 1.00%
new with view and forced function-type specialization versus old
BenchmarkTools.TrialRatio:
time: 1.5556930693069306
gctime: 1.0
memory: 1.0037986704653372
allocs: 1.4
time tolerance: 5.00%
memory tolerance: 1.00%
new (without view) and forced function-type specialization versus old
BenchmarkTools.TrialRatio:
time: 0.9927805280528053
gctime: 1.0
memory: 1.0009496676163343
allocs: 1.2
time tolerance: 5.00%
memory tolerance: 1.00%
So with forced specialization this approach performs well, though using a view incurs some overhead. Should forcing specialization be necessary? Best!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JeffBezanson, this seems like undesirable behavior; I don't think we want <:Function
to be necessary to get good performance in higher-order functions?
# `SparseMatrixCSC`s determine a reasonable return element type. (Issue #18974.) | ||
let | ||
A = spdiagm(Int64(1):Int64(4)) | ||
@test is(eltype(expm1.(A)), Float64) # representative for _unary_nz2nz_z2z class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use isa
or ===
, since is
was recently deprecated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Fixed. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I think it'll need to be eltype(expm1.(A)) === Float64
, or use isa
with the array type
…ver SparseMatrixCSCs.
@@ -1635,3 +1635,12 @@ let | |||
@test isa(abs.(A), SparseMatrixCSC) # representative for _unary_nz2nz_z2z class | |||
@test isa(exp.(A), Array) # representative for _unary_nz2nz_z2nz class | |||
end | |||
|
|||
# Check that `broadcast` methods specialized for unary operations over | |||
# `SparseMatrixCSC`s determine a reasonable return element type. (Issue #18974.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#19065 didn't have a test, should this be added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, will prepare a short PR. Thanks for the reminder!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…ver a sparse matrix) from closed PR JuliaLang#18975 that was missed in PR JuliaLang#19065.
…ver a sparse matrix) from closed PR JuliaLang#18975 that was missed in PR JuliaLang#19065.
…ver a sparse matrix) from closed PR JuliaLang#18975 that was missed in PR JuliaLang#19065.
Without this PR
whereas with this PR
Best!