Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classes of unary operations in broadcast over sparse arrays #18309

Closed
Sacha0 opened this issue Aug 31, 2016 · 4 comments
Closed

Classes of unary operations in broadcast over sparse arrays #18309

Sacha0 opened this issue Aug 31, 2016 · 4 comments
Labels

Comments

@Sacha0
Copy link
Member

Sacha0 commented Aug 31, 2016

The implementation of broadcast over SparseMatrixCSCs separates unary functions into three classes (with disjoint logic): (1) unary functions that map zeros to zeros and may map nonzeros to zeros; (2) unary functions that map zeros to zeros and nonzeros to nonzeros; and (3) unary functions that map both zeros and nonzeros to nonzeros.

In the first class, when a nonzero in the original sparse matrix maps to exact zero under, e.g., sin or round, the implementation expunges that entry.

On the one hand, this behavior seems like a holdover from the aggressive-stored-zero-expunging era, now inconsistent with behavior elsewhere. Additionally, this behavior comes at the cost of a branch and additional communication on every iteration of an inner loop, the inability to apply an @simd decoration on that inner loop, and some additional code.

On the other hand, in some cases, for example broadcasting round over a sparse matrix with unity-balanced or standard normally distributed entries, a substantial fraction of the nonzeros might map to zero, and expunging those zeros may be advantageous for some applications (though perhaps not for others).

Should this first class still exist? Thoughts? Best!

Ref. #17265.

@tkelman
Copy link
Contributor

tkelman commented Aug 31, 2016

Worth benchmarking whether the current implementation is any different performance-wise than a simpler implementation that combines classes 1 and 2, and calls dropzeros! if desired for class 1?

@tkelman tkelman added the domain:arrays:sparse Sparse arrays label Aug 31, 2016
@JaredCrean2
Copy link
Contributor

I'd be in favor of getting rid of the first class, for the reasons you mention.

@Sacha0
Copy link
Member Author

Sacha0 commented Oct 23, 2016

Worth benchmarking whether the current implementation is any different performance-wise than a simpler implementation that combines classes 1 and 2, and calls dropzeros! if desired for class 1?

Benchmarks below. tl;dr: (1) For most operations defined in class one (fourteen trigonometric and hyperbolic functions), merging class one into class two would improve default performance (i.e. when not expunging numerical zeros) by up to a few percent and degrade performance by less than that when also calling dropzeros! on the result; (2) For the remaining functions in class one (which more frequently create numerical zeros, namely ceil, floor, trunc, round, real, and imag) and outside of pathological cases, merging the two classes would improve default performance (i.e. when not expunging numerical zeros) by up to a few tens of percent but degrade performance by a larger amount of the same order when also calling dropzeros! on the result; (3) For the functions identified in (2), pathological cases exist wherein performance would degrade substantially (up to several hundred percent runtime in the tests below), e.g. when flooring a sparse matrix with standard-normally distributed entries (which all map to zero).

Thoughts on whether to trade the performance reductions for the greater behavioral consistency, reduced implementation complexity, and moderate performance improvements? Thanks!

using BenchmarkTools

for (fn, desc) in (
        (sin, ", representative of most functions in class one"),
        (round, ", representative of remaining functions in class one (non-pathological case)"),
        (floor, ", representative of the pathological case") )
    println("Results for $(string(fn))$(desc)")
    for N in [10^2, 10^3, 10^4]
        A = sprand(N, N, 0.1) * (fn == floor ? 1 : 10)
        nz2zbench = @benchmark Base.SparseArrays._broadcast_unary_nz2z_z2z($fn, $A)
        nz2nzbench = @benchmark Base.SparseArrays._broadcast_unary_nz2nz_z2z($fn, $A)
        nz2nzdzbench = @benchmark dropzeros!(Base.SparseArrays._broadcast_unary_nz2nz_z2z($fn, $A))
        println("--> For size-$(size(A)) sparse matrix with $(nnz(A)) entries")
        println("-- --> nz2nz versus nz2z")
        println(judge(median(nz2nzbench), median(nz2zbench)))
        println("-- --> nz2nz with dropzeros! versus nz2z")
        println(judge(median(nz2nzdzbench), median(nz2zbench)))
    end
    println()
end

yields

Results for sin, representative of most functions in class one
--> For size-(100,100) sparse matrix with 956 entries
-- --> nz2nz versus nz2z
BenchmarkTools.TrialJudgement:
  time:   -8.31% => improvement (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
-- --> nz2nz with dropzeros! versus nz2z
BenchmarkTools.TrialJudgement:
  time:   +4.27% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
--> For size-(1000,1000) sparse matrix with 100181 entries
-- --> nz2nz versus nz2z
BenchmarkTools.TrialJudgement:
  time:   -4.44% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
-- --> nz2nz with dropzeros! versus nz2z
BenchmarkTools.TrialJudgement:
  time:   +0.43% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
--> For size-(10000,10000) sparse matrix with 10002843 entries
-- --> nz2nz versus nz2z
BenchmarkTools.TrialJudgement:
  time:   -3.93% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
-- --> nz2nz with dropzeros! versus nz2z
BenchmarkTools.TrialJudgement:
  time:   +0.86% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)

Results for round, representative of remaining functions in class one (non-pathological case)
--> For size-(100,100) sparse matrix with 1017 entries
-- --> nz2nz versus nz2z
BenchmarkTools.TrialJudgement:
  time:   -25.20% => improvement (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
-- --> nz2nz with dropzeros! versus nz2z
BenchmarkTools.TrialJudgement:
  time:   +35.54% => regression (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
--> For size-(1000,1000) sparse matrix with 99922 entries
-- --> nz2nz versus nz2z
BenchmarkTools.TrialJudgement:
  time:   -33.23% => improvement (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
-- --> nz2nz with dropzeros! versus nz2z
BenchmarkTools.TrialJudgement:
  time:   +40.89% => regression (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
--> For size-(10000,10000) sparse matrix with 9998682 entries
-- --> nz2nz versus nz2z
BenchmarkTools.TrialJudgement:
  time:   -18.28% => improvement (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
-- --> nz2nz with dropzeros! versus nz2z
BenchmarkTools.TrialJudgement:
  time:   +40.83% => regression (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)

Results for floor, representative of the pathological case
--> For size-(100,100) sparse matrix with 944 entries
-- --> nz2nz versus nz2z
BenchmarkTools.TrialJudgement:
  time:   +25.27% => regression (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
-- --> nz2nz with dropzeros! versus nz2z
BenchmarkTools.TrialJudgement:
  time:   +89.14% => regression (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
--> For size-(1000,1000) sparse matrix with 99952 entries
-- --> nz2nz versus nz2z
BenchmarkTools.TrialJudgement:
  time:   +247.06% => regression (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
-- --> nz2nz with dropzeros! versus nz2z
BenchmarkTools.TrialJudgement:
  time:   +407.48% => regression (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
--> For size-(10000,10000) sparse matrix with 10001131 entries
-- --> nz2nz versus nz2z
BenchmarkTools.TrialJudgement:
  time:   +399.42% => regression (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
-- --> nz2nz with dropzeros! versus nz2z
BenchmarkTools.TrialJudgement:
  time:   +495.62% => regression (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)

@Sacha0
Copy link
Member Author

Sacha0 commented Dec 8, 2016

See related discussion in #19239, #19371, and #19372, and downstream PRs #19438 and #19518. #19518 should close this issue. Best!

@tkelman tkelman closed this as completed Dec 31, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants