Classes of unary operations in broadcast over sparse arrays #18309

Sacha0 · 2016-08-31T17:57:19Z

The implementation of broadcast over SparseMatrixCSCs separates unary functions into three classes (with disjoint logic): (1) unary functions that map zeros to zeros and may map nonzeros to zeros; (2) unary functions that map zeros to zeros and nonzeros to nonzeros; and (3) unary functions that map both zeros and nonzeros to nonzeros.

In the first class, when a nonzero in the original sparse matrix maps to exact zero under, e.g., sin or round, the implementation expunges that entry.

On the one hand, this behavior seems like a holdover from the aggressive-stored-zero-expunging era, now inconsistent with behavior elsewhere. Additionally, this behavior comes at the cost of a branch and additional communication on every iteration of an inner loop, the inability to apply an @simd decoration on that inner loop, and some additional code.

On the other hand, in some cases, for example broadcasting round over a sparse matrix with unity-balanced or standard normally distributed entries, a substantial fraction of the nonzeros might map to zero, and expunging those zeros may be advantageous for some applications (though perhaps not for others).

Should this first class still exist? Thoughts? Best!

Ref. #17265.

The text was updated successfully, but these errors were encountered:

tkelman · 2016-08-31T18:01:33Z

Worth benchmarking whether the current implementation is any different performance-wise than a simpler implementation that combines classes 1 and 2, and calls dropzeros! if desired for class 1?

JaredCrean2 · 2016-08-31T21:46:46Z

I'd be in favor of getting rid of the first class, for the reasons you mention.

Sacha0 · 2016-10-23T01:31:15Z

Worth benchmarking whether the current implementation is any different performance-wise than a simpler implementation that combines classes 1 and 2, and calls dropzeros! if desired for class 1?

Benchmarks below. tl;dr: (1) For most operations defined in class one (fourteen trigonometric and hyperbolic functions), merging class one into class two would improve default performance (i.e. when not expunging numerical zeros) by up to a few percent and degrade performance by less than that when also calling dropzeros! on the result; (2) For the remaining functions in class one (which more frequently create numerical zeros, namely ceil, floor, trunc, round, real, and imag) and outside of pathological cases, merging the two classes would improve default performance (i.e. when not expunging numerical zeros) by up to a few tens of percent but degrade performance by a larger amount of the same order when also calling dropzeros! on the result; (3) For the functions identified in (2), pathological cases exist wherein performance would degrade substantially (up to several hundred percent runtime in the tests below), e.g. when flooring a sparse matrix with standard-normally distributed entries (which all map to zero).

Thoughts on whether to trade the performance reductions for the greater behavioral consistency, reduced implementation complexity, and moderate performance improvements? Thanks!

using BenchmarkTools

for (fn, desc) in (
        (sin, ", representative of most functions in class one"),
        (round, ", representative of remaining functions in class one (non-pathological case)"),
        (floor, ", representative of the pathological case") )
    println("Results for $(string(fn))$(desc)")
    for N in [10^2, 10^3, 10^4]
        A = sprand(N, N, 0.1) * (fn == floor ? 1 : 10)
        nz2zbench = @benchmark Base.SparseArrays._broadcast_unary_nz2z_z2z($fn, $A)
        nz2nzbench = @benchmark Base.SparseArrays._broadcast_unary_nz2nz_z2z($fn, $A)
        nz2nzdzbench = @benchmark dropzeros!(Base.SparseArrays._broadcast_unary_nz2nz_z2z($fn, $A))
        println("--> For size-$(size(A)) sparse matrix with $(nnz(A)) entries")
        println("-- --> nz2nz versus nz2z")
        println(judge(median(nz2nzbench), median(nz2zbench)))
        println("-- --> nz2nz with dropzeros! versus nz2z")
        println(judge(median(nz2nzdzbench), median(nz2zbench)))
    end
    println()
end

yields

Results for sin, representative of most functions in class one
--> For size-(100,100) sparse matrix with 956 entries
-- --> nz2nz versus nz2z
BenchmarkTools.TrialJudgement:
  time:   -8.31% => improvement (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
-- --> nz2nz with dropzeros! versus nz2z
BenchmarkTools.TrialJudgement:
  time:   +4.27% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
--> For size-(1000,1000) sparse matrix with 100181 entries
-- --> nz2nz versus nz2z
BenchmarkTools.TrialJudgement:
  time:   -4.44% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
-- --> nz2nz with dropzeros! versus nz2z
BenchmarkTools.TrialJudgement:
  time:   +0.43% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
--> For size-(10000,10000) sparse matrix with 10002843 entries
-- --> nz2nz versus nz2z
BenchmarkTools.TrialJudgement:
  time:   -3.93% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
-- --> nz2nz with dropzeros! versus nz2z
BenchmarkTools.TrialJudgement:
  time:   +0.86% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)

Results for round, representative of remaining functions in class one (non-pathological case)
--> For size-(100,100) sparse matrix with 1017 entries
-- --> nz2nz versus nz2z
BenchmarkTools.TrialJudgement:
  time:   -25.20% => improvement (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
-- --> nz2nz with dropzeros! versus nz2z
BenchmarkTools.TrialJudgement:
  time:   +35.54% => regression (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
--> For size-(1000,1000) sparse matrix with 99922 entries
-- --> nz2nz versus nz2z
BenchmarkTools.TrialJudgement:
  time:   -33.23% => improvement (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
-- --> nz2nz with dropzeros! versus nz2z
BenchmarkTools.TrialJudgement:
  time:   +40.89% => regression (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
--> For size-(10000,10000) sparse matrix with 9998682 entries
-- --> nz2nz versus nz2z
BenchmarkTools.TrialJudgement:
  time:   -18.28% => improvement (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
-- --> nz2nz with dropzeros! versus nz2z
BenchmarkTools.TrialJudgement:
  time:   +40.83% => regression (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)

Results for floor, representative of the pathological case
--> For size-(100,100) sparse matrix with 944 entries
-- --> nz2nz versus nz2z
BenchmarkTools.TrialJudgement:
  time:   +25.27% => regression (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
-- --> nz2nz with dropzeros! versus nz2z
BenchmarkTools.TrialJudgement:
  time:   +89.14% => regression (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
--> For size-(1000,1000) sparse matrix with 99952 entries
-- --> nz2nz versus nz2z
BenchmarkTools.TrialJudgement:
  time:   +247.06% => regression (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
-- --> nz2nz with dropzeros! versus nz2z
BenchmarkTools.TrialJudgement:
  time:   +407.48% => regression (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
--> For size-(10000,10000) sparse matrix with 10001131 entries
-- --> nz2nz versus nz2z
BenchmarkTools.TrialJudgement:
  time:   +399.42% => regression (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
-- --> nz2nz with dropzeros! versus nz2z
BenchmarkTools.TrialJudgement:
  time:   +495.62% => regression (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)

Sacha0 · 2016-12-08T05:29:05Z

See related discussion in #19239, #19371, and #19372, and downstream PRs #19438 and #19518. #19518 should close this issue. Best!

tkelman added the domain:arrays:sparse Sparse arrays label Aug 31, 2016

Sacha0 mentioned this issue Sep 19, 2016

Deprecate vectorized round methods in favor of compact broadcast syntax #18590

Closed

Sacha0 mentioned this issue Nov 6, 2016

more generic broadcast over a single sparse matrix #19239

Merged

tkelman closed this as completed Dec 31, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classes of unary operations in broadcast over sparse arrays #18309

Classes of unary operations in broadcast over sparse arrays #18309

Sacha0 commented Aug 31, 2016 •

edited

Loading

tkelman commented Aug 31, 2016

JaredCrean2 commented Aug 31, 2016

Sacha0 commented Oct 23, 2016 •

edited

Loading

Sacha0 commented Dec 8, 2016

Classes of unary operations in broadcast over sparse arrays #18309

Classes of unary operations in broadcast over sparse arrays #18309

Comments

Sacha0 commented Aug 31, 2016 • edited Loading

tkelman commented Aug 31, 2016

JaredCrean2 commented Aug 31, 2016

Sacha0 commented Oct 23, 2016 • edited Loading

Sacha0 commented Dec 8, 2016

Sacha0 commented Aug 31, 2016 •

edited

Loading

Sacha0 commented Oct 23, 2016 •

edited

Loading