Fix `v[i] = -0.0` and support `reinterpret` #296

LilithHafner · 2022-11-24T10:21:18Z

assign on mysparsearray[index] = v whenever v !== zero(eltype(sparsecollection))
stop prohibiting reinterpret

Similar to @StefanKarpinski's suggestion here
Fixes #289
Fixes #294
Fixes #304
Fixes #305
Fixes JuliaLang/julia#48079

The primary motivation for this PR is to fix the bug where someone expects all AbstractArrays to be reinterpretable (there is an ::AbstractArray method defined in Base) and someone else tries passing their method a sparse array and runs into an error.

codecov-commenter · 2022-11-24T10:45:04Z

Codecov Report

Merging #296 (319c509) into main (31b491e) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #296      +/-   ##
==========================================
+ Coverage   93.71%   93.74%   +0.02%     
==========================================
  Files          12       12              
  Lines        7433     7431       -2     
==========================================
  Hits         6966     6966              
+ Misses        467      465       -2

Impacted Files	Coverage Δ
src/abstractsparse.jl	`60.41% <ø> (+2.41%)`	⬆️
src/sparsematrix.jl	`95.46% <100.00%> (ø)`
src/sparsevector.jl	`94.95% <100.00%> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

fredrikekre · 2022-11-24T15:42:57Z

Is it really useful to allow reinterpret here? Seems like quite a performance trap since the result won't be considered sparse? For example, for something like a matmul you will not even hit BLAS, just the generic matmul.

SobhanMP · 2022-11-24T23:59:12Z

` @test m[1, 1] === -0.0` looks really bad as `iszero(-0.0)` is true :/

LilithHafner · 2022-11-25T00:33:25Z

AbstractSparseArray <: AbstractArray, so operations like broadcasted addition with a scalar are supported even though they produce dense results. reinterpret falls into the same category: an operation that is defined for all other AbstractArrays and produces a dense result.

Both broadcasted addition with a scalar and reinterpret could, with specialized code here, produce sparse results. Nevertheless, I see a clear hierarchy of sparse result > dense result > error and don't think that the possibility of producing a sparse result should get in the way of removing an unnecessary error.

The reason this error exists is because of behavior problems, not because of conversion to a dense array (source). This PR fixes the behavior problem.

Variable default values would let both these operations cleanly produce sparse results (and allow IEEE compliance with non-infinite floating point results), but that is not for this PR.

@test m[1, 1] === -0.0 looks really bad as iszero(-0.0)

How do you feel about the current behavior?

julia> x = sprand(5, .5)
5-element SparseVector{Float64, Int64} with 3 stored entries:
  [1]  =  0.00352086
  [3]  =  0.111171
  [5]  =  0.223453

julia> x .= -0.0
5-element SparseVector{Float64, Int64} with 3 stored entries:
  [1]  =  -0.0
  [3]  =  -0.0
  [5]  =  -0.0

julia> x .=== -0.0
5-element SparseVector{Bool, Int64} with 3 stored entries:
  [1]  =  1
  [3]  =  1
  [5]  =  1

julia> collect(x .=== -0.0)
5-element Vector{Bool}:
 1
 0
 1
 0
 1

SobhanMP · 2022-11-25T18:51:02Z

How do you feel about the current behavior?

looks like intended behavior? the rule is do not make new entries unless non-zero but keep entries in an array unless dropzero

SobhanMP · 2022-11-25T18:52:44Z

i also think @dkarrasch spent a bit of time getting _isnotzero(v) to work nicely but you're simply ignoring that some types may have expensive zeros (think static arrays) or that they can have multiple zeros.

LilithHafner · 2022-11-26T01:44:10Z

the rule is do not make new entries unless non-zero but keep entries in an array unless dropzero

This is the rule in the publically documented API: "In some applications, it is convenient to store explicit zero values in a SparseMatrixCSC. These are accepted by functions in Base (but there is no guarantee that they will be preserved in mutating operations)."

you're simply ignoring that some types may have expensive zeros (think static arrays)

This implementation calls zero slightly fewer times for static arrays (and all other array types).

It's also worth noting that static arrays' zero method is faster than the existing _isnotzero method

julia> @btime SparseArrays._isnotzero(x) setup=(x = @SVector rand(20))
  6.143 ns (0 allocations: 0 bytes)
true

julia> @btime x !== zero(x) setup=(x = @SVector rand(20))
  4.909 ns (0 allocations: 0 bytes)
true

julia> @btime x !== zero(x) setup=(x = @SVector zeros(20))
  1.781 ns (0 allocations: 0 bytes)
false

julia> @btime SparseArrays._isnotzero(x) setup=(x = @SVector zeros(20))
  1.915 ns (0 allocations: 0 bytes)
false

More importantly, actual benchmarks indicate a small (negligible) speedup from this PR:

julia> function f(n, m)
           s = spzeros(SVector{10, Float64}, n, n)
           for _ in 1:m
               s[rand(eachindex(s))] = @SVector rand(10)
           end
           out = 0
           for _ in 1:m
               out += sum(s[rand(eachindex(s))])
           end
           out
       end
f (generic function with 1 method)

julia> @benchmark f(1_000, 10_000) # master
BenchmarkTools.Trial: 119 samples with 1 evaluation.
 Range (min … max):  27.345 ms … 43.278 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     28.643 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   30.044 ms ±  3.999 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

     ▇█▂                                                       
  ▃▅▇████▄▃▃▁▃▁▁▁▄▄▃▃▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▃▄▃▄ ▃
  27.3 ms         Histogram: frequency by time        43.2 ms <

 Memory estimate: 3.51 MiB, allocs estimate: 19.

julia> @benchmark f(1_000, 10_000) # PR
BenchmarkTools.Trial: 169 samples with 1 evaluation.
 Range (min … max):  26.936 ms … 52.152 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     28.515 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   29.270 ms ±  2.905 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

      ▆█▆▂                                                     
  ▂▁▆█████▅▆▅▃▃▃▂▃▃▂▂▃▁▁▁▂▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▃▁▁▁▁▁▁▂▁▁▁▂ ▂
  26.9 ms         Histogram: frequency by time        41.5 ms <

 Memory estimate: 3.51 MiB, allocs estimate: 19.

julia> versioninfo()
Julia Version 1.10.0-DEV.27
Commit 90934ff730 (2022-11-19 21:56 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin21.3.0)
  CPU: 4 × Intel(R) Core(TM) i5-8210Y CPU @ 1.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 1 on 2 virtual cores
Environment:
  LD_LIBRARY_PATH = /usr/local/lib
  JULIA_PKG_PRECOMPILE_AUTO = 0

or that they can have multiple zeros.

Addressing the case of multiple zeros more reliably is the point. A classic assumption folks make about arrays is that if you set an element to a value and then check back later (without any intervening writes) to see what the value is, it will be the exact same value you set initially. This PR makes SparseArrays conform to that behavior without a substantial runtime cost.

fredrikekre · 2022-11-30T11:50:08Z

AbstractSparseArray <: AbstractArray, so operations like broadcasted addition with a scalar are supported even though they produce dense results. reinterpret falls into the same category: an operation that is defined for all other AbstractArrays and produces a dense result.

But in many cases it would probably be better to get an error for that addition. And if you really want a dense result, then why use a sparse array to begin with?

I wouldn't say that reinterpret is in the same category here, because that would always give a non-sparse looking result, so that is always bad.

LilithHafner · 2022-11-30T12:24:35Z

I suppose there are two things at play here: fixing v[i] = -0.0 and supporting reinterpret. It seems that you oppose supporting reinterpret. Do you also oppose making v[i] = -0.0; v[i] === -0.0 always hold?

StefanKarpinski · 2022-11-30T19:12:53Z

Not super specific to this PR, but my take is that v[i] = x should always result in structural non-zero being stored, regardless of the value of x. That makes v[i] = -0.0 automatically satisfy v[i] === -0.0. If you want to drop structural non-zeros that have a zero value, then you should have to call dropzeros!(v) and that could have an option for either keeping or dropping negative zeros. However, that's very much not how sparse arrays currently work, so I'm not sure if changing this makes this inconsistent in an undesirable way.

rayegun · 2022-11-30T19:58:01Z

v[i] = x should always result in structural non-zero being stored, regardless of the value of x.

This is how the GraphBLAS spec works and how I intend the Finch.jl supported ecosystem to work as well. I don't think we can do this until v2.0 here though.

LilithHafner · 2022-12-01T01:29:01Z

Branching off the reinterpret question into a larger discussion on operations that produce dense results: #307

fredrikekre · 2022-12-01T10:21:28Z

my take is that v[i] = x should always result in structural non-zero being stored, regardless of the value of x

I agree. Right now you have to jump through hoops to insert stored entries (for example adding and subtracting some value at the same index...).

LilithHafner · 2023-01-05T21:11:28Z

Triage approves.

Triage spent a while discussing the objection that there could be a major performance regression if someone counts on assigning -0.0 being structurally zero but reached consensus that that should not be an issue in sensible use cases.

Triage briefly discussed the objection that someone may have counted on the semantics of -0.0 turning into 0.0 but dismissed this as an unreasonable expectation.

StefanKarpinski · 2023-01-05T21:23:47Z

The reasoning about "sensible use cases" is that the potential issue would be if you're assigning a lot of -0.0 values into a sparse array and counting on them being discarded in order to not blow up memory; but if you're doing that, then you have a performance problem anyway since you're doing O(n*m) work, even if you weren't using O(n*m) storage. Basically, reasonable sparse matrix code shouldn't be be relying on assignments no being stored in the first place.

LilithHafner · 2023-01-05T21:27:06Z

backport-1.9 because of JuliaLang/julia#48079

fredrikekre · 2023-01-06T16:02:25Z

What is the purpose/usecase for reinterprethere? To hit slow AbstractArray fallbacks? Personally I would prefer an error when hitting such a code-path.

LilithHafner · 2023-01-06T16:06:11Z

#289 from the OP is one such example

fredrikekre · 2023-01-06T16:24:18Z

No, you wouldn't be able to use the result of reinterpret for anything useful, like passing to lldl as asked for in the issue (https://github.com/JuliaSmoothOptimizers/LimitedLDLFactorizations.jl/blob/51eec8dae729b9c14486ae3465edb700fb2130aa/src/LimitedLDLFactorizations.jl#L472).

LilithHafner · 2023-01-06T16:48:18Z

"anything useful" is quite a broad umbrella. Three potentially useful things I can imagine doing are taking a random sample with rand(reinterpret(T, x), 100), visualizing a reinterpreted array with julia> reinterpret(T, x), and even computing a cheap operation like sum(reinterpret(T, x)). These operations could be faster if there existed a specialized reinterpret function for sparse arrays but if they are not the bottleneck, it is sometimes okay to ignore sparsity. In general, we want to allow folks to write efficient code but not require premature optimization by throwing on inefficient operations.

SparseVector <: AbstractVector which indicates that when a specialized operation is not implemented, we want to fall back to the generic AbstractArray case instead of throwing a method error. To single out a single inefficient operation (e.g. reinterpret) and make it throw while leaving other even less efficient operations (e.g. various sorting functions) as their default requires a strong justification for why we should go out of our way to overwrite the default behavior of reinterpret to make it throw.

The strong justification that motivated the introduction of this error—incorrect behavior—is now gone.

In any case, it should be possible to implement an efficient specialization for reinterpret on sparse arrays which would make this particular objection moot. I don't feel qualified to write that implementation, though, because I have neither an in depth understanding of SparseArrays.jl internals nor a precise specification of what it entails to subtype AbstractSparseArray.

github-actions · 2023-01-20T18:10:39Z

The backport to 1.9 failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-1.9 1.9
# Navigate to the new working tree
cd .worktrees/backport-1.9
# Create a new branch
git switch --create backport-296-to-1.9
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick --mainline 1 d4c36be30c62762bb1b9224c1963b723494260f9
# Push it to GitHub
git push --set-upstream origin backport-296-to-1.9
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-1.9

Then, create a pull request where the base branch is 1.9 and the compare/head branch is backport-296-to-1.9.

* assign whenever v !== zero(eltype(sparsecollection)) * test -0.0 * allow reinterpret * test reinterpret

LilithHafner · 2023-04-17T19:51:01Z

@dkarrasch, did this make it into 1.9? I'm still seeing the bug it is supposed to fix:

@v1.9) pkg> activate --temp
  Activating new project at `/var/folders/hc/fn82kz1j5vl8w7lwd4l079y80000gn/T/jl_FokowT`

(jl_FokowT) pkg> st
Status `/private/var/folders/hc/fn82kz1j5vl8w7lwd4l079y80000gn/T/jl_FokowT/Project.toml` (empty project)

julia> versioninfo()
Julia Version 1.9.0-rc2
Commit 72aec423c2a (2023-04-01 10:41 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.3.0)
  CPU: 8 × Apple M2
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)
  Threads: 1 on 4 virtual cores

julia> using SparseArrays

julia> sort!(sprand(100, .1))
ERROR: `reinterpret` on sparse arrays is discontinued.
Try reinterpreting the value itself instead.

Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] reinterpret(#unused#::Type, A::SparseVector{Float64, Int64})
   @ SparseArrays ~/.julia/juliaup/julia-1.9.0-rc2+0.aarch64.apple.darwin14/share/julia/stdlib/v1.9/SparseArrays/src/abstractsparse.jl:79
...

KristofferC · 2023-04-17T19:59:16Z

It should be on the latest backport PR in Julia.

Lilith Hafner and others added 7 commits November 24, 2022 15:16

assign whenever v !== zero(eltype(sparsecollection))

f4b784e

test -0.0

84ba1ef

allow reinterpret

875bf1c

test reinterpret

6cdcb91

support AbstractArray eltypes even though they have no zero

13cff66

enter PR numbers

7efe01b

Merge branch 'main' into always-assign

28b338f

LilithHafner changed the title ~~Fix 'v[i] = -0.0' and support 'reinterpret'~~ Fix v[i] = -0.0 and support reinterpret Nov 24, 2022

make tests wordsize-independant

efdd06e

LilithHafner mentioned this pull request Dec 1, 2022

Refactor and document sorting dispatch JuliaLang/julia#47383

Merged

LilithHafner mentioned this pull request Jan 2, 2023

sortperm broken on arrays that do not support reinterpret JuliaLang/julia#48079

Closed

LilithHafner added the backport 1.9 label Jan 5, 2023

Merge branch 'main' into always-assign

319c509

Merge branch 'main' into always-assign

d899a08

LilithHafner merged commit d4c36be into JuliaSparse:main Jan 20, 2023

LilithHafner deleted the always-assign branch January 20, 2023 18:10

LilithHafner added a commit that referenced this pull request Jan 20, 2023

Fix v[i] = -0.0 and support reinterpret (#296)

2501f92

* assign whenever v !== zero(eltype(sparsecollection)) * test -0.0 * allow reinterpret * test reinterpret

LilithHafner mentioned this pull request Jan 20, 2023

Backport #296 to 1.9 (#296) #333

Closed

dkarrasch pushed a commit that referenced this pull request Jan 20, 2023

Fix v[i] = -0.0 and support reinterpret (#296)

dbe3e95

* assign whenever v !== zero(eltype(sparsecollection)) * test -0.0 * allow reinterpret * test reinterpret

dkarrasch removed the backport 1.9 label Jan 30, 2023

SobhanMP mentioned this pull request Apr 30, 2023

Inconsistent behavior in assignment to sparse matrices of Dual numbers JuliaDiff/ForwardDiff.jl#542

Closed

LilithHafner mentioned this pull request May 13, 2023

Julia 1.9.0 doesn't drop zeros when assigning to a sparse array only when the elements are Big #389

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `v[i] = -0.0` and support `reinterpret` #296

Fix `v[i] = -0.0` and support `reinterpret` #296

LilithHafner commented Nov 24, 2022 •

edited

Loading

codecov-commenter commented Nov 24, 2022 •

edited

Loading

fredrikekre commented Nov 24, 2022

SobhanMP commented Nov 24, 2022 via email •

edited

Loading

LilithHafner commented Nov 25, 2022

SobhanMP commented Nov 25, 2022

SobhanMP commented Nov 25, 2022

LilithHafner commented Nov 26, 2022

fredrikekre commented Nov 30, 2022

LilithHafner commented Nov 30, 2022

StefanKarpinski commented Nov 30, 2022

rayegun commented Nov 30, 2022 •

edited

Loading

LilithHafner commented Dec 1, 2022

fredrikekre commented Dec 1, 2022

LilithHafner commented Jan 5, 2023

StefanKarpinski commented Jan 5, 2023 •

edited

Loading

LilithHafner commented Jan 5, 2023

fredrikekre commented Jan 6, 2023

LilithHafner commented Jan 6, 2023

fredrikekre commented Jan 6, 2023

LilithHafner commented Jan 6, 2023

github-actions bot commented Jan 20, 2023

LilithHafner commented Apr 17, 2023

KristofferC commented Apr 17, 2023

Fix v[i] = -0.0 and support reinterpret #296

Fix v[i] = -0.0 and support reinterpret #296

Conversation

LilithHafner commented Nov 24, 2022 • edited Loading

codecov-commenter commented Nov 24, 2022 • edited Loading

Codecov Report

fredrikekre commented Nov 24, 2022

SobhanMP commented Nov 24, 2022 via email • edited Loading

LilithHafner commented Nov 25, 2022

SobhanMP commented Nov 25, 2022

SobhanMP commented Nov 25, 2022

LilithHafner commented Nov 26, 2022

fredrikekre commented Nov 30, 2022

LilithHafner commented Nov 30, 2022

StefanKarpinski commented Nov 30, 2022

rayegun commented Nov 30, 2022 • edited Loading

LilithHafner commented Dec 1, 2022

fredrikekre commented Dec 1, 2022

LilithHafner commented Jan 5, 2023

StefanKarpinski commented Jan 5, 2023 • edited Loading

LilithHafner commented Jan 5, 2023

fredrikekre commented Jan 6, 2023

LilithHafner commented Jan 6, 2023

fredrikekre commented Jan 6, 2023

LilithHafner commented Jan 6, 2023

github-actions bot commented Jan 20, 2023

LilithHafner commented Apr 17, 2023

KristofferC commented Apr 17, 2023

Fix `v[i] = -0.0` and support `reinterpret` #296

Fix `v[i] = -0.0` and support `reinterpret` #296

LilithHafner commented Nov 24, 2022 •

edited

Loading

codecov-commenter commented Nov 24, 2022 •

edited

Loading

SobhanMP commented Nov 24, 2022 via email •

edited

Loading

rayegun commented Nov 30, 2022 •

edited

Loading

StefanKarpinski commented Jan 5, 2023 •

edited

Loading