Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

diag of SparseMatrixCSC should always return SparseVector #23261

Merged
merged 6 commits into from
Aug 18, 2017

Conversation

fredrikekre
Copy link
Member

diag(A::SparseMatrixCSC, k::Int=0) now dispatches here and then to getindex for sparse matrices. Will do some benchmarking if it is worth updating SpDiagIterator, or keep it as is.

fix #21064

@tkelman
Copy link
Contributor

tkelman commented Aug 14, 2017

if it doesn't help performance to adapt SpDiagIterator, we should maybe deprecate that if it isn't used anywhere else

@fredrikekre
Copy link
Member Author

It is used in trace too, but thats it. Will see what benchmarks say tomorrow.

@ararslan ararslan added linear algebra Linear algebra sparse Sparse arrays labels Aug 14, 2017
Copy link
Member

@Sacha0 Sacha0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

superficially lgtm :).

@@ -879,7 +879,7 @@ for f in (:\, :Ac_ldiv_B, :At_ldiv_B)
if m == n
if istril(A)
if istriu(A)
return ($f)(Diagonal(A), B)
return ($f)(Diagonal(Vector(diag(A))), B)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this to avoid changing the behavior? I guess it should still work with a sparse vector, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea it works, but will result in a SparseMatrixCSC, so the result is not inferable.

@fredrikekre
Copy link
Member Author

It seems that using SpDiagIterator is faster so we should rewrite that to work with arbitrary diagonal. I suppose this could be merged as is and we can make performance improvements later, but might as well leave it as WIP until I get around to fix it.

@fredrikekre
Copy link
Member Author

I updated SpDiagIterator such that we can iterate over any diagonal.

ind = Vector{Ti}(); sizehint!(ind, min(l, nnz(A)))
val = Vector{Tv}(); sizehint!(val, min(l, nnz(A)))
for (i, v) in enumerate(SpDiagIterator(A, d))
if !iszero(v)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does SpDiagIterator skip structural zeros or produce them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it is like getindex so it returns either the stored value at that position or zero(T). I am tempted to just iterate over stored values but that would be breaking I guess. Could also just remove SpDiagIterator and move it into the diag function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would be breaking I guess

in what way?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like

for i in SpDiagIterator(spzeros(3,3))
    # ...
end

Currently gives a loop of length 3, with i being 0 all three times. If the iterator only returned stored values, this loop would be done on the first iteration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes good point. I just grepped all registered packages and it's never used outside of Base, so I think as long as it still does what it's supposed to in the places that Base uses it, we can change its behavior if it would be an improvement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I did some searching too, but found it nowhere else. In that case I think we can just remove it. I did some benchmarking for trace and it seems that

function trace2(A::SparseMatrixCSC{Tv,Ti}) where {Tv,Ti}
    n = checksquare(A)
    d = zero(Tv)
    for i in 1:n
        d += A[i,i]
    end
    return d
end

is faster than using the iterator. So I will remove it, and then include the iteration code inside the diag body instead.

@fredrikekre
Copy link
Member Author

Take 3: Removed SpDiagIterator and imported the iteration code into the diag body. Now we collect all the stored values, so stored zeros will be preserved, which is kinda neat IMO

julia> A
3×3 SparseMatrixCSC{Float64,Int64} with 1 stored entry:
  [1, 1]  =  0.0

julia> diag(A)
3-element SparseVector{Float64,Int64} with 1 stored entry:
  [1]  =  0.0

(r1 > r2) && (return (zero(Tv), j+1))
r1 = searchsortedfirst(A.rowval, j, r1, r2, Forward)
(((r1 > r2) || (A.rowval[r1] != j)) ? zero(Tv) : A.nzval[r1], j+1)
function diag(A::SparseMatrixCSC{Tv,Ti}, d::Int=0) where {Tv,Ti}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like most of the other methods allow Integer for the second argument

@fredrikekre fredrikekre changed the title [WIP] diag of SparseMatrixCSC should always return SparseVector diag of SparseMatrixCSC should always return SparseVector Aug 17, 2017
throw(ArgumentError("requested diagonal, $d, out of bounds in matrix of size ($m, $n)"))
end
l = d < 0 ? min(m+d,n) : min(n-d,m)
r, c = d <= 0 ? (-d, 0) : (0, d) # start row/col -1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps either type r and c or use zero to on the right side to avoid potential type instability?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converted d directly to Int so should be ok now :)

ind = Vector{Ti}()
val = Vector{Tv}()
for i in 1:l
r += 1; c += 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise here, either type r and c or use oneunit on the right side to avoid potential type instability?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fixed by converting to Int directly.

for d in SpDiagIterator(A)
s += d
for i in 1:n
s += A[i,i]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this version perform similarly to the original? Sparse getindex is fairly complex / expensive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it do pretty much exactly what the sparse iterator what doing? You still gotta search, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect so, at least roughly. But my conjectures often fail, so explicit verification has become my friend :).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some benchmarking suggested this was faster. The iterator was even allocating some stuff. I will get back with number :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using BenchmarkTools
function trace2(A::SparseMatrixCSC{Tv}) where Tv
    n = Base.LinAlg.checksquare(A)
    s = zero(Tv)
    for i in 1:n
        s += A[i,i]
    end
    return s
end

for s in (1000, 5000), p in (0.1, 0.01, 0.005)
    S1 = sprand(s, s, p)
    S2 = S1 + speye(s, s) # typical case with values on all diagonal positions
    println("trace")
    @btime trace($S1)
    @btime trace($S2)
    println("trace2")
    @btime trace2($S1)
    @btime trace2($S2)
end

with output:

trace
  24.820 μs (6 allocations: 176 bytes)
  26.273 μs (6 allocations: 176 bytes)
trace2
  25.804 μs (0 allocations: 0 bytes)
  25.010 μs (0 allocations: 0 bytes)
trace
  11.110 μs (6 allocations: 176 bytes)
  11.680 μs (6 allocations: 176 bytes)
trace2
  9.736 μs (0 allocations: 0 bytes)
  10.217 μs (0 allocations: 0 bytes)
trace
  9.370 μs (6 allocations: 176 bytes)
  9.862 μs (6 allocations: 176 bytes)
trace2
  7.443 μs (0 allocations: 0 bytes)
  7.938 μs (0 allocations: 0 bytes)
trace
  204.548 μs (6 allocations: 176 bytes)
  218.412 μs (6 allocations: 176 bytes)
trace2
  197.253 μs (0 allocations: 0 bytes)
  210.470 μs (0 allocations: 0 bytes)
trace
  117.837 μs (6 allocations: 176 bytes)
  120.147 μs (6 allocations: 176 bytes)
trace2
  112.441 μs (0 allocations: 0 bytes)
  113.419 μs (0 allocations: 0 bytes)
trace
  104.461 μs (6 allocations: 176 bytes)
  106.751 μs (6 allocations: 176 bytes)
trace2
  97.655 μs (0 allocations: 0 bytes)
  100.329 μs (0 allocations: 0 bytes)

r1 = Int(A.colptr[c])
r2 = Int(A.colptr[c+1]-1)
r1 > r2 && continue
r1 = searchsortedfirst(A.rowval, r, r1, r2, Forward)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC searchsortedfirst always returns an Int?

S3 = sprand(T, 5, 10, 0.5)
for S in (S1, S2, S3)
A = Matrix(S)
@test diag(S)::SparseVector{T,Int} == diag(A)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra space before ==?

A = Matrix(S)
@test diag(S)::SparseVector{T,Int} == diag(A)
for k in -size(S,1):size(S,2)
@test diag(S, k)::SparseVector{T,Int} == diag(A, k)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra space before ==?

Copy link
Member

@Sacha0 Sacha0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fredrikekre! :)

The Travis macOS failure appears unrelated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linear algebra Linear algebra sparse Sparse arrays
Projects
None yet
Development

Successfully merging this pull request may close these issues.

diag(sparse matrix) should be a sparse vector?
6 participants