Extending inner products #16573

cortner · 2016-05-25T10:27:39Z

Two related questions:

I have the need for a generalised euclidean inner product

   dot(x, M, y)  =  dot(x, M * y)

which I don't want to hard-code prefer an ability to overload it. Is this functionality sufficiently general (I think it is) that it should be in Base by default for the standard arrays?

EDIT: second question is deleted; it needs more thinking.

The text was updated successfully, but these errors were encountered:

andreasnoack · 2016-05-25T13:43:19Z

I that this would be useful to have in base and I've thought about adding it several times. One of the benefits is that you can completely avoid a temporary M*y.

cortner · 2016-05-25T13:51:22Z

yes - that was exactly the origin of this - I should have said, sorry.

stevengj · 2016-05-25T20:16:04Z

Would be nice to get some benefit from BLAS here, though it's only a BLAS2 operation. @xianyi, do you have a suggestion on the most efficient way to use BLAS for x'*M*y?

One option would be to break x into chunks of say, 16 elements (big enough to exceed a cache line but small enough that the memory allocation of a length-16 temporary vector is an issue, and compute x'*M on that chunk via BLAS2 using a subset of the columns of M (since Julia is column-major, this doesn't require any copying), then accumulate that chunk of the dot product with y. However, this is suboptimal because it doesn't fully exploit multi-threading: each of the chunk dot products could ideally be computed in a parallel thread.

stevengj · 2016-05-25T20:19:24Z

Ideally you'd also want a specialized method for the common case where M is a diagonal matrix, especially a diagonal real matrix.

And tridiagonal "mass" matrices are also common, as well as general sparse matrices M, in problems arising from discretized PDEs.

cortner · 2016-05-25T20:29:45Z

I was mostly thinking of sparse cases for my own use. These should be relatively easy to implement in Julia?

andreasnoack · 2016-05-25T20:38:21Z

For the matvec, it seems that multithreading is almost the only speedup you'll get from BLAS. For a 500x500 problem and a Julia double loop with @inbounds and @simd I get

julia> blas_set_num_threads(1);

julia> @time for i = 1:10000; Ac_mul_B!(y, A, x); end # OpenBLAS 1 thread
  0.484342 seconds

julia> blas_set_num_threads(4);

julia> @time for i = 1:10000; Ac_mul_B!(y, A, x); end # OpenBLAS 4 threads
  0.184681 seconds

julia> @time for i = 1:10000; mymul(y, A, x); end # Julia
  0.509110 seconds

so if/when the @threadsoverhead is gone, a pure Julia version should probably be okay.

stevengj · 2016-05-26T02:10:18Z

I'm surprised there isn't some SIMD speedup.

mschauer · 2016-05-26T08:58:54Z

Probably also worthwhile to special case x === y and issym(A).

andreasnoack · 2016-05-26T13:07:10Z

@stevengj I don't understand. Both OpenBLAS and Julia use SIMD so where would you have expected to have seen the speedup?

stevengj · 2016-05-26T18:41:49Z

In OpenBLAS the SIMD is hard-coded (I thought), whereas in Julia we rely on the compiler to do its magic, which doesn't always work as well. Maybe this is one of the cases where the compiler does a good job in exploiting SIMD.

JaredCrean2 · 2016-05-26T18:51:08Z

In my experiments with mat-vec and mat-mat, Julia has done a good job exploiting SIMD. For mat-vec my implemention was asymptotically as fast as OpenBlas, although they use a better algorithm for mat-mat, so theirs was faster in that case (about about 3x IIRC).

simonbyrne · 2017-10-13T17:49:38Z

To follow up on this, for when everything is sparse we can also do much better by interleaving the computations. e.g. in a recent project I did:

function mul_ut_X_v(u::SparseVector, X::SparseMatrixCSC, v::SparseVector)
    u.n == X.m && X.n == u.n || throw(DimensionMismatch())
    
    z = zero(promote_type(eltype(u),eltype(X),eltype(v)))
    for (vi,vv) in zip(v.nzind, v.nzval)        
        X_ptr_lo = X.colptr[vi] 
        X_ptr_hi = X.colptr[vi+1] - 1
        if X_ptr_lo <= X_ptr_hi        
            z += Base.SparseArrays._spdot(dot, 1, length(u.nzind), u.nzind, u.nzval,
                                            X_ptr_lo, X_ptr_hi, X.rowval, X.nzval) * vv
        end
    end
    z
end

Once we have lazy transposes, we could overload the ternary multiplication operator u' * X * v for this. Until then, I would be fine with overloading dot, or adding a bilinear function.

simonbyrne · 2017-10-16T06:56:31Z

As pointed out by @andyferris, we can now overload ternary * for this, e.g.

function Base.:*(u::RowVector{T}, X::Matrix{T}, v::Vector{T}) where {T}
    m,n = size(X)
    length(u) == m && length(v) == n || throw(DimensionMismatch())
    z = zero(T)
    for j = 1:n
        zz = zero(T)
        @simd for i = 1:m
            @inbounds zz += u[i]*X[i,j]
        end
        z += zz*v[j]
    end
    z
end

This gives a 2x speedup on my machine.

andyferris · 2017-10-16T08:49:32Z

Credit to @ViralBShah also I think? (Note we need to make the mechanics of Ac_mul_B a bit different (nuke it from orbit) to make this work).

stevengj · 2017-10-16T12:09:06Z

I still think it would be nice to have dot(x, A, y), even if ternary * calls this.

simonbyrne · 2017-10-16T23:56:53Z

@andyferris Why would we need to change Ac_mul_B?

andyferris · 2017-10-17T01:55:13Z

I just mean the lowering. You want it to come out as *(adjoint(x), A, y) rather than Ac_mul_B(x, A*y).

simonbyrne · 2017-10-17T03:27:19Z

Overloading the ternary * seems to do that already

mbauman · 2017-10-17T04:41:18Z

Yeah, as long as you use an explicit *. It's fiddly, though:

julia> expand(Main, :(x'*A*y))
:(adjoint(x) * A * y)

julia> expand(Main, :(x'A*y))
:(Ac_mul_B(x, A) * y)

julia> expand(Main, :(x'*A))
:(Ac_mul_B(x, A))

julia> expand(Main, :(x'A))
:(Ac_mul_B(x, A))

andreasnoack · 2017-10-19T08:43:04Z

This would be fixed if we settle on a definition of (and implement) a matrix transpose because then all the Ax_mul_Bxs can go away.

mschauer · 2017-10-19T08:44:36Z

Have there been any regrets about implementing a vector transpose?

andreasnoack · 2017-10-19T08:54:39Z

I think that people are generally happy with the vector transpose. There have been a few comments where people have been surprised/confused/annoyed but I think that is to be expected given the length of #4774.

StefanKarpinski · 2017-10-19T14:52:05Z

Yes, I have to say the experience with RowVectors behaviorally embedded as row matrices except for linear algebraic operations has been lovely. They generally just do what you want.

ViralBShah · 2018-12-17T08:27:14Z

The generalized inner product would be nice to have without creating a temporary.

chriscoey · 2019-05-09T15:54:21Z

What is the current thinking on bringing this to Julia? This type of quadratic form x'Qy (often with x = y and Q symmetric) is ubiquitous in math-optimization (I'm surprised there is no blas function for it). I see there is some implementation in PDMats.jl for the case where Q is PSD (which is restrictive). Brief related discussion: https://discourse.julialang.org/t/matrix-multiplication-a-b-a

andyferris · 2019-05-10T12:30:39Z

@chriscoey I suspect this might be an issue waiting for someone to contribute a pull request to LinearAlgebra? The discussion above seems pretty receptive to the idea.

dkarrasch · 2019-07-30T11:49:15Z

So, we do have a ternary * for Diagonals, and I quickly wrote up one for SparseMatrixCSC and AbstractVectors:

function *(x::Adjoint{<:Any,<:AbstractVector}, A::SparseMatrixCSC, y::AbstractVector)
    length(x) == A.m || throw(DimensionMismatch())
    A.n == length(y) || throw(DimensionMismatch())
    T = promote_type(eltype(x), eltype(A), eltype(y))
    out = T(0)
    rvals = rowvals(A)
    nzvals = nonzeros(A)

    @inbounds for col = 1:A.n
        temp = T(0)
        for k in nzrange(A, col)
            temp += x[rvals[k]] * nzvals[k]
        end
        out += temp * y[col]
    end
    return out
end

Would we insert this into SparseArrays as is (perhaps together with @simonbyrne's all-sparse version above), or should we introduce a generic dot(x, A, y) first, make it throw in general, but have more specific implementations and make the ternary *s call those?

stevengj · 2019-07-30T14:23:37Z

I would have the dot(x, A, y) first.

cortner mentioned this issue May 25, 2016

More Preconditioning JuliaNLSolvers/Optim.jl#202

Closed

stevengj added the domain:linear algebra Linear algebra label May 25, 2016

simonbyrne added the domain:arrays:sparse Sparse arrays label Oct 13, 2017

juliohm mentioned this issue Jan 15, 2018

State of inner products in Base #25565

Closed

lkapelevich mentioned this issue Jul 30, 2019

Faster dot product for symmetric matrices #32730

Closed

dkarrasch mentioned this issue Jul 30, 2019

add generalized dot product #32739

Merged

3 tasks

andreasnoack closed this as completed in #32739 Sep 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extending inner products #16573

Extending inner products #16573

cortner commented May 25, 2016 •

edited

Loading

andreasnoack commented May 25, 2016

cortner commented May 25, 2016

stevengj commented May 25, 2016 •

edited

Loading

stevengj commented May 25, 2016 •

edited

Loading

cortner commented May 25, 2016

andreasnoack commented May 25, 2016

stevengj commented May 26, 2016

mschauer commented May 26, 2016

andreasnoack commented May 26, 2016

stevengj commented May 26, 2016

JaredCrean2 commented May 26, 2016

simonbyrne commented Oct 13, 2017

simonbyrne commented Oct 16, 2017

andyferris commented Oct 16, 2017

stevengj commented Oct 16, 2017

simonbyrne commented Oct 16, 2017

andyferris commented Oct 17, 2017

simonbyrne commented Oct 17, 2017

mbauman commented Oct 17, 2017

andreasnoack commented Oct 19, 2017

mschauer commented Oct 19, 2017

andreasnoack commented Oct 19, 2017

StefanKarpinski commented Oct 19, 2017

ViralBShah commented Dec 17, 2018

chriscoey commented May 9, 2019 •

edited

Loading

andyferris commented May 10, 2019

dkarrasch commented Jul 30, 2019

stevengj commented Jul 30, 2019

Extending inner products #16573

Extending inner products #16573

Comments

cortner commented May 25, 2016 • edited Loading

andreasnoack commented May 25, 2016

cortner commented May 25, 2016

stevengj commented May 25, 2016 • edited Loading

stevengj commented May 25, 2016 • edited Loading

cortner commented May 25, 2016

andreasnoack commented May 25, 2016

stevengj commented May 26, 2016

mschauer commented May 26, 2016

andreasnoack commented May 26, 2016

stevengj commented May 26, 2016

JaredCrean2 commented May 26, 2016

simonbyrne commented Oct 13, 2017

simonbyrne commented Oct 16, 2017

andyferris commented Oct 16, 2017

stevengj commented Oct 16, 2017

simonbyrne commented Oct 16, 2017

andyferris commented Oct 17, 2017

simonbyrne commented Oct 17, 2017

mbauman commented Oct 17, 2017

andreasnoack commented Oct 19, 2017

mschauer commented Oct 19, 2017

andreasnoack commented Oct 19, 2017

StefanKarpinski commented Oct 19, 2017

ViralBShah commented Dec 17, 2018

chriscoey commented May 9, 2019 • edited Loading

andyferris commented May 10, 2019

dkarrasch commented Jul 30, 2019

stevengj commented Jul 30, 2019

cortner commented May 25, 2016 •

edited

Loading

stevengj commented May 25, 2016 •

edited

Loading

stevengj commented May 25, 2016 •

edited

Loading

chriscoey commented May 9, 2019 •

edited

Loading