@mul gives Array{TrackedReal} instead of TrackedArray #8

baggepinnen · 2019-11-27T02:45:17Z

The code below produces an array of tracked reals instead of a tracked array. If the arrays are CuArrays, then the code fails since there is no constructor for CuArray(::Array{TrackedReal})

julia> Zd = TrackedArray(randn(4,3,2)); M = TrackedArray(randn(3,2));

julia> @mul E[a,d] := Zd[a,b,d]*M[b,d]
4×2 Array{Tracker.TrackedReal{Float64},2}:
  2.1719     0.248542
  2.50329   -0.100249
 -0.90269   -0.509704
 -0.290149   1.33654

The text was updated successfully, but these errors were encountered:

baggepinnen · 2019-11-27T04:13:11Z

For future reference, this is a workaround

dropdims(sum(Zd.*reshape(M,1,size(M)...), dims=2), dims=2)

mcabbott · 2019-11-27T10:39:18Z

This use of @mul uses my very naiive batch-matrix-multiplication function, which makes slices. I was actually going to remove this in the re-write, #5…

It would not be super-hard to add a gradient defn. for this function. It would be better still to call someone else’s function, more in the spirit of this package just being the front-end. For CuArrays there are special kernels for doing such things. But making all of this work nicely in Julia is work in progress... Relevant links:
FluxML/NNlib.jl#100
https://github.com/Roger-luo/BatchedRoutines.jl
https://github.com/chengchingwen/Transformers.jl/blob/master/src/fix/batchedmul.jl

That workaround is precisely this, which should be correct but won’t be fast:

@reduce E[a,d] := sum(b) Zd[a,b,d] * M[b,d]

mcabbott · 2020-01-27T13:42:06Z

I discovered that OMEinsum now supports batch matmul, as of under-Peter/OMEinsum.jl#74. So at least with Zygote your example can now work:

using Zygote, Random, OMEinsum#master
Random.seed!(42);
Zd_ = randn(4,3,2); M_ = randn(3,2);
f(Zd, M) = (@ein E[a,d] := Zd[a,b,d]*M[b,d]; sum(exp.(E)))
Zygote.gradient(f, Zd_, M_)

using TensorCast#master  # @mul doesn't handle this, but the @reduce fallback does:
g(Zd, M) = (@reduce E[a,d] := sum(b) Zd[a,b,d]*M[b,d]; sum(exp.(E)))
Zygote.gradient(g, Zd_, M_) # agrees!

Surprisingly you have to get to quite large arrays for its gradient to be faster than the @reduce method:

julia> Zd_0 = randn(400,300,200); M_0 = randn(300,200);

julia> @btime Zygote.gradient(f, $Zd_0, $M_0);
  209.318 ms (240768 allocations: 375.26 MiB)

julia> @btime f($Zd_0, $M_0);
  6.751 ms (237 allocations: 1.24 MiB)

julia> @btime Zygote.gradient(g, $Zd_0, $M_0);
  339.116 ms (240106 allocations: 740.82 MiB)

julia> @btime g($Zd_0, $M_0);
  96.580 ms (27 allocations: 184.33 MiB)

Anyway I'm going to close this essentially as won't-fix, the relevant bit of @mul is already gone from master.

mcabbott closed this as completed Jan 27, 2020

mcabbott added the wontfix This will not be worked on label Jan 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

@mul gives Array{TrackedReal} instead of TrackedArray #8

@mul gives Array{TrackedReal} instead of TrackedArray #8

baggepinnen commented Nov 27, 2019

baggepinnen commented Nov 27, 2019

mcabbott commented Nov 27, 2019

mcabbott commented Jan 27, 2020

@mul gives Array{TrackedReal} instead of TrackedArray #8

@mul gives Array{TrackedReal} instead of TrackedArray #8

Comments

baggepinnen commented Nov 27, 2019

baggepinnen commented Nov 27, 2019

mcabbott commented Nov 27, 2019

mcabbott commented Jan 27, 2020