Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@mul gives Array{TrackedReal} instead of TrackedArray #8

Closed
baggepinnen opened this issue Nov 27, 2019 · 3 comments
Closed

@mul gives Array{TrackedReal} instead of TrackedArray #8

baggepinnen opened this issue Nov 27, 2019 · 3 comments
Labels
wontfix This will not be worked on

Comments

@baggepinnen
Copy link

Similar to mcabbott/SliceMap.jl#3

The code below produces an array of tracked reals instead of a tracked array. If the arrays are CuArrays, then the code fails since there is no constructor for CuArray(::Array{TrackedReal})

julia> Zd = TrackedArray(randn(4,3,2)); M = TrackedArray(randn(3,2));

julia> @mul E[a,d] := Zd[a,b,d]*M[b,d]
4×2 Array{Tracker.TrackedReal{Float64},2}:
  2.1719     0.248542
  2.50329   -0.100249
 -0.90269   -0.509704
 -0.290149   1.33654 
@baggepinnen
Copy link
Author

For future reference, this is a workaround

dropdims(sum(Zd.*reshape(M,1,size(M)...), dims=2), dims=2)

@mcabbott
Copy link
Owner

This use of @mul uses my very naiive batch-matrix-multiplication function, which makes slices. I was actually going to remove this in the re-write, #5

It would not be super-hard to add a gradient defn. for this function. It would be better still to call someone else’s function, more in the spirit of this package just being the front-end. For CuArrays there are special kernels for doing such things. But making all of this work nicely in Julia is work in progress... Relevant links:
FluxML/NNlib.jl#100
https://github.com/Roger-luo/BatchedRoutines.jl
https://github.com/chengchingwen/Transformers.jl/blob/master/src/fix/batchedmul.jl

That workaround is precisely this, which should be correct but won’t be fast:

@reduce E[a,d] := sum(b) Zd[a,b,d] * M[b,d]

@mcabbott
Copy link
Owner

I discovered that OMEinsum now supports batch matmul, as of under-Peter/OMEinsum.jl#74. So at least with Zygote your example can now work:

using Zygote, Random, OMEinsum#master
Random.seed!(42);
Zd_ = randn(4,3,2); M_ = randn(3,2);
f(Zd, M) = (@ein E[a,d] := Zd[a,b,d]*M[b,d]; sum(exp.(E)))
Zygote.gradient(f, Zd_, M_)

using TensorCast#master  # @mul doesn't handle this, but the @reduce fallback does:
g(Zd, M) = (@reduce E[a,d] := sum(b) Zd[a,b,d]*M[b,d]; sum(exp.(E)))
Zygote.gradient(g, Zd_, M_) # agrees!

Surprisingly you have to get to quite large arrays for its gradient to be faster than the @reduce method:

julia> Zd_0 = randn(400,300,200); M_0 = randn(300,200);

julia> @btime Zygote.gradient(f, $Zd_0, $M_0);
  209.318 ms (240768 allocations: 375.26 MiB)

julia> @btime f($Zd_0, $M_0);
  6.751 ms (237 allocations: 1.24 MiB)

julia> @btime Zygote.gradient(g, $Zd_0, $M_0);
  339.116 ms (240106 allocations: 740.82 MiB)

julia> @btime g($Zd_0, $M_0);
  96.580 ms (27 allocations: 184.33 MiB)

Anyway I'm going to close this essentially as won't-fix, the relevant bit of @mul is already gone from master.

@mcabbott mcabbott added the wontfix This will not be worked on label Jan 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants