-
-
Notifications
You must be signed in to change notification settings - Fork 83
Add correct batch size for RNN hidden layer #706
base: master
Are you sure you want to change the base?
Conversation
Maybe add a test? |
As I understand it, I don't think this PR would do any harm, but that seems fishy and it may be worth understanding why it happens in case something else is going wrong. |
Line 188 in eac33f8
h ). But that doesn't happen in the forwards pass...
This is just to apply the same treatment as |
I believe that line mirrors this one which does happen in the forward pass (we just do it again in the backwards pass because we don't have access to the expanded It would be worth printing out the size of |
Found this (size(ho), size(dho)) = ((8, 10), (8,)) This was when the model was All subsequent calls have the same size. I am assuming its because Line 185 in eac33f8
|
What should the test actually be? Just calling into the |
If The fact that |
@DhairyaLGandhi @MikeInnes Have you had a moment to look at the above? Using the following latest release, the issue seems not to be reproducible anymore: [3a865a2d] CuArrays v2.2.2
[a93c6f00] DataFrames v0.21.2
[587475ba] Flux v0.10.4
[28b8d3ca] GR v0.50.1
[91a5bcdd] Plots v1.4.3
[2913bbd2] StatsBase v0.33.0
[e88e6eb3] Zygote v0.4.22 However, I couldn't identify what change to either Flux, CuArrays or Zygote may have solved the issue. Any pointer would be welcome. |
Am I right in understanding that the mwe in FluxML/Flux.jl#1114 works as expected now? |
Yes! |
Wonder if the Adapt issue is related then, incorrect size in the backwards pass could result from an |
Ref JuliaGPU/Adapt.jl#24, I suppose |
Fixes FluxML/Flux.jl#1114
The context here is that on the first call to the layer (also simulated via calling the
Flux.reset!
on the structure), the gradients for the hidden layer were returned as the same size as the state at the time (a vector), whereas on stepping through time, the size of the state would change. This needs to be accounted for while sending the correct size back. We do this automatically for subsequent calls, only with the first run, is when we see this issue.@MikeInnes @maleadt, would this be an alright fix?