-
Notifications
You must be signed in to change notification settings - Fork 6.8k
LSTM, 'add' and 'write' produce the same results? #725
Comments
Yes!If you are using |
I mean, if I am using 'write' for the lstm, should it produce the same (and correct) result as using 'add' ? If that's the case, where is 'add' useful? Thanks ! |
hmm. I think I see what you mean, the 'add' only make difference when we call updater(idx, grad, weight). |
I think 'add' means the calculated gradient should be added to rather than written to the gradient variable. This is useful for RNN since RNN will calculate 't' ('t' is the number of unrolling length) gradients of the gate parameters, and all these gradients should be accumulated to update the gate parameters. In fact, I am quite surprised that the 'write' strategy gives the same result as 'add' strategy. Let me look into it. |
Is it related to this? #568 |
Have been working on lstm recently. Found that 'add' and 'write' grad_reqs produce the same results
which for the first two epochs, I have,
And for the 'add' version
These two should not be the same ?
The text was updated successfully, but these errors were encountered: