Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

LSTM, 'add' and 'write' produce the same results? #725

Closed
Kublai-Jing opened this issue Nov 26, 2015 · 5 comments
Closed

LSTM, 'add' and 'write' produce the same results? #725

Kublai-Jing opened this issue Nov 26, 2015 · 5 comments

Comments

@Kublai-Jing
Copy link
Contributor

Have been working on lstm recently. Found that 'add' and 'write' grad_reqs produce the same results

rnn_exec = rnn_sym.bind(ctx=ctx, args=arg_arrays,
                        args_grad=args_grad,
                        grad_req="write")    

which for the first two epochs, I have,
screen shot 2015-11-25 at 7 55 23 pm

And for the 'add' version

rnn_exec = rnn_sym.bind(ctx=ctx, args=arg_arrays,
                        args_grad=args_grad,
                        grad_req="add")    

screen shot 2015-11-25 at 7 56 23 pm

These two should not be the same ?

@antinucleon
Copy link
Contributor

Yes!If you are using write, you can safely remove reset gradient code at the end of epoch

@Kublai-Jing
Copy link
Contributor Author

I mean, if I am using 'write' for the lstm, should it produce the same (and correct) result as using 'add' ? If that's the case, where is 'add' useful? Thanks !

@Kublai-Jing
Copy link
Contributor Author

hmm. I think I see what you mean, the 'add' only make difference when we call updater(idx, grad, weight).
Using write will overwrite the gradient inside that call, whereas using 'add' will add to that. So I guess the engine will itself add up gradients to the same parameter when we call backward, which means that using 'write' is also correct for lstm in this case.

@jermainewang
Copy link
Contributor

I think 'add' means the calculated gradient should be added to rather than written to the gradient variable. This is useful for RNN since RNN will calculate 't' ('t' is the number of unrolling length) gradients of the gate parameters, and all these gradients should be accumulated to update the gate parameters. In fact, I am quite surprised that the 'write' strategy gives the same result as 'add' strategy. Let me look into it.

@pluskid
Copy link
Contributor

pluskid commented Nov 26, 2015

Is it related to this? #568

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants