LSTM, 'add' and 'write' produce the same results? #725

Kublai-Jing · 2015-11-26T03:56:59Z

Have been working on lstm recently. Found that 'add' and 'write' grad_reqs produce the same results

rnn_exec = rnn_sym.bind(ctx=ctx, args=arg_arrays,
                        args_grad=args_grad,
                        grad_req="write")

which for the first two epochs, I have,

And for the 'add' version

rnn_exec = rnn_sym.bind(ctx=ctx, args=arg_arrays,
                        args_grad=args_grad,
                        grad_req="add")

These two should not be the same ?

The text was updated successfully, but these errors were encountered:

antinucleon · 2015-11-26T04:08:01Z

Yes！If you are using write, you can safely remove reset gradient code at the end of epoch

Kublai-Jing · 2015-11-26T04:09:57Z

I mean, if I am using 'write' for the lstm, should it produce the same (and correct) result as using 'add' ? If that's the case, where is 'add' useful? Thanks !

Kublai-Jing · 2015-11-26T04:14:29Z

hmm. I think I see what you mean, the 'add' only make difference when we call updater(idx, grad, weight).
Using write will overwrite the gradient inside that call, whereas using 'add' will add to that. So I guess the engine will itself add up gradients to the same parameter when we call backward, which means that using 'write' is also correct for lstm in this case.

jermainewang · 2015-11-26T04:58:17Z

I think 'add' means the calculated gradient should be added to rather than written to the gradient variable. This is useful for RNN since RNN will calculate 't' ('t' is the number of unrolling length) gradients of the gate parameters, and all these gradients should be accumulated to update the gate parameters. In fact, I am quite surprised that the 'write' strategy gives the same result as 'add' strategy. Let me look into it.

pluskid · 2015-11-26T05:32:30Z

Is it related to this? #568

tqchen closed this as completed Dec 15, 2015

zixuanweeei mentioned this issue Oct 22, 2019

Fused RNN Operators have nonsupport of add grad_req with mkl-dnn #16578

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LSTM, 'add' and 'write' produce the same results? #725

LSTM, 'add' and 'write' produce the same results? #725

Kublai-Jing commented Nov 26, 2015

antinucleon commented Nov 26, 2015

Kublai-Jing commented Nov 26, 2015

Kublai-Jing commented Nov 26, 2015

jermainewang commented Nov 26, 2015

pluskid commented Nov 26, 2015

LSTM, 'add' and 'write' produce the same results? #725

LSTM, 'add' and 'write' produce the same results? #725

Comments

Kublai-Jing commented Nov 26, 2015

And for the 'add' version

antinucleon commented Nov 26, 2015

Kublai-Jing commented Nov 26, 2015

Kublai-Jing commented Nov 26, 2015

jermainewang commented Nov 26, 2015

pluskid commented Nov 26, 2015