Element-Research rnn #60

nicholas-leonard · 2016-04-08T17:18:07Z

Hi @jcjohnson,

So I work on the Element-Research/rnn package. I have a couple of questions:

why did you chose to build your own LSTM/RNN implementations?
what does torch-rnn package do better? What does rnn do worse?
would you be open to merging these repositories?

The last questions came out from the NVIDIA GPU Tech Conference this week. If you prefer, we can talk offline.

Regards,
Nicholas Leonard

jcjohnson · 2016-04-08T18:11:34Z

Hi @nicholas-leonard,

Great work on Element-Research/rnn!

(1) I wrote it partly because I enjoy implementing things, and tend to learn a lot by doing so. It started off as a challenge to see whether I could implement RNNs without any cloning, because I always seem to run into subtle bugs with clones. It was then a fun challenge to see how efficient I could make things.

(2) From my point of view, torch-rnn and rnn have slightly different design philosophies; I'd be interested to see if you agree. It seems to me that rnn is designed to be very general, with a lot of great modular pieces that give you a ton of freedom to hook up different kinds of recurrences. However this freedom comes at a price: the API has a much larger surface area, and although I haven't run benchmarks myself I imagine that it ends up being a bit more inefficient, especially with regards to memory. I think that as it stands, if you just want to use a basic RNN or LSTM on sequences, then torch-rnn is a good choice; if you want to do anything fancier and especially if you want to play around with different recurrences, then rnn is the way to go.

(3) I'd be open to merging. Currently torch-rnn is in a weird state where there are some reusable modules that feel more like a library, and also an application built around those modules for character-level language modeling. I'd like for the reusable modules to be easy to use in other projects, so I had already been thinking about splitting them out into a separate "library" repo and making this repo entirely an application. Merging with rnn might be a good way to achieve this effect.

Justin

nicholas-leonard · 2016-04-08T19:03:19Z

Nice to meet you @jcjohnson!

(1) Clones are so evil. It took me a lot to make the clone + param sharing as efficient as possible : https://github.com/Element-Research/dpnn/blob/master/Module.lua#L32-L161. Took me too many iterations and unit tests to get that working. And I can't agree more about the learning from your own implementations bit. I am the same.

(2) Yeah I just noticed that you basically implement LSTM/RNN without using any existing modules. That must have been a lot of work! It must indeed result in blazing speed as you basically keep the Lua overhead to a bare minimum.

(3) I would love to merge your code into the rnn package. We could definitely benefit from the speedup. For backward-compatibility I would still need to preserve the current version of LSTM, FastLSTM and what not. But we could include your LSTM as SequenceLSTM, FastestLSTM, FullLSTM, or whatnot. Basically, I would love it if you could use rnn as your library repo.

If you are up for it, just fork rnn and send a PR. If you want, I can get you commit rights to the repository. I could use the help anyway. So what do you say?

nicholas-leonard · 2016-04-12T01:19:02Z

@jcjohnson I started working on a PR to integrate LSTM. I want to modify it to work with T x N inputs instead of N x T (like the rest of rnn). Hope that works for you?

MTSranger · 2016-04-13T06:13:13Z

I can definitely concur with memory problems of cloning. I've got an in-house cloning implementation that's somewhat similar to the one in rnn (the difference is that instead of setting nil, I make a "clone template" which is made by :resize(0) to everything in :parameters() on a full clone, and then use that for all future clones; also I modified it to support clones of clones for fast-prototyping purposes).

In my case, it seems like cloning nngraph LSTM modules causes huge memory leaks during training when the number of clones expand to 1000s (we had an LSTM which we had to unroll over 4000 time steps). Switching to torch-rnn essentially fixed the problem.

In any case thank you all for this wonderful work and I do look forward to having the fast clone-less LSTM modules merged into rnn!

gokceneraslan · 2016-04-13T08:27:52Z

Although it's an amazing effort and a very advanced rnn solution, I should say that the interface of https://github.com/Element-Research/rnn is quite confusing, especially for beginners. There are several classes to go through, Recurrent, Recurrence, Recursor, Repeater, Sequencer... It makes the learning curve really steep. By merging torch-rnn, you'll be introducing more LSTM classes, so it'll be like LSTM, FastLSTM, SequenceLSTM, FullLSTM...

Given that the learning curve of Torch is already steeper than others due to Lua, Tensor interface whatnot, it should be considered to make the rnn implementation a bit simpler. In other high level frameworks like Keras and Lasagne, all you need is just one (1) class, LSTM.

nicholas-leonard · 2016-04-13T17:50:58Z

@gokceneraslan Yes rnn has many classes. But a single LSTM/RNN class will not support all use cases. For example, the nn.LSTM for this repo is basically like a nn.Sequencer(nnFastLSTM) from rnn. Like @jcjohnson mentioned above, sometimes you need something other than a Sequencer, like RecurrentAttention where the input sequence is not know in advance (it depends on the output of each time-step). I guess Keras and Lasagne do not support this kind of thing either.

Also for new users, the main point of confusion is what to use to build their RNN : lstm, torch-rnn, rnn, char-rnn, Recurrent? So many RNN packages. It would be nice to consolidate the main components into a single package so as not to duplicate effort... This point was raised at the NVIDIA GTC last week.

nicholas-leonard · 2016-04-13T22:32:03Z

@jcjohnson Here is the SeqLSTM so far : Element-Research/rnn#207 . For now I made a simple unit test to match its behaviour to FastLSTM. I still need to modify it to take inputs of size seqlen x batchsize x inputsize instead of batchsize x seqlen x inputsize. Also needs benchmarks and documentation.

nicholas-leonard · 2016-04-21T16:39:57Z

Ok so SeqLSTM is merged, unit tested and documented.

blackyang mentioned this issue Apr 26, 2016

documentation on how the faster and memory-efficient results are achieved #76

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Element-Research rnn #60

Element-Research rnn #60

nicholas-leonard commented Apr 8, 2016

jcjohnson commented Apr 8, 2016

nicholas-leonard commented Apr 8, 2016

nicholas-leonard commented Apr 12, 2016

MTSranger commented Apr 13, 2016

gokceneraslan commented Apr 13, 2016

nicholas-leonard commented Apr 13, 2016

nicholas-leonard commented Apr 13, 2016

nicholas-leonard commented Apr 21, 2016

Element-Research rnn #60

Element-Research rnn #60

Comments

nicholas-leonard commented Apr 8, 2016

jcjohnson commented Apr 8, 2016

nicholas-leonard commented Apr 8, 2016

nicholas-leonard commented Apr 12, 2016

MTSranger commented Apr 13, 2016

gokceneraslan commented Apr 13, 2016

nicholas-leonard commented Apr 13, 2016

nicholas-leonard commented Apr 13, 2016

nicholas-leonard commented Apr 21, 2016