Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Element-Research rnn #60

Open
nicholas-leonard opened this issue Apr 8, 2016 · 8 comments
Open

Element-Research rnn #60

nicholas-leonard opened this issue Apr 8, 2016 · 8 comments

Comments

@nicholas-leonard
Copy link

Hi @jcjohnson,

So I work on the Element-Research/rnn package. I have a couple of questions:

  1. why did you chose to build your own LSTM/RNN implementations?
  2. what does torch-rnn package do better? What does rnn do worse?
  3. would you be open to merging these repositories?

The last questions came out from the NVIDIA GPU Tech Conference this week. If you prefer, we can talk offline.

Regards,
Nicholas Leonard

@jcjohnson
Copy link
Owner

Hi @nicholas-leonard,

Great work on Element-Research/rnn!

(1) I wrote it partly because I enjoy implementing things, and tend to learn a lot by doing so. It started off as a challenge to see whether I could implement RNNs without any cloning, because I always seem to run into subtle bugs with clones. It was then a fun challenge to see how efficient I could make things.

(2) From my point of view, torch-rnn and rnn have slightly different design philosophies; I'd be interested to see if you agree. It seems to me that rnn is designed to be very general, with a lot of great modular pieces that give you a ton of freedom to hook up different kinds of recurrences. However this freedom comes at a price: the API has a much larger surface area, and although I haven't run benchmarks myself I imagine that it ends up being a bit more inefficient, especially with regards to memory. I think that as it stands, if you just want to use a basic RNN or LSTM on sequences, then torch-rnn is a good choice; if you want to do anything fancier and especially if you want to play around with different recurrences, then rnn is the way to go.

(3) I'd be open to merging. Currently torch-rnn is in a weird state where there are some reusable modules that feel more like a library, and also an application built around those modules for character-level language modeling. I'd like for the reusable modules to be easy to use in other projects, so I had already been thinking about splitting them out into a separate "library" repo and making this repo entirely an application. Merging with rnn might be a good way to achieve this effect.

Justin

@nicholas-leonard
Copy link
Author

Nice to meet you @jcjohnson!

(1) Clones are so evil. It took me a lot to make the clone + param sharing as efficient as possible : https://github.com/Element-Research/dpnn/blob/master/Module.lua#L32-L161. Took me too many iterations and unit tests to get that working. And I can't agree more about the learning from your own implementations bit. I am the same.

(2) Yeah I just noticed that you basically implement LSTM/RNN without using any existing modules. That must have been a lot of work! It must indeed result in blazing speed as you basically keep the Lua overhead to a bare minimum.

(3) I would love to merge your code into the rnn package. We could definitely benefit from the speedup. For backward-compatibility I would still need to preserve the current version of LSTM, FastLSTM and what not. But we could include your LSTM as SequenceLSTM, FastestLSTM, FullLSTM, or whatnot. Basically, I would love it if you could use rnn as your library repo.

If you are up for it, just fork rnn and send a PR. If you want, I can get you commit rights to the repository. I could use the help anyway. So what do you say?

@nicholas-leonard
Copy link
Author

@jcjohnson I started working on a PR to integrate LSTM. I want to modify it to work with T x N inputs instead of N x T (like the rest of rnn). Hope that works for you?

@MTSranger
Copy link

I can definitely concur with memory problems of cloning. I've got an in-house cloning implementation that's somewhat similar to the one in rnn (the difference is that instead of setting nil, I make a "clone template" which is made by :resize(0) to everything in :parameters() on a full clone, and then use that for all future clones; also I modified it to support clones of clones for fast-prototyping purposes).

In my case, it seems like cloning nngraph LSTM modules causes huge memory leaks during training when the number of clones expand to 1000s (we had an LSTM which we had to unroll over 4000 time steps). Switching to torch-rnn essentially fixed the problem.

In any case thank you all for this wonderful work and I do look forward to having the fast clone-less LSTM modules merged into rnn!

@gokceneraslan
Copy link

Although it's an amazing effort and a very advanced rnn solution, I should say that the interface of https://github.com/Element-Research/rnn is quite confusing, especially for beginners. There are several classes to go through, Recurrent, Recurrence, Recursor, Repeater, Sequencer... It makes the learning curve really steep. By merging torch-rnn, you'll be introducing more LSTM classes, so it'll be like LSTM, FastLSTM, SequenceLSTM, FullLSTM...

Given that the learning curve of Torch is already steeper than others due to Lua, Tensor interface whatnot, it should be considered to make the rnn implementation a bit simpler. In other high level frameworks like Keras and Lasagne, all you need is just one (1) class, LSTM.

@nicholas-leonard
Copy link
Author

@gokceneraslan Yes rnn has many classes. But a single LSTM/RNN class will not support all use cases. For example, the nn.LSTM for this repo is basically like a nn.Sequencer(nnFastLSTM) from rnn. Like @jcjohnson mentioned above, sometimes you need something other than a Sequencer, like RecurrentAttention where the input sequence is not know in advance (it depends on the output of each time-step). I guess Keras and Lasagne do not support this kind of thing either.

Also for new users, the main point of confusion is what to use to build their RNN : lstm, torch-rnn, rnn, char-rnn, Recurrent? So many RNN packages. It would be nice to consolidate the main components into a single package so as not to duplicate effort... This point was raised at the NVIDIA GTC last week.

@nicholas-leonard
Copy link
Author

@jcjohnson Here is the SeqLSTM so far : Element-Research/rnn#207 . For now I made a simple unit test to match its behaviour to FastLSTM. I still need to modify it to take inputs of size seqlen x batchsize x inputsize instead of batchsize x seqlen x inputsize. Also needs benchmarks and documentation.

@nicholas-leonard
Copy link
Author

Ok so SeqLSTM is merged, unit tested and documented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants