Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SeqLSTM #207

Merged
merged 16 commits into from
Apr 21, 2016
Merged

SeqLSTM #207

merged 16 commits into from
Apr 21, 2016

Conversation

nicholas-leonard
Copy link
Member

Super fast LSTM code from Justin's torch-rnn

@jcjohnson
Copy link

@nicholas-leonard Looks great!

I'm not sure about the best way to handle (N x T x D) vs (T x N x D) layouts (where N = minibatch, T = sequence length, D = input size). TND fits better with the rest of rnn, and is more memory friendly so it will probably be a bit faster; however NTD seems like a more natural fit with the rest of nn, which is why I chose to use that layout.

Thoughts?

@nicholas-leonard
Copy link
Member Author

@jcjohnson Yeah my thoughts exactly. I want to make SeqLSTM default to TND to make the transition more seamless for rnn users. The reason I chose this for rnn was like you said the memory-friendliness. But then like you said, their is the rest of torch users and the torch-rnn package...

I was thinking that we could add something like SeqLSTM.batchfirst = true to make it easy to switch to NTD instead of the default TND. If you insist, since it is your code and it will affect your backwards-compatibility, I don't really mind making it default to false.

So it is up to you really :) What do you choose ?

@SeanNaren
Copy link
Contributor

Really awesome to see this in rnn nice job guys. @nicholas-leonard I basically created a version of BiSequencer for the torch-rnn package which may be of use, I opened a PR here. I would be more than happy to create a PR after you implement this!

@nicholas-leonard
Copy link
Member Author

@SeanNaren Awesome! Would really appreciate that!

@jcjohnson
Copy link

@nicholas-leonard TND default sounds good to me; that way the default behavior is also the fastest.

I like the idea of a batchFirst field that switches over to NTD layout. I'm not too worried about the default behavior maintaining backward compatibility with torch-rnn since I am probably the main consumer of the SeqLSTM right now, and a line to set batchFirst in my own code isn't a big deal.

willfrey and others added 2 commits April 15, 2016 18:29
Fixed remember comparisons (= to ==)
Update SeqLSTM.lua [changed = to == in comparison]
@northanapon
Copy link

A couple questions after trying out the code. SeqLSTM.batchfirst does not work when remember_state = true. I think the index of self.cell and self._output might need to be swapped.

Second, when batch size changed (i.e. during test time), SeqLSTM does not reset previous states properly. As a result, the batch size is mismatch. This might also affect the training. I am not sure how to implement :forget(). Maybe

function SeqLSTM:forget()
  parent:forget()
  self:resetStates()
end

@nicholas-leonard
Copy link
Member Author

@northanapon I fixed your first bug. Wasn't able to reproduce your second bug (see unit test). Maybe it got fixed by fix to first. Can you test your second use case again with newest commit?

@nicholas-leonard nicholas-leonard changed the title SeqLSTM (work in progress) SeqLSTM Apr 20, 2016
@nicholas-leonard nicholas-leonard merged commit 6a71750 into Element-Research:master Apr 21, 2016
@northanapon
Copy link

northanapon commented Apr 21, 2016

I tried the new SeqLSTM (from master). The batch size problem still exists when :remember('both'). The test case resets :remember('neither'). Here is a code to reproduce the error.

lstm = nn.SeqLSTM(10, 10)
lstm.batchfirst = true
lstm:remember('both')
lstm:training()
lstm:forward(torch.Tensor(32, 20, 10))
lstm:evaluate()
lstm:forget()
lstm:forward(torch.Tensor(1, 1, 10))

I have to set :remember('both') in order to sample output one at step at a time, but :forget() does not reset c0 and h0 size.

@nicholas-leonard
Copy link
Member Author

@northanapon I see what you mean now. Fixed in 7116a3d . Thanks for pointing this out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants