-
Notifications
You must be signed in to change notification settings - Fork 507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bi-Directional RNN support #66
Conversation
@elbamos I've opened a PR here but there is a merge with RNN happening here. Would be nice to get BRNN support for the LSTM in RNN, hopefully @jcjohnson agrees! |
Cool! I just hope it doesn't make rnn even more complicated.
|
-- reverse output | ||
local k = 1 | ||
for i = input:size(1), 1, -1 do | ||
self.output[k] = input[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried that an explicit loop for reversing will be very slow for CudaTensors, since each execution of this line will require device synchronization and a kernel launch. Can you reverse the sequence in a vectorized way to avoid this overhead?
@SeanNaren Sorry I've been slow to respond to this - I agree that bidirectional RNNs are a good thing that we should support. Overall the BRNN and ReverseSequence API looks good to me, but I have some inline comments. |
No worries, thanks for the comments will work through em, all seem very valid points! I agree with loop, wasn't really a fan of it, so will try to find a better solution to it. Any suggestions would be awesome! |
@jcjohnson Added a different version of reversing the sequence using this response. How does it look now? For clearState it seems to be in the nn.Container and I add the modules to the container so hopefully that will handle it. Let me know of any more feedback or if I missed anything! |
@SeanNaren The solution using narrow is pretty much the same as before, just using a different syntax; each call to Other than that it's looking good. Can you add some unit tests to make sure everything works as expected? |
@jcjohnson Thanks for your suggestion helped a lot! I've pushed a change that uses gather, let me know of any feedback you may have on it, will add some unit tests now. |
Added tests, let me know if there is any more issues/feedback. |
for x = 1, T do | ||
indices:narrow(1, x, 1):fill(T - x + 1) | ||
end | ||
self.gradInput = gradOutput:gather(1, indices) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this allocates new memory at every function call. Could you do something like
self.gradInput:gather(gradOutput, 1, indices)
instead ?
I have some comments on the code (mostly minor), but I commented on two things that I think deserves some attention. But overall I think this is nice :) |
@fmassa Thanks a lot man! Lots of good fixes cheers. I think I've covered all the issues, let me know if you have any more feedback. |
Dear SeanNaren: How to apply BRNN? The command $th train.lua -input_h5 my_data.h5 -input_json my_data.json -model_type brnn |
How to integrate it into train.lua? New code doesn't seem to be plugged into existing torch-rnn facilities. Also, how they behave in practice? According to papers, they're good at certain kinds of tasks and have better loss than other types of RNN. But how they compare with, say, LSTM on character-level language modelling? Are they producing better or more interesting results? |
Re-opened request on a new branch.
Contains an implementation of BRNN for the LSTM module based on the BiSequencer module here.
Let me know of any feedback/alterations!