-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added BRNN support #3
Conversation
Looks good. Can you add some tests, too ? |
Good idea but not sure how to test the RNN script, is there a way that we could get the bias and weights of the gate? How would you suggest testing the two modules? |
cudnn provides cudnnGetRNNLinLayerMatrixParams and cudnnGetRNNLinLayerBiasParams functions for getting pointers to biases and weights. |
I'm trying to replicate the sample as to act as the test however I can't figure out how to create the double pointer required for the cudnnGetRNNLinLayerMatrixParams method for the last arg:
How would I create the double pointer? My attempt may also help:
|
This stackoverflow question might help |
Thanks! Managed to cast it using ffi to get the pointer like such, but not sure how to access the actual data from the pointer. In the C RNN example all values in the matrix/bias are initialised to a set value, would I be able to somehow replicate that in lua?
|
You'll have to get weight size from
|
Awesome, and for the bias values would the same approach work? The command is nearly identical:
|
Yes, this should work for biases too. |
Added a test for RNN that uses the checksums taken from the cudnn RNN C sample. The checksums for a few of the backward passes are not correct, any help would be great! Will try to fix them as well. |
The test now works for RELU/TANH/LSTM/GRU. One failing test (the weight checksum for ReLU is slightly off, not entirely sure why). |
Great, thanks! |
Just as a side note would it make more sense to add the LSTM/GRU RNNs as separate modules like a BLSTM? |
local dataType = 'CUDNN_DATA_FLOAT' | ||
local format = 'CUDNN_TENSOR_NCHW' | ||
local nbDims = torch.IntTensor(1) | ||
local filterDimA = torch.IntTensor({ 1, 1, 1 }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove hard-coded 3 dimensions by
local minDim=3
local filterDimA=torch.ones(minDim)
replace '3' argument to GetFilterNdDescriptor call with minDim
, and replace filterDimA[1] * filterDimA[2] * filterDimA[3]
with filterDimA:prod()
Added batchFirst to the RNN, let me know what you guys think! EDIT: should we support RNN ReLU/Tanh as separate modules? Only thing I'm concerned is about is that after setting self.mode to either, you have to reset the rnndescriptor (a call to reset()). Does this warrant a separate module? |
@SeanNaren, Tanh, ReLu and Sigmoid activations all have their own separate modules, so I don't see why Tanh or ReLU RNNs shouldn't. Great work and thanks for your help! |
@ngimel Maybe other people would, but I do not need the ability to drill-down into hx and hy. I would, though, benefit if we had a choice between keeping all hidden outputs versus only the last output from the sequence, but not in a B-RNN. Is that what you meant? |
I'm seeing this with the current version when I try to create a new BLSTM layer. The BLSTM layer I created last night continues to function and train.
|
@elbamos that's a big hidden dim you have there, I don't think I can fit that onto my memory in the first place to test! It works alright with a smaller inputdim on my end however. Added separate modules for Tanh and for ReLU. @ngimel Sounds great but I'm sure you are much more informed than me, so any guidance on how to approach this would be great :) EDIT: should've done this some time ago, added you as as collab ngimel feel free to change anything you see fit! EDIT2: Don't want to spam too much but tests are now much faster. Should've realised earlier it was the manual for loops that were taking up all our time... |
@borisfom I think the support is mostly done and tests are looking good, how would you suggest us merging the stuff here and RNN into main? Shall I rebase onto R5 into a new branch and open a new request with that branch into R5? |
Please hold on - I am rebasing my R5 to upstream R5 (nontrivial exercise due to selective merge they did earlier). This will likely result in a new branch - then I will ask you to do a PR against that branch. |
@SeanNaren Thanks! Looks like neither you nor @elbamos need more flexibility for hidden layers, so we can leave that till later date. You guys serve as proxy for real RNN users ;-) |
@SeanNaren : actually, please nevermind, I will merge the PR before rebase. |
@SeanNaren: yes the best course would be for you to create a new branch from current one, and then squash that one to very few commits, then send a new PR from that branch (unless you can change the branch in the PR directly) |
SeanNaren: yes one commit is perfect! Please do PR against branch R5-re. |
You can always add your outputs outside of cuDNN. We went with concatenation rather than addition because it preserves the information and leaves the user free to postprocess the output the way user sees fit. There is no overlap between layers in a bidirectional network, so if you split layers and add the outputs yourself, you should not loose much performance. |
Awesome thanks! |
Spatial/Volumetric dilated convolution added with tests.
Hey I've added BLSTM support, hopefully it looks good! I've had to expose some methods in RNN to inherit. Hopefully it's useful, let me know of any feedback.