Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nn.BiSequencer, cudnn and non-contiguous input #178

Closed
willfrey opened this issue Mar 26, 2016 · 4 comments
Closed

nn.BiSequencer, cudnn and non-contiguous input #178

willfrey opened this issue Mar 26, 2016 · 4 comments

Comments

@willfrey
Copy link
Contributor

I'm having a problem using nn.BiSequencer() with cudnn.

Here's a simple example:

require 'rnn'
require 'cunn'
require 'cudnn'

batch_size = 16
maxLen = 100
nFeat = 201
hiddenSize = 256

net = nn.Sequential()
net:add(nn.SplitTable(1,2))
net:add(nn.BiSequencer(nn.FastLSTM(nFeat,256),nn.FastLSTM(nFeat,256),nn.CAddTable(true)))
net:cuda()
cudnn.convert(net,cudnn)

inputs = torch.randn(batch_size,maxLen,nFeat):cuda()
outputs = net:forward(inputs)

Here's the error I get: cudnn/Pointwise.lua:11: Non-contiguous inputs not supported yet

I read through some closed issues and tried these variations for the network. None of them work, unfortunately.

net = nn.Sequential()
net:add(nn.Transpose({1,2})
net:add(nn.SplitTable(1)) -- trying to maintain contiguous inputs for the BiSequencer
net:add(nn.Copy(nil,nil,false)) -- really trying to maintain contiguous data for the BiSequencer
net:add(nn.BiSequencer(nn.FastLSTM(nFeat,256),nn.FastLSTM(nFeat,256),nn.CAddTable(true)))
net:cuda()
cudnn.convert(net,cudnn)

Any help would be appreciated.

@northanapon
Copy link

I found similar problem in nn.FastLSTM for cuDNN R5. Here an example:

lstm = nn.FastLSTM(2,10)
lstm:cuda()
cudnn.convert(lstm, cudnn)

Non-batch mode works fine

lstm:forget()
lstm:forward(torch.rand(1, 2):cuda())

Batch mode does not work

lstm:forget()
lstm:forward(torch.rand(4, 2):cuda())

Here is the error message:

In 1 module of nn.Sequential:
In 1 module of nn.ConcatTable:
In 6 module of nn.Sequential:
In 1 module of nn.ParallelTable:
... Non-contiguous inputs not supported yet

So that is the cudnn.Sigmoid (and possibly the other 3 cudnn activations) when computing LSTM gates.

Is there a plan to adopt cuDNN R5's LSTM and GRU API? I heard they much faster from NVIDIA's recent talk at my university.

@nicholas-leonard
Copy link
Member

@northanapon Yes there is a plan : borisfom/cudnn.torch#3 . In the mean time, you can use Justin's super fast SeqLSTM : #207 . Not sure if it will solve your bug though.

@northanapon
Copy link

Thanks @nicholas-leonard, I tried SeqLSTM. It is faster than FastLSTM on GPU.
cudnn.convert(seqlstm, cudnn) works as well, but this does not give any speed up to basic SeqLSTM.

@ngimel
Copy link

ngimel commented Apr 15, 2016

@northanapon cudnn is not expected to give speed-up over basic SeqLSTM - most of the work is in the Linear layers that are mapped to cublas, not cudnn. The only thing that gets mapped to cudnn is activations which are a small fraction of computation and nn implementation of those is reasonable. Same would be true for FastLSTM - even if you could convert it to cudnn, you wouldn't see a speedup. But stay tuned for torch bindings for cudnn LSTM implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants