incompatibility with nngraph's gmodule in evaluate mode #172

justinmaojones · 2016-03-19T19:53:21Z

Running the following script:

require 'rnn';
require 'nngraph';

n1 = 3
n2 = 4
n3 = 3

x1 = nn.Identity()()
x23 = nn.Identity()()
x2,x3 = x23:split(2)
z  = nn.JoinTable(1)({x1,x2,x3})
y1 = nn.Linear(n1+n2+n3,n2)(z)
y2 = nn.Linear(n1+n2+n3,n3)(z)
m = nn.gModule({x1,x23},{y1,y2})

outputSize = {{n2},{n3}}
r = nn.Recurrence(m,outputSize,1)
a = torch.randn(n1)
--r:forward(a)
--r:forward(a)
r:forget()
r:evaluate()
r:forward(a)
r:forward(a)

will generate the following error:

torch/install/share/lua/5.1/nngraph/gmodule.lua:255: split(2) cannot split 0 outputs
stack traceback:
    [C]: in function 'error'
    ...e/jmj418/torch/install/share/lua/5.1/nngraph/gmodule.lua:255: in function 'neteval'
    ...e/jmj418/torch/install/share/lua/5.1/nngraph/gmodule.lua:287: in function 'updateOutput'
    /home/jmj418/torch/install/share/lua/5.1/rnn/Recurrence.lua:106: in function 'forward'
    checkRecurrence.lua:30: in main chunk
    [C]: in function 'dofile'
    [string "_RESULT={dofile 'checkRecurrence.lua' }"]:1: in main chunk
    [C]: in function 'xpcall'
    /home/jmj418/torch/install/share/lua/5.1/trepl/init.lua:645: in function 'repl'
    ...j418/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:199: in main chunk
    [C]: at 0x00405f30

The cause of this error seems to be found in lines 330-336 of gmodule.lua (commit ebde0bd), which I have posted below.

-- first clear the input states
for _,node in ipairs(self.forwardnodes) do
  local input = node.data.input
  while input and #input>0 do
    table.remove(input)
  end
end

Whenever gmodule:updateOutput() is called, the inputs to each node are cleared. When running Recurrence:updateOutput() in evaluate mode, the previous output of Recurrence is set to self.outputs[self.step-1](see line 96 of Recurrence.lua in commit e77fe74), which points to the input of one of the nngraph forwardnodes, which will get cleared once gmodule:updateOutput() is called. This is only a problem in evaluate mode, because it does not use clones.

One way I managed to get around this was save a copy of the current output of Recurrence in a buffer. In other words:

function Recurrence:updateOutput(input)
   local prevOutput
   if self.step == 1 then
      if self.userPrevOutput then
         -- user provided previous output
         prevOutput = self.userPrevOutput
      else
         -- first previous output is zeros
         local batchSize = self:getBatchSize(input)
         self.zeroTensor = self:recursiveResizeZero(self.zeroTensor, self.outputSize, batchSize)
         prevOutput = self.zeroTensor
      end
   else
      -- previous output of this module
-->>>>>> CHANGE #1
      prevOutput = self.output
      --prevOutput = self.outputs[self.step-1]
--<<<<<<
   end
   -- output(t) = recurrentModule{input(t), output(t-1)}
   local output
   if self.train ~= false then
      self:recycle()
      local recurrentModule = self:getStepModule(self.step)
      -- the actual forward propagation
      output = recurrentModule:updateOutput{input, prevOutput}
   else
      output = self.recurrentModule:updateOutput{input, prevOutput}
   end

   self.outputs[self.step] = output

-->>>>>> CHANGE #2: SAVE TO BUFFER
   self.output = self:recursiveResizeZero(self.output, self.outputSize, batchSize)
   self.output = nn.utils.recursiveAdd(self.output,output)
   --self.output = output
--<<<<<<

   self.step = self.step + 1
   self.gradPrevOutput = nil
   self.updateGradInputStep = nil
   self.accGradParametersStep = nil

   return self.output
end

The text was updated successfully, but these errors were encountered:

nicholas-leonard · 2016-03-22T19:02:09Z

@justinmaojones I opened up a ticket here : torch/nngraph#109 . To my mind this should work. Will wait to see what nngraph guys have to say.

Kaixhin · 2016-06-18T14:33:03Z

@nicholas-leonard Any update on this? According to Ivo's reply it seems that @justinmaojones's workaround is the solution.

nicholas-leonard · 2016-07-16T20:13:40Z

@Kaixhin Yeah just make sure the output is copied. So it the network is shallow, copy the output using nn.Copy or whatnot. Ideally, rnn would do this for you. It could be done with a test that is executed the first time forward is called in evaluation mode. If the behaviour of the rnn with a nn.Copy appended to the end is the same as without, then no nn.Copy is necessary, otherwise it is.

Broken (see Element-Research/rnn#172). Irrelevant with #46.

Kaixhin mentioned this issue May 23, 2016

Recurrent Dqn Kaixhin/Atari#8

Open

Kaixhin added a commit to Kaixhin/Atari that referenced this issue Aug 6, 2016

Remove nn.FastLSTM.usenngraph = true

1e48018

Broken (see Element-Research/rnn#172). Irrelevant with #46.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

incompatibility with nngraph's gmodule in evaluate mode #172

incompatibility with nngraph's gmodule in evaluate mode #172

justinmaojones commented Mar 19, 2016

nicholas-leonard commented Mar 22, 2016

Kaixhin commented Jun 18, 2016

nicholas-leonard commented Jul 16, 2016

incompatibility with nngraph's gmodule in evaluate mode #172

incompatibility with nngraph's gmodule in evaluate mode #172

Comments

justinmaojones commented Mar 19, 2016

nicholas-leonard commented Mar 22, 2016

Kaixhin commented Jun 18, 2016

nicholas-leonard commented Jul 16, 2016