Training failure when using non-default number of threads #3

rszeto · 2017-08-17T12:08:32Z

Hi Emily,

I ran into a problem when running the train_drnet.lua script when setting the --nThreads flag to values other than 0 (I tried 1, 2, and 5). I get the following output:

Found Environment variable CUDNN_PATH = /usr/local/cudnn-v5.1/lib64/libcudnn.so.5{
  contentDim : 64
  seed : 1
  beta1 : 0.9
  name : "default"
  learningRate : 0.002
  movingDigits : 1
  batchSize : 100
  imageSize : 64
  optimizer : "adam"
  model : "dcgan"
  save : "logs//moving_mnist/default"
  gpu : 0
  dataRoot : "data"
  depth : 18
  dataWarmup : 10
  advWeight : 0
  dataset : "moving_mnist"
  epochSize : 50000
  cropSize : 227
  maxStep : 12
  normalize : false
  nEpochs : 200
  poseDim : 5
  decoder : "dcgan"
  dataPool : 200
  nThreads : 1
  nShare : 1
}
<torch> set nb of threads to 1	
<gpu> using device 0	
Loaded models from file	
/home/szetor/build/torch/install/bin/luajit: .../szetor/build/torch/install/share/lua/5.1/trepl/init.lua:389: ...or/build/torch/install/share/lua/5.1/threads/threads.lua:183: [thread 1 callback] ./data/moving_mnist.lua:12: attempt to index local 'opt' (a nil value)
stack traceback:
	./data/moving_mnist.lua:12: in function 'getData'
	./data/moving_mnist.lua:32: in function '__init'
	.../szetor/build/torch/install/share/lua/5.1/torch/init.lua:91: in function <.../szetor/build/torch/install/share/lua/5.1/torch/init.lua:87>
	[C]: in function 'MovingMNISTLoader'
	./data/moving_mnist.lua:150: in main chunk
	[C]: in function 'require'
	./data/data.lua:10: in function 'getDatasourceFun'
	./data/threads.lua:28: in function <./data/threads.lua:18>
	[C]: in function 'xpcall'
	...or/build/torch/install/share/lua/5.1/threads/threads.lua:234: in function 'callback'
	...etor/build/torch/install/share/lua/5.1/threads/queue.lua:65: in function <...etor/build/torch/install/share/lua/5.1/threads/queue.lua:41>
	[C]: in function 'pcall'
	...etor/build/torch/install/share/lua/5.1/threads/queue.lua:40: in function 'dojob'
	[string "  local Queue = require 'threads.queue'..."]:13: in main chunk
stack traceback:
	[C]: in function 'error'
	.../szetor/build/torch/install/share/lua/5.1/trepl/init.lua:389: in function 'require'
	train_drnet.lua:321: in main chunk
	[C]: in function 'dofile'
	...uild/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
	[C]: at 0x00405d50

My guess is that the MovingMNISTLoader class doesn't have access to the global opt variable when threading, unlike in the case when opt.nThreads is 0.

I would appreciate your help in fixing this issue. Thank you.

The text was updated successfully, but these errors were encountered:

rszeto mentioned this issue Aug 17, 2017

Fixed parallel data loading for Moving MNIST #4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training failure when using non-default number of threads #3

Training failure when using non-default number of threads #3

rszeto commented Aug 17, 2017

Training failure when using non-default number of threads #3

Training failure when using non-default number of threads #3

Comments

rszeto commented Aug 17, 2017