Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training failure when using non-default number of threads #3

Open
rszeto opened this issue Aug 17, 2017 · 0 comments
Open

Training failure when using non-default number of threads #3

rszeto opened this issue Aug 17, 2017 · 0 comments

Comments

@rszeto
Copy link

rszeto commented Aug 17, 2017

Hi Emily,

I ran into a problem when running the train_drnet.lua script when setting the --nThreads flag to values other than 0 (I tried 1, 2, and 5). I get the following output:

Found Environment variable CUDNN_PATH = /usr/local/cudnn-v5.1/lib64/libcudnn.so.5{
  contentDim : 64
  seed : 1
  beta1 : 0.9
  name : "default"
  learningRate : 0.002
  movingDigits : 1
  batchSize : 100
  imageSize : 64
  optimizer : "adam"
  model : "dcgan"
  save : "logs//moving_mnist/default"
  gpu : 0
  dataRoot : "data"
  depth : 18
  dataWarmup : 10
  advWeight : 0
  dataset : "moving_mnist"
  epochSize : 50000
  cropSize : 227
  maxStep : 12
  normalize : false
  nEpochs : 200
  poseDim : 5
  decoder : "dcgan"
  dataPool : 200
  nThreads : 1
  nShare : 1
}
<torch> set nb of threads to 1	
<gpu> using device 0	
Loaded models from file	
/home/szetor/build/torch/install/bin/luajit: .../szetor/build/torch/install/share/lua/5.1/trepl/init.lua:389: ...or/build/torch/install/share/lua/5.1/threads/threads.lua:183: [thread 1 callback] ./data/moving_mnist.lua:12: attempt to index local 'opt' (a nil value)
stack traceback:
	./data/moving_mnist.lua:12: in function 'getData'
	./data/moving_mnist.lua:32: in function '__init'
	.../szetor/build/torch/install/share/lua/5.1/torch/init.lua:91: in function <.../szetor/build/torch/install/share/lua/5.1/torch/init.lua:87>
	[C]: in function 'MovingMNISTLoader'
	./data/moving_mnist.lua:150: in main chunk
	[C]: in function 'require'
	./data/data.lua:10: in function 'getDatasourceFun'
	./data/threads.lua:28: in function <./data/threads.lua:18>
	[C]: in function 'xpcall'
	...or/build/torch/install/share/lua/5.1/threads/threads.lua:234: in function 'callback'
	...etor/build/torch/install/share/lua/5.1/threads/queue.lua:65: in function <...etor/build/torch/install/share/lua/5.1/threads/queue.lua:41>
	[C]: in function 'pcall'
	...etor/build/torch/install/share/lua/5.1/threads/queue.lua:40: in function 'dojob'
	[string "  local Queue = require 'threads.queue'..."]:13: in main chunk
stack traceback:
	[C]: in function 'error'
	.../szetor/build/torch/install/share/lua/5.1/trepl/init.lua:389: in function 'require'
	train_drnet.lua:321: in main chunk
	[C]: in function 'dofile'
	...uild/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
	[C]: at 0x00405d50

My guess is that the MovingMNISTLoader class doesn't have access to the global opt variable when threading, unlike in the case when opt.nThreads is 0.

I would appreciate your help in fixing this issue. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant