You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I use 4 GPU setup and trying to make DALI work as a backend for DataLoader (making slight corrections in existing NeMo code to at least make it without errors till pipeline compilation) but training still fails (fails PTL sanity check)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
self.padding, self.dilation, self.groups)
**RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 0 does not equal 2 (while checking arguments for cudnn_convolution)**
result = self.forward(*input, **kwargs)
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 257, in forward
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 257, in forward
self.padding, self.dilation, self.groups)
self.padding, self.dilation, self.groups)
**Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 0 does not equal 2 ** is the main part here and it repeats for devices {0,1,3}.
Seems like some of the operations in pipeline are done on cpu, and the others on different gpu or something in the like. Is there a way, that I can work through this problem?
I use 4 GPU setup and trying to make DALI work as a backend for DataLoader (making slight corrections in existing NeMo code to at least make it without errors till pipeline compilation) but training still fails (fails PTL sanity check)
**Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 0 does not equal 2 ** is the main part here and it repeats for devices {0,1,3}.
Seems like some of the operations in pipeline are done on cpu, and the others on different gpu or something in the like. Is there a way, that I can work through this problem?
Environment details
Environment location: DGX-1
Torch '1.6.0'
Python 3.76
Ubuntu 18.04.4 LTS"
The text was updated successfully, but these errors were encountered: