ONNX model does not save on GPU #3144

lezwon · 2020-08-25T03:35:04Z

🐛 Bug

Attempting to export on ONNX after training model on GPU, throws an error is the input_sample or example_input_array is not a CUDA tensor.

To Reproduce

Steps to reproduce the behavior:

Train a model on GPU
Try to export to ONNX when self.example_input_array = torch.zeros(1, 1, 500, 500) or input_sample = torch.zeros(1, 1, 500, 500)

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-32-cd8009a0b6a3> in <module>
      1 filepath = 'model.onnx'
----> 2 model.to_onnx(filepath, export_params=True)

/opt/conda/lib/python3.7/site-packages/pytorch_lightning/core/lightning.py in to_onnx(self, file_path, input_sample, **kwargs)
   1721         if 'example_outputs' not in kwargs:
   1722             self.eval()
-> 1723             kwargs['example_outputs'] = self(input_data)
   1724 
   1725         torch.onnx.export(self, input_data, file_path, **kwargs)

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

<ipython-input-24-51cae3b5e57f> in forward(self, inputs)
     20 
     21     def forward(self, inputs):
---> 22         return self.model(inputs)
     23 
     24     def training_step(self, batch, batch_idx):

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/container.py in forward(self, input)
     98     def forward(self, input):
     99         for module in self:
--> 100             input = module(input)
    101         return input
    102 

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/conv.py in forward(self, input)
    351 
    352     def forward(self, input):
--> 353         return self._conv_forward(input, self.weight)
    354 
    355 class Conv3d(_ConvNd):

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/conv.py in _conv_forward(self, input, weight)
    348                             _pair(0), self.dilation, self.groups)
    349         return F.conv2d(input, weight, self.bias, self.stride,
--> 350                         self.padding, self.dilation, self.groups)
    351 
    352     def forward(self, input):

RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same

Code sample

filepath = 'model.onnx'
model.to_onnx(filepath, export_params=True)

Expected behavior

Should automatically convert example_input_array or input_sample to the device type and save the model to ONNX.

The text was updated successfully, but these errors were encountered:

Borda · 2020-08-25T07:09:53Z

I would say that the problem could be the distributed way, mind check running only on a single GPU?

lezwon · 2020-08-25T09:31:24Z

I ran this on Kaggle notebook. When I tried to save after training, it threw the error.

lezwon added bug Something isn't working help wanted Open to be worked on labels Aug 25, 2020

lezwon mentioned this issue Aug 25, 2020

fix ONNX model save on GPU #3145

Merged

7 tasks

Borda added the priority: 0 High priority task label Aug 25, 2020

mergify bot closed this as completed in #3145 Aug 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ONNX model does not save on GPU #3144

ONNX model does not save on GPU #3144

lezwon commented Aug 25, 2020

Borda commented Aug 25, 2020

lezwon commented Aug 25, 2020

ONNX model does not save on GPU #3144

ONNX model does not save on GPU #3144

Comments

lezwon commented Aug 25, 2020

🐛 Bug

To Reproduce

Code sample

Expected behavior

Borda commented Aug 25, 2020

lezwon commented Aug 25, 2020