Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors when testing on CPU #154

Open
bryanbocao opened this issue Jun 24, 2022 · 11 comments
Open

Errors when testing on CPU #154

bryanbocao opened this issue Jun 24, 2022 · 11 comments

Comments

@bryanbocao
Copy link

layer_name: <class 'torch.nn.modules.conv.Conv2d'>, total_params: 15121584, total_traina_params: 15121584, n_layers: 39
device:  cpu
Traceback (most recent call last):
  File "main.py", line 208, in <module>
    test(epoch)
  File "main.py", line 189, in test
    outputs = net(inputs)
  File "/home/brcao/Apps/anaconda3/envs/yolo/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/brcao/Repos/pytorch-cifar/models/dla_simple.py", line 106, in forward
    out = self.base(x)
  File "/home/brcao/Apps/anaconda3/envs/yolo/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/brcao/Apps/anaconda3/envs/yolo/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/home/brcao/Apps/anaconda3/envs/yolo/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/brcao/Apps/anaconda3/envs/yolo/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 423, in forward
    return self._conv_forward(input, self.weight)
  File "/home/brcao/Apps/anaconda3/envs/yolo/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 419, in _conv_forward
    return F.conv2d(input, weight, self.bias, self.stride,
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _thnn_conv2d_forward
@logan-mo
Copy link

logan-mo commented Aug 6, 2022

@bryanbocao Your device is not being set to the GPU. Can you make sure if your Cuda drivers are properly installed and all models and datasets are being sent to the GPU memory?

@bryanbocao
Copy link
Author

bryanbocao commented Aug 6, 2022

@Phillibob55 That's a different problem. The current issue is not about CUDA driver installation/configurations.
I can run it on GPU but I intentionally wanted to test the CPU runtime, which requires both the models and data to be in CPU memory instead of GPU.

The model is trained and saved on GPU memory, need to add map_location=device argument when loading the model where device='cpu' in order to run the model on CPU. I've solved this issue by

parser.add_argument('--select_device', type=str, default='gpu', help='gpu | cpu')
...
device = 'cuda' if torch.cuda.is_available() and args.select_device == 'gpu' else 'cpu'
...
checkpoint = torch.load('./checkpoint/{}_ckpt.pth'.format(args.net), map_location=device)

@bryanbocao Your device is not being set to the GPU. Can you make sure if your Cuda drivers are properly installed and all models and datasets are being sent to the GPU memory?

@bryanbocao bryanbocao reopened this Aug 6, 2022
@bryanbocao
Copy link
Author

bryanbocao commented Aug 6, 2022

@Phillibob55
Check this out
#152
b782bba

@logan-mo
Copy link

logan-mo commented Aug 7, 2022

@Phillibob55 That's a different problem. The current issue is not about CUDA driver installation/configurations. I can run it on GPU but I intentionally wanted to test the CPU runtime, which requires both the models and data to be in CPU memory instead of GPU.

The model is trained and saved on GPU memory, need to add map_location=device argument when loading the model where device='cpu' in order to run the model on CPU. I've solved this issue by

parser.add_argument('--select_device', type=str, default='gpu', help='gpu | cpu')
...
device = 'cuda' if torch.cuda.is_available() and args.select_device == 'gpu' else 'cpu'
...
checkpoint = torch.load('./checkpoint/{}_ckpt.pth'.format(args.net), map_location=device)

@bryanbocao Your device is not being set to the GPU. Can you make sure if your Cuda drivers are properly installed and all models and datasets are being sent to the GPU memory?

ooh, yeah. Makes sense that way. OC seems to be inactive. So I'm working on my own version of this, which runs on any image dataset and doesn't have these problems, etc.

@logan-mo
Copy link

logan-mo commented Aug 7, 2022

@bryanbocao Can you kinda guide to make these models work with image sizes other than 32x32?

@bryanbocao
Copy link
Author

@Phillibob55 I am happy to work on that.
Do you mean
(1) simply resize any arbitrary images into 32x32 resolution and feed them into these models? It can simply be done by adding one more argument in the command line and resize method in the code.
or
(2) prepare a set of models whose direct input shape is different from 32x32.

@logan-mo
Copy link

logan-mo commented Aug 7, 2022

@bryanbocao
I started off with the first approach and just added a resize transform, but that loses a lot of information. For datasets like ImageNet, this doesn't give accuracy above 50%. So I was thinking maybe if I try to make the models accept images of any size, it might give me better results, taking more time training of course.

The codebase on the repo has a lot of hardcoded elements. I combated the 10 output classes by adding an argument for number of classes in every model class. But I don't have the knowledge to know what's going on in the complex models to modify them to be able to accept images of any size.

P.S, I'm using a Jupyter notebook instead of my main.py

@logan-mo
Copy link

logan-mo commented Aug 7, 2022

I've created a repo for it here

@bryanbocao
Copy link
Author

@bryanbocao I started off with the first approach and just added a resize transform, but that loses a lot of information. For datasets like ImageNet, this doesn't give accuracy above 50%. So I was thinking maybe if I try to make the models accept images of any size, it might give me better results, taking more time training of course.

The codebase on the repo has a lot of hardcoded elements. I combated the 10 output classes by adding an argument for number of classes in every model class. But I don't have the knowledge to know what's going on in the complex models to modify them to be able to accept images of any size.

P.S, I'm using a Jupyter notebook instead of my main.py

@Phillibob55 Sounds good. If you would like to create an easy-to-use repo that we can just change some arguments to train and test many different models, I am happy to contribute in my spare time. I have forked you repo to
https://github.com/bryanbocao/image-classification

@CopyABCs
Copy link

CopyABCs commented Mar 3, 2023

Hello, I would like to ask you what caused the following error after running, and how to deal with it:

D:\anaconda3\envs\datudui\python.exe C:/Users/52254/Desktop/pytorch-cifar-master/main.py
'stty' �����ڲ����ⲿ���Ҳ���ǿ����еij���
���������ļ���
Traceback (most recent call last):
File "C:/Users/52254/Desktop/pytorch-cifar-master/main.py", line 15, in
from utils import progress_bar
File "C:\Users\52254\Desktop\pytorch-cifar-master\utils.py", line 45, in
_, term_width = os.popen('stty size', 'r').read().split()
ValueError: not enough values to unpack (expected 2, got 0)

@Selfpline6
Copy link

Hello, I would like to ask you what caused the following error after running, and how to deal with it:

D:\anaconda3\envs\datudui\python.exe C:/Users/52254/Desktop/pytorch-cifar-master/main.py 'stty' �����ڲ����ⲿ���Ҳ���ǿ����еij��� ���������ļ��� Traceback (most recent call last): File "C:/Users/52254/Desktop/pytorch-cifar-master/main.py", line 15, in from utils import progress_bar File "C:\Users\52254\Desktop\pytorch-cifar-master\utils.py", line 45, in _, term_width = os.popen('stty size', 'r').read().split() ValueError: not enough values to unpack (expected 2, got 0)

Hello, I would like to ask you what caused the following error after running, and how to deal with it:

D:\anaconda3\envs\datudui\python.exe C:/Users/52254/Desktop/pytorch-cifar-master/main.py 'stty' �����ڲ����ⲿ���Ҳ���ǿ����еij��� ���������ļ��� Traceback (most recent call last): File "C:/Users/52254/Desktop/pytorch-cifar-master/main.py", line 15, in from utils import progress_bar File "C:\Users\52254\Desktop\pytorch-cifar-master\utils.py", line 45, in _, term_width = os.popen('stty size', 'r').read().split() ValueError: not enough values to unpack (expected 2, got 0)

I have the same problem as you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants