Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run_ssd_live_demo.py: "RuntimeError: expected device cpu but got device cuda:0" #89

Open
Jaftem opened this issue Dec 6, 2019 · 7 comments

Comments

@Jaftem
Copy link

Jaftem commented Dec 6, 2019

Hi,

I trained a mb2-ssd-lite model with a subset (just 1 class) of Open Images on just 20 epochs. I'm now attempting to run the live demo with this model:

$ python run_ssd_live_demo.py mb2-ssd-lite models/mb2-ssd-lite-Epoch-19-Loss-3.6359732536622036.pth models/open-images-model-labels.txt 

And I get the runtime error

Traceback (most recent call last):
  File "run_ssd_live_demo.py", line 65, in <module>
    boxes, labels, probs = predictor.predict(image, 10, 0.4)
  File "/ml/playground/pytorch-ssd/vision/ssd/predictor.py", line 37, in predict
    scores, boxes = self.net.forward(images)
  File "/ml/playground/pytorch-ssd/vision/ssd/ssd.py", line 93, in forward
    locations, self.priors, self.config.center_variance, self.config.size_variance
  File "/ml/playground/pytorch-ssd/vision/utils/box_utils.py", line 104, in convert_locations_to_boxes
    locations[..., :2] * center_variance * priors[..., 2:] + priors[..., :2],
RuntimeError: expected device cpu but got device cuda:0

I can run the live demo on the pretrained model as per the README's instrucitons without error. Any ideas?

@Jaftem Jaftem changed the title run_ssd_live_demo.py: RuntimeError: expected device cpu but got device cuda:0 run_ssd_live_demo.py: "RuntimeError: expected device cpu but got device cuda:0" Dec 6, 2019
@Jaftem
Copy link
Author

Jaftem commented Dec 6, 2019

So it looks like the issue is the locations tensor being a CPU tensor and priors being a CUDA tensor. On line 93 of vision/ssd/ssd.py I made the following change:

    locations.to(self.device), self.priors, self.config.center_variance, self.config.size_variance

Which gets the live demo to work. But because mb1-ssd seems to work fine, I believe the issue occurs at some point prior to this and that the above fix is more of a workaround. I haven't fully reviewed the entire code base to know if there is a better fix.

@hsahovic
Copy link

I ran into the same issue as @Jaftem. His workaround solved my problem.
It seems that trying to convert models trained on GPUs does not work with the current code base.

@vladserkoff
Copy link

Since the demo shows the inference on CPU, you either want to pass map_location='cpu' here

net.load(model_path)

or explicitly move the model to gpu somewhere in run_ssd_live_demo.py

@TheCamilovisk
Copy link

@Jaftem try changing line 50 on run_ssd_live_demo.py from this:

predictor = create_mobilenetv2_ssd_lite_predictor(net, candidate_size=200)

to this:

predictor = create_mobilenetv2_ssd_lite_predictor(net, candidate_size=200, device=torch.device('cuda'))

Initializing the Predictor class this way solves this issue without touching the SSD class.
I don't know the reason why the CPU is default device of the mb2-ssd-lite Predictor, but this line of the README may be a clue:

You may notice MobileNetV2 SSD/SSD-Lite is slower than MobileNetV1 SSD/Lite on PC. However, MobileNetV2 is faster on mobile devices.

So, this model may be intended to be used in mobile devices.

@MADHAVAN001
Copy link

As @Jaftem pointed out, this error is reproducible when running in a node with GPUs. Pytorch seems to be loading into GPU by default.

Running this is a CPU only node works.

@YaYaB
Copy link

YaYaB commented Apr 10, 2020

Yep exactly, the issue comes from
https://github.com/qfgaohao/pytorch-ssd/blob/master/vision/ssd/ssd.py#L35
ssd model will be loaded using cuda device if available. ssd's constructor has a device parameter that is None by default.
When loading the mobilenet_v2_ssd_lite network, ssd's constructor is called without any device at all, thus it will load with cuda if available.
https://github.com/qfgaohao/pytorch-ssd/blob/master/vision/ssd/mobilenet_v2_ssd_lite.py#L58

On my side I just forced the device to be cpu in ssd because that is why I need. However a better solution would be to add a parameter device to the mobilenet_v2_ssd_lite so that it can specify a device

@AiueoABC
Copy link

I got similar error when I run convert_to_caffe2_models.py to convert mobilenet_v2_ssd_lite model to onnx.
In this case, @Jaftem 's solution helped me to get onnx, but not sure this is okay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants