Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError:Cuda call failed #114

Open
toprocker opened this issue Jan 2, 2019 · 6 comments
Open

RuntimeError:Cuda call failed #114

toprocker opened this issue Jan 2, 2019 · 6 comments

Comments

@toprocker
Copy link

My Environment:
ubuntu16.04+python3.6+cuda9.2+cudnn7.2+pytorch0.4.1

--inference --model FlowNet2 --save_flow --inference_dataset MpiSintelClean --inference_dataset_root /home/toprocker/datasets/MpiSintel/training/ --resume /home/toprocker/PycharmProjects/FlowNet2-torch/checkpoints/FlowNet2_checkpoint.pth.tar

**Initializing Adam Optimizer
[0.001s] amsgrad = False (<class 'bool'>)
[0.001s] weight_decay = 0 (<class 'int'>)
[0.001s] eps = 1e-08 (<class 'float'>)
[0.001s] betas = (0.9, 0.999) (<class 'tuple'>)
[0.001s] lr = 0.001 (<class 'float'>)
[0.001s] Operation finished

Overall Progress: 0%| | 0/1 [00:00<?, ?it/s]
Inferencing : 0%| | 0/1041.0 [00:00<?, ?it/s]/home/toprocker/.local/lib/python3.6/site-packages/torch/nn/modules/upsampling.py:122: UserWarning: nn.Upsampling is deprecated. Use nn.functional.interpolate instead.
warnings.warn("nn.Upsampling is deprecated. Use nn.functional.interpolate instead.")
/home/toprocker/.local/lib/python3.6/site-packages/torch/nn/functional.py:1961: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
/home/toprocker/PycharmProjects/FlowNet2-torch/main.py:375: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
total_loss += loss_val.data[0]
/home/toprocker/PycharmProjects/FlowNet2-torch/main.py:376: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
loss_values = [v.data[0] for v in losses]

Inference Averages for Epoch 0: L1: 0.205, EPE: 0.326: 0%| | 0/1041.0 [00:02<?, ?it/s]
Inference Averages for Epoch 0: L1: 0.205, EPE: 0.326: 0%| | 1/1041.0 [00:02<43:36, 2.52s/it]
Inference Averages for Epoch 0: L1: 0.205, EPE: 0.326: 0%| | 1/1041.0 [00:02<43:36, 2.52s/it]error in correlation_forward_cuda_kernel: no kernel image is available for execution on the device
Traceback (most recent call last):
File "/home/toprocker/PycharmProjects/FlowNet2-torch/main.py", line 409, in
stats = inference(args=args, epoch=epoch - 1, data_loader=inference_loader, model=model_and_loss, offset=offset)
File "/home/toprocker/PycharmProjects/FlowNet2-torch/main.py", line 371, in inference
losses, output = model(data[0], target[0], inference=True)
File "/home/toprocker/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/toprocker/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 121, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/toprocker/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/toprocker/PycharmProjects/FlowNet2-torch/main.py", line 170, in forward
output = self.model(data)
File "/home/toprocker/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/toprocker/PycharmProjects/FlowNet2-torch/models.py", line 118, in forward
flownetc_flow2 = self.flownetc(x)[0]
File "/home/toprocker/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, *kwargs)
File "/home/toprocker/PycharmProjects/FlowNet2-torch/networks/FlowNetC.py", line 86, in forward
out_corr = self.corr(out_conv3a, out_conv3b) # False
File "/home/toprocker/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(input, kwargs)
File "/home/toprocker/PycharmProjects/FlowNet2-torch/networks/correlation_package/correlation.py", line 59, in forward
result = CorrelationFunction(self.pad_size, self.kernel_size, self.max_displacement,self.stride1, self.stride2, self.corr_multiply)(input1, input2)
File "/home/toprocker/PycharmProjects/FlowNet2-torch/networks/correlation_package/correlation.py", line 27, in forward
self.pad_size, self.kernel_size, self.max_displacement,self.stride1, self.stride2, self.corr_multiply)
RuntimeError: CUDA call failed (correlation_forward_cuda at correlation_cuda.cc:79)
frame #0: + 0x136e7 (0x7fa5b90c16e7 in /home/toprocker/.local/lib/python3.6/site-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #1: + 0x1392e (0x7fa5b90c192e in /home/toprocker/.local/lib/python3.6/site-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #2: + 0xfde2 (0x7fa5b90bdde2 in /home/toprocker/.local/lib/python3.6/site-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #3: ./work() [0x50682c]
frame #4: _PyEval_EvalFrameDefault + 0x4de (0x50729e in ./work)
frame #5: ./work() [0x596285]
frame #6: PyObject_Call + 0x3e (0x59cf6e in ./work)
frame #7: THPFunction_do_forward(THPFunction
, _object
) + 0x2ad (0x7fa5e7736bfd in /home/toprocker/.local/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #8: PyCFunction_Call + 0x52 (0x566b52 in ./work)
frame #9: ./work() [0x54d7f9]
frame #10: _PyObject_FastCallKeywords + 0x19c (0x59d9fc in ./work)
frame #11: ./work() [0x5067b0]
frame #12: _PyEval_EvalFrameDefault + 0x4de (0x50729e in ./work)
frame #13: ./work() [0x596285]
frame #14: PyObject_Call + 0x3e (0x59cf6e in ./work)
frame #15: _PyEval_EvalFrameDefault + 0x1ab3 (0x508873 in ./work)
frame #16: ./work() [0x504232]
frame #17: ./work() [0x5966f8]
frame #18: ./work() [0x54d4ee]
frame #19: _PyObject_FastCallKeywords + 0x19c (0x59d9fc in ./work)
frame #20: ./work() [0x5067b0]
frame #21: _PyEval_EvalFrameDefault + 0x4de (0x50729e in ./work)
frame #22: ./work() [0x596285]
frame #23: PyObject_Call + 0x3e (0x59cf6e in ./work)
frame #24: _PyEval_EvalFrameDefault + 0x1ab3 (0x508873 in ./work)
frame #25: ./work() [0x504232]
frame #26: ./work() [0x5966f8]
frame #27: ./work() [0x54d4ee]
frame #28: _PyObject_FastCallKeywords + 0x19c (0x59d9fc in ./work)
frame #29: ./work() [0x5067b0]
frame #30: _PyEval_EvalFrameDefault + 0x4de (0x50729e in ./work)
frame #31: ./work() [0x596285]
frame #32: PyObject_Call + 0x3e (0x59cf6e in ./work)
frame #33: _PyEval_EvalFrameDefault + 0x1ab3 (0x508873 in ./work)
frame #34: ./work() [0x504232]
frame #35: ./work() [0x5966f8]
frame #36: ./work() [0x54d4ee]
frame #37: _PyObject_FastCallKeywords + 0x19c (0x59d9fc in ./work)
frame #38: ./work() [0x5067b0]
frame #39: _PyEval_EvalFrameDefault + 0x4de (0x50729e in ./work)
frame #40: ./work() [0x504232]
frame #41: ./work() [0x596698]
frame #42: PyObject_Call + 0x3e (0x59cf6e in ./work)
frame #43: _PyEval_EvalFrameDefault + 0x1ab3 (0x508873 in ./work)
frame #44: ./work() [0x504232]
frame #45: ./work() [0x596698]
frame #46: ./work() [0x54d4ee]
frame #47: PyObject_Call + 0x3e (0x59cf6e in ./work)
frame #48: _PyEval_EvalFrameDefault + 0x1ab3 (0x508873 in ./work)
frame #49: ./work() [0x504232]
frame #50: ./work() [0x596698]
frame #51: PyObject_Call + 0x3e (0x59cf6e in ./work)
frame #52: _PyEval_EvalFrameDefault + 0x1ab3 (0x508873 in ./work)
frame #53: ./work() [0x504232]
frame #54: ./work() [0x596698]
frame #55: ./work() [0x54d4ee]
frame #56: _PyObject_FastCallKeywords + 0x19c (0x59d9fc in ./work)
frame #57: ./work() [0x5067b0]
frame #58: _PyEval_EvalFrameDefault + 0x15c6 (0x508386 in ./work)
frame #59: ./work() [0x504232]
frame #60: ./work() [0x505e83]
frame #61: ./work() [0x5066f0]
frame #62: _PyEval_EvalFrameDefault + 0x15c6 (0x508386 in ./work)
frame #63: ./work() [0x504232]

@toprocker
Copy link
Author

no kernel image is available for execution on the device
My GPU is GEFORCE 930M whose capacity is 5.0 rather than 5.2 which is needed in the correlation_cuda setup.py.

@toprocker
Copy link
Author

I solved it through 注释掉correlation_cuda.cc中forward下面的错误检查,Finally i get a series of .flo files.Thanks.

@Ritchiegit
Copy link

Ritchiegit commented Jan 26, 2019

I got same error in my environment:
Geforce940MX+ubuntu16.04+python3.6+cuda9.2+cudnn7.2+pytorch0.4.1
Thanks to your advice, I have coped with it in this way.
step 1: Delete the block of code, just like this:
// check for errors
// if (!success) {
// AT_ERROR("CUDA call failed");
// }
The path is in flownet2-pytorch/networks/correlation_package/correlation_cuda.cc
line:79

step 2: Rerun install.sh in the path of flownet2-pytorch, like:
/flownet2-pytorch$ bash install.sh

Thank you. @toprocker

@eugenelyj
Copy link

@toprocker Thank you! And i think for the no kernel image error, we need to commented out the checking error code in correlation_cuda_kernel.cu

Nepoh added a commit to Nepoh/DAIN that referenced this issue Jan 30, 2020
@ivanthewebber
Copy link

I find this fix rather surprising (but I'm glad I found it because I have the same issue); did the model train to expected levels of performance and everything?

@m00yu
Copy link

m00yu commented May 16, 2023

@Ritchiegit Thanks, your explanation solved my error!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants