RuntimeError:Cuda call failed #114

toprocker · 2019-01-02T10:20:05Z

My Environment:
ubuntu16.04+python3.6+cuda9.2+cudnn7.2+pytorch0.4.1

--inference --model FlowNet2 --save_flow --inference_dataset MpiSintelClean --inference_dataset_root /home/toprocker/datasets/MpiSintel/training/ --resume /home/toprocker/PycharmProjects/FlowNet2-torch/checkpoints/FlowNet2_checkpoint.pth.tar

**Initializing Adam Optimizer
[0.001s] amsgrad = False (<class 'bool'>)
[0.001s] weight_decay = 0 (<class 'int'>)
[0.001s] eps = 1e-08 (<class 'float'>)
[0.001s] betas = (0.9, 0.999) (<class 'tuple'>)
[0.001s] lr = 0.001 (<class 'float'>)
[0.001s] Operation finished

Overall Progress: 0%| | 0/1 [00:00<?, ?it/s]
Inferencing : 0%| | 0/1041.0 [00:00<?, ?it/s]/home/toprocker/.local/lib/python3.6/site-packages/torch/nn/modules/upsampling.py:122: UserWarning: nn.Upsampling is deprecated. Use nn.functional.interpolate instead.
warnings.warn("nn.Upsampling is deprecated. Use nn.functional.interpolate instead.")
/home/toprocker/.local/lib/python3.6/site-packages/torch/nn/functional.py:1961: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
/home/toprocker/PycharmProjects/FlowNet2-torch/main.py:375: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
total_loss += loss_val.data[0]
/home/toprocker/PycharmProjects/FlowNet2-torch/main.py:376: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
loss_values = [v.data[0] for v in losses]

Inference Averages for Epoch 0: L1: 0.205, EPE: 0.326: 0%| | 0/1041.0 [00:02<?, ?it/s]
Inference Averages for Epoch 0: L1: 0.205, EPE: 0.326: 0%| | 1/1041.0 [00:02<43:36, 2.52s/it]
Inference Averages for Epoch 0: L1: 0.205, EPE: 0.326: 0%| | 1/1041.0 [00:02<43:36, 2.52s/it]error in correlation_forward_cuda_kernel: no kernel image is available for execution on the device
Traceback (most recent call last):
File "/home/toprocker/PycharmProjects/FlowNet2-torch/main.py", line 409, in
stats = inference(args=args, epoch=epoch - 1, data_loader=inference_loader, model=model_and_loss, offset=offset)
File "/home/toprocker/PycharmProjects/FlowNet2-torch/main.py", line 371, in inference
losses, output = model(data[0], target[0], inference=True)
File "/home/toprocker/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/toprocker/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 121, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/toprocker/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/toprocker/PycharmProjects/FlowNet2-torch/main.py", line 170, in forward
output = self.model(data)
File "/home/toprocker/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/toprocker/PycharmProjects/FlowNet2-torch/models.py", line 118, in forward
flownetc_flow2 = self.flownetc(x)[0]
File "/home/toprocker/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, *kwargs)
File "/home/toprocker/PycharmProjects/FlowNet2-torch/networks/FlowNetC.py", line 86, in forward
out_corr = self.corr(out_conv3a, out_conv3b) # False
File "/home/toprocker/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(input, kwargs)
File "/home/toprocker/PycharmProjects/FlowNet2-torch/networks/correlation_package/correlation.py", line 59, in forward
result = CorrelationFunction(self.pad_size, self.kernel_size, self.max_displacement,self.stride1, self.stride2, self.corr_multiply)(input1, input2)
File "/home/toprocker/PycharmProjects/FlowNet2-torch/networks/correlation_package/correlation.py", line 27, in forward
self.pad_size, self.kernel_size, self.max_displacement,self.stride1, self.stride2, self.corr_multiply)
RuntimeError: CUDA call failed (correlation_forward_cuda at correlation_cuda.cc:79)
frame #0: + 0x136e7 (0x7fa5b90c16e7 in /home/toprocker/.local/lib/python3.6/site-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #1: + 0x1392e (0x7fa5b90c192e in /home/toprocker/.local/lib/python3.6/site-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #2: + 0xfde2 (0x7fa5b90bdde2 in /home/toprocker/.local/lib/python3.6/site-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #3: ./work() [0x50682c]
frame #4: _PyEval_EvalFrameDefault + 0x4de (0x50729e in ./work)
frame #5: ./work() [0x596285]
frame #6: PyObject_Call + 0x3e (0x59cf6e in ./work)
frame #7: THPFunction_do_forward(THPFunction, _object) + 0x2ad (0x7fa5e7736bfd in /home/toprocker/.local/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #8: PyCFunction_Call + 0x52 (0x566b52 in ./work)
frame #9: ./work() [0x54d7f9]
frame #10: _PyObject_FastCallKeywords + 0x19c (0x59d9fc in ./work)
frame #11: ./work() [0x5067b0]
frame #12: _PyEval_EvalFrameDefault + 0x4de (0x50729e in ./work)
frame #13: ./work() [0x596285]
frame #14: PyObject_Call + 0x3e (0x59cf6e in ./work)
frame #15: _PyEval_EvalFrameDefault + 0x1ab3 (0x508873 in ./work)
frame #16: ./work() [0x504232]
frame #17: ./work() [0x5966f8]
frame #18: ./work() [0x54d4ee]
frame #19: _PyObject_FastCallKeywords + 0x19c (0x59d9fc in ./work)
frame #20: ./work() [0x5067b0]
frame #21: _PyEval_EvalFrameDefault + 0x4de (0x50729e in ./work)
frame #22: ./work() [0x596285]
frame #23: PyObject_Call + 0x3e (0x59cf6e in ./work)
frame #24: _PyEval_EvalFrameDefault + 0x1ab3 (0x508873 in ./work)
frame #25: ./work() [0x504232]
frame #26: ./work() [0x5966f8]
frame #27: ./work() [0x54d4ee]
frame #28: _PyObject_FastCallKeywords + 0x19c (0x59d9fc in ./work)
frame #29: ./work() [0x5067b0]
frame #30: _PyEval_EvalFrameDefault + 0x4de (0x50729e in ./work)
frame #31: ./work() [0x596285]
frame #32: PyObject_Call + 0x3e (0x59cf6e in ./work)
frame #33: _PyEval_EvalFrameDefault + 0x1ab3 (0x508873 in ./work)
frame #34: ./work() [0x504232]
frame #35: ./work() [0x5966f8]
frame #36: ./work() [0x54d4ee]
frame #37: _PyObject_FastCallKeywords + 0x19c (0x59d9fc in ./work)
frame #38: ./work() [0x5067b0]
frame #39: _PyEval_EvalFrameDefault + 0x4de (0x50729e in ./work)
frame #40: ./work() [0x504232]
frame #41: ./work() [0x596698]
frame #42: PyObject_Call + 0x3e (0x59cf6e in ./work)
frame #43: _PyEval_EvalFrameDefault + 0x1ab3 (0x508873 in ./work)
frame #44: ./work() [0x504232]
frame #45: ./work() [0x596698]
frame #46: ./work() [0x54d4ee]
frame #47: PyObject_Call + 0x3e (0x59cf6e in ./work)
frame #48: _PyEval_EvalFrameDefault + 0x1ab3 (0x508873 in ./work)
frame #49: ./work() [0x504232]
frame #50: ./work() [0x596698]
frame #51: PyObject_Call + 0x3e (0x59cf6e in ./work)
frame #52: _PyEval_EvalFrameDefault + 0x1ab3 (0x508873 in ./work)
frame #53: ./work() [0x504232]
frame #54: ./work() [0x596698]
frame #55: ./work() [0x54d4ee]
frame #56: _PyObject_FastCallKeywords + 0x19c (0x59d9fc in ./work)
frame #57: ./work() [0x5067b0]
frame #58: _PyEval_EvalFrameDefault + 0x15c6 (0x508386 in ./work)
frame #59: ./work() [0x504232]
frame #60: ./work() [0x505e83]
frame #61: ./work() [0x5066f0]
frame #62: _PyEval_EvalFrameDefault + 0x15c6 (0x508386 in ./work)
frame #63: ./work() [0x504232]

toprocker · 2019-01-03T04:30:12Z

no kernel image is available for execution on the device
My GPU is GEFORCE 930M whose capacity is 5.0 rather than 5.2 which is needed in the correlation_cuda setup.py.

toprocker · 2019-01-03T17:17:26Z

I solved it through 注释掉correlation_cuda.cc中forward下面的错误检查，Finally i get a series of .flo files.Thanks.

Ritchiegit · 2019-01-26T11:48:38Z

I got same error in my environment：
Geforce940MX+ubuntu16.04+python3.6+cuda9.2+cudnn7.2+pytorch0.4.1
Thanks to your advice, I have coped with it in this way.
step 1: Delete the block of code, just like this:
// check for errors
// if (!success) {
// AT_ERROR("CUDA call failed");
// }
The path is in flownet2-pytorch/networks/correlation_package/correlation_cuda.cc
line:79

step 2: Rerun install.sh in the path of flownet2-pytorch, like:
/flownet2-pytorch$ bash install.sh

Thank you. @toprocker

eugenelyj · 2019-08-04T04:11:00Z

@toprocker Thank you! And i think for the no kernel image error, we need to commented out the checking error code in correlation_cuda_kernel.cu

NVIDIA/flownet2-pytorch#114 (comment)

ivanthewebber · 2020-06-24T17:25:55Z

I find this fix rather surprising (but I'm glad I found it because I have the same issue); did the model train to expected levels of performance and everything?

m00yu · 2023-05-16T07:50:42Z

@Ritchiegit Thanks, your explanation solved my error!

Nepoh added a commit to Nepoh/DAIN that referenced this issue Jan 30, 2020

remove error check

6aee7d0

NVIDIA/flownet2-pytorch#114 (comment)

Barath19 mentioned this issue Jun 25, 2023

RuntimeError:Cuda call failed #114 Barath19/CaLiB#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError:Cuda call failed #114

RuntimeError:Cuda call failed #114

toprocker commented Jan 2, 2019

toprocker commented Jan 3, 2019

toprocker commented Jan 3, 2019

Ritchiegit commented Jan 26, 2019 •

edited

Loading

eugenelyj commented Aug 4, 2019

ivanthewebber commented Jun 24, 2020

m00yu commented May 16, 2023

RuntimeError:Cuda call failed #114

RuntimeError:Cuda call failed #114

Comments

toprocker commented Jan 2, 2019

toprocker commented Jan 3, 2019

toprocker commented Jan 3, 2019

Ritchiegit commented Jan 26, 2019 • edited Loading

eugenelyj commented Aug 4, 2019

ivanthewebber commented Jun 24, 2020

m00yu commented May 16, 2023

Ritchiegit commented Jan 26, 2019 •

edited

Loading