Exception when using CPP package with GPU (TitanX) with mlp() example #5633

Bumblebee1964 · 2017-03-30T11:19:44Z

For bugs or installation issues, please provide the following information.
The more information you provide, the more likely people will be able to help you.

Environment info

Windows 10, 64 bit
Compiler:

Package used (Python/R/Scala/Julia):
cpp
MXNet version:

Or if installed from source:
Latest (30-3-22017)
MXNet commit hash (git rev-parse HEAD):

If you are using python package, please provide

Python version and distribution:

If you are using R package, please provide

R sessionInfo():

Error Message:

Please paste the full error message, including stack trace.
Result in console window:
make the Executor
Training
epoch 0
[[13:08:2413:08:24] ] C:\Projects\MXNet\Base\dmlc-core\include\dmlc/logging.h:C:\Projects\MXNet\Base\dmlc-core\include\dmlc/logging.h300: [13:08:24] c:\projects\mxnet\base\mshadow\mshadow./cuda/tensor_gpu-inl.cuh:106: Check failed: err == cudaSuccess (8 vs. 0) Name: MapPlanKernel ErrStr:invalid device function:
300: [13:08:24] [13:08:24] c:\projects\mxnet\base\mshadow\mshadow./cuda/tensor_gpu-inl.cuh:106: Check failed: err == cudaSuccess (8 vs. 0) Name: MapPlanKernel ErrStr:invalid device functionC:\Projects\MXNet\Base\dmlc-core\include\dmlc/logging.h
:300: [13:08:24] c:\projects\mxnet\base\src\engine./threaded_engine.h:329: [13:08:24] c:\projects\mxnet\base\mshadow\mshadow./cuda/tensor_gpu-inl.cuh:106: Check failed: err == cudaSuccess (8 vs. 0) Name: MapPlanKernel ErrStr:invalid device function
An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.
[13:08:24] C:\Projects\MXNet\Base\dmlc-core\include\dmlc/logging.h:300: [13:08:24] c:\projects\mxnet\base\src\engine./threaded_engine.h:329: [13:08:24] c:\projects\mxnet\base\mshadow\mshadow./cuda/tensor_gpu-inl.cuh:106: Check failed: err == cudaSuccess (8 vs. 0) Name: MapPlanKernel ErrStr:invalid device function
An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Minimum reproducible example

if you are using your own code, please provide a short script that reproduces the error.

Steps to reproduce

or if you are running standard examples, please provide the commands you have run that lead to the error.

Tried to create a Windows 2015 solution with c++ project. As it is not supplied (as was in the old separate cpp-package). Used the example file mlp.cpp. Hell of a job to get everything right als environment variables and include paths
When run with context CPU, it works as should. When run with GPU device 0 (a GT640, compute capability 2.1 graphics card) it works. When run with GPU device 1 (a NVidia TitanX, board) it crashes. Probably when copying the ndarray to the GPU.
With the old cpp-package I had the opposite, graphics card did not work, Titan X did work:) No clue on the cause of the problem with the graphics card in this particular case.

What have you tried to solve it?

Tried to debug it. Somewhere awaiting the copy to the GPU the exception is caused. Due to multiple threads couldn't find the real place the eception is generated
Looked at all issues as envrironment variables
In contrast to the previous version of the c-api / cpp package, it is quite time consuming to fix everything. Maybe someone with skills can get it to the same level is the previous version?

Has anyone been able to run the cpp examples with the new mxnet version and c-api on a windows machine?

The text was updated successfully, but these errors were encountered:

ptrendx · 2017-03-31T17:43:55Z

The "invalid device function" error suggests you do not generate the code for the proper GPU compute capability. I don't have experience with Visual Studio, but somewhere in the configuration you should be able to set compute capabilities for which to generate the code. If your TitanX is Maxwell it should be 5.2, if Pascal, then it should be 6.1

Bumblebee1964 · 2017-04-14T07:15:42Z

Thanks for your reply. I couldn't find any inconsistencies as CUDA 8.0 Toolkit was referenced in the projects. However when I copied the project to a new machine and installed CUDA 8.0, the problem was not apparent anymore. So I guess on my old machine, there are things broken.

matt32106 mentioned this issue Apr 8, 2017

[QA] why not all examples run out of the box? #5717

Open

Bumblebee1964 closed this as completed Jun 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exception when using CPP package with GPU (TitanX) with mlp() example #5633

Exception when using CPP package with GPU (TitanX) with mlp() example #5633

Bumblebee1964 commented Mar 30, 2017

ptrendx commented Mar 31, 2017

Bumblebee1964 commented Apr 14, 2017

Exception when using CPP package with GPU (TitanX) with mlp() example #5633

Exception when using CPP package with GPU (TitanX) with mlp() example #5633

Comments

Bumblebee1964 commented Mar 30, 2017

Environment info

Error Message:

Minimum reproducible example

Steps to reproduce

What have you tried to solve it?

ptrendx commented Mar 31, 2017

Bumblebee1964 commented Apr 14, 2017