Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Exception when using CPP package with GPU (TitanX) with mlp() example #5633

Closed
Bumblebee1964 opened this issue Mar 30, 2017 · 2 comments
Closed

Comments

@Bumblebee1964
Copy link

For bugs or installation issues, please provide the following information.
The more information you provide, the more likely people will be able to help you.

Environment info

Windows 10, 64 bit
Compiler:

Package used (Python/R/Scala/Julia):
cpp
MXNet version:

Or if installed from source:
Latest (30-3-22017)
MXNet commit hash (git rev-parse HEAD):

If you are using python package, please provide

Python version and distribution:

If you are using R package, please provide

R sessionInfo():

Error Message:

Please paste the full error message, including stack trace.
Result in console window:
make the Executor
Training
epoch 0
[[13:08:2413:08:24] ] C:\Projects\MXNet\Base\dmlc-core\include\dmlc/logging.h:C:\Projects\MXNet\Base\dmlc-core\include\dmlc/logging.h300: [13:08:24] c:\projects\mxnet\base\mshadow\mshadow./cuda/tensor_gpu-inl.cuh:106: Check failed: err == cudaSuccess (8 vs. 0) Name: MapPlanKernel ErrStr:invalid device function:
300: [13:08:24] [13:08:24] c:\projects\mxnet\base\mshadow\mshadow./cuda/tensor_gpu-inl.cuh:106: Check failed: err == cudaSuccess (8 vs. 0) Name: MapPlanKernel ErrStr:invalid device functionC:\Projects\MXNet\Base\dmlc-core\include\dmlc/logging.h
:300: [13:08:24] c:\projects\mxnet\base\src\engine./threaded_engine.h:329: [13:08:24] c:\projects\mxnet\base\mshadow\mshadow./cuda/tensor_gpu-inl.cuh:106: Check failed: err == cudaSuccess (8 vs. 0) Name: MapPlanKernel ErrStr:invalid device function
An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.
[13:08:24] C:\Projects\MXNet\Base\dmlc-core\include\dmlc/logging.h:300: [13:08:24] c:\projects\mxnet\base\src\engine./threaded_engine.h:329: [13:08:24] c:\projects\mxnet\base\mshadow\mshadow./cuda/tensor_gpu-inl.cuh:106: Check failed: err == cudaSuccess (8 vs. 0) Name: MapPlanKernel ErrStr:invalid device function
An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Minimum reproducible example

if you are using your own code, please provide a short script that reproduces the error.

Steps to reproduce

or if you are running standard examples, please provide the commands you have run that lead to the error.

  1. Tried to create a Windows 2015 solution with c++ project. As it is not supplied (as was in the old separate cpp-package). Used the example file mlp.cpp. Hell of a job to get everything right als environment variables and include paths
  2. When run with context CPU, it works as should. When run with GPU device 0 (a GT640, compute capability 2.1 graphics card) it works. When run with GPU device 1 (a NVidia TitanX, board) it crashes. Probably when copying the ndarray to the GPU.
  3. With the old cpp-package I had the opposite, graphics card did not work, Titan X did work:) No clue on the cause of the problem with the graphics card in this particular case.

What have you tried to solve it?

  1. Tried to debug it. Somewhere awaiting the copy to the GPU the exception is caused. Due to multiple threads couldn't find the real place the eception is generated
  2. Looked at all issues as envrironment variables
  3. In contrast to the previous version of the c-api / cpp package, it is quite time consuming to fix everything. Maybe someone with skills can get it to the same level is the previous version?

Has anyone been able to run the cpp examples with the new mxnet version and c-api on a windows machine?

@ptrendx
Copy link
Member

ptrendx commented Mar 31, 2017

The "invalid device function" error suggests you do not generate the code for the proper GPU compute capability. I don't have experience with Visual Studio, but somewhere in the configuration you should be able to set compute capabilities for which to generate the code. If your TitanX is Maxwell it should be 5.2, if Pascal, then it should be 6.1

@Bumblebee1964
Copy link
Author

Thanks for your reply. I couldn't find any inconsistencies as CUDA 8.0 Toolkit was referenced in the projects. However when I copied the project to a new machine and installed CUDA 8.0, the problem was not apparent anymore. So I guess on my old machine, there are things broken.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants