-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MXNetError after first detection and recognition #415
Comments
another error that can happen is like this: terminate called after throwing an instance of 'dmlc::Error' Stack trace returned 9 entries: |
same problem, unpredictable occurence |
@diggerdu I solved the problem by ensuring that only a single thread calls the initialized MXNet model. If multiple threads call the same model, this kind of error would happen. To be more specific, in my server script I used flask and it, by default, enables multithreading to handle input requests. After I set multithreading parameter to false, everything works perfectly. You can also find some more information here: apache/mxnet#3946 Hope this will help you solve your problem. |
Thank you very mach @WIll-Xu35 |
Hi, @WIll-Xu35 , Thank you. |
Thank you @WIll-Xu35 , I'm struggling this whole afternoon because of this error :( app.run(threaded=False) |
大佬牛逼,给跪了 |
Nice job! |
感谢大佬,完美解决我的问题 |
Thank you!! It works for me!! |
Hi all,
I am trying to apply this repository to a server side face register and recognition service. I have tried to detect faces and generate embeddings for all the photos in a directory (contains 300 photos) and it worked fine. However, when I attach it to local server code, it can only do a single face detection and embedding generation. Once a second detection is called, it raises an error.
my configuration is CUDA 9.0, mxnet 1.3.0, cudnn7, python2.7
Detailed error messages here:
File "server.py", line 118, in login
login_res, message = face_verification(file_path, regis_path, username)
File "server.py", line 14, in face_verification
result, data = server_function.verify(embedding_dir, photo_dir, login_id)
File "/home/wenbin/project/mxnet_faceID/server_function.py", line 88, in verify
img_tmp = model.get_input(image)
File "/home/wenbin/project/mxnet_faceID/face_model.py", line 71, in get_input
ret = self.detector.detect_face(face_img, det_type = self.args.det)
File "/home/wenbin/project/mxnet_faceID/mtcnn_detector.py", line 493, in detect_face
output = self.LNet.predict(input_buf)
File "/home/wenbin/.local/lib/python2.7/site-packages/mxnet/model.py", line 717, in predict
o_list.append(o_nd[0:real_size].asnumpy())
File "/home/wenbin/.local/lib/python2.7/site-packages/mxnet/ndarray/ndarray.py", line 1894, in asnumpy
ctypes.c_size_t(data.size)))
File "/home/wenbin/.local/lib/python2.7/site-packages/mxnet/base.py", line 210, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
MXNetError: [11:00:54] src/operator/nn/./cudnn/cudnn_convolution-inl.h:156: Check failed: e == CUDNN_STATUS_SUCCESS (7 vs. 0) cuDNN: CUDNN_STATUS_MAPPING_ERROR
Stack trace returned 10 entries:
[bt] (0) /home/wenbin/mxnet/lib/libmxnet.so(dmlc::StackTraceabi:cxx11+0x5b) [0x7f1372ee4dcb]
[bt] (1) /home/wenbin/mxnet/lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f1372ee5938]
[bt] (2) /home/wenbin/mxnet/lib/libmxnet.so(mxnet::op::CuDNNConvolutionOp::Forward(mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocatormxnet::TBlob > const&, std::vector<mxnet::OpReqType, std::allocatormxnet::OpReqType > const&, std::vector<mxnet::TBlob, std::allocatormxnet::TBlob > const&)+0x389) [0x7f1377346829]
[bt] (3) /home/wenbin/mxnet/lib/libmxnet.so(void mxnet::op::ConvolutionComputemshadow::gpu(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocatormxnet::TBlob > const&, std::vector<mxnet::OpReqType, std::allocatormxnet::OpReqType > const&, std::vector<mxnet::TBlob, std::allocatormxnet::TBlob > const&)+0xbfc) [0x7f137733bbec]
[bt] (4) /home/wenbin/mxnet/lib/libmxnet.so(mxnet::exec::FComputeExecutor::Run(mxnet::RunContext, bool)+0x59) [0x7f13754883f9]
[bt] (5) /home/wenbin/mxnet/lib/libmxnet.so(+0x317c8d3) [0x7f13754348d3]
[bt] (6) /home/wenbin/mxnet/lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x8e5) [0x7f1375a92185]
[bt] (7) /home/wenbin/mxnet/lib/libmxnet.so(void mxnet::engine::ThreadedEnginePerDevice::GPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context, bool, mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>, std::shared_ptrdmlc::ManualEvent const&)+0xeb) [0x7f1375aa931b]
[bt] (8) /home/wenbin/mxnet/lib/libmxnet.so(std::_Function_handler<void (std::shared_ptrdmlc::ManualEvent), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock, bool)::{lambda()#3}::operator()() const::{lambda(std::shared_ptrdmlc::ManualEvent)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptrdmlc::ManualEvent&&)+0x4e) [0x7f1375aa958e]
[bt] (9) /home/wenbin/mxnet/lib/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptrdmlc::ManualEvent)> (std::shared_ptrdmlc::ManualEvent)> >::_M_run()+0x4a) [0x7f1375a9178a]
[11:00:54] src/resource.cc:262: Ignore CUDA Error [11:00:54] src/storage/./pooled_storage_manager.h:85: CUDA: an illegal memory access was encountered
Stack trace returned 10 entries:
[bt] (0) /home/wenbin/mxnet/lib/libmxnet.so(dmlc::StackTraceabi:cxx11+0x5b) [0x7f1372ee4dcb]
[bt] (1) /home/wenbin/mxnet/lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f1372ee5938]
[bt] (2) /home/wenbin/mxnet/lib/libmxnet.so(mxnet::storage::GPUPooledStorageManager::DirectFreeNoLock(mxnet::Storage::Handle)+0x95) [0x7f1375ab5815]
[bt] (3) /home/wenbin/mxnet/lib/libmxnet.so(mxnet::storage::GPUPooledStorageManager::DirectFree(mxnet::Storage::Handle)+0x3d) [0x7f1375ab81bd]
[bt] (4) /home/wenbin/mxnet/lib/libmxnet.so(mxnet::StorageImpl::DirectFree(mxnet::Storage::Handle)+0x68) [0x7f1375ab1418]
[bt] (5) /home/wenbin/mxnet/lib/libmxnet.so(std::_Function_handler<void (mxnet::RunContext), mxnet::resource::ResourceManagerImpl::ResourceTempSpace::~ResourceTempSpace()::{lambda(mxnet::RunContext)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext&&)+0xff) [0x7f1375b8090f]
[bt] (6) /home/wenbin/mxnet/lib/libmxnet.so(+0x37dfe01) [0x7f1375a97e01]
[bt] (7) /home/wenbin/mxnet/lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x8e5) [0x7f1375a92185]
[bt] (8) /home/wenbin/mxnet/lib/libmxnet.so(mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)+0x65) [0x7f1375aad085]
[bt] (9) /home/wenbin/mxnet/lib/libmxnet.so(mxnet::engine::ThreadedEngine::PushAsync(std::function<void (mxnet::RunContext, mxnet::engine::CallbackOnComplete)>, mxnet::Context, std::vector<mxnet::engine::Var*, std::allocatormxnet::engine::Var* > const&, std::vector<mxnet::engine::Var*, std::allocatormxnet::engine::Var* > const&, mxnet::FnProperty, int, char const*, bool)+0x1b0) [0x7f1375a98400]
Any help would be appreciated!
The text was updated successfully, but these errors were encountered: