NaiveEngine asynchronous error in multi-threading #8966

xinghedyc · 2017-12-06T09:23:23Z

when I use NaiveEngine in openmp multi-threading, I binded 2 executor on gpu(0) and gpu(1).
and do exe.forward parallelly. However I got some error when running the program:

[08:56:51] /data1/yuchendai/lpr_sdk/mxnet/dmlc-core/include/dmlc/logging.h:308: [08:56:51] src/engine/naive_engine.cc:169: Check failed: this->req_completed_ NaiveEngine only support synchronize Push so far

Stack trace returned 10 entries:
[bt] (0) ./bin/test(_ZN4dmlc15LogMessageFatalD1Ev+0x30) [0x45d614]
[bt] (1) /data1/yuchendai/lpr_sdk/mxnet/lib/libmxnet.so(_ZN5mxnet6engine11NaiveEngine9PushAsyncESt8functionIFvNS_10RunContextENS0_18CallbackOnCompleteEEENS_7ContextERKSt6vectorIPNS0_3VarESaISA_EESE_NS_10FnPropertyEiPKc+0x3b4) [0x7f3001f9f8f4]
[bt] (2) /data1/yuchendai/lpr_sdk/mxnet/lib/libmxnet.so(_ZN5mxnet6engine11NaiveEngine4PushEPNS0_3OprENS_7ContextEib+0xad) [0x7f3001f9fd4d]
[bt] (3) /data1/yuchendai/lpr_sdk/mxnet/lib/libmxnet.so(_ZN5mxnet4exec13GraphExecutor6RunOpsEbmm+0x584) [0x7f3002014dc4]
[bt] (4) /data1/yuchendai/lpr_sdk/mxnet/lib/libmxnet.so(MXExecutorForward+0x15) [0x7f3001fd8935]
[bt] (5) ./bin/test(_ZN5mxnet3cpp8Executor7ForwardEb+0x45) [0x47a0f3]
[bt] (6) ./bin/test(_ZN9PLATE_OCR16CustomizationOcr7forwardERKSsRSt6vectorINS_4WordESaIS4_EE+0x30b) [0x4793d9]
[bt] (7) ./bin/test(_ZNK9PLATE_OCR25CustomizationOcrInterface9RecognizeERKSsRNS_12RecognizeResEi+0x53) [0x45cbff]
[bt] (8) ./bin/test() [0x458c77]
[bt] (9) /lib64/libgomp.so.1(+0xdde5) [0x7f2ffd96ade5]

why the NaiveEngine has asynchronous operations?

The text was updated successfully, but these errors were encountered:

goswamig · 2018-03-10T00:07:05Z

Please add labels: "Feature request", "Thread Safety", "Feature request"

mseeger · 2018-08-22T11:50:48Z

@KellenSunderland

mseeger · 2018-08-29T06:45:55Z

I am getting the same error, when running some pretty benign code on a normal CPU instance (m4.xlarge, Ubuntu Deep Learning AMI). The code is binding several executors in sequence. The error occurs only with NaiveEngine. It does not occur on my Mac.

rosenrodt · 2018-08-30T04:58:09Z

It seems like the internal memory pool tries to call cudaFree() on the same resource assigned to two or more MXNet instances when running Naive Engine. The memory pool is supposed to be thread-local singleton so each MXNet instance spawned by each thread does not contend with each other the same resource, but when running in Naive Engine it apparently is not the case. I get all sorts of errors like CUDA invalid pointer error and eventually cuBLAS failure

apeforest · 2019-07-18T00:00:13Z

Why not just use the ThreadedEngine (default one) for multithreaded inference?

xinghedyc changed the title ~~NaiveEngine error in multi-threading~~ NaiveEngine asynchronous error in multi-threading Dec 6, 2017

sandeep-krishnamurthy added Feature request Thread Safety C++ Related to C++ labels Mar 11, 2018

marcoabreu added Backend Issues related to the backend of MXNet and removed C++ Related to C++ labels Jul 17, 2018

arcadiaphy mentioned this issue Jul 17, 2019

fix naive engine for multi-threaded inference #15574

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NaiveEngine asynchronous error in multi-threading #8966

NaiveEngine asynchronous error in multi-threading #8966

xinghedyc commented Dec 6, 2017 •

edited

Loading

goswamig commented Mar 10, 2018

mseeger commented Aug 22, 2018

mseeger commented Aug 29, 2018

rosenrodt commented Aug 30, 2018

apeforest commented Jul 18, 2019

NaiveEngine asynchronous error in multi-threading #8966

NaiveEngine asynchronous error in multi-threading #8966

Comments

xinghedyc commented Dec 6, 2017 • edited Loading

goswamig commented Mar 10, 2018

mseeger commented Aug 22, 2018

mseeger commented Aug 29, 2018

rosenrodt commented Aug 30, 2018

apeforest commented Jul 18, 2019

xinghedyc commented Dec 6, 2017 •

edited

Loading