[MXNET-910] Multithreading inference. #12456

zheng-da · 2018-09-04T22:03:24Z

Description

MXNet Executor isn't thread-safe and thus the predictor is also not thread-safe. However, there is a use case that we want to parallel inference of the same model with multiple threads. In this case, we need to create multiple executors that share the same weight arrays. The current C predict API doesn't support this. As such, we add a new API to create multiple predictors for parallel inference and show the use of this new API in the example of image classification.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

marcoabreu · 2018-09-04T22:08:03Z

Shouldn't we hide this abstraction from our users? The engine should be smart enough to determine when to multithread - the APIs just have to support concurrent calls.

zheng-da · 2018-09-05T00:57:51Z

This PR is mainly for demo.
For your comments, I think it's necessary to expose the number of threads for parallel inference. The computation in an executor is parallelized itself. This gives another option for parallelization along with parallelism in an executor. It's hard for a system to figure. Users should try and decide.

piiswrong · 2018-09-07T23:52:08Z

example/image-classification/predict-cpp/image-classification-predict.cc

    return EXIT_FAILURE;
  }

  std::string test_file(argv[1]);
+  int num_threads = std::atoi(argv[2]);


Example interface changed. Add a default value for num_threads?

piiswrong · 2018-09-07T23:54:01Z

include/mxnet/c_predict_api.h

+ *   enough to keep `num_threads` predictors.
+ * \return 0 when success, -1 when failure.
+ */
+MXNET_DLL int MXPredCreateMultithread(const char* symbol_json_str,


MultiThread?

This reverts commit b9d844e.

eric-haibin-lin · 2018-09-18T22:58:41Z

src/c_api/c_predict_api.cc

    ret->out_shapes = out_shapes;
-    ret->out_arrays = ret->exec->outputs();
+
+    if (!lazy) {


Why is this made lazy?

The fundamental problem here is that if we create multiple executors in the same thread (e.g., in the main thread), these executors will share the same temporary resources, which leads to race condition when these executors are used in different threads. To fix this problem, here we avoid creating executors when we create predictors in the main thread. The executors are actually created when the predictor is used in the worker thread for the first time. As long as the executor is always used in this worker thread, there won't be race condition.

The fundamental problem here is that if we create multiple executors in the same thread (e.g., in the main thread), these executors will share the same temporary resources, which leads to race condition when these executors are used in different threads. To fix this problem, here we avoid creating executors when we create predictors in the main thread. The executors are actually created when the predictor is used in the worker thread for the first time. As long as the executor is always used in this worker thread, there won't be race condition.

If I use 10 different PredictorHandles created by
MXPredCreate() in the main thread and for each PredictorHandle calling MXPredSetInput(), MXPredForward() and MXPredGetOutput() functions to inference in 10 threads, Is it safe?
Thread can get one current available PredictorHandle to inference, therefore, one certain thread may get different PredictorHandle to inference. Is this safe?

you can give it a try. i'm not sure.

you can give it a try. i'm not sure.

I got dead lock in MXPredSetInput(), MXPredForward() or MXPredGetOutput(), It seems it doesn't support.

* add multi-threading inference. * demo multi-threading inference. * add new capi. * make naive engine thread local. * create an executor inside each thread. * fix format. * fix format. * fix format. * Revert "make naive engine thread local." This reverts commit b9d844e. * Update CAPI. * add doc. * fix lint. * update example. * update. * fix. * add check. * fix. * fix example. * update name. * update README.

zheng-da requested review from anirudh2290 and szha as code owners September 4, 2018 22:03

zheng-da closed this Sep 5, 2018

zheng-da reopened this Sep 5, 2018

zheng-da force-pushed the multithread branch from 84cd54b to aaaa725 Compare September 7, 2018 17:35

zheng-da changed the title ~~[WIP] Multithreading inference.~~ [MXNET-910] Multithreading inference. Sep 7, 2018

piiswrong suggested changes Sep 8, 2018

View reviewed changes

zheng-da added 20 commits September 11, 2018 15:58

add multi-threading inference.

46d3128

demo multi-threading inference.

b6850c7

add new capi.

7da4254

make naive engine thread local.

8cce6f3

create an executor inside each thread.

7e6ddfc

fix format.

e93fb72

fix format.

acdd78e

fix format.

d5ef9cd

Revert "make naive engine thread local."

00b47bf

This reverts commit b9d844e.

Update CAPI.

20b7392

add doc.

899d324

fix lint.

da579e2

update example.

d1e4912

update.

c64041e

fix.

d5560fe

add check.

811e998

fix.

9376c85

fix example.

1a01342

update name.

4c9c477

update README.

c21ba35

zheng-da force-pushed the multithread branch from 937e139 to c21ba35 Compare September 11, 2018 23:02

piiswrong approved these changes Sep 18, 2018

View reviewed changes

eric-haibin-lin reviewed Sep 18, 2018

View reviewed changes

eric-haibin-lin merged commit d8984e8 into apache:master Sep 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MXNET-910] Multithreading inference. #12456

[MXNET-910] Multithreading inference. #12456

zheng-da commented Sep 4, 2018 •

edited

Loading

marcoabreu commented Sep 4, 2018

zheng-da commented Sep 5, 2018

piiswrong Sep 7, 2018

piiswrong Sep 7, 2018

eric-haibin-lin Sep 18, 2018

zheng-da Sep 19, 2018

JohnLee168 Feb 28, 2019

zheng-da Feb 28, 2019

JohnLee168 Mar 4, 2019

[MXNET-910] Multithreading inference. #12456

[MXNET-910] Multithreading inference. #12456

Conversation

zheng-da commented Sep 4, 2018 • edited Loading

Description

Checklist

Essentials

Changes

Comments

marcoabreu commented Sep 4, 2018

zheng-da commented Sep 5, 2018

piiswrong Sep 7, 2018

Choose a reason for hiding this comment

piiswrong Sep 7, 2018

Choose a reason for hiding this comment

eric-haibin-lin Sep 18, 2018

Choose a reason for hiding this comment

zheng-da Sep 19, 2018

Choose a reason for hiding this comment

JohnLee168 Feb 28, 2019

Choose a reason for hiding this comment

zheng-da Feb 28, 2019

Choose a reason for hiding this comment

JohnLee168 Mar 4, 2019

Choose a reason for hiding this comment

zheng-da commented Sep 4, 2018 •

edited

Loading