Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-910] Multithreading inference. #12456

Merged
merged 20 commits into from
Sep 19, 2018
Merged

Conversation

zheng-da
Copy link
Contributor

@zheng-da zheng-da commented Sep 4, 2018

Description

MXNet Executor isn't thread-safe and thus the predictor is also not thread-safe. However, there is a use case that we want to parallel inference of the same model with multiple threads. In this case, we need to create multiple executors that share the same weight arrays. The current C predict API doesn't support this. As such, we add a new API to create multiple predictors for parallel inference and show the use of this new API in the example of image classification.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@marcoabreu
Copy link
Contributor

Shouldn't we hide this abstraction from our users? The engine should be smart enough to determine when to multithread - the APIs just have to support concurrent calls.

@zheng-da
Copy link
Contributor Author

zheng-da commented Sep 5, 2018

This PR is mainly for demo.
For your comments, I think it's necessary to expose the number of threads for parallel inference. The computation in an executor is parallelized itself. This gives another option for parallelization along with parallelism in an executor. It's hard for a system to figure. Users should try and decide.

@zheng-da zheng-da closed this Sep 5, 2018
@zheng-da zheng-da reopened this Sep 5, 2018
@zheng-da zheng-da changed the title [WIP] Multithreading inference. [MXNET-910] Multithreading inference. Sep 7, 2018
return EXIT_FAILURE;
}

std::string test_file(argv[1]);
int num_threads = std::atoi(argv[2]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example interface changed. Add a default value for num_threads?

* enough to keep `num_threads` predictors.
* \return 0 when success, -1 when failure.
*/
MXNET_DLL int MXPredCreateMultithread(const char* symbol_json_str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MultiThread?

ret->out_shapes = out_shapes;
ret->out_arrays = ret->exec->outputs();

if (!lazy) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this made lazy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fundamental problem here is that if we create multiple executors in the same thread (e.g., in the main thread), these executors will share the same temporary resources, which leads to race condition when these executors are used in different threads. To fix this problem, here we avoid creating executors when we create predictors in the main thread. The executors are actually created when the predictor is used in the worker thread for the first time. As long as the executor is always used in this worker thread, there won't be race condition.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fundamental problem here is that if we create multiple executors in the same thread (e.g., in the main thread), these executors will share the same temporary resources, which leads to race condition when these executors are used in different threads. To fix this problem, here we avoid creating executors when we create predictors in the main thread. The executors are actually created when the predictor is used in the worker thread for the first time. As long as the executor is always used in this worker thread, there won't be race condition.

If I use 10 different PredictorHandles created by
MXPredCreate() in the main thread and for each PredictorHandle calling MXPredSetInput(), MXPredForward() and MXPredGetOutput() functions to inference in 10 threads, Is it safe?
Thread can get one current available PredictorHandle to inference, therefore, one certain thread may get different PredictorHandle to inference. Is this safe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can give it a try. i'm not sure.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can give it a try. i'm not sure.

I got dead lock in MXPredSetInput(), MXPredForward() or MXPredGetOutput(), It seems it doesn't support.

@eric-haibin-lin eric-haibin-lin merged commit d8984e8 into apache:master Sep 19, 2018
anirudh2290 pushed a commit to anirudh2290/mxnet that referenced this pull request Sep 19, 2018
* add multi-threading inference.

* demo multi-threading inference.

* add new capi.

* make naive engine thread local.

* create an executor inside each thread.

* fix format.

* fix format.

* fix format.

* Revert "make naive engine thread local."

This reverts commit b9d844e.

* Update CAPI.

* add doc.

* fix lint.

* update example.

* update.

* fix.

* add check.

* fix.

* fix example.

* update name.

* update README.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants