fix naive engine for multi-threaded inference #15574

arcadiaphy · 2019-07-17T16:23:31Z

Description

In the cpp example of image classification, MXPredCreateMultiThread is used to perform multi-threaded inference with naive engine. The req_completed_ variable in naive engine is not thread-safe, which is fixed in this PR.

The problem is reported in #8966.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

apeforest · 2019-07-17T17:02:10Z

@anirudh2290 Could you please take a look?

marcoabreu · 2019-07-17T20:51:46Z

Thanks for the contribution, but MXNets frontend is not considered threadsafe in general. Thus, I'd prefer if we hold up with this fix.

I feel like we should rather do a proper analysis about the threadsafety of MXNet and then act upon that. Small patches won't really solve the problem but might possibly introduce further issues.

apeforest · 2019-07-17T22:34:28Z

src/engine/naive_engine.cc

@@ -238,7 +238,11 @@ class NaiveEngine final : public Engine {
    static_cast<NaiveEngine*>(engine)->req_completed_ = true;
  }
  // whether action is completed
-  bool req_completed_;
+#if DMLC_CXX11_THREAD_LOCAL
+  static thread_local bool req_completed_;


According to C++ standards, thread_local implies static.
https://stackoverflow.com/questions/22794382/are-c11-thread-local-variables-automatically-static

pengzhao-intel · 2019-07-17T23:54:27Z

@ZhennanQin could you help to take a review for the fix?

ZhennanQin · 2019-07-18T01:22:44Z

I just evaluated the thread-safe condition for NaiveEngine, and I agree with @marcoabreu that it's not enough with this point fix. There's too much data race when using MXNet in multi-thread usage, even for NaiveEngine, we need to take care all class member not only req_completed_. Creating private NaiveEngine for each thread is another option. So we need a whole picture plan for MXNet multi-thread support.

anirudh2290 · 2019-07-18T01:30:06Z

src/engine/naive_engine.cc

+  static thread_local bool req_completed_;
+#else
+  static MX_THREAD_LOCAL bool req_completed_;
+#endif


This introduces an additional overhead for thread local, and the number of ops in mxnet model can be in the scale of 100s. I am not sure we should add this without a build flag since we are adding overhead for single threaded use cases too.

Please not another build flag - at some point we're gonna spend more money on CI than the NASA :D

Could we make some benchmarks to determine the impact here?

I'm not that deep inside the engine code, so excuse me if that's a stupid question. But why are we declaring the internal variables of the engine as thread local instead of creating thread-local instantiations of the engines. Each thread gets its own engine (or multiple engines or a single engine is shared across threads, synchronized with locks) and we switch all static variables to private ones. The current design of the engine doesn't seem to have the creation of multiple instantiations in mind so we should fix that instead of making all the static variables threadlocal.

anirudh2290 · 2019-07-18T01:41:41Z

@marcoabreu @ZhennanQin I agree we need a bigger solution for the problem at hand.Having said that, Looks like the scope of the PR is limited to fix the existing multi threaded inference which is limited to the C Predict API which should be fine.

arcadiaphy · 2019-07-18T03:24:07Z

@apeforest @ZhennanQin @marcoabreu @anirudh2290
Multi-threaded inference is very common in deploying, so we need to have a solution on this scenario and provide a workable example for users. Let me take some time to sum up what I've known now.

Since threaded engine uses multi-thread scheduling for operator computation, so a natural idea is to utilize it for parallel inference which is covered in this proposal. There are two main problem for this kind of usage in mxnet:

The threaded engine is not thread-safe, so we need to restrict the computation pushing and result pulling in one thread, otherwise we'll have all kinds of errors in engine.
For computation pushing, one thread is OK, but when we pull the result using functions like asnumpy, it will block the thread and increase the latency.

Due to these problems and the lacking of a workable example, so I focus on the naive engine instead. The only example in mxnet now is cpp predict with MXPredCreateMultiThread API, which creates executors for each thread by sharing model parameters, a very seductive option with regard to both speed and memory footprint. In production environment, naive engine looks like a more usable method by restricting one inference to only one thread. Also, I think this way is the basis of other advanced methods, if you can't make it work in naive engine, then very likely you can't make it work in threaded engine too. There are two problems in this method too:

The naive engine is not thread-safe, which I want to fix in this PR. Without this fix, the cpp example is not usable at all. I agree that a more thorough analysis is needed.
Perhaps there are some issues in MKLDNN calling, I've found a very strange phenomenon which is mentioned in Multi-threaded inference broken with MKLDNN #15576. I hope that this can be fixed too since MKLDNN is so good.

arcadiaphy · 2019-07-19T03:06:12Z

@apeforest @anirudh2290
Do we really need req_completed_ in naive engine? I'm inclined to remove it now.

anirudh2290 · 2019-07-19T05:40:34Z

@arcadiaphy I guess that would be nice to explore. We can remove it only if we can make sure that all exec_funs pushed make a call to callback, since its original purpose is to prevent async calls being pushed.

marcoabreu · 2019-07-19T20:44:04Z

I understand that you're trying to keep the changes to a minimum, but judging from the static variable definition, it doesn't seem like the engine was designed to be threadsafe - or even have multiple instantiations. I'd prefer if we make the engine only access internal-variables, get rid of static ones (aka proper object oriented design) and add synchronization to the APIs.

But I don't really have a dog in that hunt, so take my opinion with a grain of salt here.

arcadiaphy · 2019-07-21T06:06:07Z

@anirudh2290 Yes, in order to prevent async call from being pushed, using a single req_completed_ is not enough. We can add the req_completed_ to NaiveOpr and check whether it's been set to true in complete callback.

Edit:
I used a simpler implementation to remove field req_completed_. Since cpu stream is dummy stream, so for cpu mode, the naive engine is now stateless not considering bool flag shutdown_phase_.

piyushghai · 2019-08-02T19:41:19Z

@apeforest @anirudh2290 Gentle ping for the review.

fix naive engine

d4c81b7

arcadiaphy added the Thread Safety label Jul 17, 2019

apeforest requested a review from anirudh2290 July 17, 2019 17:02

arcadiaphy mentioned this pull request Jul 17, 2019

Multi-threaded inference broken with MKLDNN #15576

Closed

apeforest reviewed Jul 17, 2019

View reviewed changes

anirudh2290 reviewed Jul 18, 2019

View reviewed changes

arcadiaphy added 2 commits July 21, 2019 15:34

remove thread local

033f792

trigger CI

f858c35

arcadiaphy force-pushed the pr_thread branch from 6a592b7 to f858c35 Compare July 22, 2019 02:22

anirudh2290 merged commit d225074 into apache:master Aug 21, 2019

arcadiaphy deleted the pr_thread branch December 5, 2019 10:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix naive engine for multi-threaded inference #15574

fix naive engine for multi-threaded inference #15574

arcadiaphy commented Jul 17, 2019

apeforest commented Jul 17, 2019

marcoabreu commented Jul 17, 2019

apeforest Jul 17, 2019

pengzhao-intel commented Jul 17, 2019

ZhennanQin commented Jul 18, 2019

anirudh2290 Jul 18, 2019

marcoabreu Jul 19, 2019 •

edited

Loading

anirudh2290 commented Jul 18, 2019

arcadiaphy commented Jul 18, 2019 •

edited

Loading

arcadiaphy commented Jul 19, 2019

anirudh2290 commented Jul 19, 2019

marcoabreu commented Jul 19, 2019 •

edited

Loading

arcadiaphy commented Jul 21, 2019 •

edited

Loading

piyushghai commented Aug 2, 2019

fix naive engine for multi-threaded inference #15574

fix naive engine for multi-threaded inference #15574

Conversation

arcadiaphy commented Jul 17, 2019

Description

Checklist

Essentials

Changes

Comments

apeforest commented Jul 17, 2019

marcoabreu commented Jul 17, 2019

apeforest Jul 17, 2019

Choose a reason for hiding this comment

pengzhao-intel commented Jul 17, 2019

ZhennanQin commented Jul 18, 2019

anirudh2290 Jul 18, 2019

Choose a reason for hiding this comment

marcoabreu Jul 19, 2019 • edited Loading

Choose a reason for hiding this comment

anirudh2290 commented Jul 18, 2019

arcadiaphy commented Jul 18, 2019 • edited Loading

arcadiaphy commented Jul 19, 2019

anirudh2290 commented Jul 19, 2019

marcoabreu commented Jul 19, 2019 • edited Loading

arcadiaphy commented Jul 21, 2019 • edited Loading

piyushghai commented Aug 2, 2019

marcoabreu Jul 19, 2019 •

edited

Loading

arcadiaphy commented Jul 18, 2019 •

edited

Loading

marcoabreu commented Jul 19, 2019 •

edited

Loading

arcadiaphy commented Jul 21, 2019 •

edited

Loading