Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ThreadLocalStore not thread safe ? #571

Closed
anirudh2290 opened this issue Oct 3, 2019 · 1 comment
Closed

ThreadLocalStore not thread safe ? #571

anirudh2290 opened this issue Oct 3, 2019 · 1 comment

Comments

@anirudh2290
Copy link
Contributor

MXNet heavily uses ThreadLocalStore to keep the C APIs thread safe. I observed that ThreadLocalStore itself has thread safety issues probably because of gcc bug with thread_local ( https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60673, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81880 ) .

Please see the below minimal reproducible step:

#include <time.h>
#include <dmlc/logging.h>
#include <dmlc/thread_group.h>
#include <dmlc/omp.h>
#include <mxnet/c_api.h>
#include <mxnet/engine.h>
#include <mxnet/ndarray.h>
#include <dmlc/timer.h>
#include <cstdio>
#include <thread>
#include <chrono>
#include <vector>


struct A {
    std::vector<int*> a;
};

static std::mutex api_lock;
int ThreadSafetyTest() {
  std::unique_lock<std::mutex> lock(api_lock);
  A *ret = dmlc::ThreadLocalStore<A>::Get();
  std::vector<int*> tmp_inputs;
  tmp_inputs.reserve(10);
  for (int i = 0; i < 10; ++i) {
      tmp_inputs.push_back(new int(i));
  }
  ret->a.clear();
  ret->a.reserve(10);
  for (int  i = 0; i < 10; ++i) {
      ret->a.push_back(tmp_inputs[i]);
  }
  LOG(INFO) << dmlc::BeginPtr(ret->a);
}

int main(int argc, char const *argv[]) {
    auto func = [&](int num) {
        ThreadSafetyTest();
        };
    std::vector<std::thread> worker_threads(5);
    int count = 0;
    for (auto&& i : worker_threads) {
        i = std::thread(func, count);
        count++;
    }

    for (auto&& i : worker_threads) {
        i.join();
    }
}

If you look at the printed value for BeginPtr(ret->a) it has the same value between different threads, which means its points to the same address and can cause issues. This issue is not reproduced when using MX_THREAD_LOCAL

@anirudh2290
Copy link
Contributor Author

This may not specifically be an issue with thread safety, but because of the difference in the behavior of destructor between the two thread local implementations. Projects like MXNet depend on the lifetime of the thread_local data extending beyond the threads lifetime. I have opened a PR for dependent projects to choose the implementation: #573

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant