Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Current AsyncWorker scheduling conflict with threads used by other libraries #8

Closed
RReverser opened this issue Nov 10, 2022 · 8 comments · Fixed by #9
Closed

Current AsyncWorker scheduling conflict with threads used by other libraries #8

RReverser opened this issue Nov 10, 2022 · 8 comments · Fixed by #9

Comments

@RReverser
Copy link
Contributor

Currently, AsyncWorker uses the global pthread pool for scheduling its own tasks, taking threads as they become available. This works fine if AsyncWorker is the exclusive user of multithreading, but not when the addon is linked with other libraries that also rely on threads.

Here's the general problematic scenario:

  • addon A has async_method that spawns tasks onto an AsyncWorker
  • it links with library B and invokes B.some_method() that uses up to N threads itself
  • logically, user builds addon with PTHREAD_POOL_SIZE=N+1 (N threads for the library + 1 for the async op in the addon itself)

Now, if user invokes A.async_method(), the addon A will take one thread from the pool for the worker, N threads are left, the library takes those N threads, finishes the work, everything's fine.

But now another user has code that invokes async_method() on various inputs in parallel e.g. via Promise.all:

Promise.all([input1, input2]).map(input => A.async_method())

Now, the addon A will take 2 threads (one per each async_method() invocation) to create the AsyncWorkers, each of those workers invokes the B.some_method(), each of those parallel B.some_method() invocations tries to allocate N workers == 2*N total, but the pool only has (N+1)-2 == N-1 threads at this point! So each of them deadlocks.

User could increase the PTHREAD_POOL_SIZE to 2*(N+1), but then the same failure will happen if they happen to invoke 3 invocations of the library in parallel, and so on.

I actually observed this behaviour with deadlocks in a real-world addon I'm cross-compiling to Wasm with emnapi right now, and it took some time to figure out what's happening.

In my case I fixed this by monkey-patching the async methods to queue them one by one when compiled to Wasm, but, ideally, this should be handled at the emnapi level.

I think the only reliable solution here is for emnapi to have a separate, exclusive, thread pool, instead of taking as many threads as it wants from the global Emscripten pool. Then, it could either use the same JS logic or even some C++ implementation of work-stealing to execute max of EMNAPI_WORKER_POOL_SIZE in parallel, and no more. This would allow users to configure EMNAPI_WORKER_POOL_SIZE to the desired number of tasks, and PTHREAD_POOL_SIZE to EMNAPI_WORKER_POOL_SIZE * (N + 1) in advance, and avoid risk of deadlocks altogether.

And then, those users who know that they're not linking against any multithreaded libraries will be able to set both PTHREAD_POOL_SIZE and EMNAPI_WORKER_POOL to (number of CPU cores) and get max parallelisation as they can today.

@RReverser
Copy link
Contributor Author

RReverser commented Nov 10, 2022

Offtop: it was pretty funny how I found this library. I was actually googling for exactly same-named library of mine from 2017 - https://github.com/RReverser/emnapi - as I wanted to add some more functions to it, but instead found your version of emnapi and saw you made further progress than me, so now trying to use it instead 😅

@toyobayashi
Copy link
Owner

toyobayashi commented Nov 11, 2022

@RReverser
Thanks a lot for your feedback. I also learned a lot about wasm/wasi from your open source projects and blogs.

Now I understand the scenarios where problems can arise, but I still have questions on the following points:

I think the only reliable solution here is for emnapi to have a separate, exclusive, thread pool, instead of taking as many threads as it wants from the global Emscripten pool.

Do you mean emnapi should implement the worker scheduling mechanism and not depend on Emscripten pthreads?

This would allow users to configure EMNAPI_WORKER_POOL_SIZE to the desired number of tasks, and PTHREAD_POOL_SIZE to EMNAPI_WORKER_POOL_SIZE * (N + 1) in advance, and avoid risk of deadlocks altogether.

As a result, user still need to correctly configure EMNAPI_WORKER_POOL_SIZE and PTHREAD_POOL_SIZE by themselves. How is this different from letting users directly configure PTHREAD_POOL_SIZE to the size they want?

@toyobayashi
Copy link
Owner

So the purpose of EMNAPI_WORKER_POOL_SIZE is to correctly queue AsyncWorker that can spawn child threads in parallel?

@toyobayashi
Copy link
Owner

toyobayashi commented Nov 11, 2022

I did some work on feat-worker-pool branch, tested locally, works fine. Suggestion or PR welcome.

@RReverser
Copy link
Contributor Author

Thanks a lot for your feedback. I also learned a lot about wasm/wasi from your open source projects and blogs.

Thank you! You have no idea how much this feedback means to me. Every time surprised someone actually reads those 😅

Do you mean emnapi should implement the worker scheduling mechanism and not depend on Emscripten pthreads?

So the purpose of EMNAPI_WORKER_POOL_SIZE is to correctly queue AsyncWorker that can spawn child threads in parallel?

Yes and yes. That is, it should be a subset of PTHREAD_POOL_SIZE that is exclusively allocated and used only by emnapi and no one else to prevent deadlocks.

I did some work on feat-worker-pool branch, tested locally, works fine. Suggestion or PR welcome.

I actually won't be near the computer today, if you don't mind, I'll take a look on Monday. Thanks for the quick response!

@toyobayashi
Copy link
Owner

After b8535be, napi_queue_async_work still depends on pthread_create, but if EMNAPI_WORKER_POOL_SIZE predefined, it will check the emnapi specific running worker count, then decide whether spawn a thread or queue the async work. this works in my test, it will be nice if this actually work in you scenario.

Currently I have no idea how to implement async work without relying on pthreads. (emscripten-core/emscripten#17213)

I'll take a look on Monday

Looking forward to your code review and testing. Have a nice weekend.

@RReverser
Copy link
Contributor Author

Yeah at least with -DEMNAPI_WORKER_POOL_SIZE=1 it seems to do what my workaround did and doesn't deadlock, thanks!

That said, I'm still concerned about "piggy-backing" on the PThread object for couple of reasons:

  1. PThread JS object (as built-in JS libraries in general) is considered to be an implementation detail of Emscripten's pthread support, rather than a documented API intended for use by other developers. It doesn't change too often but it's still a risk to rely on it.
  2. Overriding built-ins like .push on array objects is not only problematic for composition, but also inhibits JIT optimisations in modern engines. Might be not too noticeable in this case, considering that the cost of creating new threads is already high enough, but still.

I think a safer and more efficient option would be have your own worker pool on the C/C++ side that uses the public pthread/std::thread API. Something like https://github.com/bshoshany/thread-pool might be a source of an inspiration or even used directly (this is a random search result that looks promising, there are other implementation of work-stealing threadpools).

At the very least, as an intermediate option to fix just the issue (2) from the above list, you could override PThread.getNewWorker + PThread.returnWorkerToPool for now, so that way you don't override array methods but only the higher-level API.

@toyobayashi
Copy link
Owner

Great thanks for your advice. I have ported libuv's thread pool implementation. It looks perfect!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants