Current AsyncWorker scheduling conflict with threads used by other libraries #8

RReverser · 2022-11-10T16:58:53Z

Currently, AsyncWorker uses the global pthread pool for scheduling its own tasks, taking threads as they become available. This works fine if AsyncWorker is the exclusive user of multithreading, but not when the addon is linked with other libraries that also rely on threads.

Here's the general problematic scenario:

addon A has async_method that spawns tasks onto an AsyncWorker
it links with library B and invokes B.some_method() that uses up to N threads itself
logically, user builds addon with PTHREAD_POOL_SIZE=N+1 (N threads for the library + 1 for the async op in the addon itself)

Now, if user invokes A.async_method(), the addon A will take one thread from the pool for the worker, N threads are left, the library takes those N threads, finishes the work, everything's fine.

But now another user has code that invokes async_method() on various inputs in parallel e.g. via Promise.all:

Promise.all([input1, input2]).map(input => A.async_method())

Now, the addon A will take 2 threads (one per each async_method() invocation) to create the AsyncWorkers, each of those workers invokes the B.some_method(), each of those parallel B.some_method() invocations tries to allocate N workers == 2*N total, but the pool only has (N+1)-2 == N-1 threads at this point! So each of them deadlocks.

User could increase the PTHREAD_POOL_SIZE to 2*(N+1), but then the same failure will happen if they happen to invoke 3 invocations of the library in parallel, and so on.

I actually observed this behaviour with deadlocks in a real-world addon I'm cross-compiling to Wasm with emnapi right now, and it took some time to figure out what's happening.

In my case I fixed this by monkey-patching the async methods to queue them one by one when compiled to Wasm, but, ideally, this should be handled at the emnapi level.

I think the only reliable solution here is for emnapi to have a separate, exclusive, thread pool, instead of taking as many threads as it wants from the global Emscripten pool. Then, it could either use the same JS logic or even some C++ implementation of work-stealing to execute max of EMNAPI_WORKER_POOL_SIZE in parallel, and no more. This would allow users to configure EMNAPI_WORKER_POOL_SIZE to the desired number of tasks, and PTHREAD_POOL_SIZE to EMNAPI_WORKER_POOL_SIZE * (N + 1) in advance, and avoid risk of deadlocks altogether.

And then, those users who know that they're not linking against any multithreaded libraries will be able to set both PTHREAD_POOL_SIZE and EMNAPI_WORKER_POOL to (number of CPU cores) and get max parallelisation as they can today.

The text was updated successfully, but these errors were encountered:

RReverser · 2022-11-10T17:01:11Z

Offtop: it was pretty funny how I found this library. I was actually googling for exactly same-named library of mine from 2017 - https://github.com/RReverser/emnapi - as I wanted to add some more functions to it, but instead found your version of emnapi and saw you made further progress than me, so now trying to use it instead 😅

toyobayashi · 2022-11-11T06:48:35Z

@RReverser
Thanks a lot for your feedback. I also learned a lot about wasm/wasi from your open source projects and blogs.

Now I understand the scenarios where problems can arise, but I still have questions on the following points:

I think the only reliable solution here is for emnapi to have a separate, exclusive, thread pool, instead of taking as many threads as it wants from the global Emscripten pool.

Do you mean emnapi should implement the worker scheduling mechanism and not depend on Emscripten pthreads?

This would allow users to configure EMNAPI_WORKER_POOL_SIZE to the desired number of tasks, and PTHREAD_POOL_SIZE to EMNAPI_WORKER_POOL_SIZE * (N + 1) in advance, and avoid risk of deadlocks altogether.

As a result, user still need to correctly configure EMNAPI_WORKER_POOL_SIZE and PTHREAD_POOL_SIZE by themselves. How is this different from letting users directly configure PTHREAD_POOL_SIZE to the size they want?

toyobayashi · 2022-11-11T07:06:23Z

So the purpose of EMNAPI_WORKER_POOL_SIZE is to correctly queue AsyncWorker that can spawn child threads in parallel?

…execution (#8)

toyobayashi · 2022-11-11T10:02:45Z

I did some work on feat-worker-pool branch, tested locally, works fine. Suggestion or PR welcome.

RReverser · 2022-11-11T11:55:31Z

Thanks a lot for your feedback. I also learned a lot about wasm/wasi from your open source projects and blogs.

Thank you! You have no idea how much this feedback means to me. Every time surprised someone actually reads those 😅

Do you mean emnapi should implement the worker scheduling mechanism and not depend on Emscripten pthreads?

So the purpose of EMNAPI_WORKER_POOL_SIZE is to correctly queue AsyncWorker that can spawn child threads in parallel?

Yes and yes. That is, it should be a subset of PTHREAD_POOL_SIZE that is exclusively allocated and used only by emnapi and no one else to prevent deadlocks.

I did some work on feat-worker-pool branch, tested locally, works fine. Suggestion or PR welcome.

I actually won't be near the computer today, if you don't mind, I'll take a look on Monday. Thanks for the quick response!

toyobayashi · 2022-11-11T14:12:47Z

After b8535be, napi_queue_async_work still depends on pthread_create, but if EMNAPI_WORKER_POOL_SIZE predefined, it will check the emnapi specific running worker count, then decide whether spawn a thread or queue the async work. this works in my test, it will be nice if this actually work in you scenario.

Currently I have no idea how to implement async work without relying on pthreads. (emscripten-core/emscripten#17213)

I'll take a look on Monday

Looking forward to your code review and testing. Have a nice weekend.

RReverser · 2022-11-14T22:04:25Z

Yeah at least with -DEMNAPI_WORKER_POOL_SIZE=1 it seems to do what my workaround did and doesn't deadlock, thanks!

That said, I'm still concerned about "piggy-backing" on the PThread object for couple of reasons:

PThread JS object (as built-in JS libraries in general) is considered to be an implementation detail of Emscripten's pthread support, rather than a documented API intended for use by other developers. It doesn't change too often but it's still a risk to rely on it.
Overriding built-ins like .push on array objects is not only problematic for composition, but also inhibits JIT optimisations in modern engines. Might be not too noticeable in this case, considering that the cost of creating new threads is already high enough, but still.

I think a safer and more efficient option would be have your own worker pool on the C/C++ side that uses the public pthread/std::thread API. Something like https://github.com/bshoshany/thread-pool might be a source of an inspiration or even used directly (this is a random search result that looks promising, there are other implementation of work-stealing threadpools).

At the very least, as an intermediate option to fix just the issue (2) from the above list, you could override PThread.getNewWorker + PThread.returnWorkerToPool for now, so that way you don't override array methods but only the higher-level API.

toyobayashi · 2022-11-15T16:34:47Z

Great thanks for your advice. I have ported libuv's thread pool implementation. It looks perfect!

toyobayashi added a commit that referenced this issue Nov 11, 2022

feat: add EMNAPI_WORKER_POOL_SIZE to limit max parallel async work …

b8535be

…execution (#8)

toyobayashi mentioned this issue Nov 15, 2022

feat: port libuv threadpool #9

Merged

toyobayashi closed this as completed in #9 Nov 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Current AsyncWorker scheduling conflict with threads used by other libraries #8

Current AsyncWorker scheduling conflict with threads used by other libraries #8

RReverser commented Nov 10, 2022

RReverser commented Nov 10, 2022 •

edited

Loading

toyobayashi commented Nov 11, 2022 •

edited

Loading

toyobayashi commented Nov 11, 2022

toyobayashi commented Nov 11, 2022 •

edited

Loading

RReverser commented Nov 11, 2022

toyobayashi commented Nov 11, 2022

RReverser commented Nov 14, 2022

toyobayashi commented Nov 15, 2022

Current AsyncWorker scheduling conflict with threads used by other libraries #8

Current AsyncWorker scheduling conflict with threads used by other libraries #8

Comments

RReverser commented Nov 10, 2022

RReverser commented Nov 10, 2022 • edited Loading

toyobayashi commented Nov 11, 2022 • edited Loading

toyobayashi commented Nov 11, 2022

toyobayashi commented Nov 11, 2022 • edited Loading

RReverser commented Nov 11, 2022

toyobayashi commented Nov 11, 2022

RReverser commented Nov 14, 2022

toyobayashi commented Nov 15, 2022

RReverser commented Nov 10, 2022 •

edited

Loading

toyobayashi commented Nov 11, 2022 •

edited

Loading

toyobayashi commented Nov 11, 2022 •

edited

Loading