Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guarantee synchronous creation of Workers under limited conditions? #10228

Closed
juj opened this issue Mar 28, 2024 · 8 comments · Fixed by web-platform-tests/wpt#45502
Closed
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest needs tests Moving the issue forward requires someone to write tests topic: workers

Comments

@juj
Copy link

juj commented Mar 28, 2024

What problem are you trying to solve?

Today, web has a limitation that spawning a new Worker(url) does not actually progress until the main JS thread yields back to the event loop. Historically, the rationale was given that this was so because the constructor of a Worker will perform a fetch to a potentially remote network location.

We are requesting that it would be guaranteed that whenever the URL passed to new Worker(url) represents an in-memory Blob (i.e. information already synchronously available in RAM), that the construction of a new Worker should be guaranteed to complete without needing to yield back to the event loop.

Rationale

In multithreaded WebAssembly and SharedArrayBuffer (SAB) users space, the delayed behavior of new Worker creates major challenges to correctly planning the resource utilization of a web page, in a wide variety of use cases that utilize best practices for multithreading techniques.

This challenge results in sites having to balance between gratuitously over-utilizing page resources, vs risking a deadlock in their programs.

It is our request that when the URL passed to a Worker represents an in-memory Blob (so no remote network request is needed), that the constructor of new Worker() should be guaranteed to complete without needing to yield back to the browser event loop.

That is, today, the following code will deadlock in a browser, but we wish that the standard would provide a guarantee that it should not do so.

a.html

<html><body><script>
fetch('a.js').then(response => response.blob()).then(blob => {
  let worker = new Worker(URL.createObjectURL(blob));
  let sab = new Uint8Array(new SharedArrayBuffer(16));
  worker.postMessage(sab);
  console.log('Waiting for Worker to finish');
  while(sab[0] != 1) /*wait to join with the result*/;
  console.log(`Worker finished. SAB: ${sab[0]}`);
});
</script></body></html>

a.js

onmessage = (e) => {
  console.log('Received SAB');
  e.data[0] = 1;
}

Today, browsers will deadlock on the while() loop, because the line new Worker() will not proceed to actually create the Worker until the main thread JS code yields back to the browser's event loop. We wish that this should not be the case, but the launch of a Worker should progress even while the main thread JS code is busy executing the while() loop.

Note that the presence of the while() loop in the above test case is contrived, to illustrate in minimal terms the problem. In this example it may read insensible to want to wait for new Worker() to finish, but in real-world use cases, there is a strong motivation to do so, explained further in below.

Why is this a problem (worth solving)?

There are several algorithms and problem spaces where utilizing a fork-join pattern is the most efficient and best practices pattern of structuring program code.

This may seem counterintuitive to developers who have learned the "sync=bad, async=good" mantra, but in multithreaded programming, the opposite can instead be true. To convince why this is the case, let's briefly look at a couple of examples.

Example 1: multithreaded rendering

In realtime interactive rendering applications, a scene graph is traversed to update and collect items to render in the 3D view at each requestAnimationFrame() call.

If the simulation contents are complex, instead of performing the update and collect step in the single main thread, performance and responsiveness can be greatly improved by splitting up the scene traversal across a work pool of threads. This represents a typical fork-join model of computation, where the main thread will fork off # of logical cores number of threads to quickly iterate through a scene graph, parallelizing both throughput and latency extremely well. In the join stage, the main JS thread will hold to wait until the worker thread(s) complete.

Ideally, when the program code needs to fork off computations for the first time around, it would be able to synchronously create the work pool it needs. Later rAF() callbacks would then be able to reuse this already populated pool.

Example 2: multithreaded garbage collection

A second example can be found in multithreaded Mark-and-Sweep garbage collection. In https://github.com/juj/emgc you can find an example of a multithreaded GC, to be used for example in compiling a C#, Java or Python VM into WebAssembly.

In such scenario, if the WebAssembly heap gets unlucky and runs out of memory on a malloc, the VM may need to trigger an on-demand GC to reclaim memory. To improve overall performance of the GC and responsiveness of the main thread, instead of doing the GC marking phase just on the main thread, it is desirable to do the GC marking phase with the help of multiple Workers.

But in order to achieve such a there-and-then GC marking, the main thread would need to synchronously fork off the marking phase to background Workers, and then join when the Workers finish.

Example 3: Parallel for

In some compiled languages, there is support for a parallel for construct, in which the for loop is sliced up into segments that are each processed in a background thread. Parallel for loops are desirable due to the programming simplicity, and the performance improvements of such for loops can be tremendous.


The above examples are valid best-practices use cases of synchronously handing off computations to background Workers, that will result in greatly improved performance and user responsiveness to the main JS thread (as opposed to having the main JS thread alone undertake such computations).

However, since the nature of these kind of fork-join computations is synchronous, today, the developer must have ensured ahead of time that the problematic pool of new Worker() steps have been "pre-cached" or "pre-pooled", since the Worker() constructor has that problem of not being able to complete synchronously.

This pre-pooling is exactly what WebAssembly/SharedArrayBuffer users have been doing so far - they generally warm up a pool of Workers in advance, so that when the time comes in their programs to perform any synchronous fork-join operations, they would be guaranteed to have all the Workers ready to synchronously receive commands.

Now, with the experience from the years that have passed, this kind of pre-pooling workaround scheme is starting to be seen considerably harmful for several reasons.

Pre-pooling Workers leads to worse web sites in the wild

On simple web sites, the pre-pooling workaround is typically simple enough to implement; but as sites and WebAssembly programs scale, the pre-pooling technique does not, and there are several problems that the pre-pooling workaround leads to. Pre-pooling is considered harmful at least for the following reasons:

  1. Pre-pooling pessimises site startup times. Because all the synchronous Workers typically need to be pre-pooled before WebAssembly Module instantiation time, the site must wait until all of the Workers in the pool have successfully launched.

  2. Pre-pooling risks web site deadlock. Developers need to manually estimate the number of such pre-pooled threads they will ever need at most. If they make a mistake in this analysis and underestimate the number of threads they need, then when their program code runs into a situation where they would need a new thread for synchronous computations, this cannot be achieved, and the site computation will need to halt.

  3. Pre-pooling leads to wasting site resources. Because of the above, developers generally err on the side of caution, choosing counts of pre-pooled threads that are overly conservative, which leads into allocating Workers that the site may never practically use.

  4. Unused Workers are hard to free up. Since a multithreaded WebAssembly application will need to ensure that it has all the Workers available it may need to, it can be very difficult to reason when unused Workers could be reclaimed. This results in monolithic "grow-only" type of pooling of Workers on these pages.

In summary, today developers who produce multithreaded WebAssembly sites have the challenge that they need to estimate how many pthread_create()s their whole codebase will ever simultaneously do. As codebases get larger, or when composing software from multiple authors, this may become impossible to do.

What would be gained by solving this?

If the example code a.html and a.js above would be guaranteed to work, then multithreaded WebAssembly applications would be able to spin up any needed Workers (or pools of Workers) on-demand, rather than needing to orchestrate their creation well in advance with oracle knowledge.

This would greatly ease development of such applications and reduce surface area of novel "web-only" bugs. It would remove the source of resource over-utilization on all multithreaded WebAssembly pages, and it would help sites to shrink down such Workers when likely not in use, without risking a page crash if such assumption did not 100% hold.

Sidenote: getting an exception if Worker cannot be created?

Today new Worker(url) further has an undesirable behavior that if the site already has used up all the Workers that it is limited to spawn, the new Worker operation will pause, and only progress when a previous Worker has been GCd first.

It would be desirable to have an API that would throw a catchable exception if there are currently no Workers available. This way program code would be able to make some other decisions at least (maybe perform the operation on the main thread instead). The current behavior risks causing sites to hang waiting for some computation that might never arrive.

@juj juj added addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest labels Mar 28, 2024
@annevk
Copy link
Member

annevk commented Mar 28, 2024

I don't think this works. Blobs cannot be read synchronously. I could maybe see this if we have a new API where you construct a dedicated worker from a string or a Uint8Array, but even then it seems rather fishy given that postMessage() is supposed to schedule (and might well be contending for priority with tasks the worker itself created).

@juj
Copy link
Author

juj commented Mar 28, 2024

The specific syntax of the API could be different. The general wish here is to have something that would however not require allowing the unsafe-eval CSP policy.

@juj
Copy link
Author

juj commented Mar 29, 2024

To motivate this use case better, here is another example code snippet b.html that reflects code that does work today:

b.html

<html><body><script>
fetch('b.js').then(response => response.blob()).then(blob => {
  let worker = new Worker(URL.createObjectURL(blob));
  worker.postMessage('init');
  worker.onmessage = () => {
    let sab = new Uint8Array(new SharedArrayBuffer(16));
    worker.postMessage(sab);
    console.log('Waiting for Worker to finish');
    while(sab[0] != 1) /*wait to join with the result*/; 
    console.log(`Worker finished. SAB: ${sab[0]}`);
  };
});
</script></body></html>

b.js

onmessage = (e) => {
  if (e.data == 'init') {
    console.log('Worker received SAB');
    postMessage(0);
  } else {
    console.log('Received SAB');
    e.data[0] = 1;
  }
}

The above code example does not hang, but does works as expected in all browsers.

The workaround scheme shown by b.html is what WebAssembly/SharedArrayBuffer users use today since a.html does not work. The above code pattern is what currently ships in all multithreaded Emscripten WebAssembly programs in the wild.

The difference between b.html and a.html is that in b.html, worker.postMessage() is able to progress even while the main thread is spinwaiting for the worker, whereas new Worker() in a.html is not able to.

But the troubling affairs with the workaround presented by b.html is that in that example, one must preallocate the Worker up front while still computing in an asynchronous context. After the synchronous part of the multithreaded program begins, the needed Workers must already be synchronously available.

Continuing the example 2 of the multithreaded GC from above: in that GC, I would like to perform the GC marking step quicker by using a pool of background Workers. But I would also like to spawn that GC marking Worker pool only on-demand when necessary, instead of requiring the whole WebAssembly site to have to delay its page startup until I first manage to spin up all the GC Workers (that may or may not even ever fire, depending on what the user does on the site!).

If the code example in a.html worked, then I would be able to only ever spawn the GC Workers pool at the first occassion that I need to GC, which would enable a kind of "only-pay-if-you-use-it" type of allocation of site resources.

So ideally, if new Worker(inMemoryBlob) was able to complete without needing to yield back to the main JS event loop, all of that wasteful preallocation of new Worker()s could be avoided, and multithreaded WebAssembly sites would not need to start up by creating an avalanche of Workers that they might only potentially need to ever use - that would be a big help to WebAssembly site startup performance overall.

@Kaiido
Copy link
Member

Kaiido commented Mar 30, 2024

I think there is some confusing parts in here.
Actually you are not asking for a synchronous creation of the Worker (which would be impossible as said by Anne), but on the contrary that it's created in a way it doesn't block the owner's thread.

Currently the specs for new Worker() do ask that run a worker is done in parallel. So, per the current specs, what you are doing is already supposed to be working as you expect, no matter where the Worker script is coming from. So I'd rather label it as [interop] rather than [addition/proposal].

@juj
Copy link
Author

juj commented Mar 30, 2024

Actually you are not asking for a synchronous creation of the Worker (which would be impossible as said by Anne), but on the contrary that it's created in a way it doesn't block the owner's thread.

Thanks for helping to clarify. Yeah, that wording reads completely accurate.

@annevk
Copy link
Member

annevk commented Apr 2, 2024

Hmm yeah, on reflection both examples should probably already work.

So I think next steps here are:

  1. Adding web-platform-tests coverage.
  2. Filing implementation bugs.

@annevk annevk added the needs tests Moving the issue forward requires someone to write tests label Apr 2, 2024
juj added a commit to juj/wpt that referenced this issue Apr 3, 2024
juj added a commit to juj/wpt that referenced this issue Apr 3, 2024
@juj
Copy link
Author

juj commented Apr 3, 2024

Had a stab at adding tests at web-platform-tests/wpt#45502 . LMK how that looks like.

@juj
Copy link
Author

juj commented Apr 9, 2024

Implementation bugs at

annevk pushed a commit to web-platform-tests/wpt that referenced this issue Apr 10, 2024
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Apr 19, 2024
…stMessage() to happen in parallel, a=testonly

Automatic update from web-platform-tests
Add tests for new Worker() and worker.postMessage() to happen in parallel

Resolves whatwg/html#10228.
--

wpt-commits: 2060611f666a08629a55d5d594a0188c49c9ef5e
wpt-pr: 45502
ErichDonGubler pushed a commit to erichdongubler-mozilla/firefox that referenced this issue Apr 22, 2024
…stMessage() to happen in parallel, a=testonly

Automatic update from web-platform-tests
Add tests for new Worker() and worker.postMessage() to happen in parallel

Resolves whatwg/html#10228.
--

wpt-commits: 2060611f666a08629a55d5d594a0188c49c9ef5e
wpt-pr: 45502
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest needs tests Moving the issue forward requires someone to write tests topic: workers
Development

Successfully merging a pull request may close this issue.

3 participants