[DISCUSS] Synchronization or Lightweight Sync to Async Abstraction #47

tqchen · 2020-05-15T14:41:37Z

One important nature of the WebGPU is that it is asynchronize, which means synchronization can only be done through callbacks.

Such API deisgn forces async semantics on the application, and it totally makes sense on a host execution environment such as javascript. For example, in our recent work to support machine learning, we simply wrap the async interface as a JS future and use javascript's await for synchronizaiton.

On a native only wasm application though, the async nature of the API puts quite a lot of burdens on the application itself. For example, it is extremely painful to directly use the webgpu-native C API to write a C/C++ application, because there is no built in async/await mechanism in the language.

I want to use the thread to gather thoughts along the direction as there are quite a few design choices. These design choices relates to both the WASM execution env, as well as the header defintion. I will list them below

C0: Only Keep Async C API, and rely on async/await support on the languages that compiles to wasm
- Explaination: it is certainly be easier to target the current C API using rust, because the language have native async/await support.
C1: Introduce Sync API, and sync support on native
- Most of the the downstream APIs(metal, vulkan) do have a synchronization primitive, and we could just expose them as an API
C2: Introduce Sync API, think about asynchization
- Same as C1, but we acknowledge the fact that async is the nature of WebGPU. Because the synchronization(blocking) happens in WASM to system boundary, there are certainly techiniques(with limitations) to turn the synchronization call to an async version. However, such feature either depends on the compiler, or the WASM VM(runtime). As a simple example, if we place a restriction that the async system call can only resume at the call-site. Then we could simply "freeze" the state of wasm vm, do other jobs, and then re-enter without any backup of the stack(because stack is already in the linear memory of the wasm), this removes the overhead of a pause/resume, but requires the support of the WASM runtime.

My current take is that C2 is the most ideal one, as it enables applications to write as native, but still deploys to most platforms, however, there are certain gaps in runtime(related to standardization) to make that happen.

Kangz · 2020-05-15T15:45:17Z

This is somewhat related to #18.

In native there is the possibility to wait synchronously with ProcessEvents (even though none of Dawn or wgpu implements it for now). However on the Web, promises are only allowed to become resolved when control is yielded back to the browser, so code that uses ProcessEvents won't work on the Web.

C2 is interesting, but it is more of a toolchain issue than a webgpu-native issue. For example WASM has an "asyncify" mechanism that allows yielding to the browser in the middle of C++ code, but it comes with some drawbacks.

Something we could do is to provide helper functions built against webgpu.h that make things like device creation appear synchronous in native. However they wouldn't work when compiled to WebAssembly. I'm not sure what more can be done given the constraints of the Web.

tqchen · 2020-05-16T16:31:39Z

Thanks @Kangz C2 is indeed tied to the toolchains and whether things can standardize a bit. I opened this thread in the WASI community to see if we can make some progress there WebAssembly/WASI#276

lachlansneff · 2020-05-16T16:45:13Z

I just want to point out that C++ does now (or will soon, I'm not sure what version it's in) have "async/await" support, in much the same way that Rust does.

grovesNL · 2020-05-16T20:41:56Z

For example, it is extremely painful to directly use the webgpu-native C API to write a C/C++ application, because there is no built in async/await mechanism in the language.

I don't think language-level support for async/await is necessary (even though it may be slightly more convenient). The Rust bindings in wgpu-rs initially used callbacks but switched to futures to make it easier to integrate with the rest of the Rust ecosystem. For example, C applications could use the the existing callbacks with state machines to track the status of the application (similar to how async/await is implemented in a lot of languages).

I think C1 and C2 are interesting but it seems difficult to hide async APIs behind sync APIs without:

decreasing portability with the web in some way
- e.g. busy-waiting for a callback to fire on native isn't going to translate well to the web, exposing additional synchronization primitives will move webgpu-headers further away from the WebGPU spec
decreasing flexibility
- e.g. processing multiple callbacks at once should be possible, like Promise.all in JS

I'd slightly prefer C0 and trying to find ways to make this easier for developers (like #18) in a way that works both natively and the web.

tqchen · 2020-05-16T21:56:55Z

agree with all of your points (async gives you most flexibity and portability).

The main goal of C2, is to explore if we could build some thing on top of C0(or in addition to), that have limited flexibility(to trade for simplicity and when such flexibility is less needed) but still with portability(runs on the web without busy waiting). Thinking about the fact that async programming are hard without async/await support and there are applications that can still be both simple and performant with certain amount of synchronization semantics.

lachlansneff · 2020-05-16T22:13:29Z

Honestly, the few things in webgpu that are async are nearly useless in a synchronous context. It doesn't make sense to block on an async map, because you'll be blocking through multiple frames potentially.

tqchen · 2020-05-16T22:58:49Z

To be clear, I am not advocating for sychronization in throughout the entire application. Async APIs are useful and should be used to overlap computations, allow multiple concurrent executions when possible.

On the other hand, there will always be application boundaries, where having some form of synchronization helps to simplify the programming model, especially in the case when the host application itself has a synchrous programming model

The following code is a quite typical example of a ML prediction pipeline

model = CreateModel()
# async
inputdata = data.copytogpu()
# async
output = data.predict(inputdata)
# wait until computation finishes
gpu(0).sync()
print(output.asnumpy())

In the above example, we can do all kinds of async callbacks tricks(and usually people are doing that) in predict(). But to the application user's POV the gpu(0).sync(); is the simplest way(and the common way) to get the async work back to the sync world.

Having such kind of primitive is important to ship a WebGPU wasm vm that binds to languages like python, and C without forcing the user into the async land. For the cases where the outside host is async in nature, then certainly the async version of API could and should be used.

grovesNL · 2020-05-17T13:31:04Z

I think it's already possible to implement sync() natively with busy-waiting (calling ProcessEvents in a loop), but this wouldn't work well on the web. If a VM wants to pipeline async operations internally to act as if the API is synchronous then I guess it's possible for the VM to perform this busy-waiting.

In general Python applications might also benefit from exposing the callbacks as Python futures (e.g. asyncio, concurrent), in which case the application/library could decide whether to run callbacks to completion (e.g. by busy-waiting if the executor allows that).

tqchen · 2020-05-17T16:05:57Z

We agree that don’t want busy waiting.

This is the whole point of C2. Because wasm natively runs on a VM, it could be possible to block the wasm(to the wasm POV) and turn that into an async Op, so the wasm VM itself can be async and won’t block.

lachlansneff · 2020-05-17T16:41:42Z

@tqchen The kind of async that you're suggesting is just blocking, in the same way that a blocking systemcall doesn't stop the kernel from doing other stuff while the operation completes.

tqchen · 2020-05-17T16:54:44Z

@lachlansneff great insight about he analogy of kernel and kernel threads:)

It is blocking to the wasm's POV, but not busy waiting. And such kind of blocking is implementable in the web, as long as the VM itself runs in an async environment, and to the web's pov, everything goes on in an async fashion (resources are spent on other async tasks that are not part of the particular wasm Instance).

The advantage of C2 is that from the wasm itself's POV, the programer does not need to worry about async programming if the goal is to just write simple programs that does not need concurrency within wasm.

The async API should still be the primarily one that is exposed to the wasm vm. Because wasm vm("kernel") is async, so it can effectively make use of the async version of the API.

Kangz · 2020-05-17T18:43:36Z

When designing an API for the Web, blocking behavior is disallowed. WebGL has gl.finish(), but it is very much the exception and will never happen again under the current set of guidelines for designing Web APIs. This means that WebGPU (the JS API) will not have blocking behavior, and it also is the reason why WebAssembly currently doesn't support blocking calls.

We should not add blocking calls to webgpu.h because it won't be possible to implement the same behavior when running in WebAssembly on the Web. It can be a shim with ProcessEvents when the runtime can block.

tqchen · 2020-05-17T19:04:35Z

We all agree to the fact that:

Wasm VM should not block(so that it can be implemented in the Web).
WebGPU JS API should not contain any blocking calls.

The main goal of C2 is a bit subtle, we want explore whether it is possible to still allow "blocking-style" calls to the wasm program's pov by translate that into a async pause of Wasm VM instead of busy waiting, specifically:

wasm program's pov: which runs inside the wasm vm, the program "blocks"
The JS engine and wasm VM's pov: the blocking call of wasm becomes an async call into the WebGPU JS API, and the callback will resume the execution of the wasm vm.

My main goal is not to push for such an API, but more or less to use the discussion thread to make us aware of such an option, and think about potential benefit it could bring (e.g. the simplicity of the program that compiles to wasm).

The code example below demonstrate the idea(it would need enhancement to the wasm execution environent):

// A blocking call to the wasm's pov
extern void wait_for_event();

void test() {
   wait_for_event();
}

// example javascript code that invokes the test.
async test() {
    imports = {
        env: {
          // The callback of wait_for_event will resume the execution of the wasm instance
          `wait_for_event`: async() => { await some_event(); }
        }
    };
    x = WebAssembly. instantiate(wasmSource, imports);
    await x.exports.test();
}

Kangz · 2020-05-17T19:28:18Z

I must be missing something, if this isn't about changing the shape of webgpu.h shouldn't this thread be on some WASM issue board instead?

Also I think we'll eventually want to expose some kind of platform-specific event / fd / ... object that gets called when ProcessEvents might run a callback. It's loosely related to this because if you can block on this event somehow in WASI then you could implement the blocking call you're suggesting with proper yielding instead of spin-looping.

tqchen · 2020-05-17T19:57:17Z

Well, one could argue that given synchronize can be implemented via proper yielding, it could be part of the webgpu addon api that exposes this function.

One would hope that webgpu native API can provide a unified API so that stand-alone applications can be built, and run on the web, and native wasm environment without changing any code.

Now, let us think about a scenario where a wasm app want to target webgpu. Without the synchronize API, the wasm app itself (written in C/C++) would have to deal with the callbacks properly, which is extremely hard if it is built in a standalone manner (without async/await support). Of course, we can build async to sync on top of certain runtime, (e.g in the case of JS). In that case, however, we would simply directly use of the WebGPU JS API instead of using webgpu-native, and the solution would be platform specific(won't run on standalone wasm vms).

If we have a platform agnostic "synchronize" as an addon API that can be somehow implemented in all platforms(by proper yielding on the web or redirect to blocking down stream APIs in vulkan/metal on the native). Then the standalone program that target webgpu native can be written once, and run on both the web, and wasm vm like environment like wasmtime.

tqchen · 2020-05-18T22:12:01Z

Related proposal to the wasm WebAssembly/design#1345

kainino0x · 2023-08-03T02:25:16Z

#199 will track changes to the C header for better async, so closing this issue to centralize tracking there.
Anything that's not part of the C API design is out of scope for this repo.

tqchen mentioned this issue May 15, 2020

[DISCUSS] Execution Environment for Asyncify Lightweight Synchronize System Calls WebAssembly/WASI#276

Open

kvark mentioned this issue May 15, 2020

wgpu backend gfx-rs/gfx#3027

Open

kainino0x added the async Asynchronous operations and callbacks label May 19, 2023

kainino0x closed this as not planned Won't fix, can't repro, duplicate, stale Aug 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DISCUSS] Synchronization or Lightweight Sync to Async Abstraction #47

[DISCUSS] Synchronization or Lightweight Sync to Async Abstraction #47

tqchen commented May 15, 2020 •

edited

Loading

Kangz commented May 15, 2020

tqchen commented May 16, 2020

lachlansneff commented May 16, 2020

grovesNL commented May 16, 2020

tqchen commented May 16, 2020

lachlansneff commented May 16, 2020

tqchen commented May 16, 2020 •

edited

Loading

grovesNL commented May 17, 2020

tqchen commented May 17, 2020 •

edited

Loading

lachlansneff commented May 17, 2020

tqchen commented May 17, 2020 •

edited

Loading

Kangz commented May 17, 2020

tqchen commented May 17, 2020 •

edited

Loading

Kangz commented May 17, 2020

tqchen commented May 17, 2020 •

edited

Loading

tqchen commented May 18, 2020

kainino0x commented Aug 3, 2023

[DISCUSS] Synchronization or Lightweight Sync to Async Abstraction #47

[DISCUSS] Synchronization or Lightweight Sync to Async Abstraction #47

Comments

tqchen commented May 15, 2020 • edited Loading

Kangz commented May 15, 2020

tqchen commented May 16, 2020

lachlansneff commented May 16, 2020

grovesNL commented May 16, 2020

tqchen commented May 16, 2020

lachlansneff commented May 16, 2020

tqchen commented May 16, 2020 • edited Loading

grovesNL commented May 17, 2020

tqchen commented May 17, 2020 • edited Loading

lachlansneff commented May 17, 2020

tqchen commented May 17, 2020 • edited Loading

Kangz commented May 17, 2020

tqchen commented May 17, 2020 • edited Loading

Kangz commented May 17, 2020

tqchen commented May 17, 2020 • edited Loading

tqchen commented May 18, 2020

kainino0x commented Aug 3, 2023

tqchen commented May 15, 2020 •

edited

Loading

tqchen commented May 16, 2020 •

edited

Loading

tqchen commented May 17, 2020 •

edited

Loading

tqchen commented May 17, 2020 •

edited

Loading

tqchen commented May 17, 2020 •

edited

Loading

tqchen commented May 17, 2020 •

edited

Loading