Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISCUSS] Synchronization or Lightweight Sync to Async Abstraction #47

Closed
tqchen opened this issue May 15, 2020 · 17 comments
Closed

[DISCUSS] Synchronization or Lightweight Sync to Async Abstraction #47

tqchen opened this issue May 15, 2020 · 17 comments
Labels
async Asynchronous operations and callbacks

Comments

@tqchen
Copy link

tqchen commented May 15, 2020

One important nature of the WebGPU is that it is asynchronize, which means synchronization can only be done through callbacks.

Such API deisgn forces async semantics on the application, and it totally makes sense on a host execution environment such as javascript. For example, in our recent work to support machine learning, we simply wrap the async interface as a JS future and use javascript's await for synchronizaiton.

On a native only wasm application though, the async nature of the API puts quite a lot of burdens on the application itself. For example, it is extremely painful to directly use the webgpu-native C API to write a C/C++ application, because there is no built in async/await mechanism in the language.

I want to use the thread to gather thoughts along the direction as there are quite a few design choices. These design choices relates to both the WASM execution env, as well as the header defintion. I will list them below

  • C0: Only Keep Async C API, and rely on async/await support on the languages that compiles to wasm
    • Explaination: it is certainly be easier to target the current C API using rust, because the language have native async/await support.
  • C1: Introduce Sync API, and sync support on native
    • Most of the the downstream APIs(metal, vulkan) do have a synchronization primitive, and we could just expose them as an API
  • C2: Introduce Sync API, think about asynchization
    • Same as C1, but we acknowledge the fact that async is the nature of WebGPU. Because the synchronization(blocking) happens in WASM to system boundary, there are certainly techiniques(with limitations) to turn the synchronization call to an async version. However, such feature either depends on the compiler, or the WASM VM(runtime). As a simple example, if we place a restriction that the async system call can only resume at the call-site. Then we could simply "freeze" the state of wasm vm, do other jobs, and then re-enter without any backup of the stack(because stack is already in the linear memory of the wasm), this removes the overhead of a pause/resume, but requires the support of the WASM runtime.

My current take is that C2 is the most ideal one, as it enables applications to write as native, but still deploys to most platforms, however, there are certain gaps in runtime(related to standardization) to make that happen.

@Kangz
Copy link
Collaborator

Kangz commented May 15, 2020

This is somewhat related to #18.

In native there is the possibility to wait synchronously with ProcessEvents (even though none of Dawn or wgpu implements it for now). However on the Web, promises are only allowed to become resolved when control is yielded back to the browser, so code that uses ProcessEvents won't work on the Web.

C2 is interesting, but it is more of a toolchain issue than a webgpu-native issue. For example WASM has an "asyncify" mechanism that allows yielding to the browser in the middle of C++ code, but it comes with some drawbacks.

Something we could do is to provide helper functions built against webgpu.h that make things like device creation appear synchronous in native. However they wouldn't work when compiled to WebAssembly. I'm not sure what more can be done given the constraints of the Web.

@tqchen
Copy link
Author

tqchen commented May 16, 2020

Thanks @Kangz C2 is indeed tied to the toolchains and whether things can standardize a bit. I opened this thread in the WASI community to see if we can make some progress there WebAssembly/WASI#276

@lachlansneff
Copy link

I just want to point out that C++ does now (or will soon, I'm not sure what version it's in) have "async/await" support, in much the same way that Rust does.

@grovesNL
Copy link
Member

For example, it is extremely painful to directly use the webgpu-native C API to write a C/C++ application, because there is no built in async/await mechanism in the language.

I don't think language-level support for async/await is necessary (even though it may be slightly more convenient). The Rust bindings in wgpu-rs initially used callbacks but switched to futures to make it easier to integrate with the rest of the Rust ecosystem. For example, C applications could use the the existing callbacks with state machines to track the status of the application (similar to how async/await is implemented in a lot of languages).

I think C1 and C2 are interesting but it seems difficult to hide async APIs behind sync APIs without:

  • decreasing portability with the web in some way
    • e.g. busy-waiting for a callback to fire on native isn't going to translate well to the web, exposing additional synchronization primitives will move webgpu-headers further away from the WebGPU spec
  • decreasing flexibility
    • e.g. processing multiple callbacks at once should be possible, like Promise.all in JS

I'd slightly prefer C0 and trying to find ways to make this easier for developers (like #18) in a way that works both natively and the web.

@tqchen
Copy link
Author

tqchen commented May 16, 2020

agree with all of your points (async gives you most flexibity and portability).

The main goal of C2, is to explore if we could build some thing on top of C0(or in addition to), that have limited flexibility(to trade for simplicity and when such flexibility is less needed) but still with portability(runs on the web without busy waiting). Thinking about the fact that async programming are hard without async/await support and there are applications that can still be both simple and performant with certain amount of synchronization semantics.

@lachlansneff
Copy link

Honestly, the few things in webgpu that are async are nearly useless in a synchronous context. It doesn't make sense to block on an async map, because you'll be blocking through multiple frames potentially.

@tqchen
Copy link
Author

tqchen commented May 16, 2020

To be clear, I am not advocating for sychronization in throughout the entire application. Async APIs are useful and should be used to overlap computations, allow multiple concurrent executions when possible.

On the other hand, there will always be application boundaries, where having some form of synchronization helps to simplify the programming model, especially in the case when the host application itself has a synchrous programming model

The following code is a quite typical example of a ML prediction pipeline

model = CreateModel()
# async
inputdata = data.copytogpu()
# async
output = data.predict(inputdata)
# wait until computation finishes
gpu(0).sync()
print(output.asnumpy())

In the above example, we can do all kinds of async callbacks tricks(and usually people are doing that) in predict(). But to the application user's POV the gpu(0).sync(); is the simplest way(and the common way) to get the async work back to the sync world.

Having such kind of primitive is important to ship a WebGPU wasm vm that binds to languages like python, and C without forcing the user into the async land. For the cases where the outside host is async in nature, then certainly the async version of API could and should be used.

@grovesNL
Copy link
Member

I think it's already possible to implement sync() natively with busy-waiting (calling ProcessEvents in a loop), but this wouldn't work well on the web. If a VM wants to pipeline async operations internally to act as if the API is synchronous then I guess it's possible for the VM to perform this busy-waiting.

In general Python applications might also benefit from exposing the callbacks as Python futures (e.g. asyncio, concurrent), in which case the application/library could decide whether to run callbacks to completion (e.g. by busy-waiting if the executor allows that).

@tqchen
Copy link
Author

tqchen commented May 17, 2020

We agree that don’t want busy waiting.

This is the whole point of C2. Because wasm natively runs on a VM, it could be possible to block the wasm(to the wasm POV) and turn that into an async Op, so the wasm VM itself can be async and won’t block.

@lachlansneff
Copy link

@tqchen The kind of async that you're suggesting is just blocking, in the same way that a blocking systemcall doesn't stop the kernel from doing other stuff while the operation completes.

@tqchen
Copy link
Author

tqchen commented May 17, 2020

@lachlansneff great insight about he analogy of kernel and kernel threads:)

It is blocking to the wasm's POV, but not busy waiting. And such kind of blocking is implementable in the web, as long as the VM itself runs in an async environment, and to the web's pov, everything goes on in an async fashion (resources are spent on other async tasks that are not part of the particular wasm Instance).

The advantage of C2 is that from the wasm itself's POV, the programer does not need to worry about async programming if the goal is to just write simple programs that does not need concurrency within wasm.

The async API should still be the primarily one that is exposed to the wasm vm. Because wasm vm("kernel") is async, so it can effectively make use of the async version of the API.

@Kangz
Copy link
Collaborator

Kangz commented May 17, 2020

When designing an API for the Web, blocking behavior is disallowed. WebGL has gl.finish(), but it is very much the exception and will never happen again under the current set of guidelines for designing Web APIs. This means that WebGPU (the JS API) will not have blocking behavior, and it also is the reason why WebAssembly currently doesn't support blocking calls.

We should not add blocking calls to webgpu.h because it won't be possible to implement the same behavior when running in WebAssembly on the Web. It can be a shim with ProcessEvents when the runtime can block.

@tqchen
Copy link
Author

tqchen commented May 17, 2020

We all agree to the fact that:

  • Wasm VM should not block(so that it can be implemented in the Web).
  • WebGPU JS API should not contain any blocking calls.

The main goal of C2 is a bit subtle, we want explore whether it is possible to still allow "blocking-style" calls to the wasm program's pov by translate that into a async pause of Wasm VM instead of busy waiting, specifically:

  • wasm program's pov: which runs inside the wasm vm, the program "blocks"
  • The JS engine and wasm VM's pov: the blocking call of wasm becomes an async call into the WebGPU JS API, and the callback will resume the execution of the wasm vm.

My main goal is not to push for such an API, but more or less to use the discussion thread to make us aware of such an option, and think about potential benefit it could bring (e.g. the simplicity of the program that compiles to wasm).

The code example below demonstrate the idea(it would need enhancement to the wasm execution environent):

// A blocking call to the wasm's pov
extern void wait_for_event();

void test() {
   wait_for_event();
}
// example javascript code that invokes the test.
async test() {
    imports = {
        env: {
          // The callback of wait_for_event will resume the execution of the wasm instance
          `wait_for_event`: async() => { await some_event(); }
        }
    };
    x = WebAssembly. instantiate(wasmSource, imports);
    await x.exports.test();
}

@Kangz
Copy link
Collaborator

Kangz commented May 17, 2020

I must be missing something, if this isn't about changing the shape of webgpu.h shouldn't this thread be on some WASM issue board instead?

Also I think we'll eventually want to expose some kind of platform-specific event / fd / ... object that gets called when ProcessEvents might run a callback. It's loosely related to this because if you can block on this event somehow in WASI then you could implement the blocking call you're suggesting with proper yielding instead of spin-looping.

@tqchen
Copy link
Author

tqchen commented May 17, 2020

Well, one could argue that given synchronize can be implemented via proper yielding, it could be part of the webgpu addon api that exposes this function.

One would hope that webgpu native API can provide a unified API so that stand-alone applications can be built, and run on the web, and native wasm environment without changing any code.

Now, let us think about a scenario where a wasm app want to target webgpu. Without the synchronize API, the wasm app itself (written in C/C++) would have to deal with the callbacks properly, which is extremely hard if it is built in a standalone manner (without async/await support). Of course, we can build async to sync on top of certain runtime, (e.g in the case of JS). In that case, however, we would simply directly use of the WebGPU JS API instead of using webgpu-native, and the solution would be platform specific(won't run on standalone wasm vms).

If we have a platform agnostic "synchronize" as an addon API that can be somehow implemented in all platforms(by proper yielding on the web or redirect to blocking down stream APIs in vulkan/metal on the native). Then the standalone program that target webgpu native can be written once, and run on both the web, and wasm vm like environment like wasmtime.

@tqchen
Copy link
Author

tqchen commented May 18, 2020

Related proposal to the wasm WebAssembly/design#1345

@kainino0x kainino0x added the async Asynchronous operations and callbacks label May 19, 2023
@kainino0x
Copy link
Collaborator

#199 will track changes to the C header for better async, so closing this issue to centralize tracking there.
Anything that's not part of the C API design is out of scope for this repo.

@kainino0x kainino0x closed this as not planned Won't fix, can't repro, duplicate, stale Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
async Asynchronous operations and callbacks
Projects
None yet
Development

No branches or pull requests

5 participants