-
Notifications
You must be signed in to change notification settings - Fork 43
-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DISCUSS] Synchronization or Lightweight Sync to Async Abstraction #47
Comments
This is somewhat related to #18. In native there is the possibility to wait synchronously with C2 is interesting, but it is more of a toolchain issue than a webgpu-native issue. For example WASM has an "asyncify" mechanism that allows yielding to the browser in the middle of C++ code, but it comes with some drawbacks. Something we could do is to provide helper functions built against |
Thanks @Kangz C2 is indeed tied to the toolchains and whether things can standardize a bit. I opened this thread in the WASI community to see if we can make some progress there WebAssembly/WASI#276 |
I just want to point out that C++ does now (or will soon, I'm not sure what version it's in) have "async/await" support, in much the same way that Rust does. |
I don't think language-level support for async/await is necessary (even though it may be slightly more convenient). The Rust bindings in wgpu-rs initially used callbacks but switched to futures to make it easier to integrate with the rest of the Rust ecosystem. For example, C applications could use the the existing callbacks with state machines to track the status of the application (similar to how async/await is implemented in a lot of languages). I think C1 and C2 are interesting but it seems difficult to hide async APIs behind sync APIs without:
I'd slightly prefer C0 and trying to find ways to make this easier for developers (like #18) in a way that works both natively and the web. |
agree with all of your points (async gives you most flexibity and portability). The main goal of C2, is to explore if we could build some thing on top of C0(or in addition to), that have limited flexibility(to trade for simplicity and when such flexibility is less needed) but still with portability(runs on the web without busy waiting). Thinking about the fact that async programming are hard without async/await support and there are applications that can still be both simple and performant with certain amount of synchronization semantics. |
Honestly, the few things in webgpu that are async are nearly useless in a synchronous context. It doesn't make sense to block on an async map, because you'll be blocking through multiple frames potentially. |
To be clear, I am not advocating for sychronization in throughout the entire application. Async APIs are useful and should be used to overlap computations, allow multiple concurrent executions when possible. On the other hand, there will always be application boundaries, where having some form of synchronization helps to simplify the programming model, especially in the case when the host application itself has a synchrous programming model The following code is a quite typical example of a ML prediction pipeline model = CreateModel()
# async
inputdata = data.copytogpu()
# async
output = data.predict(inputdata)
# wait until computation finishes
gpu(0).sync()
print(output.asnumpy()) In the above example, we can do all kinds of async callbacks tricks(and usually people are doing that) in Having such kind of primitive is important to ship a WebGPU wasm vm that binds to languages like python, and C without forcing the user into the async land. For the cases where the outside host is async in nature, then certainly the async version of API could and should be used. |
I think it's already possible to implement In general Python applications might also benefit from exposing the callbacks as Python futures (e.g. |
We agree that don’t want busy waiting. This is the whole point of C2. Because wasm natively runs on a VM, it could be possible to block the wasm(to the wasm POV) and turn that into an async Op, so the wasm VM itself can be async and won’t block. |
@tqchen The kind of async that you're suggesting is just blocking, in the same way that a blocking systemcall doesn't stop the kernel from doing other stuff while the operation completes. |
@lachlansneff great insight about he analogy of kernel and kernel threads:) It is blocking to the wasm's POV, but not busy waiting. And such kind of blocking is implementable in the web, as long as the VM itself runs in an async environment, and to the web's pov, everything goes on in an async fashion (resources are spent on other async tasks that are not part of the particular wasm Instance). The advantage of C2 is that from the wasm itself's POV, the programer does not need to worry about async programming if the goal is to just write simple programs that does not need concurrency within wasm. The async API should still be the primarily one that is exposed to the wasm vm. Because wasm vm("kernel") is async, so it can effectively make use of the async version of the API. |
When designing an API for the Web, blocking behavior is disallowed. WebGL has gl.finish(), but it is very much the exception and will never happen again under the current set of guidelines for designing Web APIs. This means that WebGPU (the JS API) will not have blocking behavior, and it also is the reason why WebAssembly currently doesn't support blocking calls. We should not add blocking calls to |
We all agree to the fact that:
The main goal of C2 is a bit subtle, we want explore whether it is possible to still allow "blocking-style" calls to the wasm program's pov by translate that into a async pause of Wasm VM instead of busy waiting, specifically:
My main goal is not to push for such an API, but more or less to use the discussion thread to make us aware of such an option, and think about potential benefit it could bring (e.g. the simplicity of the program that compiles to wasm). The code example below demonstrate the idea(it would need enhancement to the wasm execution environent): // A blocking call to the wasm's pov
extern void wait_for_event();
void test() {
wait_for_event();
} // example javascript code that invokes the test.
async test() {
imports = {
env: {
// The callback of wait_for_event will resume the execution of the wasm instance
`wait_for_event`: async() => { await some_event(); }
}
};
x = WebAssembly. instantiate(wasmSource, imports);
await x.exports.test();
} |
I must be missing something, if this isn't about changing the shape of Also I think we'll eventually want to expose some kind of platform-specific event / fd / ... object that gets called when ProcessEvents might run a callback. It's loosely related to this because if you can block on this event somehow in WASI then you could implement the blocking call you're suggesting with proper yielding instead of spin-looping. |
Well, one could argue that given One would hope that webgpu native API can provide a unified API so that stand-alone applications can be built, and run on the web, and native wasm environment without changing any code. Now, let us think about a scenario where a wasm app want to target webgpu. Without the synchronize API, the wasm app itself (written in C/C++) would have to deal with the callbacks properly, which is extremely hard if it is built in a standalone manner (without async/await support). Of course, we can build async to sync on top of certain runtime, (e.g in the case of JS). In that case, however, we would simply directly use of the WebGPU JS API instead of using webgpu-native, and the solution would be platform specific(won't run on standalone wasm vms). If we have a platform agnostic "synchronize" as an addon API that can be somehow implemented in all platforms(by proper yielding on the web or redirect to blocking down stream APIs in vulkan/metal on the native). Then the standalone program that target webgpu native can be written once, and run on both the web, and wasm vm like environment like wasmtime. |
Related proposal to the wasm WebAssembly/design#1345 |
#199 will track changes to the C header for better async, so closing this issue to centralize tracking there. |
One important nature of the WebGPU is that it is asynchronize, which means synchronization can only be done through callbacks.
Such API deisgn forces async semantics on the application, and it totally makes sense on a host execution environment such as javascript. For example, in our recent work to support machine learning, we simply wrap the async interface as a JS future and use javascript's await for synchronizaiton.
On a native only wasm application though, the async nature of the API puts quite a lot of burdens on the application itself. For example, it is extremely painful to directly use the webgpu-native C API to write a C/C++ application, because there is no built in async/await mechanism in the language.
I want to use the thread to gather thoughts along the direction as there are quite a few design choices. These design choices relates to both the WASM execution env, as well as the header defintion. I will list them below
My current take is that C2 is the most ideal one, as it enables applications to write as native, but still deploys to most platforms, however, there are certain gaps in runtime(related to standardization) to make that happen.
The text was updated successfully, but these errors were encountered: