Add `create_*_pipeline_async()` #3794

kpreid · 2023-05-22T04:29:29Z

The WebGPU specification includes

and their use is recommended to avoid blocking upon pipeline creation. There are currently no corresponding functions in wgpu; presumably there should be.

The text was updated successfully, but these errors were encountered:

JMS55 · 2023-05-27T02:58:37Z

How would we go about implementing this on vk/dx12/metal backends? A dedicated OS-thread for each pipeline?

PJB3005 · 2023-05-28T13:05:23Z

The best solution would probably require wgpu to use a thread pool, since spawning OS threads for individual jobs like that might have some decent overhead.

If so I'd definitely prefer if wgpu had a way to disable its own thread pool. My understanding is that this is an API that is mostly useful for browser usage, since WebGPU in a browser doesn't yet have any way to do multithreading. In a native context I should just be able to call create_*_pipeline() myself from my own thread pool.

notgull · 2023-05-28T14:35:47Z

How is this function implemented in the browser? Do they use thread pools for this?

kpreid · 2023-05-28T16:47:09Z

I wrote the original issue simply as “here is a discrepancy with the spec” without further thought, but here are some further thoughts:

My use of wgpu targets web and desktop (as is probably desirable for many games). Therefore, I want to be able to have good performance (no event loop hiccups) on web, which means that I must use async. Conveniently, the web environment already “has an async executor”, inherently, and wgpu-on-web targets that executor (unavoidably, insofar as it has anything async at all).

This together with @PJB3005's point suggests that the solution here and for similar situations might be for wgpu to allow plugging in an executor in some fashion; to be handed a “spawn_blocking()” function that it can use when some operation is supposed to be async but this is not natively supported. That way, wgpu would not need to manage its own thread pool.

Certainly this could be done by a layer on top of wgpu, but it is annoying to have to design and maintain additional code just to get what WebGPU specifies to be possible. Also, in particular, if wgpu had an executor (or thread pool) available to it, then it could make map_async() much more straightforward to use — right now it is awkward to use in a truly async fashion without potentially missing wakeups.

Dinnerbone · 2023-05-31T19:54:28Z

These would be very useful for us, we bulk create a lot of APIs and it's a significant cost on some machines to do that concurrently - especially with webgl backend. I've been tempted to try and write a hypothetical "DeviceExt::create_render_pipeline_bulk()" method, but async would solve it much better.

cwfitzgerald · 2023-06-01T05:28:34Z

We definitely don't want to utilize or make threads on our own. In line with the other async functions, a simple native implementation of this api would be calling the standard create_*_pipeline then return a already ready future.

This is a hard decision to make, as it's very hard to paper over the differences between native and web.

cwfitzgerald · 2023-06-29T21:16:42Z

So we've been musing about this same problem in the webgpu.h standardization meetings and we have come up with a possible solution that we're asking for feedback on. The rough C solution is here but I will translate this to the rust api:

type Task = Box<dyn FnOnce() + Send>;
type TaskCallback = Box<dyn Fn(Task) + Send + Sync>

// Maybe actually the device
struct InstanceDescriptor {
    ...
    // Callback which will be called when the implementation wants to do work on another thread.
    // If this is not provided, the implementation will not do any work on any threads.
    //
    // The callback will be called with the task that the runtime wants to do on a thread.
    // This task should be spawned onto a threadpool, immediately invoked inside the callback, or otherwise
    // made to execute. 
    //
    // It should be assumed that the work spawned on this callback will be of substantial time (1ms+) and pure compute.
    task_executor: Option<TaskCallback>,
    ...
}

impl Device {
    // On webgpu will call createRenderPipeline.
    // 
    // On native will:
    // - If allow_async is false will create the render pipeline inside the call.
    // - If allow_async is true, the implementation is allowed (but not required) to spawn a
    //   job on the task callback to do the work of compilation if such a callback exists. This leads to
    //   less predicable performance but increased overall performance as compilation is parallelized.
    fn create_render_pipeline(&self, desc: RenderPipelineDescriptor, allow_async: bool) -> RenderPipeline;

    // On webgpu will call createRenderPipelineAsync.
    // 
    // On native will:
    // - Spawn a job on the instance's `task_executor` if it exists to generate the pipeline. Otherwise:
    // - Create the render pipeline inside the call.
    async fn create_render_pipeline_async(&self, desc: RenderPipelineDescriptor) -> RenderPipeline;
}

This api should allow people to use arbitrary integrations:

let desc = InstanceDescriptor {
    ...
    task_executor: Some(Box::new(|task| tokio::spawn_blocking(task)))
}

let desc = InstanceDescriptor {
    ...
    task_executor: Some(Box::new(|task| rayon::spawn(task)))
}

let my_fancy_threadpool_spawner: Arc<T> = ...;
let desc = InstanceDescriptor {
    ...
    task_executor: Some(Box::new(move |task| my_fancy_threadpool_spawner.spawn(task)))
}

Looking forward to people's thoughts on this. This kind of design will also open the door to other possible optimizations like this.

jimblandy · 2023-06-29T21:38:06Z

Why do we have to worry about this at all? Why can't the user just

put their Device in an Arc or used scoped threads or whatever
create their own thread or get one from a pool however they like
call Device::create_compute_pipeline there, and
send it back to the thread that needs it when it's done?

jimblandy · 2023-06-29T21:40:00Z

In other words - we have this thread-safe API, so the whole point should be to make the user deal with that stuff. @kpreid, can't you just write your own create_compute_pipeline_async? It's not like wgpu is any better at threading than you are.

cwfitzgerald · 2023-06-29T22:17:48Z

There's a couple considerations here that are pushing towards having internal handling:

Unifying the behavior of create_compile_pipeline_async on both wasm and native. The async version is the preferred version on web and this would make it consistent on native.
Dawn really wants to be able to make the non-async version of pipeline creation do as much work in parallel as possible. We want to have a C backend on top of webgpu.h, and this having this exposed means that you can hook them into your own thread pool.

The question of "why do we care at all" is still a good one.

kainino0x · 2023-06-29T22:38:45Z

It pretty much boils down to the WASM implementations. For Jim's userspace solution to work on WASM:

There needs to be multithreading the Web API (or the WASM binding implementation needs to proxy the pipeline creation call back to the device's thread)
To actually get any benefit out of it, the browser must implement a multithreading API and also have multiple GPU-process threads, either 1:1 with JS threads or with its own thread pool and scheduler, such that the pipeline creation doesn't block other work

Additionally, on WASM, the extra thread needed to initiate that pipeline creation is useless - the actual parallelization is happening in the GPU process, so an extra JS thread is wasted overhead. And JS threads are very expensive compared to native threads. So it works fine if you have that thread already, but it's detrimental if you didn't actually need it.

Hence I think it's best for the default experience in native to match the experience in JS (or other remoting implementations) closely where possible.

jimblandy · 2023-06-30T00:17:11Z

Okay - I understand what I wasn't getting before.

The WebGPU create*PipelineAsync functions expose parallelism not otherwise available to web content: at least in today's browsers, content cannot just fork off another thread, call GPUDevice.create*Pipeline (sans Async) , and send the pipeline back to the caller once it's usable - but create*PipelineAsync essentially gets you the same effect.
That means that this new parallelism can only be offered to programs using wgpu on the web if wgpu's web backend calls those functions.
That means that wgpu's own API must include those functions.
Since wgpu wants to provide a common API for web and native, wgpu's native API must now support these functions (even though native-only code should always prefer to just take care of the matter itself, as I suggested above).
Connor's proposal assumes that native wgpu is just going to fork off a thread, and lets wgpu's user set the policy for doing so, in a way that should only impact setup code.

jimblandy · 2023-06-30T00:23:06Z

Additionally, on WASM, the extra thread needed to initiate that pipeline creation is useless - the actual parallelization is happening in the GPU process, so an extra JS thread is wasted overhead.

Right - a separate thread in the content process invites the use of a separate thread in the GPU process, but doesn't require it, so it's useless.

nical · 2023-07-06T14:53:57Z

I like the general idea. How would a user of the API know when a task is done?

cwfitzgerald · 2023-09-19T18:11:44Z

The user would need to have their own signalling as part of the function provided to the hook.

Elabajaba · 2024-02-20T16:50:20Z

Just to add a reason for why this is needed on native, Metal is weird and will block on creating pipelines unless you pass in a callback at pipeline creation time. (bevy's async pipeline compilation ran into this with wgpu's existing create_*_pipeline(), and it ended up being quite a bit slower to try to create metal pipelines asynchronously/in parallel).

Ralith · 2024-03-05T23:47:07Z

VK_KHR_deferred_host_operations may be interesting as API design prior art and/or a backend on which this might be implemented.

teoxoy added type: enhancement New feature or request help required We need community help to make this happen. area: api Issues related to API surface labels May 24, 2023

JMS55 mentioned this issue May 27, 2023

Parallel pipeline compliation bevyengine/bevy#8224

Closed

JMS55 mentioned this issue Nov 30, 2023

Async pipeline compilation bevyengine/bevy#10812

Merged

Wumpf added this to the WebGPU Specification V1 milestone Jul 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `create_*_pipeline_async()` #3794

Add `create_*_pipeline_async()` #3794

kpreid commented May 22, 2023

JMS55 commented May 27, 2023

PJB3005 commented May 28, 2023

notgull commented May 28, 2023

kpreid commented May 28, 2023

Dinnerbone commented May 31, 2023 •

edited

Loading

cwfitzgerald commented Jun 1, 2023

cwfitzgerald commented Jun 29, 2023 •

edited

Loading

jimblandy commented Jun 29, 2023 •

edited

Loading

jimblandy commented Jun 29, 2023

cwfitzgerald commented Jun 29, 2023

kainino0x commented Jun 29, 2023

jimblandy commented Jun 30, 2023 •

edited

Loading

jimblandy commented Jun 30, 2023

nical commented Jul 6, 2023

cwfitzgerald commented Sep 19, 2023

Elabajaba commented Feb 20, 2024

Ralith commented Mar 5, 2024

Add create_*_pipeline_async() #3794

Add create_*_pipeline_async() #3794

Comments

kpreid commented May 22, 2023

JMS55 commented May 27, 2023

PJB3005 commented May 28, 2023

notgull commented May 28, 2023

kpreid commented May 28, 2023

Dinnerbone commented May 31, 2023 • edited Loading

cwfitzgerald commented Jun 1, 2023

cwfitzgerald commented Jun 29, 2023 • edited Loading

jimblandy commented Jun 29, 2023 • edited Loading

jimblandy commented Jun 29, 2023

cwfitzgerald commented Jun 29, 2023

kainino0x commented Jun 29, 2023

jimblandy commented Jun 30, 2023 • edited Loading

jimblandy commented Jun 30, 2023

nical commented Jul 6, 2023

cwfitzgerald commented Sep 19, 2023

Elabajaba commented Feb 20, 2024

Ralith commented Mar 5, 2024

Add `create_*_pipeline_async()` #3794

Add `create_*_pipeline_async()` #3794

Dinnerbone commented May 31, 2023 •

edited

Loading

cwfitzgerald commented Jun 29, 2023 •

edited

Loading

jimblandy commented Jun 29, 2023 •

edited

Loading

jimblandy commented Jun 30, 2023 •

edited

Loading