Built-in polling thread #1871

xq-tec · 2021-08-27T11:50:56Z

Is your feature request related to a problem? Please describe.
The current interface for mapping a buffer is a little inconvenient: Calling BufferSlice::map_async() followed by device.poll(Maintain::Wait) blocks the thread, defeating the purpose of that future. Integrating device.poll() into an event loop, as suggested by the docs, is also suboptimal: Either you block the entire event loop waiting for the GPU to become idle, or you're back to wasting CPU time by polling continuously.

Describe the solution you'd like
In my application, I start a dedicated thread for polling the device. This thread is usually idle (parked) and is woken up only when a buffer is mapped.

fn map_buffer(
    poll_thread: &PollThread,
    buffer: &wgpu::BufferSlice,
    mode: wgpu::MapMode,
) -> impl std::future::Future<Output = Result<(), wgpu::BufferAsyncError>> + Send {
    let fut = buffer.map_async(mode);
    poll_thread.request_poll();
    fut
}

pub struct PollThread {
    thread: std::thread::JoinHandle<()>,
    exit_flag: Arc<AtomicBool>,
}

impl PollThread {
    pub fn start(device: Arc<wgpu::Device>) -> PollThread {
        let exit_flag = Arc::new(AtomicBool::new(false));
        let exit_flag_2 = exit_flag.clone();
        let thread = std::thread::spawn(move || poll_loop(device, exit_flag_2));
        PollThread { thread, exit_flag }
    }

    pub fn request_poll(&self) {
        self.thread.thread().unpark();
    }

    pub fn exit(&self) {
        self.exit_flag.store(true, Ordering::Release);
        self.thread.thread().unpark();
    }
}

impl Drop for PollThread {
    fn drop(&mut self) {
        self.exit();
    }
}

fn poll_loop(device: Arc<wgpu::Device>, exit_flag: Arc<AtomicBool>) {
    while !exit_flag.load(Ordering::Acquire) {
        device.poll(wgpu::Maintain::Wait);
        // If request_poll() was called while we were in device.poll(),
        // park() will return immediately.
        std::thread::park();
    }
}

This PollThread is roughly used as follows:

let device = Arc::new(device);
let poll_thread = PollThread::start(device.clone());
...
let buffer_slice = buffer.slice(..);
// No blocking, and idiomatic async behavior:
if let Ok(()) = map_buffer(&poll_thread, &buffer_slice, wgpu::MapMode::Read).await {
    ...
}

This works really well, especially with multiple futures which might map independent buffers concurrently, and has very low overhead. But it would be even better if it was integrated into wgpu itself: Each Device starts its own PollThread instance, and BufferSlice::map_async() calls PollThread::request_poll() for the device it belongs to. You could even make the starting of a PollThread optional at device creation.

Would this approach be acceptable for you? If yes, I could prepare a pull request.

The text was updated successfully, but these errors were encountered:

grovesNL · 2021-08-27T12:12:03Z

Hi! 👋 we talked about some options to do this in gfx-rs/wgpu-rs#727 which used a background thread directly in the implementation of the Futures that we return here.

It's a bit complicated because some applications don't want us wgpu to automatically create a background thread. Instead we're hoping to integrate directly into the event loop somehow, which would hopefully avoid the extra background thread. We haven't figured out how the event loop integration would work yet though.

xq-tec · 2021-08-27T13:22:18Z

we talked about some options to do this in gfx-rs/wgpu-rs#727

There's an important difference between the approach in gfx-rs/wgpu-rs#727 and my approach: In my approach, the background thread is completely idle while no buffer mapping is needed, and yet there's no delay once a mapping is requested. In your approach, there are either long delays before the future completes (up to 100 ms in your code), or the background thread has to be woken up very frequently.

It's a bit complicated because some applications don't want us wgpu to automatically create a background thread.

It's easy to make the creation of the background thread optional:

impl Adapter {
    Adapter::request_device(..., launch_background_thread: bool) {...}
}

we're hoping to integrate directly into the event loop somehow

I don't see how this could be done in a sensible way. There are basically three options for this:

Call device.poll(Maintain::Poll) whenever the event loop is idle. Bad, because it wastes an entire CPU core.
Call device.poll(Maintain::Wait) whenever the event loop is idle. Even worse, because it wastes a CPU core when the GPU is idle, and blocks event processing for all windows while the GPU is busy.
Call device.poll(Maintain::Poll) every X milliseconds. Also bad, because it introduces unnecessary delay for completing the buffer mapping future.

In contrast, there's basically no drawback to my proposed solution: It's almost zero-cost when enabled, and zero-cost when not enabled.

grovesNL · 2021-08-27T13:59:54Z

There's an important difference between the approach in gfx-rs/wgpu-rs#727 and my approach

Right, the approach in gfx-rs/wgpu-rs#727 was more of a proof of concept -- I think we would've wanted more control over how the device is polled (basically the same way you described, and directly integrated with pending futures).

xq-tec · 2021-08-28T07:21:24Z

I'll try to implement my idea, then we'll see how well it integrates with wgpu :)

xq-tec · 2021-08-31T20:14:17Z

Hi Josh, I've created PR #1891. Let me know what you think!

cwfitzgerald · 2023-06-29T22:03:23Z

An update on this department. The webgpu.h group has been talking about adding a couple apis that would make this possible to implement in user-space. There is more that goes into this than what is here, but the proposal isn't final, so I haven't transposed the whole api, but we definitely want to make this possible.

struct WgpuFuture(...);

struct InstanceDescriptor {
    ...
    // Called once every time a WgpuFuture is created
    future_callback: Box<dyn Fn(WgpuFuture)>,
}

impl Device {
    fn waitAnyFutures(futures: &[WgpuFuture], timeout: Duration);
}

These two primitives will allow you to create a runtime which constantly polls in runtime.

There is also discussions about allowing export to Fds so integration with runtimes like tokio on certain platforms. More details to come.

grovesNL added area: api Issues related to API surface type: question Further information is requested labels Aug 27, 2021

xq-tec mentioned this issue Aug 31, 2021

[WIP] Background thread for automatic device polling #1891

Closed

raphlinus mentioned this issue Sep 28, 2023

Strategy for robust dynamic memory, readback, and async linebender/vello#366

Open

Wumpf mentioned this issue Oct 22, 2024

DownloadBuffer: Improve usability and documentation #6426

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Built-in polling thread #1871

Built-in polling thread #1871

xq-tec commented Aug 27, 2021

grovesNL commented Aug 27, 2021

xq-tec commented Aug 27, 2021 •

edited

Loading

grovesNL commented Aug 27, 2021

xq-tec commented Aug 28, 2021

xq-tec commented Aug 31, 2021

cwfitzgerald commented Jun 29, 2023

Built-in polling thread #1871

Built-in polling thread #1871

Comments

xq-tec commented Aug 27, 2021

grovesNL commented Aug 27, 2021

xq-tec commented Aug 27, 2021 • edited Loading

grovesNL commented Aug 27, 2021

xq-tec commented Aug 28, 2021

xq-tec commented Aug 31, 2021

cwfitzgerald commented Jun 29, 2023

xq-tec commented Aug 27, 2021 •

edited

Loading