Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Built-in polling thread #1871

Open
xq-tec opened this issue Aug 27, 2021 · 6 comments
Open

Built-in polling thread #1871

xq-tec opened this issue Aug 27, 2021 · 6 comments
Labels
area: api Issues related to API surface type: question Further information is requested

Comments

@xq-tec
Copy link

xq-tec commented Aug 27, 2021

Is your feature request related to a problem? Please describe.
The current interface for mapping a buffer is a little inconvenient: Calling BufferSlice::map_async() followed by device.poll(Maintain::Wait) blocks the thread, defeating the purpose of that future. Integrating device.poll() into an event loop, as suggested by the docs, is also suboptimal: Either you block the entire event loop waiting for the GPU to become idle, or you're back to wasting CPU time by polling continuously.

Describe the solution you'd like
In my application, I start a dedicated thread for polling the device. This thread is usually idle (parked) and is woken up only when a buffer is mapped.

fn map_buffer(
    poll_thread: &PollThread,
    buffer: &wgpu::BufferSlice,
    mode: wgpu::MapMode,
) -> impl std::future::Future<Output = Result<(), wgpu::BufferAsyncError>> + Send {
    let fut = buffer.map_async(mode);
    poll_thread.request_poll();
    fut
}

pub struct PollThread {
    thread: std::thread::JoinHandle<()>,
    exit_flag: Arc<AtomicBool>,
}

impl PollThread {
    pub fn start(device: Arc<wgpu::Device>) -> PollThread {
        let exit_flag = Arc::new(AtomicBool::new(false));
        let exit_flag_2 = exit_flag.clone();
        let thread = std::thread::spawn(move || poll_loop(device, exit_flag_2));
        PollThread { thread, exit_flag }
    }

    pub fn request_poll(&self) {
        self.thread.thread().unpark();
    }

    pub fn exit(&self) {
        self.exit_flag.store(true, Ordering::Release);
        self.thread.thread().unpark();
    }
}

impl Drop for PollThread {
    fn drop(&mut self) {
        self.exit();
    }
}

fn poll_loop(device: Arc<wgpu::Device>, exit_flag: Arc<AtomicBool>) {
    while !exit_flag.load(Ordering::Acquire) {
        device.poll(wgpu::Maintain::Wait);
        // If request_poll() was called while we were in device.poll(),
        // park() will return immediately.
        std::thread::park();
    }
}

This PollThread is roughly used as follows:

let device = Arc::new(device);
let poll_thread = PollThread::start(device.clone());
...
let buffer_slice = buffer.slice(..);
// No blocking, and idiomatic async behavior:
if let Ok(()) = map_buffer(&poll_thread, &buffer_slice, wgpu::MapMode::Read).await {
    ...
}

This works really well, especially with multiple futures which might map independent buffers concurrently, and has very low overhead. But it would be even better if it was integrated into wgpu itself: Each Device starts its own PollThread instance, and BufferSlice::map_async() calls PollThread::request_poll() for the device it belongs to. You could even make the starting of a PollThread optional at device creation.

Would this approach be acceptable for you? If yes, I could prepare a pull request.

@grovesNL
Copy link
Collaborator

Hi! 👋 we talked about some options to do this in gfx-rs/wgpu-rs#727 which used a background thread directly in the implementation of the Futures that we return here.

It's a bit complicated because some applications don't want us wgpu to automatically create a background thread. Instead we're hoping to integrate directly into the event loop somehow, which would hopefully avoid the extra background thread. We haven't figured out how the event loop integration would work yet though.

@grovesNL grovesNL added area: api Issues related to API surface type: question Further information is requested labels Aug 27, 2021
@xq-tec
Copy link
Author

xq-tec commented Aug 27, 2021

we talked about some options to do this in gfx-rs/wgpu-rs#727

There's an important difference between the approach in gfx-rs/wgpu-rs#727 and my approach: In my approach, the background thread is completely idle while no buffer mapping is needed, and yet there's no delay once a mapping is requested. In your approach, there are either long delays before the future completes (up to 100 ms in your code), or the background thread has to be woken up very frequently.

It's a bit complicated because some applications don't want us wgpu to automatically create a background thread.

It's easy to make the creation of the background thread optional:

impl Adapter {
    Adapter::request_device(..., launch_background_thread: bool) {...}
}

we're hoping to integrate directly into the event loop somehow

I don't see how this could be done in a sensible way. There are basically three options for this:

  1. Call device.poll(Maintain::Poll) whenever the event loop is idle. Bad, because it wastes an entire CPU core.
  2. Call device.poll(Maintain::Wait) whenever the event loop is idle. Even worse, because it wastes a CPU core when the GPU is idle, and blocks event processing for all windows while the GPU is busy.
  3. Call device.poll(Maintain::Poll) every X milliseconds. Also bad, because it introduces unnecessary delay for completing the buffer mapping future.

In contrast, there's basically no drawback to my proposed solution: It's almost zero-cost when enabled, and zero-cost when not enabled.

@grovesNL
Copy link
Collaborator

There's an important difference between the approach in gfx-rs/wgpu-rs#727 and my approach

Right, the approach in gfx-rs/wgpu-rs#727 was more of a proof of concept -- I think we would've wanted more control over how the device is polled (basically the same way you described, and directly integrated with pending futures).

@xq-tec
Copy link
Author

xq-tec commented Aug 28, 2021

I'll try to implement my idea, then we'll see how well it integrates with wgpu :)

@xq-tec
Copy link
Author

xq-tec commented Aug 31, 2021

Hi Josh, I've created PR #1891. Let me know what you think!

@cwfitzgerald
Copy link
Member

An update on this department. The webgpu.h group has been talking about adding a couple apis that would make this possible to implement in user-space. There is more that goes into this than what is here, but the proposal isn't final, so I haven't transposed the whole api, but we definitely want to make this possible.

struct WgpuFuture(...);

struct InstanceDescriptor {
    ...
    // Called once every time a WgpuFuture is created
    future_callback: Box<dyn Fn(WgpuFuture)>,
}

impl Device {
    fn waitAnyFutures(futures: &[WgpuFuture], timeout: Duration);
}

These two primitives will allow you to create a runtime which constantly polls in runtime.

There is also discussions about allowing export to Fds so integration with runtimes like tokio on certain platforms. More details to come.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: api Issues related to API surface type: question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants