Skip to content

Add asyncify setup#1

Merged
reeselevine merged 23 commits into
webgpu-integrationfrom
webgpu-asyncify
Apr 3, 2026
Merged

Add asyncify setup#1
reeselevine merged 23 commits into
webgpu-integrationfrom
webgpu-asyncify

Conversation

@reeselevine

Copy link
Copy Markdown
Owner

Opening a separate PR targeting the other one for now, since I think there are enough changes here to make it worth a separate review/discussion. This adds support for using ASYNCIFY, which theoretically allows the WebGPU backend to work on more systems while JSPI development continues.

  • This PR actually creates four builds of llama.cpp, single-thread and multi-thread versions of JSPI and ASYNCIFY. We can decide if we want to keep all of them, although I'm actually surprised by how small the WASM blobs all are (~5 mb for the ASYNCIFY builds)
  • JSPI builds are chosen if JSPI is available. I found that on Chrome on my M3, the JSPI build was ~10% faster when running Llama-3.2-1B-Instruct-Q4_0.gguf. The JSPI WASM blobs are also ~60% smaller than the ASYNCIFY ones.
  • I did run into quite an annoying issue around using EXPORT_ALL in the build, so I moved all the necessary methods to EXPORTED_FUNCTIONS. See issue I opened with emscripten here.

I found that there are different issues/discussion points on different browsers, so here's some of that information

Chrome

As in the other PR, JSPI works well here, and ASYNCIFY is 10% slower with the WebGPU backend. But otherwise, no other issues.

Safari

Unfortunately, using the WebGPU backend leads to the model returning gibberish (although it runs pretty quickly in doing so). I'm not sure if this is an issue with the llama.cpp WebGPU backend, or a bug in Safari's WebGPU implementation (WebKit). I'll note that WebLLM, which has a WebGPU backend, does work, but that runs through JavaScript directly, not WASM, so there could be differences in the handling there.

Firefox

The llama.cpp WebGPU backend currently doesn't work here due to Firefox's WebGPU compiler not handling some of the program constructs we use. I've opened an issue for that here, so hopefully that can be resolved and allow testing here.

Otherwise, the multi-thread CPU backend seems to work fine on all the browsers, even with the WebGPU backend otherwise enabled in the builds.

@reeselevine

Copy link
Copy Markdown
Owner Author

With the current setup we should have stable support on many laptops and mobile devices

@reeselevine reeselevine merged commit 808f388 into webgpu-integration Apr 3, 2026
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant