Add asyncify setup by reeselevine · Pull Request #1 · reeselevine/wllama

reeselevine · 2026-01-20T07:04:02Z

Opening a separate PR targeting the other one for now, since I think there are enough changes here to make it worth a separate review/discussion. This adds support for using ASYNCIFY, which theoretically allows the WebGPU backend to work on more systems while JSPI development continues.

This PR actually creates four builds of llama.cpp, single-thread and multi-thread versions of JSPI and ASYNCIFY. We can decide if we want to keep all of them, although I'm actually surprised by how small the WASM blobs all are (~5 mb for the ASYNCIFY builds)
JSPI builds are chosen if JSPI is available. I found that on Chrome on my M3, the JSPI build was ~10% faster when running Llama-3.2-1B-Instruct-Q4_0.gguf. The JSPI WASM blobs are also ~60% smaller than the ASYNCIFY ones.
I did run into quite an annoying issue around using EXPORT_ALL in the build, so I moved all the necessary methods to EXPORTED_FUNCTIONS. See issue I opened with emscripten here.

I found that there are different issues/discussion points on different browsers, so here's some of that information

Chrome

As in the other PR, JSPI works well here, and ASYNCIFY is 10% slower with the WebGPU backend. But otherwise, no other issues.

Safari

Unfortunately, using the WebGPU backend leads to the model returning gibberish (although it runs pretty quickly in doing so). I'm not sure if this is an issue with the llama.cpp WebGPU backend, or a bug in Safari's WebGPU implementation (WebKit). I'll note that WebLLM, which has a WebGPU backend, does work, but that runs through JavaScript directly, not WASM, so there could be differences in the handling there.

Firefox

The llama.cpp WebGPU backend currently doesn't work here due to Firefox's WebGPU compiler not handling some of the program constructs we use. I've opened an issue for that here, so hopefully that can be resolved and allow testing here.

Otherwise, the multi-thread CPU backend seems to work fine on all the browsers, even with the WebGPU backend otherwise enabled in the builds.

reeselevine · 2026-04-03T19:05:00Z

With the current setup we should have stable support on many laptops and mobile devices

Add asyncify setup

06f5537

reeselevine mentioned this pull request Jan 20, 2026

WebGPU Integration (continued) ngxson/wllama#201

Closed

reeselevine added 22 commits January 22, 2026 13:31

Fixes for builds with ASSERTIONS=1

819c8d5

Update to faster llama.cpp webgpu

65bb53b

Fix asyncify vs. jspi path choice

b92aa33

Update to support qwen3.5

24f66ed

Update to support safari/firefox

c91277e

Update llama.cpp

c867db0

update llama.cpp version

0d389c7

Fix bug

340c5e6

Update emdawnwebgpu

532b66f

try lower submit count

0b24107

extreme throttling

a862732

new

05c5bd2

New batching with throttling

949fe5a

start bisecting

bcf4f34

next config

b7a362e

next config

7e75768

next config

14acac0

model screens

67b437b

back to 1 in flight

3b33798

Try 2 batches of 16

1c49e9a

UPdate llama.cpp

b68a099

Merge branch 'ios-debug2' into webgpu-asyncify

d908478

reeselevine merged commit 808f388 into webgpu-integration Apr 3, 2026
2 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add asyncify setup#1

Add asyncify setup#1
reeselevine merged 23 commits into
webgpu-integrationfrom
webgpu-asyncify

reeselevine commented Jan 20, 2026

Uh oh!

reeselevine commented Apr 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

reeselevine commented Jan 20, 2026

Chrome

Safari

Firefox

Uh oh!

reeselevine commented Apr 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant