Test fixes by reeselevine · Pull Request #208 · ngxson/wllama

reeselevine · 2026-04-07T16:47:08Z

Fix npm run test locally and add a couple webgpu tests

Summary by CodeRabbit

Release Notes

New Features
- GPU acceleration support via WebGPU backend with experimental stability on select devices
- Performance monitoring capabilities: retrieve token generation rates and timing metrics
- Performance metrics reset functionality
- GitHub Pages automatic deployment workflow
Improvements
- Enhanced model selection UI with WebGPU budget awareness and memory limits
- Expanded WASM build variants for improved compatibility
- Updated model library and dependencies
- Improved documentation reflecting GPU acceleration availability
Chores
- Build infrastructure enhancements
- Type definitions for WebGPU support

…awnwebgpu port, removing remote port file from repo

…stigation

Minor feedback addressed

Add asyncify setup

Webgpu integration

…ts for now

reeselevine · 2026-04-07T16:47:24Z

whoops meant to open on my repo

coderabbitai · 2026-04-07T16:47:29Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d1f9da21-eed1-45a5-8ec5-9120ba14f414

📥 Commits

Reviewing files that changed from the base of the PR and between 8778d7b and 0362e25.

⛔ Files ignored due to path filters (9)

examples/main/package-lock.json is excluded by !**/package-lock.json
package-lock.json is excluded by !**/package-lock.json
src/asyncify-multi-thread/wllama.wasm is excluded by !**/*.wasm
src/asyncify-single-thread/wllama.wasm is excluded by !**/*.wasm
src/jspi-multi-thread/wllama.wasm is excluded by !**/*.wasm
src/jspi-single-thread/wllama.wasm is excluded by !**/*.wasm
src/multi-thread/wllama.wasm is excluded by !**/*.wasm
src/single-thread/wllama.wasm is excluded by !**/*.wasm
src/webgpu-single-thread/wllama.wasm is excluded by !**/*.wasm

📒 Files selected for processing (39)

.github/workflows/deploy-examples-main.yml
CMakeLists.txt
cpp/actions.hpp
cpp/glue.hpp
cpp/test_glue.cpp
cpp/wllama.cpp
examples/main/package.json
examples/main/src/components/ChatScreen.tsx
examples/main/src/components/GuideScreen.tsx
examples/main/src/components/ModelScreen.tsx
examples/main/src/config.ts
examples/main/src/utils/custom-models.tsx
examples/main/src/utils/displayed-model.tsx
examples/main/src/utils/types.ts
examples/main/src/utils/utils.ts
examples/main/src/utils/wllama.context.tsx
examples/main/tsconfig.app.json
llama.cpp
package.json
scripts/build_wasm.sh
scripts/build_worker.sh
scripts/docker-compose.yml
src/asyncify-multi-thread/wllama.js
src/asyncify-single-thread/wllama.js
src/cache-manager.ts
src/glue/messages.ts
src/jspi-multi-thread/wllama.js
src/jspi-single-thread/wllama.js
src/mjs.test.ts
src/multi-thread/wllama.js
src/single-thread/wllama.js
src/webgpu-single-thread/wllama.js
src/wllama.test.ts
src/wllama.ts
src/worker.ts
src/workers-code/generated.ts
src/workers-code/llama-cpp.js
tsconfig.build.json
vitest.config.ts

📝 Walkthrough

Walkthrough

This PR adds comprehensive WebGPU backend support, introduces performance context APIs for timing metrics, restructures WASM build variants (JSPI and asyncify), updates the glue protocol (v1→v2), and enhances the example app UI with WebGPU memory budgeting, model filtering, and performance statistics display.

Changes

Cohort / File(s)	Summary
WebGPU Backend Infrastructure `CMakeLists.txt`, `cpp/actions.hpp`, `cpp/glue.hpp`, `cpp/wllama.cpp`	Added CMake options (`GGML_WEBGPU`, `GGML_WEBGPU_JSPI`, `LLAMA_WASM_MEM64`). Extended C++ backend to select and manage WebGPU/CPU device via `ggml_backend_dev_t`, with fallback logic and device unavailability error handling.
Performance Context APIs `cpp/actions.hpp`, `cpp/glue.hpp`, `cpp/test_glue.cpp`, `cpp/wllama.cpp`, `src/glue/messages.ts`	Introduced `action_perf_context` and `action_perf_reset` handlers with new message types (`pctx_req`, `pctx_res`, `prst_req`, `prst_res`). Wire protocol bumped from v1 to v2. Load request now includes `use_webgpu` and `no_perf` flags.
WASM Build System Refactor `scripts/build_wasm.sh`, `scripts/build_worker.sh`, `scripts/docker-compose.yml`, `llama.cpp`	Updated Emscripten from 4.0.3 to 4.0.20. Replaced binary single/multi-thread builds with four variants (JSPI & asyncify, each single & multi-thread). Added Dawn WebGPU package integration into build. Updated exported build constants.
TypeScript Worker & Runtime `src/worker.ts`, `src/workers-code/llama-cpp.js`, `src/wllama.ts`	Refactored worker module loading to support build-type selection (JSPI vs asyncify). Added unsigned pointer normalization in llama-cpp.js. Generalized async/sync wrapper logic. Updated `Wllama` class with `preferWebGPU`, `noPerf`, `getPerfContext()`, `resetPerfContext()`, and `usingWebGPU()` APIs.
Example App - Performance Display `examples/main/src/components/ChatScreen.tsx`, `examples/main/src/utils/utils.ts`, `examples/main/src/utils/wllama.context.tsx`	Added UI for prefill/decode token rates with "Reset" button. Implemented `getWebGPUMemoryBudget()` with iOS-specific capping. Extended context to parameterize WebGPU preference and track runtime WebGPU status.
Example App - Model Selection & Budget `examples/main/src/components/ModelScreen.tsx`, `examples/main/src/config.ts`, `examples/main/src/utils/displayed-model.tsx`, `examples/main/src/utils/custom-models.tsx`	Added "Prefer WebGPU" checkbox, WebGPU memory budget UI, and model size blocking logic. Implemented GGUF split-file parsing and i-quant model filtering. Updated model list and added `isIQuantModel()` utility.
Configuration & Types `examples/main/src/utils/types.ts`, `examples/main/package.json`, `examples/main/tsconfig.app.json`, `package.json`, `tsconfig.build.json`	Extended `RuntimeInfo` and `InferenceParams` with WebGPU fields. Updated `@huggingface/jinja` (0.2.2→0.5.3) and added `@webgpu/types` to dependencies and TypeScript configurations.
Documentation & Tests `examples/main/src/components/GuideScreen.tsx`, `src/wllama.test.ts`, `src/mjs.test.ts`, `vitest.config.ts`, `.github/workflows/deploy-examples-main.yml`	Updated guide copy to reflect experimental WebGPU support. Added WebGPU-specific completion tests. Updated tests to use new build-variant paths. Increased test timeout to 60s. Added GitHub Actions workflow for example app deployment.
Utilities `src/cache-manager.ts`	Added await and metadata file cleanup in `deleteMany` operation.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Wllama
    participant Worker
    participant WASM
    participant Backend as WebGPU/CPU Backend

    Client->>Wllama: loadModel(preferWebGPU: true)
    Wllama->>Wllama: Check navigator.gpu availability
    alt WebGPU Available
        Wllama->>Worker: init(buildType: 'jspi', use_webgpu: true)
    else WebGPU Unavailable
        Wllama->>Wllama: Fallback to CPU (warn)
        Wllama->>Worker: init(buildType: 'jspi', use_webgpu: false)
    end
    Worker->>WASM: wllama_start()
    Worker->>WASM: wllama_action('load', {...use_webgpu, n_gpu_layers...})
    WASM->>Backend: ggml_backend_dev_by_name('WebGPU') or ggml_backend_dev_by_type(CPU)
    Backend-->>WASM: device_handle
    WASM-->>Worker: load_res
    Worker-->>Wllama: Model loaded with device
    Wllama-->>Client: Ready (usingWebGPU() = true/false)

sequenceDiagram
    participant UI as ChatScreen
    participant Wllama
    participant Worker
    participant WASM

    UI->>Wllama: createCompletion(prompt)
    Wllama->>Worker: wllama_action('completion', ...)
    Worker->>WASM: inference on selected backend
    WASM-->>Worker: completion_res
    Worker-->>Wllama: result
    Wllama-->>UI: completion done
    UI->>Wllama: getPerfContext()
    Wllama->>Worker: wllama_action('perf_context', ...)
    Worker->>WASM: perf_context (retrieve t_p_eval_ms, t_eval_ms, n_p_eval, n_eval)
    WASM-->>Worker: pctx_res {timing, counters}
    Worker-->>Wllama: metrics
    Wllama-->>UI: PerfContextData
    UI->>UI: Display token rates (tok/s)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

sync with upstream llama.cpp source code #171: Updates the llama.cpp submodule commit pointer, similar submodule maintenance.
sync with llama.cpp upstream #192: Synchronizes the llama.cpp upstream submodule to a newer commit.
Strict tsconfig.json #178: Modifies glue message type definitions and protocols in src/glue/messages.ts.

Suggested reviewers

ngxson

Poem

🐰 Hops with joy through WebGPU gates,
Where tokens dance at faster rates,
Perf metrics bloom, a rabbit's delight,
JSPI and asyncify shine so bright,
GPU backends now within reach—
A speedier wllama we teach! 🚀

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

boingboomtschak and others added 30 commits December 22, 2025 10:16

Changes to get llama.cpp WebGPU backend working in build process

677940f

Removing __pycache__

ed56306

Bumping emsdk image ver to 4.0.20 from 4.0.10, for newer included emd…

7d141ec

…awnwebgpu port, removing remote port file from repo

Adding WebGPU backend build to both single and multi threaded paths

ba4905a

Explicitly exporting HEAPU8 (required by newer Emscripten ver)

133cff4

Pushing 64-bit flag change, submodule update, some JSPI/Asyncify inve…

2f1f500

…stigation

Separate path for webgpu build

7cf42a8

working

134e50e

Move to passing backend choice to wasm

b3dbe23

Cleanup

1323ec1

Update flags

d718207

deploy

bb64e35

Add webgpu types to main example

690da82

Add performance tracking

1f9b342

Add webgpu support query to wllama api, address some issues

be3305e

Merge remote-tracking branch 'origin/perf'

fd9dd59

Update wasm blob with max(4gb, supported size)

f5a4058

Add back n_gpu_layers, update builds

492c423

formatting

d9112bd

Update llama.cpp submodule and rebuild wasm blobs

a8a9546

Minor feedback addressed

23bcdb7

Add asyncify setup

06f5537

Fixes for builds with ASSERTIONS=1

819c8d5

Update to faster llama.cpp webgpu

65bb53b

Fix asyncify vs. jspi path choice

b92aa33

Update to support qwen3.5

24f66ed

Update to support safari/firefox

c91277e

Update llama.cpp

c867db0

update llama.cpp version

0d389c7

Fix bug

340c5e6

reeselevine added 22 commits April 1, 2026 21:09

Update emdawnwebgpu

532b66f

try lower submit count

0b24107

extreme throttling

a862732

new

05c5bd2

New batching with throttling

949fe5a

start bisecting

bcf4f34

next config

b7a362e

next config

7e75768

next config

14acac0

model screens

67b437b

back to 1 in flight

3b33798

Try 2 batches of 16

1c49e9a

Merge pull request #2 from reeselevine/webgpu-integration

2b5e467

Minor feedback addressed

UPdate llama.cpp

b68a099

Merge branch 'ios-debug2' into webgpu-asyncify

d908478

Merge pull request #1 from reeselevine/webgpu-asyncify

808f388

Add asyncify setup

Merge pull request #3 from reeselevine/webgpu-integration

0e6b72f

Webgpu integration

Lower safari iOS limit further

d605117

Update examples/main with more iOS friendly models, filter out i-quan…

14a6ed4

…ts for now

Update examples and fix some handling of split models

1c6d32b

npm run test passing locally

4c09193

Add webgpu tests

0362e25

reeselevine closed this Apr 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test fixes#208

Test fixes#208
reeselevine wants to merge 52 commits into
ngxson:masterfrom
reeselevine:test-fixes

reeselevine commented Apr 7, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

reeselevine commented Apr 7, 2026

Uh oh!

coderabbitai Bot commented Apr 7, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

reeselevine commented Apr 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

reeselevine commented Apr 7, 2026

Uh oh!

coderabbitai Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

reeselevine commented Apr 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 7, 2026 •

edited

Loading