Test fixes#208
Conversation
…awnwebgpu port, removing remote port file from repo
Minor feedback addressed
Add asyncify setup
Webgpu integration
|
whoops meant to open on my repo |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (9)
📒 Files selected for processing (39)
📝 WalkthroughWalkthroughThis PR adds comprehensive WebGPU backend support, introduces performance context APIs for timing metrics, restructures WASM build variants (JSPI and asyncify), updates the glue protocol (v1→v2), and enhances the example app UI with WebGPU memory budgeting, model filtering, and performance statistics display. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Wllama
participant Worker
participant WASM
participant Backend as WebGPU/CPU Backend
Client->>Wllama: loadModel(preferWebGPU: true)
Wllama->>Wllama: Check navigator.gpu availability
alt WebGPU Available
Wllama->>Worker: init(buildType: 'jspi', use_webgpu: true)
else WebGPU Unavailable
Wllama->>Wllama: Fallback to CPU (warn)
Wllama->>Worker: init(buildType: 'jspi', use_webgpu: false)
end
Worker->>WASM: wllama_start()
Worker->>WASM: wllama_action('load', {...use_webgpu, n_gpu_layers...})
WASM->>Backend: ggml_backend_dev_by_name('WebGPU') or ggml_backend_dev_by_type(CPU)
Backend-->>WASM: device_handle
WASM-->>Worker: load_res
Worker-->>Wllama: Model loaded with device
Wllama-->>Client: Ready (usingWebGPU() = true/false)
sequenceDiagram
participant UI as ChatScreen
participant Wllama
participant Worker
participant WASM
UI->>Wllama: createCompletion(prompt)
Wllama->>Worker: wllama_action('completion', ...)
Worker->>WASM: inference on selected backend
WASM-->>Worker: completion_res
Worker-->>Wllama: result
Wllama-->>UI: completion done
UI->>Wllama: getPerfContext()
Wllama->>Worker: wllama_action('perf_context', ...)
Worker->>WASM: perf_context (retrieve t_p_eval_ms, t_eval_ms, n_p_eval, n_eval)
WASM-->>Worker: pctx_res {timing, counters}
Worker-->>Wllama: metrics
Wllama-->>UI: PerfContextData
UI->>UI: Display token rates (tok/s)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Fix npm run test locally and add a couple webgpu tests
Summary by CodeRabbit
Release Notes
New Features
Improvements
Chores