Add support for async file read#221
Conversation
📝 WalkthroughWalkthroughThis PR adds JSPI‑gated async file reads for WebAssembly: linker wraps and C++ fread wrappers with a 1MB cache, a JS bridge to request chunks from the main thread, worker-side handlers that await responses and write into wasm heap, and runtime toggles to disable mmap when async is available. ChangesAsync file read support for WebAssembly
Sequence Diagram(s)sequenceDiagram
participant Wasm as Wasm code
participant Wrap as __wrap_fread
participant Cache as C++ cache
participant Bridge as js_file_read
participant Worker as Worker (llama-cpp.js)
participant Main as ProxyToWorker (main thread)
Wasm->>Wrap: fread(ptr,size,nmemb,FILE*)
Wrap->>Cache: try_cache(FILE*, pos, bytes)
alt cache hit
Cache->>Wrap: copy bytes -> ptr
else cache miss or large read
Wrap->>Bridge: js_file_read(path, offset, size, out_ptr)
Bridge->>Worker: calls _wllama_js_file_read
Worker->>Main: postMessage fs.read_req
Main->>Worker: postMessage fs.read_res (ArrayBuffer)
Worker->>Bridge: resolve, write into wasm heap
Bridge->>Wrap: return byte count
end
Wrap->>Wasm: return elements read
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
examples/main/src/components/GuideScreen.tsx (1)
51-51: ⚡ Quick winConsider version-scoping the Safari support claim for maintainability.
The claim "Safari is not supported" is currently accurate (as of Safari 26.5 in May 2026, Memory64 remains unsupported in both stable and Technology Preview releases). However, versioning the statement will help prevent the guide from misleading users as browser support evolves.
For example: "Safari 26.5 and earlier do not support Memory64" or "As of May 2026, Memory64 is not supported in Safari."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/main/src/components/GuideScreen.tsx` at line 51, Update the static message in the GuideScreen component that currently reads "Safari is not supported" to include a version or date scope; locate the string in GuideScreen.tsx (the text node containing "Safari is not supported, due to lack of Memory64 support") and replace it with a version-scoped phrasing such as "As of May 2026, Safari (≤ 26.5) does not support Memory64" or "Safari 26.5 and earlier do not support Memory64" so the statement remains accurate as browser support changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cpp/wllama-fs.h`:
- Around line 12-43: Global state (s_file_path_map, ready, use_async,
cache_data, cache_start, cache_file) is mutated concurrently from
fopen/fclose/fread and must be serialized; either protect accesses with a mutex
(e.g., add a static std::mutex like s_fs_mutex and lock it in make_sure_ready(),
try_cache(), and all fopen/fclose/fread wrappers) or refactor the single cache
into per-FILE* entries stored in s_file_path_map so each FILE* has its own cache
and is updated under a mutex. Update all code paths that read/write these
symbols (make_sure_ready, try_cache and the functions in the 61-147 region) to
acquire the mutex while touching the globals to prevent races.
In `@README-dev.md`:
- Line 72: The README sentence currently mentions only fopen, fseek and fread
but should list the entire set of intercepted libc stdio functions implemented
in this PR; update that line to explicitly enumerate all wrapped stdio calls
(not just fopen, fseek, fread) by copying the exact function names from the
wrapper implementation (the module that defines the proxies to the main thread)
and replace the sentence so it accurately documents the full intercepted surface
(use the existing symbols fopen, fseek, fread to locate the sentence to edit).
In `@src/worker.ts`:
- Around line 244-263: fileReadResponse currently awaits chunk.arrayBuffer()
without error handling, so a rejection never posts a response and the
worker-side _wllama_js_file_read stays hung; wrap the blob read in try/catch and
on error post a failure message (e.g., verb 'fs.read_err' with an error string
or code) via this.worker.postMessage so the worker bridge can reject the pending
read; apply the same pattern to the analogous range/other read handler
referenced (same logic as in fileReadResponse) and ensure you include
identifying symbols: fileReadResponse, onRecvMsg, and _wllama_js_file_read to
locate and mirror the error-path behavior.
---
Nitpick comments:
In `@examples/main/src/components/GuideScreen.tsx`:
- Line 51: Update the static message in the GuideScreen component that currently
reads "Safari is not supported" to include a version or date scope; locate the
string in GuideScreen.tsx (the text node containing "Safari is not supported,
due to lack of Memory64 support") and replace it with a version-scoped phrasing
such as "As of May 2026, Safari (≤ 26.5) does not support Memory64" or "Safari
26.5 and earlier do not support Memory64" so the statement remains accurate as
browser support changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 583cb48d-dbd4-404f-9c29-524d2062ee53
⛔ Files ignored due to path filters (1)
src/wasm/wllama.wasmis excluded by!**/*.wasm
📒 Files selected for processing (12)
CMakeLists.txtREADME-dev.mdcpp/wllama-fs.hcpp/wllama.cppexamples/main/src/components/GuideScreen.tsxllama.cppsrc/utils.tssrc/wasm/wllama.jssrc/wllama.tssrc/worker.tssrc/workers-code/generated.tssrc/workers-code/llama-cpp.js
There was a problem hiding this comment.
🧹 Nitpick comments (1)
examples/main/src/utils/custom-models.tsx (1)
60-64: ⚡ Quick winRemove the commented-out validation code or restore it conditionally based on async file read availability.
Leaving code commented out is a code smell. The PR removes size validation to support large files via async reads (JSPI), but the fallback path (non-JSPI environments) still uses HeapFS with heap allocation, which has size constraints. Either:
- If async reads are guaranteed: Remove the commented code entirely.
- If HeapFS fallback is still used: Restore the validation conditionally to prevent cryptic heap allocation failures when users without JSPI support load oversized files.
Currently, users without JSPI will encounter silent runtime errors instead of the helpful upfront message about splitting shards.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/main/src/utils/custom-models.tsx` around lines 60 - 64, The commented-out size validation around MAX_GGUF_SIZE (the sizes.some(...) check) must be either removed or restored conditionally: detect whether async file reads / JSPI are available (the code path that avoids HeapFS) and only enforce the sizes.some((s) => s >= MAX_GGUF_SIZE) -> throw new Error(...) guard when running the HeapFS fallback (non-JSPI) so users get a clear "GGUF file is too big" message instead of cryptic heap allocation failures; if you guarantee JSPI/async reads for all supported runtimes, remove the commented block entirely, otherwise re-enable it and gate it on the runtime/feature-detection flag used elsewhere in this module.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@examples/main/src/utils/custom-models.tsx`:
- Around line 60-64: The commented-out size validation around MAX_GGUF_SIZE (the
sizes.some(...) check) must be either removed or restored conditionally: detect
whether async file reads / JSPI are available (the code path that avoids HeapFS)
and only enforce the sizes.some((s) => s >= MAX_GGUF_SIZE) -> throw new
Error(...) guard when running the HeapFS fallback (non-JSPI) so users get a
clear "GGUF file is too big" message instead of cryptic heap allocation
failures; if you guarantee JSPI/async reads for all supported runtimes, remove
the commented block entirely, otherwise re-enable it and gate it on the
runtime/feature-detection flag used elsewhere in this module.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b1bee32c-82f1-49a6-9b49-381dc6949138
📒 Files selected for processing (1)
examples/main/src/utils/custom-models.tsx
|
Quite insane but I managed to run Qwen3.6 27B 100% on WebGPU. Still trying to add MTP support... Screen.Recording.2026-05-18.at.00.55.23.mov |
Great achievement!! |
|
@ngxson not sure if you are running into issues with MTP support on Qwen specifically but it looks like the main PR didn't add the code to the WebGPU backend that is needed for GATED_DELTA_NET, I opened a PR here: ggml-org/llama.cpp#23299 |



Add support for async file read via JSPI, allow the wasm context to read file on-demand.
This implementation hooks into the
fread()function and translate underlay call to async Blob read. The result is that tensors can be loaded on-demand without allocating the whole file. This is particularly useful for example, if large part of the weight is offloaded to WebGPU backend, then they won't allocated on RAM.Hopefully allow support for bigger models (> 2GB in size) to run directly on WebGPU. CC @reeselevine if you are interested in doing some testings.
Note: on systems that doesn't support JSPI, we fallback to the HeapFS implementation (allocate file on heap, then doing mmap to avoid copying file)
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores