Add support for async file read by ngxson · Pull Request #221 · ngxson/wllama

ngxson · 2026-05-16T21:34:45Z

Add support for async file read via JSPI, allow the wasm context to read file on-demand.

This implementation hooks into the fread() function and translate underlay call to async Blob read. The result is that tensors can be loaded on-demand without allocating the whole file. This is particularly useful for example, if large part of the weight is offloaded to WebGPU backend, then they won't allocated on RAM.

Hopefully allow support for bigger models (> 2GB in size) to run directly on WebGPU. CC @reeselevine if you are interested in doing some testings.

Note: on systems that doesn't support JSPI, we fallback to the HeapFS implementation (allocate file on heap, then doing mmap to avoid copying file)

Summary by CodeRabbit

New Features
- Async file-read support for faster, non-blocking model loads in supported environments
- Optional in-memory 1MB read-ahead cache for small reads
- Worker/bridge updates to support deferred Blob-based reads and conditional memory-mapped loading
Bug Fixes
- Updated browser compatibility notes: Safari excluded due to missing Memory64 support
Documentation
- Expanded developer guide with a new “File access” section describing async reads, HeapFS usage, caching, and memory considerations
Chores
- Build config updated to enable WebAssembly-level file I/O interception

coderabbitai · 2026-05-16T21:34:56Z

📝 Walkthrough

Walkthrough

This PR adds JSPI‑gated async file reads for WebAssembly: linker wraps and C++ fread wrappers with a 1MB cache, a JS bridge to request chunks from the main thread, worker-side handlers that await responses and write into wasm heap, and runtime toggles to disable mmap when async is available.

Changes

Async file read support for WebAssembly

Layer / File(s)	Summary
Capability detection and model load configuration `src/utils.ts`, `src/wllama.ts`	`canUseAsyncFileRead()` exports JSPI support; model loading uses `use_mmap: !canUseAsyncFileRead()` and disables `use_mlock`.
C++ filesystem interception and read bridging `CMakeLists.txt`, `cpp/wllama-fs.h`, `cpp/wllama.cpp`	Linker wrap options for `fopen`/`fclose`/`fread`/`fseek`/`ftell`; C++ wrappers maintain FILE*→path map, implement a 1MB read‑ahead cache, and provide `js_file_read` EM_ASYNC_JS bridge calling `_wllama_js_file_read`.
Worker thread async file read handler `src/workers-code/llama-cpp.js`	Worker uses BigInt mmap pointers, optional mmap-backed allocation, writes directly into wasm heap, implements `_wllama_js_file_read` that posts `fs.read_req` and awaits `fs.read_res`, and guards message handling during JSPI suspension.
Main thread file blob storage and read response `src/worker.ts`	`ProxyToWorker` tracks `useAsyncFile` and per-file `Blob`s, passes `allocBuffer` to `fs.alloc`, skips streaming when async enabled, and slices/postMessage `fs.read_res` with transferable ArrayBuffers for read requests.
Documentation and minor UX changes `README-dev.md`, `examples/main/src/components/GuideScreen.tsx`, `examples/main/src/utils/custom-models.tsx`, `llama.cpp`	Adds a detailed "File access" doc section describing async reads and HeapFS; updates GuideScreen Safari Memory64 note; removes a max-size validation in examples; advances `llama.cpp` submodule pointer.

Sequence Diagram(s)

sequenceDiagram
  participant Wasm as Wasm code
  participant Wrap as __wrap_fread
  participant Cache as C++ cache
  participant Bridge as js_file_read
  participant Worker as Worker (llama-cpp.js)
  participant Main as ProxyToWorker (main thread)
  Wasm->>Wrap: fread(ptr,size,nmemb,FILE*)
  Wrap->>Cache: try_cache(FILE*, pos, bytes)
  alt cache hit
    Cache->>Wrap: copy bytes -> ptr
  else cache miss or large read
    Wrap->>Bridge: js_file_read(path, offset, size, out_ptr)
    Bridge->>Worker: calls _wllama_js_file_read
    Worker->>Main: postMessage fs.read_req
    Main->>Worker: postMessage fs.read_res (ArrayBuffer)
    Worker->>Bridge: resolve, write into wasm heap
    Bridge->>Wrap: return byte count
  end
  Wrap->>Wasm: return elements read

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

ngxson/wllama#215: JSPI support and capability detection work related to async Blob/file-read plumbing.

Poem

🐰 I hop from heap to JS and back,

I fetch the bytes the models lack,
One-meg chunks in gentle streams,
I bridge the wasm-to-main-thread beams,
A caching rabbit in the stack.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Add support for async file read' directly and clearly summarizes the main change of the PR: adding async file reading capability via JSPI and related infrastructure.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch xsn/async_file_read

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

examples/main/src/components/GuideScreen.tsx (1)
51-51: ⚡ Quick win

Consider version-scoping the Safari support claim for maintainability.

The claim "Safari is not supported" is currently accurate (as of Safari 26.5 in May 2026, Memory64 remains unsupported in both stable and Technology Preview releases). However, versioning the statement will help prevent the guide from misleading users as browser support evolves.

For example: "Safari 26.5 and earlier do not support Memory64" or "As of May 2026, Memory64 is not supported in Safari."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/main/src/components/GuideScreen.tsx` at line 51, Update the static
message in the GuideScreen component that currently reads "Safari is not
supported" to include a version or date scope; locate the string in
GuideScreen.tsx (the text node containing "Safari is not supported, due to lack
of Memory64 support") and replace it with a version-scoped phrasing such as "As
of May 2026, Safari (≤ 26.5) does not support Memory64" or "Safari 26.5 and
earlier do not support Memory64" so the statement remains accurate as browser
support changes.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp/wllama-fs.h`:
- Around line 12-43: Global state (s_file_path_map, ready, use_async,
cache_data, cache_start, cache_file) is mutated concurrently from
fopen/fclose/fread and must be serialized; either protect accesses with a mutex
(e.g., add a static std::mutex like s_fs_mutex and lock it in make_sure_ready(),
try_cache(), and all fopen/fclose/fread wrappers) or refactor the single cache
into per-FILE* entries stored in s_file_path_map so each FILE* has its own cache
and is updated under a mutex. Update all code paths that read/write these
symbols (make_sure_ready, try_cache and the functions in the 61-147 region) to
acquire the mutex while touching the globals to prevent races.

In `@README-dev.md`:
- Line 72: The README sentence currently mentions only fopen, fseek and fread
but should list the entire set of intercepted libc stdio functions implemented
in this PR; update that line to explicitly enumerate all wrapped stdio calls
(not just fopen, fseek, fread) by copying the exact function names from the
wrapper implementation (the module that defines the proxies to the main thread)
and replace the sentence so it accurately documents the full intercepted surface
(use the existing symbols fopen, fseek, fread to locate the sentence to edit).

In `@src/worker.ts`:
- Around line 244-263: fileReadResponse currently awaits chunk.arrayBuffer()
without error handling, so a rejection never posts a response and the
worker-side _wllama_js_file_read stays hung; wrap the blob read in try/catch and
on error post a failure message (e.g., verb 'fs.read_err' with an error string
or code) via this.worker.postMessage so the worker bridge can reject the pending
read; apply the same pattern to the analogous range/other read handler
referenced (same logic as in fileReadResponse) and ensure you include
identifying symbols: fileReadResponse, onRecvMsg, and _wllama_js_file_read to
locate and mirror the error-path behavior.

---

Nitpick comments:
In `@examples/main/src/components/GuideScreen.tsx`:
- Line 51: Update the static message in the GuideScreen component that currently
reads "Safari is not supported" to include a version or date scope; locate the
string in GuideScreen.tsx (the text node containing "Safari is not supported,
due to lack of Memory64 support") and replace it with a version-scoped phrasing
such as "As of May 2026, Safari (≤ 26.5) does not support Memory64" or "Safari
26.5 and earlier do not support Memory64" so the statement remains accurate as
browser support changes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 583cb48d-dbd4-404f-9c29-524d2062ee53

📥 Commits

Reviewing files that changed from the base of the PR and between e923fba and 2d693a9.

⛔ Files ignored due to path filters (1)

src/wasm/wllama.wasm is excluded by !**/*.wasm

📒 Files selected for processing (12)

CMakeLists.txt
README-dev.md
cpp/wllama-fs.h
cpp/wllama.cpp
examples/main/src/components/GuideScreen.tsx
llama.cpp
src/utils.ts
src/wasm/wllama.js
src/wllama.ts
src/worker.ts
src/workers-code/generated.ts
src/workers-code/llama-cpp.js

ngxson · 2026-05-17T10:38:01Z

Ok so I was able to load the Qwen3VL-8B-Q4_K_M with this PR, all layers offloaded to GPU:

coderabbitai

🧹 Nitpick comments (1)

examples/main/src/utils/custom-models.tsx (1)
60-64: ⚡ Quick win

Remove the commented-out validation code or restore it conditionally based on async file read availability.

Leaving code commented out is a code smell. The PR removes size validation to support large files via async reads (JSPI), but the fallback path (non-JSPI environments) still uses HeapFS with heap allocation, which has size constraints. Either:

If async reads are guaranteed: Remove the commented code entirely.

If HeapFS fallback is still used: Restore the validation conditionally to prevent cryptic heap allocation failures when users without JSPI support load oversized files.

Currently, users without JSPI will encounter silent runtime errors instead of the helpful upfront message about splitting shards.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/main/src/utils/custom-models.tsx` around lines 60 - 64, The
commented-out size validation around MAX_GGUF_SIZE (the sizes.some(...) check)
must be either removed or restored conditionally: detect whether async file
reads / JSPI are available (the code path that avoids HeapFS) and only enforce
the sizes.some((s) => s >= MAX_GGUF_SIZE) -> throw new Error(...) guard when
running the HeapFS fallback (non-JSPI) so users get a clear "GGUF file is too
big" message instead of cryptic heap allocation failures; if you guarantee
JSPI/async reads for all supported runtimes, remove the commented block
entirely, otherwise re-enable it and gate it on the runtime/feature-detection
flag used elsewhere in this module.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@examples/main/src/utils/custom-models.tsx`:
- Around line 60-64: The commented-out size validation around MAX_GGUF_SIZE (the
sizes.some(...) check) must be either removed or restored conditionally: detect
whether async file reads / JSPI are available (the code path that avoids HeapFS)
and only enforce the sizes.some((s) => s >= MAX_GGUF_SIZE) -> throw new
Error(...) guard when running the HeapFS fallback (non-JSPI) so users get a
clear "GGUF file is too big" message instead of cryptic heap allocation
failures; if you guarantee JSPI/async reads for all supported runtimes, remove
the commented block entirely, otherwise re-enable it and gate it on the
runtime/feature-detection flag used elsewhere in this module.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b1bee32c-82f1-49a6-9b49-381dc6949138

📥 Commits

Reviewing files that changed from the base of the PR and between cd746d7 and b9fe57a.

📒 Files selected for processing (1)

examples/main/src/utils/custom-models.tsx

ngxson · 2026-05-17T23:01:15Z

Quite insane but I managed to run Qwen3.6 27B 100% on WebGPU. Still trying to add MTP support...

Screen.Recording.2026-05-18.at.00.55.23.mov

felladrin · 2026-05-17T23:07:46Z

Quite insane but I managed to run Qwen3.6 27B to run 100% on WebGPU. Still trying to add MTP support...

Great achievement!!

reeselevine · 2026-05-18T22:22:50Z

@ngxson not sure if you are running into issues with MTP support on Qwen specifically but it looks like the main PR didn't add the code to the WebGPU backend that is needed for GATED_DELTA_NET, I opened a PR here: ggml-org/llama.cpp#23299

ngxson added 3 commits May 16, 2026 14:02

wip async file read

cad0810

support async file read

6de9d46

nits

2d693a9

coderabbitai Bot reviewed May 16, 2026

View reviewed changes

Comment thread cpp/wllama-fs.h

Comment thread README-dev.md

Comment thread src/worker.ts Outdated

handle read error

cd746d7

remove "GGUF file is too big" error

b9fe57a

coderabbitai Bot reviewed May 17, 2026

View reviewed changes

ngxson merged commit e8e134f into master May 17, 2026
6 checks passed

This was referenced May 23, 2026

enable async file read in compat mode #224

Merged

Add debug stack trace #239

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for async file read#221

Add support for async file read#221
ngxson merged 5 commits into
masterfrom
xsn/async_file_read

ngxson commented May 16, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 16, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngxson commented May 17, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

ngxson commented May 17, 2026 •

edited

Loading

Uh oh!

felladrin commented May 17, 2026

Uh oh!

reeselevine commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ngxson commented May 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngxson commented May 17, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ngxson commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

felladrin commented May 17, 2026

Uh oh!

reeselevine commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ngxson commented May 16, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 16, 2026 •

edited

Loading

ngxson commented May 17, 2026 •

edited

Loading