Sync with latest upstream, fix useCache, add getLibllamaVersion() by ngxson · Pull Request #189 · ngxson/wllama

ngxson · 2025-08-30T23:09:10Z

Applied the fix from #174 (thanks to @khromov)

Add Wllama.getLibllamaVersion() to get the build number, for example: b6327-4d74393

Summary by CodeRabbit

New Features
- Added API to retrieve the bundled library version.
- Worker build now embeds a human-readable library version string.
Bug Fixes
- Retrieving current status no longer clears in-memory token history.
Refactor
- Flash-attention configuration switched to explicit modes (auto/disabled).
Chores
- Updated model backend revision and bumped package version.
- CDN Wasm URLs updated to the new release.
- CI checkout adjusted to fetch full history and submodules.

Co-authored-by: khromov <khromov@users.noreply.github.com>

coderabbitai · 2025-08-30T23:09:17Z

Walkthrough

The PR switches flash attention from a boolean to an enum in load handling, makes status reporting copy tokens instead of moving them, updates the llama.cpp submodule, embeds a libllama build/version string into generated worker code and exposes it via a new Wllama getter, and adds sentinel behavior to KV cache removal.

Changes

Cohort / File(s)	Summary
Flash attention & status handling (C++) `cpp/actions.hpp`	Replaces boolean `flash_attn` with enum `cparams.flash_attn_type` (maps to `LLAMA_FLASH_ATTN_TYPE_AUTO` or `LLAMA_FLASH_ATTN_TYPE_DISABLED`) during model load; `action_current_status` now copies `app.tokens` into `res.tokens.arr` instead of moving.
Submodule update `llama.cpp`	Advances the `llama.cpp` submodule pointer (820de57… → 4d74393…).
Build/version metadata & exposure `scripts/build_worker.sh`, `src/workers-code/generated.ts`, `src/wllama.ts`	Build script computes `BUILD_NUMBER` and `SHORT_HASH` from `llama.cpp` and writes `LIBLLAMA_VERSION` into `generated.ts`; `Wllama` imports it and adds `static getLibllamaVersion(): string`.
KV cache behavior & minor TS changes `src/wllama.ts`	`kvRemove` treats `nDiscard < 0` as sentinel to set `nCachedTokens = nKeep`; otherwise decrements by `nDiscard`. Also adds/clarifies inline comment on `flash_attn` semantics.
Generated worker code & CDN bump `src/workers-code/generated.ts`, `src/wasm-from-cdn.ts`, `package.json`	Adds exported `LIBLLAMA_VERSION` constant (e.g., `'b6327-4d74393'`); bumps package version (2.3.4 → 2.3.5) and updates CDN wasm URLs to the new version.
CI checkout behavior `.github/workflows/verify-generated-code.yml`	Checkout step now configures `actions/checkout@v4` with `fetch-depth: 0` and `submodules: 'true'` to ensure full history and submodule init.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant Wllama
  participant LlamaC as "llama.cpp C API"

  rect rgb(235,245,255)
  note over Client,Wllama: Load model with flash attention flag
  Client->>Wllama: loadModel({ flash_attn: true|false })
  Wllama->>Wllama: map boolean -> enum (AUTO or DISABLED)
  Wllama->>LlamaC: create context with cparams.flash_attn_type
  LlamaC-->>Wllama: context created
  Wllama-->>Client: ready
  end

sequenceDiagram
  autonumber
  participant Caller
  participant Wllama

  rect rgb(240,255,240)
  note over Caller,Wllama: KV removal with sentinel support
  Caller->>Wllama: kvRemove(nKeep, nDiscard)
  alt nDiscard < 0
    Wllama->>Wllama: nCachedTokens = nKeep
  else nDiscard >= 0
    Wllama->>Wllama: nCachedTokens -= nDiscard
  end
  Wllama-->>Caller: done
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

if KV rm fails, we should clear the whole cache #188 — Adjusts flash_attn load parameter handling and KV cache removal behavior; overlaps code areas changed here.
sync with latest upstream llama.cpp #187 — Also updates the llama.cpp submodule pointer and bumps package/CDN versions.
sync with upstream llama.cpp source code #171 — Another PR that advances the llama.cpp submodule pointer; related to the submodule update in this PR.

Poem

I thump my paws at version’s gleam,
b6327 hops through build-stream.
Flash flips to enum, neat and bright,
KV trims tails in moonlit night.
Tokens copied, none shall flee—🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch xsn/sync25

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

src/wllama.ts (2)
119-120: Consider future-proofing flash_attn config.

Underlying C++ now uses an enum; keeping a boolean here is fine for AUTO/DISABLED, but you may want a string union (e.g., 'auto' | 'disabled') later to expose more modes without another breaking change.

1208-1213: Clamp nCachedTokens to avoid negative values.

When nDiscard is positive, subtracting blindly can underflow in edge cases. Clamp to at least nKeep.

Apply this diff:
-    if (nDiscard < 0) {
-      this.nCachedTokens = nKeep;
-    } else {
-      this.nCachedTokens -= nDiscard;
-    }
+    if (nDiscard < 0) {
+      this.nCachedTokens = nKeep;
+    } else {
+      this.nCachedTokens = Math.max(nKeep, this.nCachedTokens - nDiscard);
+    }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled
Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c267097 and 8c6c097.

⛔ Files ignored due to path filters (2)

src/multi-thread/wllama.wasm is excluded by !**/*.wasm
src/single-thread/wllama.wasm is excluded by !**/*.wasm

📒 Files selected for processing (5)

cpp/actions.hpp (2 hunks)
llama.cpp (1 hunks)
scripts/build_worker.sh (1 hunks)
src/wllama.ts (4 hunks)
src/workers-code/generated.ts (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-04-18T08:24:14.434Z

Learnt from: ngxson
PR: ngxson/wllama#0
File: :0-0
Timestamp: 2025-04-18T08:24:14.434Z
Learning: The file `generate_wasm_from_cdn.ts` in the wllama project is a generated script that gets stored on CDN, making it impossible to use `require('../package.json')` to dynamically access version information. This is why version references in this file need to be manually updated when the package version changes.

Applied to files:

src/workers-code/generated.ts
scripts/build_worker.sh

🧬 Code graph analysis (1)

src/wllama.ts (1)

src/workers-code/generated.ts (1)

LIBLLAMA_VERSION (4-4)

🪛 GitHub Actions: Verify generated worker code is up-to-date

src/workers-code/generated.ts

[error] 1-1: Generated code file is not up-to-date. Changes not staged for commit: 'src/workers-code/generated.ts'. Run 'npm run build:worker' to update.

🔇 Additional comments (6)

llama.cpp (1)

1-1: Submodule bump to 4d74393 validated

LIBLLAMA_VERSION in generated.ts is “b6327-4d74393”, matching the llama.cpp gitlink (4d74393).

C++ glue in cpp/actions.hpp converts the old flash_attn boolean to the new flash_attn_type enum.

src/workers-code/generated.ts (1)

4-4: Commit updated auto-generated worker file
The diff shows LIBLLAMA_VERSION updated to 'b159-8c6c097'; please commit the regenerated src/workers-code/generated.ts.

cpp/actions.hpp (2)

205-205: Boolean → enum mapping for flash attention looks correct.

Mapping true → LLAMA_FLASH_ATTN_TYPE_AUTO and false → LLAMA_FLASH_ATTN_TYPE_DISABLED aligns with llama.cpp’s enum. Guarded by not_null(), so defaults aren’t overridden unintentionally.

778-778: Good fix: copy tokens instead of moving them.

Returning a copy preserves app.tokens for subsequent calls to status-sensitive APIs.

src/wllama.ts (1)

34-34: Import of LIBLLAMA_VERSION is appropriate.

This cleanly enables the public getter without altering existing initialization paths.

scripts/build_worker.sh (1)

7-16: BUILD_NUMBER computation and script robustness confirmed
Script now sets set -euo pipefail, quotes all paths, detects and deepens shallow llama.cpp clones before computing BUILD_NUMBER, and produces a stable LIBLLAMA_VERSION; regenerated file shows only the expected version bump with no additional drift.

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled
Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8c6c097 and a4a4e6e.

📒 Files selected for processing (4)

.github/workflows/verify-generated-code.yml (1 hunks)
package.json (1 hunks)
src/wasm-from-cdn.ts (1 hunks)
src/wllama.ts (4 hunks)

✅ Files skipped from review due to trivial changes (1)

package.json

🚧 Files skipped from review as they are similar to previous changes (1)

src/wllama.ts

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-04-18T08:24:14.434Z

Learnt from: ngxson
PR: ngxson/wllama#0
File: :0-0
Timestamp: 2025-04-18T08:24:14.434Z
Learning: The file `generate_wasm_from_cdn.ts` in the wllama project is a generated script that gets stored on CDN, making it impossible to use `require('../package.json')` to dynamically access version information. This is why version references in this file need to be manually updated when the package version changes.

Applied to files:

src/wasm-from-cdn.ts

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: test (ubuntu-latest, 22)

coderabbitai · 2025-08-30T23:38:27Z

+  'single-thread/wllama.wasm': 'https://cdn.jsdelivr.net/npm/@wllama/wllama@2.3.5/src/single-thread/wllama.wasm',
+  'multi-thread/wllama.wasm': 'https://cdn.jsdelivr.net/npm/@wllama/wllama@2.3.5/src/multi-thread/wllama.wasm',


💡 Verification agent

🧩 Analysis chain

CDN version bump to 2.3.5 looks good

Links are consistent with the package bump.

Run this quick check to ensure the CDN assets for 2.3.5 are live:

🏁 Script executed:

#!/bin/bash set -euo pipefail urls=( "https://cdn.jsdelivr.net/npm/@wllama/wllama@2.3.5/src/single-thread/wllama.wasm" "https://cdn.jsdelivr.net/npm/@wllama/wllama@2.3.5/src/multi-thread/wllama.wasm" ) for u in "${urls[@]}"; do code=$(curl -s -o /dev/null -w '%{http_code}' -L "$u") echo "$code $u" [[ "$code" =~ ^(200|302)$ ]] || { echo "Missing or not yet published: $u" >&2; exit 1; } done

Length of output: 795

CDN assets for wllama@2.3.5 are not yet published
The single-thread WASM URL returns 404, so consumers will fail to load the asset. Delay merging until both single-thread/wllama.wasm and multi-thread/wllama.wasm are live on jsdelivr.

🤖 Prompt for AI Agents

In src/wasm-from-cdn.ts around lines 5-6 the entries point to wllama@2.3.5 WASM files that are not yet published on jsdelivr (single-thread URL 404), causing runtime failures; either revert these entries to the previous working CDN/version or replace them with a local/bundled path until jsdelivr publishes both artifacts, or implement a runtime availability check (HEAD/fetch) with a clear fallback and error message so consumers don't crash; update the PR to remove the non-working URLs and add a note to reintroduce the 2.3.5 URLs once both WASM files are live.

should be ok now

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Checking CDN availability for wllama@2.3.5..." urls=( "https://cdn.jsdelivr.net/npm/@wllama/wllama@2.3.5/src/single-thread/wllama.wasm" "https://cdn.jsdelivr.net/npm/@wllama/wllama@2.3.5/src/multi-thread/wllama.wasm" ) all_ok=true for url in "${urls[@]}"; do echo "Checking: $url" code=$(curl -s -o /dev/null -w '%{http_code}' -L "$url") echo " HTTP Status: $code" if [[ "$code" == "200" ]]; then echo " ✅ Available" else echo " ❌ Not available (HTTP $code)" all_ok=false fi echo done if $all_ok; then echo "🎉 All CDN assets are now available!" else echo "⚠️ Some CDN assets are still not available" fi

Length of output: 1323

@ngxson Perfect! ✅ Both CDN URLs are now returning HTTP 200 and the WASM assets are available:

single-thread/wllama.wasm ✅ Available

multi-thread/wllama.wasm ✅ Available

The version bump to 2.3.5 in src/wasm-from-cdn.ts is now safe to merge. Thanks for publishing the package!

/| /| ( :v: ) |(_)|

🧠 Learnings used

Learnt from: ngxson PR: ngxson/wllama#0 File: :0-0 Timestamp: 2025-04-18T08:24:14.434Z Learning: The file `generate_wasm_from_cdn.ts` in the wllama project is a generated script that gets stored on CDN, making it impossible to use `require('../package.json')` to dynamically access version information. This is why version references in this file need to be manually updated when the package version changes.

ngxson and others added 3 commits August 31, 2025 00:41

fix problem with useCache

a2107d6

Co-authored-by: khromov <khromov@users.noreply.github.com>

bump to latest upstream llama.cpp

c723c2b

add api for getting libllama version number

8c6c097

ngxson changed the title ~~Sync with latest stream, fix useCache, add getLibllamaVersion()~~ Sync with latest upstream, fix useCache, add getLibllamaVersion() Aug 30, 2025

coderabbitai Bot reviewed Aug 30, 2025

View reviewed changes

Comment thread src/wllama.ts

ngxson added 4 commits August 31, 2025 01:28

correct doc

9dc024c

fix CI

77b448d

v2.3.5

458281f

fix submodule

a4a4e6e

ngxson merged commit 4c9f0bb into master Aug 30, 2025
5 of 6 checks passed

coderabbitai Bot reviewed Aug 30, 2025

View reviewed changes

coderabbitai Bot mentioned this pull request Oct 6, 2025

sync with llama.cpp upstream #192

Merged

coderabbitai Bot mentioned this pull request Nov 27, 2025

sync upstream llama.cpp (b7179) #194

Merged

coderabbitai Bot mentioned this pull request Mar 7, 2026

Modify emcmake and set emsdk to "latest" #205

Closed

ngxson deleted the xsn/sync25 branch May 23, 2026 11:07

coderabbitai Bot mentioned this pull request May 30, 2026

fix CDN build #244

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync with latest upstream, fix useCache, add getLibllamaVersion()#189

Sync with latest upstream, fix useCache, add getLibllamaVersion()#189
ngxson merged 7 commits into
masterfrom
xsn/sync25

ngxson commented Aug 30, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Aug 30, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot Aug 30, 2025 •

edited

Loading

Uh oh!

ngxson Aug 31, 2025

Uh oh!

coderabbitai Bot Aug 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		'single-thread/wllama.wasm': 'https://cdn.jsdelivr.net/npm/@wllama/wllama@2.3.5/src/single-thread/wllama.wasm',
		'multi-thread/wllama.wasm': 'https://cdn.jsdelivr.net/npm/@wllama/wllama@2.3.5/src/multi-thread/wllama.wasm',

Conversation

ngxson commented Aug 30, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngxson Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ngxson commented Aug 30, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Aug 30, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

coderabbitai Bot Aug 30, 2025 •

edited

Loading