Skip to content

Sync with latest upstream, fix useCache, add getLibllamaVersion()#189

Merged
ngxson merged 7 commits into
masterfrom
xsn/sync25
Aug 30, 2025
Merged

Sync with latest upstream, fix useCache, add getLibllamaVersion()#189
ngxson merged 7 commits into
masterfrom
xsn/sync25

Conversation

@ngxson

@ngxson ngxson commented Aug 30, 2025

Copy link
Copy Markdown
Owner

Applied the fix from #174 (thanks to @khromov)

Add Wllama.getLibllamaVersion() to get the build number, for example: b6327-4d74393

Summary by CodeRabbit

  • New Features

    • Added API to retrieve the bundled library version.
    • Worker build now embeds a human-readable library version string.
  • Bug Fixes

    • Retrieving current status no longer clears in-memory token history.
  • Refactor

    • Flash-attention configuration switched to explicit modes (auto/disabled).
  • Chores

    • Updated model backend revision and bumped package version.
    • CDN Wasm URLs updated to the new release.
    • CI checkout adjusted to fetch full history and submodules.

ngxson and others added 3 commits August 31, 2025 00:41
Co-authored-by: khromov <khromov@users.noreply.github.com>
@coderabbitai

coderabbitai Bot commented Aug 30, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

The PR switches flash attention from a boolean to an enum in load handling, makes status reporting copy tokens instead of moving them, updates the llama.cpp submodule, embeds a libllama build/version string into generated worker code and exposes it via a new Wllama getter, and adds sentinel behavior to KV cache removal.

Changes

Cohort / File(s) Summary
Flash attention & status handling (C++)
cpp/actions.hpp
Replaces boolean flash_attn with enum cparams.flash_attn_type (maps to LLAMA_FLASH_ATTN_TYPE_AUTO or LLAMA_FLASH_ATTN_TYPE_DISABLED) during model load; action_current_status now copies app.tokens into res.tokens.arr instead of moving.
Submodule update
llama.cpp
Advances the llama.cpp submodule pointer (820de57… → 4d74393…).
Build/version metadata & exposure
scripts/build_worker.sh, src/workers-code/generated.ts, src/wllama.ts
Build script computes BUILD_NUMBER and SHORT_HASH from llama.cpp and writes LIBLLAMA_VERSION into generated.ts; Wllama imports it and adds static getLibllamaVersion(): string.
KV cache behavior & minor TS changes
src/wllama.ts
kvRemove treats nDiscard < 0 as sentinel to set nCachedTokens = nKeep; otherwise decrements by nDiscard. Also adds/clarifies inline comment on flash_attn semantics.
Generated worker code & CDN bump
src/workers-code/generated.ts, src/wasm-from-cdn.ts, package.json
Adds exported LIBLLAMA_VERSION constant (e.g., 'b6327-4d74393'); bumps package version (2.3.4 → 2.3.5) and updates CDN wasm URLs to the new version.
CI checkout behavior
.github/workflows/verify-generated-code.yml
Checkout step now configures actions/checkout@v4 with fetch-depth: 0 and submodules: 'true' to ensure full history and submodule init.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant Wllama
  participant LlamaC as "llama.cpp C API"

  rect rgb(235,245,255)
  note over Client,Wllama: Load model with flash attention flag
  Client->>Wllama: loadModel({ flash_attn: true|false })
  Wllama->>Wllama: map boolean -> enum (AUTO or DISABLED)
  Wllama->>LlamaC: create context with cparams.flash_attn_type
  LlamaC-->>Wllama: context created
  Wllama-->>Client: ready
  end
Loading
sequenceDiagram
  autonumber
  participant Caller
  participant Wllama

  rect rgb(240,255,240)
  note over Caller,Wllama: KV removal with sentinel support
  Caller->>Wllama: kvRemove(nKeep, nDiscard)
  alt nDiscard < 0
    Wllama->>Wllama: nCachedTokens = nKeep
  else nDiscard >= 0
    Wllama->>Wllama: nCachedTokens -= nDiscard
  end
  Wllama-->>Caller: done
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

I thump my paws at version’s gleam,
b6327 hops through build-stream.
Flash flips to enum, neat and bright,
KV trims tails in moonlit night.
Tokens copied, none shall flee—🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch xsn/sync25

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@ngxson ngxson changed the title Sync with latest stream, fix useCache, add getLibllamaVersion() Sync with latest upstream, fix useCache, add getLibllamaVersion() Aug 30, 2025

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/wllama.ts (2)

119-120: Consider future-proofing flash_attn config.

Underlying C++ now uses an enum; keeping a boolean here is fine for AUTO/DISABLED, but you may want a string union (e.g., 'auto' | 'disabled') later to expose more modes without another breaking change.


1208-1213: Clamp nCachedTokens to avoid negative values.

When nDiscard is positive, subtracting blindly can underflow in edge cases. Clamp to at least nKeep.

Apply this diff:

-    if (nDiscard < 0) {
-      this.nCachedTokens = nKeep;
-    } else {
-      this.nCachedTokens -= nDiscard;
-    }
+    if (nDiscard < 0) {
+      this.nCachedTokens = nKeep;
+    } else {
+      this.nCachedTokens = Math.max(nKeep, this.nCachedTokens - nDiscard);
+    }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled
  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c267097 and 8c6c097.

⛔ Files ignored due to path filters (2)
  • src/multi-thread/wllama.wasm is excluded by !**/*.wasm
  • src/single-thread/wllama.wasm is excluded by !**/*.wasm
📒 Files selected for processing (5)
  • cpp/actions.hpp (2 hunks)
  • llama.cpp (1 hunks)
  • scripts/build_worker.sh (1 hunks)
  • src/wllama.ts (4 hunks)
  • src/workers-code/generated.ts (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-04-18T08:24:14.434Z
Learnt from: ngxson
PR: ngxson/wllama#0
File: :0-0
Timestamp: 2025-04-18T08:24:14.434Z
Learning: The file `generate_wasm_from_cdn.ts` in the wllama project is a generated script that gets stored on CDN, making it impossible to use `require('../package.json')` to dynamically access version information. This is why version references in this file need to be manually updated when the package version changes.

Applied to files:

  • src/workers-code/generated.ts
  • scripts/build_worker.sh
🧬 Code graph analysis (1)
src/wllama.ts (1)
src/workers-code/generated.ts (1)
  • LIBLLAMA_VERSION (4-4)
🪛 GitHub Actions: Verify generated worker code is up-to-date
src/workers-code/generated.ts

[error] 1-1: Generated code file is not up-to-date. Changes not staged for commit: 'src/workers-code/generated.ts'. Run 'npm run build:worker' to update.

🔇 Additional comments (6)
llama.cpp (1)

1-1: Submodule bump to 4d74393 validated

  • LIBLLAMA_VERSION in generated.ts is “b6327-4d74393”, matching the llama.cpp gitlink (4d74393).
  • C++ glue in cpp/actions.hpp converts the old flash_attn boolean to the new flash_attn_type enum.
src/workers-code/generated.ts (1)

4-4: Commit updated auto-generated worker file
The diff shows LIBLLAMA_VERSION updated to 'b159-8c6c097'; please commit the regenerated src/workers-code/generated.ts.

cpp/actions.hpp (2)

205-205: Boolean → enum mapping for flash attention looks correct.

Mapping true → LLAMA_FLASH_ATTN_TYPE_AUTO and false → LLAMA_FLASH_ATTN_TYPE_DISABLED aligns with llama.cpp’s enum. Guarded by not_null(), so defaults aren’t overridden unintentionally.


778-778: Good fix: copy tokens instead of moving them.

Returning a copy preserves app.tokens for subsequent calls to status-sensitive APIs.

src/wllama.ts (1)

34-34: Import of LIBLLAMA_VERSION is appropriate.

This cleanly enables the public getter without altering existing initialization paths.

scripts/build_worker.sh (1)

7-16: BUILD_NUMBER computation and script robustness confirmed
Script now sets set -euo pipefail, quotes all paths, detects and deepens shallow llama.cpp clones before computing BUILD_NUMBER, and produces a stable LIBLLAMA_VERSION; regenerated file shows only the expected version bump with no additional drift.

Comment thread src/wllama.ts
@ngxson ngxson merged commit 4c9f0bb into master Aug 30, 2025
5 of 6 checks passed

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled
  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8c6c097 and a4a4e6e.

📒 Files selected for processing (4)
  • .github/workflows/verify-generated-code.yml (1 hunks)
  • package.json (1 hunks)
  • src/wasm-from-cdn.ts (1 hunks)
  • src/wllama.ts (4 hunks)
✅ Files skipped from review due to trivial changes (1)
  • package.json
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/wllama.ts
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-04-18T08:24:14.434Z
Learnt from: ngxson
PR: ngxson/wllama#0
File: :0-0
Timestamp: 2025-04-18T08:24:14.434Z
Learning: The file `generate_wasm_from_cdn.ts` in the wllama project is a generated script that gets stored on CDN, making it impossible to use `require('../package.json')` to dynamically access version information. This is why version references in this file need to be manually updated when the package version changes.

Applied to files:

  • src/wasm-from-cdn.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test (ubuntu-latest, 22)

Comment thread .github/workflows/verify-generated-code.yml
Comment thread src/wasm-from-cdn.ts
Comment on lines +5 to +6
'single-thread/wllama.wasm': 'https://cdn.jsdelivr.net/npm/@wllama/wllama@2.3.5/src/single-thread/wllama.wasm',
'multi-thread/wllama.wasm': 'https://cdn.jsdelivr.net/npm/@wllama/wllama@2.3.5/src/multi-thread/wllama.wasm',

@coderabbitai coderabbitai Bot Aug 30, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

CDN version bump to 2.3.5 looks good

Links are consistent with the package bump.

Run this quick check to ensure the CDN assets for 2.3.5 are live:


🏁 Script executed:

#!/bin/bash
set -euo pipefail
urls=(
  "https://cdn.jsdelivr.net/npm/@wllama/wllama@2.3.5/src/single-thread/wllama.wasm"
  "https://cdn.jsdelivr.net/npm/@wllama/wllama@2.3.5/src/multi-thread/wllama.wasm"
)
for u in "${urls[@]}"; do
  code=$(curl -s -o /dev/null -w '%{http_code}' -L "$u")
  echo "$code  $u"
  [[ "$code" =~ ^(200|302)$ ]] || { echo "Missing or not yet published: $u" >&2; exit 1; }
done

Length of output: 795


CDN assets for wllama@2.3.5 are not yet published
The single-thread WASM URL returns 404, so consumers will fail to load the asset. Delay merging until both single-thread/wllama.wasm and multi-thread/wllama.wasm are live on jsdelivr.

🤖 Prompt for AI Agents
In src/wasm-from-cdn.ts around lines 5-6 the entries point to wllama@2.3.5 WASM
files that are not yet published on jsdelivr (single-thread URL 404), causing
runtime failures; either revert these entries to the previous working
CDN/version or replace them with a local/bundled path until jsdelivr publishes
both artifacts, or implement a runtime availability check (HEAD/fetch) with a
clear fallback and error message so consumers don't crash; update the PR to
remove the non-working URLs and add a note to reintroduce the 2.3.5 URLs once
both WASM files are live.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be ok now

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Checking CDN availability for wllama@2.3.5..."
urls=(
  "https://cdn.jsdelivr.net/npm/@wllama/wllama@2.3.5/src/single-thread/wllama.wasm"
  "https://cdn.jsdelivr.net/npm/@wllama/wllama@2.3.5/src/multi-thread/wllama.wasm"
)

all_ok=true
for url in "${urls[@]}"; do
  echo "Checking: $url"
  code=$(curl -s -o /dev/null -w '%{http_code}' -L "$url")
  echo "  HTTP Status: $code"
  
  if [[ "$code" == "200" ]]; then
    echo "  ✅ Available"
  else
    echo "  ❌ Not available (HTTP $code)"
    all_ok=false
  fi
  echo
done

if $all_ok; then
  echo "🎉 All CDN assets are now available!"
else
  echo "⚠️  Some CDN assets are still not available"
fi

Length of output: 1323


@ngxson Perfect! ✅ Both CDN URLs are now returning HTTP 200 and the WASM assets are available:

  • single-thread/wllama.wasm ✅ Available
  • multi-thread/wllama.wasm ✅ Available

The version bump to 2.3.5 in src/wasm-from-cdn.ts is now safe to merge. Thanks for publishing the package!

    /|   /|   
   ( :v:  )
    |(_)|

🧠 Learnings used
Learnt from: ngxson
PR: ngxson/wllama#0
File: :0-0
Timestamp: 2025-04-18T08:24:14.434Z
Learning: The file `generate_wasm_from_cdn.ts` in the wllama project is a generated script that gets stored on CDN, making it impossible to use `require('../package.json')` to dynamically access version information. This is why version references in this file need to be manually updated when the package version changes.

@ngxson ngxson deleted the xsn/sync25 branch May 23, 2026 11:07
@coderabbitai coderabbitai Bot mentioned this pull request May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant