Sync with latest upstream, fix useCache, add getLibllamaVersion()#189
Conversation
Co-authored-by: khromov <khromov@users.noreply.github.com>
WalkthroughThe PR switches flash attention from a boolean to an enum in load handling, makes status reporting copy tokens instead of moving them, updates the llama.cpp submodule, embeds a libllama build/version string into generated worker code and exposes it via a new Wllama getter, and adds sentinel behavior to KV cache removal. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant Wllama
participant LlamaC as "llama.cpp C API"
rect rgb(235,245,255)
note over Client,Wllama: Load model with flash attention flag
Client->>Wllama: loadModel({ flash_attn: true|false })
Wllama->>Wllama: map boolean -> enum (AUTO or DISABLED)
Wllama->>LlamaC: create context with cparams.flash_attn_type
LlamaC-->>Wllama: context created
Wllama-->>Client: ready
end
sequenceDiagram
autonumber
participant Caller
participant Wllama
rect rgb(240,255,240)
note over Caller,Wllama: KV removal with sentinel support
Caller->>Wllama: kvRemove(nKeep, nDiscard)
alt nDiscard < 0
Wllama->>Wllama: nCachedTokens = nKeep
else nDiscard >= 0
Wllama->>Wllama: nCachedTokens -= nDiscard
end
Wllama-->>Caller: done
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/wllama.ts (2)
119-120: Consider future-proofing flash_attn config.Underlying C++ now uses an enum; keeping a boolean here is fine for AUTO/DISABLED, but you may want a string union (e.g., 'auto' | 'disabled') later to expose more modes without another breaking change.
1208-1213: Clamp nCachedTokens to avoid negative values.When nDiscard is positive, subtracting blindly can underflow in edge cases. Clamp to at least nKeep.
Apply this diff:
- if (nDiscard < 0) { - this.nCachedTokens = nKeep; - } else { - this.nCachedTokens -= nDiscard; - } + if (nDiscard < 0) { + this.nCachedTokens = nKeep; + } else { + this.nCachedTokens = Math.max(nKeep, this.nCachedTokens - nDiscard); + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (2)
src/multi-thread/wllama.wasmis excluded by!**/*.wasmsrc/single-thread/wllama.wasmis excluded by!**/*.wasm
📒 Files selected for processing (5)
cpp/actions.hpp(2 hunks)llama.cpp(1 hunks)scripts/build_worker.sh(1 hunks)src/wllama.ts(4 hunks)src/workers-code/generated.ts(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-04-18T08:24:14.434Z
Learnt from: ngxson
PR: ngxson/wllama#0
File: :0-0
Timestamp: 2025-04-18T08:24:14.434Z
Learning: The file `generate_wasm_from_cdn.ts` in the wllama project is a generated script that gets stored on CDN, making it impossible to use `require('../package.json')` to dynamically access version information. This is why version references in this file need to be manually updated when the package version changes.
Applied to files:
src/workers-code/generated.tsscripts/build_worker.sh
🧬 Code graph analysis (1)
src/wllama.ts (1)
src/workers-code/generated.ts (1)
LIBLLAMA_VERSION(4-4)
🪛 GitHub Actions: Verify generated worker code is up-to-date
src/workers-code/generated.ts
[error] 1-1: Generated code file is not up-to-date. Changes not staged for commit: 'src/workers-code/generated.ts'. Run 'npm run build:worker' to update.
🔇 Additional comments (6)
llama.cpp (1)
1-1: Submodule bump to 4d74393 validated
- LIBLLAMA_VERSION in generated.ts is “b6327-4d74393”, matching the llama.cpp gitlink (4d74393).
- C++ glue in cpp/actions.hpp converts the old flash_attn boolean to the new flash_attn_type enum.
src/workers-code/generated.ts (1)
4-4: Commit updated auto-generated worker file
The diff showsLIBLLAMA_VERSIONupdated to'b159-8c6c097'; please commit the regeneratedsrc/workers-code/generated.ts.cpp/actions.hpp (2)
205-205: Boolean → enum mapping for flash attention looks correct.Mapping true → LLAMA_FLASH_ATTN_TYPE_AUTO and false → LLAMA_FLASH_ATTN_TYPE_DISABLED aligns with llama.cpp’s enum. Guarded by not_null(), so defaults aren’t overridden unintentionally.
778-778: Good fix: copy tokens instead of moving them.Returning a copy preserves app.tokens for subsequent calls to status-sensitive APIs.
src/wllama.ts (1)
34-34: Import of LIBLLAMA_VERSION is appropriate.This cleanly enables the public getter without altering existing initialization paths.
scripts/build_worker.sh (1)
7-16: BUILD_NUMBER computation and script robustness confirmed
Script now setsset -euo pipefail, quotes all paths, detects and deepens shallow llama.cpp clones before computingBUILD_NUMBER, and produces a stableLIBLLAMA_VERSION; regenerated file shows only the expected version bump with no additional drift.
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
.github/workflows/verify-generated-code.yml(1 hunks)package.json(1 hunks)src/wasm-from-cdn.ts(1 hunks)src/wllama.ts(4 hunks)
✅ Files skipped from review due to trivial changes (1)
- package.json
🚧 Files skipped from review as they are similar to previous changes (1)
- src/wllama.ts
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-04-18T08:24:14.434Z
Learnt from: ngxson
PR: ngxson/wllama#0
File: :0-0
Timestamp: 2025-04-18T08:24:14.434Z
Learning: The file `generate_wasm_from_cdn.ts` in the wllama project is a generated script that gets stored on CDN, making it impossible to use `require('../package.json')` to dynamically access version information. This is why version references in this file need to be manually updated when the package version changes.
Applied to files:
src/wasm-from-cdn.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test (ubuntu-latest, 22)
| 'single-thread/wllama.wasm': 'https://cdn.jsdelivr.net/npm/@wllama/wllama@2.3.5/src/single-thread/wllama.wasm', | ||
| 'multi-thread/wllama.wasm': 'https://cdn.jsdelivr.net/npm/@wllama/wllama@2.3.5/src/multi-thread/wllama.wasm', |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
CDN version bump to 2.3.5 looks good
Links are consistent with the package bump.
Run this quick check to ensure the CDN assets for 2.3.5 are live:
🏁 Script executed:
#!/bin/bash
set -euo pipefail
urls=(
"https://cdn.jsdelivr.net/npm/@wllama/wllama@2.3.5/src/single-thread/wllama.wasm"
"https://cdn.jsdelivr.net/npm/@wllama/wllama@2.3.5/src/multi-thread/wllama.wasm"
)
for u in "${urls[@]}"; do
code=$(curl -s -o /dev/null -w '%{http_code}' -L "$u")
echo "$code $u"
[[ "$code" =~ ^(200|302)$ ]] || { echo "Missing or not yet published: $u" >&2; exit 1; }
doneLength of output: 795
CDN assets for wllama@2.3.5 are not yet published
The single-thread WASM URL returns 404, so consumers will fail to load the asset. Delay merging until both single-thread/wllama.wasm and multi-thread/wllama.wasm are live on jsdelivr.
🤖 Prompt for AI Agents
In src/wasm-from-cdn.ts around lines 5-6 the entries point to wllama@2.3.5 WASM
files that are not yet published on jsdelivr (single-thread URL 404), causing
runtime failures; either revert these entries to the previous working
CDN/version or replace them with a local/bundled path until jsdelivr publishes
both artifacts, or implement a runtime availability check (HEAD/fetch) with a
clear fallback and error message so consumers don't crash; update the PR to
remove the non-working URLs and add a note to reintroduce the 2.3.5 URLs once
both WASM files are live.
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Checking CDN availability for wllama@2.3.5..."
urls=(
"https://cdn.jsdelivr.net/npm/@wllama/wllama@2.3.5/src/single-thread/wllama.wasm"
"https://cdn.jsdelivr.net/npm/@wllama/wllama@2.3.5/src/multi-thread/wllama.wasm"
)
all_ok=true
for url in "${urls[@]}"; do
echo "Checking: $url"
code=$(curl -s -o /dev/null -w '%{http_code}' -L "$url")
echo " HTTP Status: $code"
if [[ "$code" == "200" ]]; then
echo " ✅ Available"
else
echo " ❌ Not available (HTTP $code)"
all_ok=false
fi
echo
done
if $all_ok; then
echo "🎉 All CDN assets are now available!"
else
echo "⚠️ Some CDN assets are still not available"
fi
Length of output: 1323
@ngxson Perfect! ✅ Both CDN URLs are now returning HTTP 200 and the WASM assets are available:
single-thread/wllama.wasm✅ Availablemulti-thread/wllama.wasm✅ Available
The version bump to 2.3.5 in src/wasm-from-cdn.ts is now safe to merge. Thanks for publishing the package!
/| /|
( :v: )
|(_)|
🧠 Learnings used
Learnt from: ngxson
PR: ngxson/wllama#0
File: :0-0
Timestamp: 2025-04-18T08:24:14.434Z
Learning: The file `generate_wasm_from_cdn.ts` in the wllama project is a generated script that gets stored on CDN, making it impossible to use `require('../package.json')` to dynamically access version information. This is why version references in this file need to be manually updated when the package version changes.
Applied the fix from #174 (thanks to @khromov)
Add
Wllama.getLibllamaVersion()to get the build number, for example:b6327-4d74393Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Chores