chore(deps): upgrade llama.rn to 0.12.4#743
Conversation
Memory profile — iPhone 13 Pro (
|
| Checkpoint | Baseline | Current | Δ | Δ % |
|---|---|---|---|---|
| app_launch | 91.8 MB | 80.8 MB | −11.1 MB | −12.0% |
| models_screen | 91.3 MB | 82.3 MB | −9.0 MB | −9.9% |
| chat_screen | 93.2 MB | 84.7 MB | −8.5 MB | −9.1% |
| model_loaded | 2148.4 MB | 2158.1 MB | +9.7 MB | +0.5% |
| chat_active | 2143.9 MB | 2145.2 MB | +1.3 MB | +0.1% |
| post_chat_idle | 2140.6 MB | 2147.7 MB | +7.1 MB | +0.3% |
| model_unloaded | 140.2 MB | 140.3 MB | +0.1 MB | +0.1% |
| Peak | 2148.4 MB | 2158.1 MB | +9.7 MB | +0.5% |
Verdict: ✅ PASS — peak +0.5% (+9.7 MB on a 2 GB base, well within threshold). UI checkpoints actually lower than baseline (−8 to −11 MB across launch/models/chat screens). The +9.7 MB at model_loaded is consistent with the framework-size increase from embedding ggml-metal source into the iOS framework binary (llama.rn #349).
Pixel 9 run pending; will follow up.
Generated by PocketPal Dev Team
Memory profile — Pixel 9 (
|
| Checkpoint | Baseline | Current | Δ | Δ % |
|---|---|---|---|---|
| app_launch | 225.6 MB | 272.3 MB | +46.8 MB | +20.7% |
| models_screen | 228.4 MB | 253.3 MB | +25.0 MB | +10.9% |
| chat_screen | 216.9 MB | 259.1 MB | +42.3 MB | +19.5% |
| model_loaded | 1732.6 MB | 1806.5 MB | +73.9 MB | +4.3% |
| chat_active | 1809.7 MB | 1860.4 MB | +50.7 MB | +2.8% |
| post_chat_idle | 1810.6 MB | 1859.9 MB | +49.2 MB | +2.7% |
| model_unloaded | 345.5 MB | 389.9 MB | +44.4 MB | +12.8% |
| Peak | 1810.6 MB | 1860.4 MB | +49.8 MB | +2.8% |
Verdict: ✅ PASS — peak +2.8% (+49.8 MB), well within the >10% AND >200 MB regression threshold. The Δ% on the low-memory UI checkpoints (launch / models / chat / unloaded) exceeds 10% but the absolute swing is ~25–47 MB on a 200–270 MB base — that's normal Android jitter at this size, not a real regression (the AND-condition gate correctly says PASS). On the loaded checkpoints (model_loaded / chat_active / post_chat_idle) where the working set is dominated by the model itself, the deltas tighten to +2.7–4.3% / +50–74 MB, consistent with the iPhone result above (+0.5% peak on iPhone). No memory regression introduced by the llama.rn 0.12.3 → 0.12.4 bump on Android.
Generated by PocketPal Dev Team
PR-743 (llama.rn 0.12.3 → 0.12.4) — bench resultsSmoke + focused matrix, 3 devices, matched-settings (cpu+hex
Coverage
vs PR-740Summary — median Δ per (device, backend)Each Δ is the median of per-cell percent changes. Absolutes aren't aggregated here because mixing model/quant cells would mix workloads; see the representative-cell table for real tok/s. smoke
focused
Representative cell —
|
| device | backend | pp PR-743 | pp PR-740 | Δpp | tg PR-743 | tg PR-740 | Δtg |
|---|---|---|---|---|---|---|---|
| poco-myron | cpu | 297.8 | 252.7 | +17.8% | 38.6 | 38.1 | +1.4% |
| poco-myron | gpu | 588.9 | 586.4 | +0.4% | 28.0 | 28.7 | -2.5% |
| poco-myron | hexagon | 890.2 | 727.2 | +22.4% | 30.9 | 31.6 | -2.3% |
| samsung-s23 | cpu | 109.7 | 112.0 | -2.1% | 17.7 | 17.6 | +0.5% |
| samsung-s23 | gpu | 253.8 | 228.6 | +11.0% | 14.3 | 12.4 | +14.9% |
| samsung-s23 | hexagon | 515.6 | 461.7 | +11.7% | 21.2 | 21.1 | +0.5% |
| poco-x7-klee | cpu | 171.2 | 165.8 | +3.2% | 21.3 | 21.9 | -2.6% |
focused
| device | backend | pp PR-743 | pp PR-740 | Δpp | tg PR-743 | tg PR-740 | Δtg |
|---|---|---|---|---|---|---|---|
| poco-myron | cpu | 205.6 | 212.1 | -3.1% | 38.7 | 38.0 | +1.8% |
| poco-myron | gpu | 540.8 | 536.7 | +0.8% | 27.1 | 27.1 | +0.3% |
| poco-myron | hexagon | 831.1 | 720.0 | +15.4% | 32.5 | 30.4 | +6.6% |
| samsung-s23 | cpu | 113.0 | 110.2 | +2.5% | 18.3 | 17.1 | +7.3% |
| samsung-s23 | gpu | 252.2 | 264.3 | -4.6% | 15.8 | 16.6 | -5.2% |
| samsung-s23 | hexagon | 487.4 | 459.9 | +6.0% | 19.9 | 20.9 | -4.7% |
| poco-x7-klee | cpu | 143.2 | 144.5 | -0.9% | 22.0 | 22.3 | -1.3% |
Key findings
-
Hex on HTP v81 (Myron) recovers the PR-740 regression. Smoke median +16.4% pp, focused median +3.4% pp; representative-cell pp jumps +22.4% smoke / +15.4% focused on
qwen3-1.7b/q4_0. PR-740's headline finding was a -14% pp walkback on Myron hex; PR-743 takes most of that back, putting Myron hex at roughly +83% pp vs the PR-713 baseline (PR-740 was at +66–77%). Natural candidates: HMX quantized matmul rework (ggml#23368) and the repl optimization in flash-attn softmax (ggml#23455) — both PocketPal-relevant Hexagon items listed in the PR body. -
S23 GPU smoke +9.1% pp / +2.9% tg, with the representative cell at +11.0% pp / +14.9% tg. Likely the OpenCL batch profiling speedup (ggml#23495) and/or backend init refactor (ggml#23318). S23 hex also +11.7% pp on the representative cell. Myron GPU stays flat — same workload but different SoC; the new Adreno MoE generalisation (ggml#23449) is a MoE-only path and our matrix has no MoE models, so no win expected there.
-
CPU on Myron is asymmetric: +13.5% smoke / -1.3% focused. This matches the thermal pattern we've documented on this device: smoke runs cool, focused runs warm (it runs second). PR-740's CPU regression was also worse on focused than smoke. Net read: CPU code itself is approximately flat vs PR-740 (the smoke uplift is largely thermal-favourable). Klee CPU and S23 CPU are flat-to-marginal across both tiers.
-
Memory:
total_mibunchanged across the board (all backends, all devices, 0.0% median). No memory regression from the bump on the Android bench, consistent with the +0.5% iOS / +2.8% Android peaks reported separately in the memory-profile comments. -
No real backend fallbacks. Every gpu cell reports
effective_backend=opencl(label-only rename, same Adreno path) — identical pattern to PR-728 and PR-740. No silent CPU fallbacks anywhere.
Caveats
- Thermal: Myron CPU smoke +13.5% is partly thermal (cool device); the focused -1.3% is the steady-state read. Take CPU deltas with that in mind.
- Coverage gaps: S23 lost 18 focused cells (gpu died at phi-4-mini/q4_0 cell 13/20 = same crash mode as PR-740; hex died at phi-4-mini/q6_k cell 30/40 — new this PR but the matched-vs-PR-740 hex count is unchanged at 14, so the deltas above are computed on the same cell set). Klee lost 5 focused cells (phi-4-mini/q6_k crash + all 4 gemma-4-e2b cells expected to OOM at 7.5 GiB RAM).
- MTP / multimodal not exercised: PR-743 includes several MTP refinements and mtmd changes (DeepSeek-OCR, HunyuanOCR→HunyuanVL merge, WAV MIME). None are measured by this single-prompt non-speculative text bench.
- Apple-only items not measured here: the iOS-side packaging change (ggml-metal embedded in framework) and the Metal concat kernel optimization were exercised via the iOS memory-profile run (separate comment above, peak +0.5%).
total_mibis fromlog_signals.memory_buffers(per AGENTS.md §7).peak_memory_mbis omitted from the summary tables because it's a noisy process-RSS sample.
Recommendation
✅ Safe to merge from a perf standpoint. PR-743 is a clear net win on Hexagon (Myron especially) and S23 GPU, flat-to-slightly-positive everywhere else, with zero memory regression. The Hexagon perf walkback that PR-740 introduced on HTP v81 is mostly recovered here. No new crash modes vs PR-740 (the S23 hex crash at phi-4-mini/q6_k is new this PR but doesn't affect comparable cells — worth a follow-up if it reproduces on subsequent runs, but not a blocker for this PR).
Reports on bench host
~/bench-bundle/bench-results/pr-743/reports/SUMMARY.md~/bench-bundle/bench-results/pr-743/reports/divergence-vs-pr-740.md(full per-cell tables vs PR-740)- Raw per-backend reports under
~/bench-bundle/bench-results/pr-743/reports/poco-myron-{smoke,focused}-{on,off}.jsonetc.
Summary
Bumps
llama.rn0.12.3 → 0.12.4. Dependency-only upgrade (package.json + lockfiles). Native iOS + Android builds pass; targeted Jest suites green. Memory profile re-verification on iPhone 13 Pro + Pixel 9 is PENDING (must be run by human via memory-profile skill on physical devices before merge — see Verification below).Effective llama.cpp range covered by this upgrade: b9254 → b9309 (55 commits) plus one llama.rn-side packaging change.
Changes
package.json—"llama.rn": "0.12.3"→"llama.rn": "0.12.4"yarn.lock— regenerated (llama.rn block only, 4+/4-)ios/Podfile.lock—llama-rn (0.12.3)→llama-rn (0.12.4), checksum2bb735f3…→94584e50…3 files, 7 insertions / 7 deletions. No consumer code touched; the llama.rn Jest mock surface is version-agnostic.
llama.cpp / llama.rn changelog (PocketPal-relevant)
Scoped to items that touch surfaces PocketPal actually ships. Dropped: server, CUDA, SYCL, Vulkan, WebGPU, ZenDNN, web UI, cmake/ci/docs, perplexity overflow fixes.
Hexagon NPU (Snapdragon)
OpenCL / Adreno
Metal (Apple)
Correctness / crash fixes
get_devices_str(common/speculative : fix nullptr crash in get_devices_str ggml-org/llama.cpp#23386)llm_graph_input_attn_kv_iswafor SWA-only models — Gemma-3 stability (llama-graph: fix null-buffer crash in llm_graph_input_attn_kv_iswa for SWA-only models ggml-org/llama.cpp#23131)kv_typeandq_type(quantized-cache correctness) (ggml-webgpu: replace f32 with kv_type and q_type ggml-org/llama.cpp#23372)Speculative decoding (MTP) — refinements
inp_out_idsto skip logit computation (mtp: use inp_out_ids for skipping logit computation ggml-org/llama.cpp#23433)Multimodal
img_tool::resizepadding refactor (mtmd : DeepSeek-OCR image processing fixes, img_tool::resize padding refactor ggml-org/llama.cpp#23345)Vocab / tokenizer
llama.rn sync points
.metalsource now ships inside the framework binary. Transparent to PocketPal's build (verified during iOS Release build below).Verification
yarn installclean —yarn.lockchange scoped to llama.rn block (4+/4-)pod installclean —Podfile.lockchange scoped to llama-rn pod + checksumyarn ios:build:releasesucceeds (~171s, Build Succeeded,PocketPal.appproduced; no new ggml-metal/metallib warnings — llama.rn#349 packaging change transparent)yarn build:android:releasesucceeds (~3m 58s, BUILD SUCCESSFUL,app-prod-release.aab~100 MB produced)Draft until both memory-profile runs complete and report PASS (regression threshold: >10% AND >200 MB, per
e2e/scripts/memory-profile.shconvention).Risk
Dependency-only; mock surface (loadLlamaModelInfo, LlamaContext, completion, bench, getFormattedChat, initMultimodal) is unchanged. No wrapper or version-string edits required — confirms the
quickclassification held. Pattern follows prior llama.rn upgrade PRs: #722 (0.12.0 stable), #728 (0.12.1), #740 (0.12.3).Generated by PocketPal Dev Team