Bump llama.cpp to 52fb93a2b (30 commits) by nyo16 · Pull Request #42 · nyo16/llama_cpp_ex

nyo16 · 2026-05-21T17:43:20Z

Summary

Bumps vendor/llama.cpp from b28a2f372 → 52fb93a2b (30 upstream commits).
No public API changes. Existing NIF and LlamaCppEx.MTP bindings continue to work unchanged.
README/MTP docs unchanged — Metal MTP optimization (#23114) is still unmerged upstream, and the documented LlamaCppEx.MTP API
surface is unaffected.

Headlines

MTP / speculative

Backend sampling for the MTP draft path (#23287) — adds a backend_sampling field to common_params_speculative_draft (default
true, additive). Our NIF doesn't touch this field, so behavior is unchanged.
Skip logit computation via inp_out_ids (#23433) — internal optimization on the MTP draft path.
Fix nullptr crash in common_speculative_get_devices_str (#23386).
Server: free draft/MTP resources on slot sleep (#23461) — fixes an upstream VRAM leak (not user-facing for our binding, but good
hygiene if the server code is ever reused).
Doc typo fix (#23435).

Other notable upstream changes

llama: null-buffer crash fix in llm_graph_input_attn_kv_iswa on SWA-only models (#23131).
vocab: Carbon-3B HybridDNATokenizer support (#23410).
server: re-inject subcommand when the router spawns children under the unified binary (#23442).
app: introduce the llama unified executable (#23296); add batched-bench, fit-params, quantize, perplexity subcommands
(#23459); show version (#23426).
mtmd: merge HunyuanOCR into HunyuanVL + OCR vision precision fix (#23329); DeepSeek-OCR image-processing fixes +
img_tool::resize padding refactor (#23345); fit_params accounts for mmproj
(#21489); WAV MIME-type variants + audio format detection (#23396).
ggml: correct iface-method check before 2D-get fallback (#23306).
metal: optimize pad + cpy (#23354).
CUDA: Programmatic Dependent Launch (PDL) for Hopper+ (#22522); RDNA3 Q6_K MMVQ nwarps tuning
(#23349).
vulkan: optimize IM2COL shader (#22685).
opencl: refactor backend initialization (#23318).
hexagon: ssm-conv fix for large prompts (#23307); HMX quantized matmul rework
(#23368).
snapdragon: toolchain v0.6 (#23369).
webui: max image size option (#22849); reactive isMobile in viewport store
(#23330); pointer-events fix on hidden div wrapper (#23390); text attachments
before message content in chat-completions payload (#23406); improved UI dev git hooks
(#23403).
docker: copy conversion files (#23370).

Test plan

NIF rebuilds cleanly against 52fb93a2b (forced full rebuild on macOS / Metal backend).
mix test → 89 passed, 4 skipped.
CI green on Linux + macOS workflows.
Smoke-test LlamaCppEx.MTP.stream/3 against a Qwen 3.6 MTP GGUF locally (optional — internal MTP changes are additive but worth a sanity check before release).

No public API changes — NIF and LlamaCppEx.MTP bindings continue to work unchanged. Headlines: backend sampling for MTP draft path (#23287, additive `backend_sampling` default-on), MTP logit-skip optimization (#23433), nullptr crash fix in common_speculative (#23386), server slot sleep VRAM leak fix (#23461). Plus assorted mtmd, server unified-binary, vulkan/cuda/metal kernel improvements. See CHANGELOG.md for the full breakdown.

nyo16 merged commit 65950ef into master May 22, 2026
4 checks passed

nyo16 deleted the bump-llama-cpp-52fb93a2b branch May 22, 2026 09:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump llama.cpp to 52fb93a2b (30 commits)#42

Bump llama.cpp to 52fb93a2b (30 commits)#42
nyo16 merged 1 commit into
masterfrom
bump-llama-cpp-52fb93a2b

nyo16 commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nyo16 commented May 21, 2026

Summary

Headlines

MTP / speculative

Other notable upstream changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant