Skip to content

Bump llama.cpp to 52fb93a2b (30 commits)#42

Merged
nyo16 merged 1 commit into
masterfrom
bump-llama-cpp-52fb93a2b
May 22, 2026
Merged

Bump llama.cpp to 52fb93a2b (30 commits)#42
nyo16 merged 1 commit into
masterfrom
bump-llama-cpp-52fb93a2b

Conversation

@nyo16
Copy link
Copy Markdown
Owner

@nyo16 nyo16 commented May 21, 2026

Summary

  • Bumps vendor/llama.cpp from b28a2f37252fb93a2b (30 upstream commits).
  • No public API changes. Existing NIF and LlamaCppEx.MTP bindings continue to work unchanged.
  • README/MTP docs unchanged — Metal MTP optimization (#23114) is still unmerged upstream, and the documented LlamaCppEx.MTP API
    surface is unaffected.

Headlines

MTP / speculative

  • Backend sampling for the MTP draft path (#23287) — adds a backend_sampling field to common_params_speculative_draft (default
    true, additive). Our NIF doesn't touch this field, so behavior is unchanged.
  • Skip logit computation via inp_out_ids (#23433) — internal optimization on the MTP draft path.
  • Fix nullptr crash in common_speculative_get_devices_str (#23386).
  • Server: free draft/MTP resources on slot sleep (#23461) — fixes an upstream VRAM leak (not user-facing for our binding, but good
    hygiene if the server code is ever reused).
  • Doc typo fix (#23435).

Other notable upstream changes

  • llama: null-buffer crash fix in llm_graph_input_attn_kv_iswa on SWA-only models (#23131).
  • vocab: Carbon-3B HybridDNATokenizer support (#23410).
  • server: re-inject subcommand when the router spawns children under the unified binary (#23442).
  • app: introduce the llama unified executable (#23296); add batched-bench, fit-params, quantize, perplexity subcommands
    (#23459); show version (#23426).
  • mtmd: merge HunyuanOCR into HunyuanVL + OCR vision precision fix (#23329); DeepSeek-OCR image-processing fixes +
    img_tool::resize padding refactor (#23345); fit_params accounts for mmproj
    (#21489); WAV MIME-type variants + audio format detection (#23396).
  • ggml: correct iface-method check before 2D-get fallback (#23306).
  • metal: optimize pad + cpy (#23354).
  • CUDA: Programmatic Dependent Launch (PDL) for Hopper+ (#22522); RDNA3 Q6_K MMVQ nwarps tuning
    (#23349).
  • vulkan: optimize IM2COL shader (#22685).
  • opencl: refactor backend initialization (#23318).
  • hexagon: ssm-conv fix for large prompts (#23307); HMX quantized matmul rework
    (#23368).
  • snapdragon: toolchain v0.6 (#23369).
  • webui: max image size option (#22849); reactive isMobile in viewport store
    (#23330); pointer-events fix on hidden div wrapper (#23390); text attachments
    before message content in chat-completions payload (#23406); improved UI dev git hooks
    (#23403).
  • docker: copy conversion files (#23370).

Test plan

  • NIF rebuilds cleanly against 52fb93a2b (forced full rebuild on macOS / Metal backend).
  • mix test89 passed, 4 skipped.
  • CI green on Linux + macOS workflows.
  • Smoke-test LlamaCppEx.MTP.stream/3 against a Qwen 3.6 MTP GGUF locally (optional — internal MTP changes are additive but worth a sanity check before release).

No public API changes — NIF and LlamaCppEx.MTP bindings continue to
work unchanged. Headlines: backend sampling for MTP draft path
(#23287, additive `backend_sampling` default-on), MTP logit-skip
optimization (#23433), nullptr crash fix in common_speculative
(#23386), server slot sleep VRAM leak fix (#23461). Plus assorted
mtmd, server unified-binary, vulkan/cuda/metal kernel improvements.

See CHANGELOG.md for the full breakdown.
@nyo16 nyo16 merged commit 65950ef into master May 22, 2026
4 checks passed
@nyo16 nyo16 deleted the bump-llama-cpp-52fb93a2b branch May 22, 2026 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant