Bump llama.cpp to 52fb93a2b (30 commits)#42
Merged
Conversation
No public API changes — NIF and LlamaCppEx.MTP bindings continue to work unchanged. Headlines: backend sampling for MTP draft path (#23287, additive `backend_sampling` default-on), MTP logit-skip optimization (#23433), nullptr crash fix in common_speculative (#23386), server slot sleep VRAM leak fix (#23461). Plus assorted mtmd, server unified-binary, vulkan/cuda/metal kernel improvements. See CHANGELOG.md for the full breakdown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
vendor/llama.cppfromb28a2f372→52fb93a2b(30 upstream commits).LlamaCppEx.MTPbindings continue to work unchanged.LlamaCppEx.MTPAPIsurface is unaffected.
Headlines
MTP / speculative
backend_samplingfield tocommon_params_speculative_draft(defaulttrue, additive). Our NIF doesn't touch this field, so behavior is unchanged.inp_out_ids(#23433) — internal optimization on the MTP draft path.nullptrcrash incommon_speculative_get_devices_str(#23386).hygiene if the server code is ever reused).
Other notable upstream changes
llm_graph_input_attn_kv_iswaon SWA-only models (#23131).HybridDNATokenizersupport (#23410).llamaunified executable (#23296); addbatched-bench,fit-params,quantize,perplexitysubcommands(#23459); show version (#23426).
img_tool::resizepadding refactor (#23345);fit_paramsaccounts formmproj(#21489); WAV MIME-type variants + audio format detection (#23396).
pad+cpy(#23354).(#23349).
IM2COLshader (#22685).ssm-convfix for large prompts (#23307); HMX quantized matmul rework(#23368).
isMobilein viewport store(#23330); pointer-events fix on hidden div wrapper (#23390); text attachments
before message content in chat-completions payload (#23406); improved UI dev git hooks
(#23403).
Test plan
52fb93a2b(forced full rebuild on macOS / Metal backend).mix test→ 89 passed, 4 skipped.LlamaCppEx.MTP.stream/3against a Qwen 3.6 MTP GGUF locally (optional — internal MTP changes are additive but worth a sanity check before release).