Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
327 commits
Select commit Hold shift + click to select a range
ceaf47c
fix: rpc-server cache may not work in Windows environments (#22394)
unraido Apr 27, 2026
4414c04
Additional test for common/gemma4 : handle parsing edge cases (#22420)
hextriclosan Apr 27, 2026
665abc6
add fast mat-vec kernels for i-quants (#22344)
SharmaRithik Apr 27, 2026
983ca89
server: (router) Forward form-data to model server (Fixes #22044) (#2…
tha80 Apr 27, 2026
434b2a1
ggml-webgpu: add Q1_0 support (#22374)
SharmaRithik Apr 27, 2026
516e8d7
server: use pos_next instead of n_tokens for m-rope (#22439)
am17an Apr 28, 2026
14e733e
spec : refactor params (#22397)
ggerganov Apr 28, 2026
c3e08f4
CANN: add new ops, optimize existing ops (#21204)
hipudding Apr 28, 2026
d530d6e
ggml : revert to -lm linking instead of find_library (#22355)
angt Apr 28, 2026
50494a2
ggml : skip already registered backends and devices (#22296)
angt Apr 28, 2026
698d19b
ggml: improve SPIR-V headers detection with __has_include (#21918)
EmilAskerov Apr 28, 2026
1982117
vulkan: add barrier after writetimestamp (#21865)
jeffbolznv Apr 28, 2026
f42e29f
webui: Server tools (#21237)
allozaur Apr 28, 2026
98bb579
ggml-webgpu: fix buffer aliasing for ssm_scan and refactor aliasing l…
reeselevine Apr 28, 2026
f9f3365
vulkan: Coalesce Q4_K/Q5_K scale loads (#21751)
TheBlueMatt Apr 28, 2026
52e5f0a
common : re-arm reasoning budget after DONE on new <think> (#22323)
BruceJillis Apr 28, 2026
5d56eff
convert : add support for Nemotron Nano 3 Omni (#22481)
danbev Apr 28, 2026
7b8443a
ggml-cuda: add flash-attn support for DKQ=320/DV=256 with ncols2=32 (…
lnigam Apr 28, 2026
fc2b005
ggml-cuda: Repost of 21896: Blackwell native NVFP4 support (#22196)
michaelw9999 Apr 28, 2026
739393b
TP: fix delayed AllReduce + zero-sized slices (#22489)
JohannesGaessler Apr 29, 2026
bdc9c74
ggml : add sve tuned code for gemm_q8_0_4x8_q8_0() kernel (#21916)
hrushitfujitsu Apr 29, 2026
7b95ea5
common: Intentionally leak logger instance to fix hanging on Windows …
rillomas Apr 29, 2026
d6a5094
ggml-webgpu: Fix bug in FlashAttention support check (#22492)
reeselevine Apr 29, 2026
b5c4227
ggml-cpu: cmake: append xsmtvdotii march for SpacemiT IME (#22317)
qiurui144 Apr 29, 2026
3142f1d
ggml-cuda: refactor fusion code (#22468)
am17an Apr 29, 2026
1cbc846
ggml-cpu : disable tiled matmul on AIX to fix page boundary segfault …
shalinib-ibm Apr 29, 2026
59237bf
webui: fix slow mic stop and WAV encode (#22480)
ServeurpersoCom Apr 29, 2026
4b221b7
ggml : bump version to 0.10.1 (ggml/1469)
ggerganov Apr 29, 2026
b1d5f5b
sync : ggml
ggerganov Apr 29, 2026
683c5ac
spec : disacard last drafted token with low prob (#22506)
ggerganov Apr 29, 2026
098705a
CUDA: fuse SSM_CONV + ADD(bias) + SILU (#22478)
anavp-nvidia Apr 29, 2026
41a63be
hexagon: make vmem and buffer-size configurable (#22487)
max-krasnyansky Apr 29, 2026
d775992
common : do not pass prompt tokens to reasoning budget sampler (#22488)
aldehir Apr 29, 2026
b42c7fa
spec : fix vocab compat checks in spec example (#22426)
petersid2022 Apr 30, 2026
80afa33
spec : fix draft model checkpoints (#22521)
ggerganov Apr 30, 2026
4515559
add fast matmul iquants (#22504)
SharmaRithik Apr 30, 2026
27aef3d
scripts : add wc2wt.sh - create worktree from current HEAD (#22513)
ggerganov Apr 30, 2026
e82aaf2
CUDA: fix tile FA kernel on Pascal (#22541)
JohannesGaessler Apr 30, 2026
5f0ab72
vendor : update cpp-httplib to 0.43.2 (#22548)
angt Apr 30, 2026
6118c04
ci : bump ty to 0.0.33 (#22535)
CISC Apr 30, 2026
c20c445
spec: fix argument typo (#22552)
barnjamin Apr 30, 2026
660b1b4
vulkan: add get/set tensor 2d functions (#22514)
0cc4m Apr 30, 2026
beb42ff
common : check for null getpwuid in hf-cache (#22550)
angt Apr 30, 2026
5cbfb18
Update llama-mmap to use ftello/fseeko (#22497)
reeselevine Apr 30, 2026
a95a11e
ggml-webgpu: Improve performance of mat-vec and mat-mat for MUL_MAT_I…
yomaytk Apr 30, 2026
aab6821
ggml-webgpu: add the upscale shader (#22419)
Constannnnnt May 1, 2026
05e141a
vulkan: Support asymmetric FA in coopmat2 path (#21753)
jeffbolznv May 1, 2026
c3c1505
ggml-webgpu: Fix vectorized handling in mul-mat and mul-mat-id (#22578)
yomaytk May 1, 2026
ab6120c
webui: Spring Cleaning Refactor v1 (#22505)
allozaur May 1, 2026
2098fd6
hexagon: enable non-contiguous row tensor support for unary ops (#22574)
aparmp-quic May 1, 2026
b97ebdc
llama-quant : fix `--tensor-type` when default `qtype` is overriden (…
ddh0 May 1, 2026
1a03cf4
hexagon: hmx flash attention (#22347)
njsyw1997 May 2, 2026
e8ec7ab
ggml : try fix win32 build (whisper/0)
ggerganov May 1, 2026
457e228
sync : ggml
ggerganov May 1, 2026
ed23489
ggml : bump version to 0.10.2 (ggml/1474)
ggerganov May 2, 2026
228e836
sync : ggml
ggerganov May 2, 2026
9dbb372
Github: update issue templates (#22594)
JohannesGaessler May 2, 2026
c5a3bc3
opencl: Adreno optimization for MoE - MxFP4 (#22301)
shawngu-quic May 2, 2026
63d93d1
convert : disable uint types (#18908)
csabakecskemeti May 2, 2026
0929436
ggml-virtgpu: fix circular dependency in headers (#22557)
Juste-Leo2 May 2, 2026
0754b7b
server : avoid checkpoint data host copies (#22558)
ggerganov May 2, 2026
d05fe1d
fix: CUDA device PCI bus ID de-dupe OOMing (ignoring other 3 gpus ent…
lucyknada May 2, 2026
db44417
convert : apply Q/K RoPE permutation in NVFP4 repack path (#22611)
jmrobles May 3, 2026
048a490
convert : Mistral format yarn apply_scale support (#22612)
juliendenize May 3, 2026
e48034d
common : determine generation prompt using longest common prefix (#22…
aldehir May 3, 2026
d4b0c22
ggml-webgpu: add layer norm ops (#22406)
Constannnnnt May 4, 2026
6dcd824
vulkan: delete dead GGML_VK_MAX_NODES def (#22621)
Atomic-Germ May 4, 2026
846262d
docs : update speculative decoding parameters after refactor (#22397)…
ggerganov May 4, 2026
fa8feae
webui: restore missing settings (#22666)
ntowle May 4, 2026
c84e6d6
server: Add a simple get_datetime server tool (#22649)
eapache May 4, 2026
994118a
model: move `load_hparams` and `load_tensors` to per-model definition…
ngxson May 4, 2026
a4701c9
common/autoparser: fixes for newline handling / forced tool calls (#2…
pwilkin May 4, 2026
36a694c
webui : fix circular dependency between chat.service.ts and models.sv…
Juste-Leo2 May 4, 2026
d8794ee
examples: refactor diffusion generation (#22590)
Sailaukan May 4, 2026
935a340
server: implement /models?reload=1 (#21848)
ngxson May 4, 2026
e77056f
CUDA: use fastdiv for batch index split in get_rows (#22650)
leonardHONG May 4, 2026
eff0670
kleidiai : update to v1.24.0 and use release archive (#22549)
chaxu01 May 4, 2026
a817a22
ggml : implement fast walsh-hadamard transform for kv rotation (#2135…
AlrIsmail May 5, 2026
fa59546
graph : handle non-contiguous Q/K/V in mul_mat_aux (#22630)
CISC May 5, 2026
d6e7b03
llama : add option to save memory in device buffers (#22679)
ggerganov May 5, 2026
2bacb1e
server : validate --tools CLI argument against known tool names (#22538)
ggerganov May 5, 2026
a09a00e
vendor : update cpp-httplib to 0.43.3 (#22686)
cabelo May 5, 2026
bf76ac7
common : only load backends when required (#22290)
angt May 5, 2026
c91faf9
ggml : bump version to 0.11.0 (ggml/1478)
ggerganov May 5, 2026
70a8309
sync : ggml
ggerganov May 5, 2026
2635ac7
common : fix missing-noreturn warnings when compiling with clang 21 (…
angt May 5, 2026
d5003b6
rpc : use graph uid instead of graph cache (#22701)
rgerganov May 5, 2026
ff806a1
opencl: refactor Adreno q4_0 (#22335)
lhez May 5, 2026
bbeb89d
Hexagon: Process M-tail rows on HMX instead of HVX (#22724)
trivikram-reddy1 May 5, 2026
2ca1161
ggml : use `CL_DEVICE_GLOBAL_MEM_SIZE` as memory estimate for OpenCL …
fl0rianr May 6, 2026
74d6248
convert : add filter_tensors method to pre-filter tensors (#22597)
CISC May 6, 2026
07eaf91
add tabindex and aria-hidden (#22699)
vignesh191 May 6, 2026
f08f20a
ggml-cpu: fuse RMS_NORM + MUL on CPU backend (#22423)
zzzzwc May 6, 2026
e3e3f8e
webui: Remove Google Favicons & Improve MCP Information logic & UI (#…
allozaur May 6, 2026
a736e6c
convert : ignore non-language tensors for Gemma4Model (#22753)
danbev May 6, 2026
7501419
feat: migrate to PEP 621 and add uv support (#21907)
dhdaines May 6, 2026
a00e47e
mtmd: add granite-speech support (ibm-granite/granite-4.0-1b-speech) …
ReinforcedKnowledge May 6, 2026
a290ce6
gguf-py : bump version to 0.19.0 (#22664)
ggerganov May 6, 2026
a010122
common: do not fit to unknown device memory (#22614)
fl0rianr May 6, 2026
5207d12
model : don't crash on unsupported architecture (#22742)
giladgd May 6, 2026
2496f9c
mtmd : support MiniCPM-V 4.6 (#22529)
tc-mb May 6, 2026
3980e04
llama : add missing call to ggml_backend_load_all() (#22752)
angt May 7, 2026
cfff1fc
sycl : fix test script (#22737)
dogunbound May 7, 2026
e358d75
webui: fix flicker issue on dismiss animation on overlay primitives (…
vignesh191 May 7, 2026
97f06e9
codeowners : add ZenDNN backend codeowner (#22772)
z-vishal May 7, 2026
f4b5a2e
webui: fix ?model= URL param race in router mode (#22771)
ServeurpersoCom May 7, 2026
8e52631
model: Add Mimo v2.5 model support (#22493)
AesSedai May 7, 2026
cc97e45
mtmd: fix whisper audio tail truncation by exposing padded buffer to …
ServeurpersoCom May 7, 2026
68380ae
ggml-cpu: Optimized risc-v cpu q1_0 dot
pl752 May 7, 2026
803627f
llama : remove unnecessary seq_id check during state restore (#22797)
ggerganov May 7, 2026
b9afc19
Write a readme on Multi-GPU usage in llama.cpp (#22729)
gaugarg-nv May 7, 2026
ad09224
sycl: add FILL, CUMSUM, DIAG, SOLVE_TRI, SSM_SCAN, GATED_DELTA_NET (#…
aicss-genai May 7, 2026
deab41e
tests: add long-sequence cases and fix inputs for gated_delta_net (#2…
Neroued May 7, 2026
093be62
common/chat : preserve media markers for typed-content templates (#22…
aldehir May 7, 2026
ceb7e14
opencl: add opfilter regex for debugging (#22782)
shaofeiqi May 7, 2026
e43431b
llama : fix device state save/load (#22805)
ggerganov May 7, 2026
aaf4a4d
webui: add option for LLM title generation (#22265)
smugman-dot May 7, 2026
05ff59c
CUDA: batch out_prod inner loop with cublasSgemmStridedBatched (#22651)
leonardHONG May 7, 2026
44dbe8c
model: Support sarashina2.2-vision-3b model (#22103)
samuraieng May 7, 2026
6a2a251
fix script error (#22795sycl : )
arthw May 8, 2026
1d72d87
convert : fix RuntimeError when stripping FP8 KV-cache scales (#22818)
pich May 8, 2026
f3e8d14
opencl: add q4_0 MoE GEMM for Adreno (#22731)
shawngu-quic May 8, 2026
3e941b8
ggml: update SCHED_DEBUG output to use ggml_op_desc() (#22825)
max-krasnyansky May 8, 2026
6d57a49
vulkan: fix spv shadowing (#22760)
miyanyan May 8, 2026
a8fd165
CUDA: lower-case PCI bus id, standardize for ggml (#22820)
JohannesGaessler May 8, 2026
9b2925e
webui: Add Import/Export of Settings configuration + improve architec…
allozaur May 8, 2026
58e68df
cuda: fuse snake activation (mul, sin, sqr, mul, add) (#22667)
ServeurpersoCom May 8, 2026
9dcf835
server: (router) expose child model info from router's /v1/models (#2…
ngxson May 8, 2026
29debb3
server: support Vertex AI compatible API (#22545)
ngxson May 8, 2026
5d6f18a
webui: fix LLM title generation for agentic conversations (#22840)
smugman-dot May 8, 2026
f9cd456
common : revert reasoning budget +inf logit bias (#22740)
aldehir May 8, 2026
9f5f0e6
model : support Gemma4_26B_A4B_NVFP4 (#22804)
ynankani May 8, 2026
4995604
common : do not wrap raw strings in schema parser for tagged parsers …
aldehir May 8, 2026
b46812d
Feature hexagon l2 norm (#22816)
pdhinaka May 8, 2026
c5703e0
sycl: support non-contiguous input in PAD op (#22148)
aicss-genai May 9, 2026
6600172
hexagon: add HTP kernel for GGML_OP_GATED_DELTA_NET (#22837)
wyanzhao May 9, 2026
046e284
Add flash attention MMA / Tiles to support MiMo-V2.5 (#22812)
AesSedai May 9, 2026
4a4f819
sycl: Battlemage AOT build via spir64_gen + MMQ subgroup annotations …
aicss-genai May 9, 2026
6048993
sycl: Q5_K reorder MMVQ/dequant + Q8_0 reorder MMVQ path (#22152)
aicss-genai May 9, 2026
fd89556
[SYCL] Add BF16 support to GET_ROWS operation (#21391)
devedse May 9, 2026
e20b839
SYCL: reduce allocation overhead during flash attention (#22732)
sanmai May 9, 2026
5757c4d
cmake : update BoringSSL to 0.20260508.0 (#22839)
cabelo May 9, 2026
00d56b1
docker : upgraded the default intel compute-runtime version (#22567)
WizardlyBump17 May 9, 2026
65d7a8b
devops : updated Nix systems (#22869)
yuannan May 9, 2026
1e5ad35
model : add sarvam_moe architecture support (#20275)
sumitchatterjee13 May 9, 2026
5755a10
model : fix model type check for granite/llama3 and deepseek2/glm4.7 …
CISC May 10, 2026
f3c3e0e
internal AllReduce kernel for CUDA provider (#22299)
scutler-nv May 10, 2026
efbada9
ggml : bump version to 0.11.1 (ggml/1484)
ggerganov May 10, 2026
0b04728
sync : ggml
ggerganov May 10, 2026
2b2babd
ggml-virtgpu : include missing mutex header (#22810)
olliewalsh May 10, 2026
5d5d2e1
vendor : update cpp-httplib to 0.43.4 (#22888)
cabelo May 10, 2026
2e97c5f
backend sampling: support returning post-sampling probs (#22622)
TimNN May 10, 2026
389ff61
server : print warning when HTTP timeout exceeded (#22907)
ggerganov May 10, 2026
7d442ab
[SYCL] Add OP im2col_3d (#22903)
arthw May 11, 2026
8383743
vendor : update cpp-httplib to 0.44.0 (#22919)
cabelo May 11, 2026
f5636f8
convert : add image break token fallback (#22914)
danbev May 11, 2026
8cef820
CUDA: directly include cuda/iterator (#22936)
ORippler May 11, 2026
dd9280a
vulkan: Support asymmetric FA in scalar/mmq/coopmat1 paths (#22589)
jeffbolznv May 11, 2026
7dbb0e9
examples : update args speculative-simple README.md [no ci] (#22938)
danbev May 11, 2026
928b486
ggml-virtgpu: Add a GHA build check (#22943)
kpouget May 11, 2026
68e7ea3
spec : parallel drafting support (#22838)
ggerganov May 11, 2026
ef22b3e
docs: fix metrics endpoint description in server README (#22879)
willjoha May 11, 2026
e936660
Ggml/cuda snake fusion hardening (#22912)
ServeurpersoCom May 11, 2026
8e1f9d0
CUDA: handle OW > 65535 in im2col (2D and 3D) (#22944)
CrispStrobe May 11, 2026
1ec7ba0
opencl: add q4_1 MoE for Adreno (#22856)
shawngu-quic May 11, 2026
da44953
metal : promote mul_mv/mul_mm batch divisors to function constants (#…
guyfischman May 12, 2026
78fbbc2
convert : add split() to LoraTorchTensor in LoRA converter (#22832)
jesus-talavera-ibm May 12, 2026
4178259
mtmd: add MiMo v2.5 vision (#22883)
AesSedai May 12, 2026
fa62042
ci : bump ty to 0.0.35 (#22961)
CISC May 12, 2026
706fbd8
vulkan: Check shared memory size for mmq shaders (#22693)
jeffbolznv May 12, 2026
ef93e98
vulkan: Fix Windows performance regression on Intel GPU BF16 workload…
rillomas May 12, 2026
fde69a3
examples : add llama-eval (#21152)
ggerganov May 12, 2026
89730c8
model-conversion : add causal-convert-mmproj target [no ci] (#22969)
danbev May 12, 2026
239a497
ggml-webgpu: address precision issues for multimodal (#22808)
Constannnnnt May 12, 2026
927dada
ggml-webgpu: Enables running gpt-oss-20b (#22906)
yomaytk May 12, 2026
7bfe120
mtmd, server, common: expose modalities to /v1/models (#22952)
ngxson May 12, 2026
dded58b
webui: Fix Chat Screen Form box disappearing + autoscroll issues on W…
allozaur May 12, 2026
cce09f0
convert : fix Pixtral 12B --mistral-format conversion (3 bugs) (#22981)
fredzillman May 12, 2026
a9883db
opencl: add opt-in Adreno xmem F16xF32 GEMM for prefill (#22755)
happyyzy May 12, 2026
856c3ad
hexagon: eliminate scalar VTCM loads via HVX splat helpers (#22993)
trivikram-reddy1 May 13, 2026
61af07c
ggml-zendnn : adaptive fallback to CPU backend for small batch sizes …
z-sachin May 13, 2026
bcfe63f
llama-eval : enable type check (#22988)
CISC May 13, 2026
634275f
spec : update CLI arguments for better consistency (#22964)
ggerganov May 13, 2026
3796c94
ci: validate model naming convention (#22680)
ngxson May 13, 2026
5d44db6
server, webui: support continue generation on reasoning models (#22727)
ServeurpersoCom May 13, 2026
e75cd5e
download: do not exit() on error (#23008)
ngxson May 13, 2026
ad96bb8
hexagon: add unary tanh op (#22999)
max-krasnyansky May 13, 2026
7e16646
docs : Update OPENVINO.md (#22959)
ravi9 May 13, 2026
46be24d
webui: preserve system message on edit cancel (#22911)
ServeurpersoCom May 13, 2026
2dfeca3
webui: Deduplicate model aliases in data + handle single/multiple ali…
allozaur May 13, 2026
527045b
flush the gpu profile timestamp before the queryset is overflowed (#2…
yomaytk May 13, 2026
1e4579f
opencl: fix crash when warming up MoE on Adreno (#22876)
lhez May 13, 2026
95d469a
server, webui: accept continue_final_message flag for vLLM API compat…
ServeurpersoCom May 13, 2026
ec562eb
opencl: add q5_0 and q5_1 MoE for Adreno (#22985)
shaofeiqi May 13, 2026
7f3f843
Fix for issue #22974. Cast intermediate results to float before addin…
scutler-nv May 13, 2026
4c1c3ac
ggml-webgpu: only use subgroup-matrix path when head dims are divisib…
ArberSephirotheca May 13, 2026
9ed6e19
SYCL: fix multi-GPU system RAM exhaustion by using Level Zero allocat…
PMZFX May 14, 2026
320a6a4
fix: Autoscroll detection (#23026)
allozaur May 14, 2026
dbe7901
vulkan: fix matmul integer pipeline selection (#23005)
0cc4m May 14, 2026
42532af
unicode,test: add Qwen3.5 non-backtracking tokenizer handler and regr…
Kabir08 May 14, 2026
0f45f1a
docker : revert stable version of intel compute-runtime (#22968)
arthw May 14, 2026
81b0d88
ggml-cpu: Add IME2 Instruction Support for the SpacemiT Backend (#22863)
alex-spacemit May 14, 2026
67b2b7f
logs : reduce (#23021)
ggerganov May 14, 2026
253ba11
webui: Move static build output from repo code to HF Bucket (#22937)
allozaur May 14, 2026
97b658c
contributing: new contributors should not submit trivial fixes (#23045)
am17an May 14, 2026
0c3e4fc
fix: Propagate version tag to WebUI asset download in self-hosted CI …
allozaur May 14, 2026
5ec717d
ggml-webgpu: makes the flash attn vec path subgroup-aware (#23040)
ArberSephirotheca May 14, 2026
834a243
ggml-webgpu: Enable NVIDIA self-hosted CI (#22976)
reeselevine May 14, 2026
d81e63d
CI : support IOT device (IQ9) (#22987)
zhiyuan8 May 14, 2026
3e037f3
HIP: RDNA3 mma FA, faster AMD transpose, tune AMD (#22880)
JohannesGaessler May 14, 2026
5c0e946
ggml-hexagon: cpy: add contiguous fast-path in reshape copy (#23076)
pdhinaka May 14, 2026
7155a49
readme : update bindings (#23063)
KitaitiMakoto May 15, 2026
91e84fe
Support for Codex CLI by skipping unsupported Responses tools (#23041)
SidShaytay May 15, 2026
d528444
webui: preserve partial response on streaming error (#23090)
ServeurpersoCom May 15, 2026
ac33f03
reasoning-budget: clone should do a deep-copy (#23095)
am17an May 15, 2026
d5dc2e0
llama-eval : add AIME 2026 dataset support (#23058)
ggerganov May 15, 2026
769cc93
ci : fix transform of top . entry in release archive (#23080)
CISC May 15, 2026
cc7200b
Refactor: convert_hf_to_gguf.py (#17114)
pwilkin May 15, 2026
18d1717
convert : fix Qwen3 ASR conversion (#23081)
CISC May 15, 2026
8be1786
webui: fix theme from --webui-config-file not applied on first load (…
ServeurpersoCom May 15, 2026
72e60f5
mtmd: add chunks and fix preproc for qwen3a (#23073)
ngxson May 15, 2026
6831fe4
docs: document `usage` object in server timings response (#23110)
julien-c May 15, 2026
cfabeb1
tests: add BF16 non-contig coverage for MUL_MAT permutations (#22689)
ServeurpersoCom May 15, 2026
1348f67
webui: Use lowercase hash for HF checksum check (#23107)
ozars May 15, 2026
49d1701
ci : fix release symlinks (#23119)
CISC May 15, 2026
59778f0
ui: Restructure repo to use `tools/ui` folder and `ui` / `UI` / `llam…
allozaur May 16, 2026
42928bc
model : NvFP4 quantized LM head support (#23046)
ynankani May 16, 2026
1d9f99a
fix: Add build step using build workflow to publish workflow (#23134)
allozaur May 16, 2026
366c5e2
ui: untrack settings sync in props effect to prevent reactive loop (#…
ServeurpersoCom May 16, 2026
1428004
webui : [ChatFormActionAdd][a11y] fix accessibility issues in add men…
vignesh191 May 16, 2026
b81c2cd
ui: Fix handling of MCP resource template parameters (#23117)
kubawoo May 16, 2026
2555826
llama + spec: MTP Support (#22673)
am17an May 16, 2026
18675b6
vendor : update cpp-httplib to 0.45.0 (#23103)
cabelo May 16, 2026
25b1bc9
ui: Correct links in `tools/ui/README.md` [no ci] (#23139)
howlger May 16, 2026
2eb3e6b
ggml: install ggml.pc in <libdir>/pkgconfig (ggml/1480)
robUx4 May 10, 2026
560445b
metal : tighten input-position loop in kernel_conv_transpose_1d (ggml…
CrispStrobe May 10, 2026
e6c37a1
ggml : bump version to 0.12.0 (ggml/1494)
ggerganov May 16, 2026
3a92bc9
sync : ggml
ggerganov May 16, 2026
0253fb2
ui: Add request timeout for MCP tool calls (#23138)
allozaur May 16, 2026
6049906
vulkan: removed duplicate #include <memory> in headers (#23144)
winstonma May 16, 2026
64b38b5
server: skip device enumeration in router mode to avoid creating CUDA…
ServeurpersoCom May 16, 2026
b64739e
server: (router) alloc tmp buffer on heap (#23159)
ngxson May 16, 2026
4f13cb7
webui: support video files as input (#22830)
foldl May 17, 2026
e30bbcf
merge: upstream/master @ 4f13cb742 (b9190) into feature/turboquant-kv…
TheTom May 17, 2026
7f23aba
fix(metal): set ne12/ne13/r2/r3 function constants in mul_mm_tq_rotat…
TheTom May 17, 2026
bf590c7
fix(turbo-quant): add forward declaration for turbo_cpu_fwht_inverse
TheTom May 17, 2026
2191d70
fix(ggml-cuda): HIP nodiscard + MUSA cudaMemcpyToSymbol alias
TheTom May 17, 2026
4e35153
cmake : fix LLAMA_BUILD_UI logic (#23190)
aldehir May 17, 2026
c654c4c
fix(ci): 4 cross-vendor -Werror failures + defensive xxd.cmake
TheTom May 17, 2026
eef2db4
fix(xxd.cmake): handle missing input file (not just empty)
TheTom May 17, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
22 changes: 14 additions & 8 deletions .devops/intel.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,12 +1,19 @@
ARG ONEAPI_VERSION=2025.3.2-0-devel-ubuntu24.04
ARG ONEAPI_VERSION=2025.3.3-0-devel-ubuntu24.04

## Build Image

FROM intel/deep-learning-essentials:$ONEAPI_VERSION AS build

ARG GGML_SYCL_F16=OFF
ARG LEVEL_ZERO_VERSION=1.28.2
ARG LEVEL_ZERO_UBUNTU_VERSION=u24.04
RUN apt-get update && \
apt-get install -y git libssl-dev
apt-get install -y git libssl-dev wget ca-certificates && \
cd /tmp && \
wget -q "https://github.com/oneapi-src/level-zero/releases/download/v${LEVEL_ZERO_VERSION}/level-zero_${LEVEL_ZERO_VERSION}%2B${LEVEL_ZERO_UBUNTU_VERSION}_amd64.deb" -O level-zero.deb && \
wget -q "https://github.com/oneapi-src/level-zero/releases/download/v${LEVEL_ZERO_VERSION}/level-zero-devel_${LEVEL_ZERO_VERSION}%2B${LEVEL_ZERO_UBUNTU_VERSION}_amd64.deb" -O level-zero-devel.deb && \
apt-get -o Dpkg::Options::="--force-overwrite" install -y ./level-zero.deb ./level-zero-devel.deb && \
rm -f /tmp/level-zero.deb /tmp/level-zero-devel.deb

WORKDIR /app

Expand All @@ -33,11 +40,11 @@ RUN mkdir -p /app/full \

FROM intel/deep-learning-essentials:$ONEAPI_VERSION AS base

ARG IGC_VERSION=v2.30.1
ARG IGC_VERSION_FULL=2_2.30.1+20950
ARG COMPUTE_RUNTIME_VERSION=26.09.37435.1
ARG COMPUTE_RUNTIME_VERSION_FULL=26.09.37435.1-0
ARG IGDGMM_VERSION=22.9.0
ARG IGC_VERSION=v2.20.5
ARG IGC_VERSION_FULL=2_2.20.5+19972
ARG COMPUTE_RUNTIME_VERSION=25.40.35563.10
ARG COMPUTE_RUNTIME_VERSION_FULL=25.40.35563.10-0
ARG IGDGMM_VERSION=22.8.2
RUN mkdir /tmp/neo/ && cd /tmp/neo/ \
&& wget https://github.com/intel/intel-graphics-compiler/releases/download/$IGC_VERSION/intel-igc-core-${IGC_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/intel-graphics-compiler/releases/download/$IGC_VERSION/intel-igc-opencl-${IGC_VERSION_FULL}_amd64.deb \
Expand Down Expand Up @@ -109,4 +116,3 @@ WORKDIR /app
HEALTHCHECK CMD [ "curl", "-f", "http://localhost:8080/health" ]

ENTRYPOINT [ "/app/llama-server" ]

1 change: 0 additions & 1 deletion .devops/nix/package.nix
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,6 @@ effectiveStdenv.mkDerivation (finalAttrs: {
ninja
pkg-config
git
spirv-headers
]
++ optionals useCuda [
cudaPackages.cuda_nvcc
Expand Down
50 changes: 48 additions & 2 deletions .devops/openvino.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,19 @@ ARG OPENVINO_VERSION_MAJOR=2026.0
ARG OPENVINO_VERSION_FULL=2026.0.0.20965.c6d6a13a886
ARG UBUNTU_VERSION=24.04

# Optional proxy build arguments - empty by default
# Intel GPU driver versions. https://github.com/intel/compute-runtime/releases
ARG IGC_VERSION=v2.30.1
ARG IGC_VERSION_FULL=2_2.30.1+20950
ARG COMPUTE_RUNTIME_VERSION=26.09.37435.1
ARG COMPUTE_RUNTIME_VERSION_FULL=26.09.37435.1-0
ARG IGDGMM_VERSION=22.9.0

# Intel NPU driver versions. https://github.com/intel/linux-npu-driver/releases
ARG NPU_DRIVER_VERSION=v1.32.0
ARG NPU_DRIVER_FULL=v1.32.0.20260402-23905121947
ARG LIBZE1_VERSION=1.27.0-1~24.04~ppa2

# Optional proxy build arguments
ARG http_proxy=
ARG https_proxy=

Expand Down Expand Up @@ -78,13 +90,47 @@ ARG http_proxy
ARG https_proxy

RUN apt-get update \
&& apt-get install -y libgomp1 libtbb12 curl \
&& apt-get install -y libgomp1 libtbb12 curl wget ocl-icd-libopencl1 \
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
&& find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete \
&& find /var/cache -type f -delete

# Install GPU drivers
ARG IGC_VERSION
ARG IGC_VERSION_FULL
ARG COMPUTE_RUNTIME_VERSION
ARG COMPUTE_RUNTIME_VERSION_FULL
ARG IGDGMM_VERSION
RUN mkdir /tmp/neo/ && cd /tmp/neo/ \
&& wget https://github.com/intel/intel-graphics-compiler/releases/download/${IGC_VERSION}/intel-igc-core-${IGC_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/intel-graphics-compiler/releases/download/${IGC_VERSION}/intel-igc-opencl-${IGC_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/intel-ocloc-dbgsym_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.ddeb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/intel-ocloc_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/intel-opencl-icd-dbgsym_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.ddeb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/intel-opencl-icd_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/libigdgmm12_${IGDGMM_VERSION}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/libze-intel-gpu1-dbgsym_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.ddeb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/libze-intel-gpu1_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.deb \
&& dpkg --install *.deb \
&& rm -rf /tmp/neo/

# Install NPU drivers
ARG NPU_DRIVER_VERSION
ARG NPU_DRIVER_FULL
ARG LIBZE1_VERSION
RUN mkdir /tmp/npu/ && cd /tmp/npu/ \
&& wget https://github.com/intel/linux-npu-driver/releases/download/${NPU_DRIVER_VERSION}/linux-npu-driver-${NPU_DRIVER_FULL}-ubuntu2404.tar.gz \
&& tar -xf linux-npu-driver-${NPU_DRIVER_FULL}-ubuntu2404.tar.gz \
&& dpkg --install *.deb \
&& rm -rf /tmp/npu/

RUN cd /tmp \
&& wget https://snapshot.ppa.launchpadcontent.net/kobuk-team/intel-graphics/ubuntu/20260324T100000Z/pool/main/l/level-zero-loader/libze1_${LIBZE1_VERSION}_amd64.deb \
&& dpkg --install libze1_${LIBZE1_VERSION}_amd64.deb \
&& rm libze1_${LIBZE1_VERSION}_amd64.deb

COPY --from=build /app/lib/ /app/

### Full (all binaries)
Expand Down
10 changes: 1 addition & 9 deletions .editorconfig
Original file line number Diff line number Diff line change
Expand Up @@ -45,15 +45,7 @@ insert_final_newline = unset
trim_trailing_whitespace = unset
insert_final_newline = unset

[tools/server/webui/**]
indent_style = unset
indent_size = unset
end_of_line = unset
charset = unset
trim_trailing_whitespace = unset
insert_final_newline = unset

[tools/server/public/**]
[tools/ui/**]
indent_style = unset
indent_size = unset
end_of_line = unset
Expand Down
4 changes: 0 additions & 4 deletions .gitattributes

This file was deleted.

2 changes: 2 additions & 0 deletions .github/ISSUE_TEMPLATE/010-bug-compilation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ body:
after recreating the CMake build directory and with `-DGGML_CCACHE=OFF`.
If the compilation succeeds with ccache disabled you should be able to permanently fix the issue
by clearing `~/.cache/ccache` (on Linux).

Please fill out this template yourself, copypasting language model outputs is [strictly prohibited](https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md#ai-usage-policy).
- type: textarea
id: commit
attributes:
Expand Down
4 changes: 3 additions & 1 deletion .github/ISSUE_TEMPLATE/011-bug-results.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: Bug (model use)
description: Something goes wrong when using a model (in general, not specific to a single llama.cpp module).
description: Something goes wrong when running a model (crashes, garbled outputs, etc.).
title: "Eval bug: "
labels: ["bug-unconfirmed", "model evaluation"]
body:
Expand All @@ -12,6 +12,8 @@ body:
If you encountered the issue while using an external UI (e.g. ollama),
please reproduce your issue using one of the examples/binaries in this repository.
The `llama-completion` binary can be used for simple and reproducible model inference.

Please fill out this template yourself, copypasting language model outputs is [strictly prohibited](https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md#ai-usage-policy).
- type: textarea
id: version
attributes:
Expand Down
2 changes: 2 additions & 0 deletions .github/ISSUE_TEMPLATE/019-bug-misc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ body:
This issue template is intended for miscellaneous bugs that don't fit into any other category.
If you encountered the issue while using an external UI (e.g. ollama),
please reproduce your issue using one of the examples/binaries in this repository.

Please fill out this template yourself, copypasting language model outputs is [strictly prohibited](https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md#ai-usage-policy).
- type: textarea
id: version
attributes:
Expand Down
2 changes: 2 additions & 0 deletions .github/ISSUE_TEMPLATE/020-enhancement.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ body:
value: |
[Please post your idea first in Discussion if there is not yet a consensus for this enhancement request. This will help to keep this issue tracker focused on enhancements that the community has agreed needs to be implemented.](https://github.com/ggml-org/llama.cpp/discussions/categories/ideas)

Please fill out this template yourself, copypasting language model outputs is [strictly prohibited](https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md#ai-usage-policy).

- type: checkboxes
id: prerequisites
attributes:
Expand Down
2 changes: 2 additions & 0 deletions .github/ISSUE_TEMPLATE/030-research.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ body:
value: |
Don't forget to check for any [duplicate research issue tickets](https://github.com/ggml-org/llama.cpp/issues?q=is%3Aopen+is%3Aissue+label%3A%22research+%F0%9F%94%AC%22)

Please fill out this template yourself, copypasting language model outputs is [strictly prohibited](https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md#ai-usage-policy).

- type: checkboxes
id: research-stage
attributes:
Expand Down
2 changes: 2 additions & 0 deletions .github/ISSUE_TEMPLATE/040-refactor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ body:
Don't forget to [check for existing refactor issue tickets](https://github.com/ggml-org/llama.cpp/issues?q=is%3Aopen+is%3Aissue+label%3Arefactoring) in case it's already covered.
Also you may want to check [Pull request refactor label as well](https://github.com/ggml-org/llama.cpp/pulls?q=is%3Aopen+is%3Apr+label%3Arefactoring) for duplicates too.

Please fill out this template yourself, copypasting language model outputs is [strictly prohibited](https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md#ai-usage-policy).

- type: textarea
id: background-description
attributes:
Expand Down
5 changes: 2 additions & 3 deletions .github/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,11 +73,10 @@ android:
- changed-files:
- any-glob-to-any-file:
- examples/llama.android/**
server/webui:
server/ui:
- changed-files:
- any-glob-to-any-file:
- tools/server/webui/**
- tools/server/public/**
- tools/ui/**
server:
- changed-files:
- any-glob-to-any-file:
Expand Down
2 changes: 1 addition & 1 deletion .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

<!-- You can provide more details and link related discussions here. Delete this section if not applicable -->

# Requirements
## Requirements

<!-- IMPORTANT: Please do NOT delete this section, otherwise your PR may be rejected -->

Expand Down
148 changes: 148 additions & 0 deletions .github/workflows/build-and-test-snapdragon.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
name: CI (snapdragon)

on:
workflow_dispatch:
push:
branches:
- master
paths:
- '.github/workflows/build-and-test-snapdragon.yml'
- 'ggml/include/ggml-hexagon.h'
- 'ggml/src/ggml-hexagon/**'
- 'docs/backend/snapdragon/**'
- 'scripts/snapdragon/**'
- 'CMakePresets.json'

pull_request:
types: [opened, synchronize, reopened]
paths:
- '.github/workflows/build-and-test-snapdragon.yml'
- 'ggml/include/ggml-hexagon.h'
- 'ggml/src/ggml-hexagon/**'
- 'docs/backend/snapdragon/**'
- 'scripts/snapdragon/**'
- 'CMakePresets.json'

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}
cancel-in-progress: true

jobs:
android-ndk-snapdragon:
runs-on: ubuntu-latest
container:
image: 'ghcr.io/snapdragon-toolchain/arm64-android:v0.3'
defaults:
run:
shell: bash

steps:
- name: Clone
uses: actions/checkout@v6
with:
fetch-depth: 0
lfs: false

- name: Build Llama.CPP for Snapdragon Android
id: build_llama_cpp_snapdragon_android
run: |
cp docs/backend/snapdragon/CMakeUserPresets.json .
cmake --preset arm64-android-snapdragon-release -B build
cmake --build build
cmake --install build --prefix pkg-snapdragon/llama.cpp

- name: Upload Llama.CPP Snapdragon Android Build Artifact
if: ${{ always() && steps.build_llama_cpp_snapdragon_android.outcome == 'success' }}
uses: actions/upload-artifact@v6
with:
name: llama-cpp-android-arm64-snapdragon
path: pkg-snapdragon/llama.cpp

linux-iot-snapdragon:
runs-on: ubuntu-latest
container:
image: 'ghcr.io/snapdragon-toolchain/arm64-linux:v0.1'
defaults:
run:
shell: bash

steps:
- name: Clone
uses: actions/checkout@v6
with:
fetch-depth: 0
lfs: false

- name: Build Llama.CPP for Snapdragon Linux IoT
id: build_llama_cpp_snapdragon_linux
run: |
cp docs/backend/snapdragon/CMakeUserPresets.json .
cmake --preset arm64-linux-snapdragon-release -B build-snapdragon -DGGML_OPENCL=ON
cmake --build build-snapdragon -j $(nproc)
cmake --install build-snapdragon --prefix pkg-snapdragon/llama.cpp

- name: Upload Llama.CPP Snapdragon Linux IoT Build Artifact
if: ${{ always() && steps.build_llama_cpp_snapdragon_linux.outcome == 'success' }}
uses: actions/upload-artifact@v6
with:
name: llama-cpp-linux-arm64-snapdragon
path: pkg-snapdragon/llama.cpp

test-snapdragon-qdc:
name: Test on QDC Device (${{ matrix.device }})
needs: [android-ndk-snapdragon, linux-iot-snapdragon]
runs-on: ubuntu-24.04-arm
timeout-minutes: 90
strategy:
fail-fast: false
matrix:
device: [SM8750, SM8850, QCS9075M]

steps:
- name: Checkout
uses: actions/checkout@v6

- name: Download build artifact
uses: actions/download-artifact@v7
with:
name: ${{ startsWith(matrix.device, 'QCS') && 'llama-cpp-linux-arm64-snapdragon' || 'llama-cpp-android-arm64-snapdragon' }}
path: pkg-snapdragon/llama.cpp

- name: Set up Python
uses: actions/setup-python@v6
with:
python-version: '3.x'
cache: pip

- name: Install system dependencies
run: |
sudo apt-get update
sudo apt-get install -y curl unzip

- name: Install QDC SDK wheel
run: |
curl -fSL -o qdc_sdk.zip https://softwarecenter.qualcomm.com/api/download/software/tools/Qualcomm_Device_Cloud_SDK/All/0.2.3/qualcomm_device_cloud_sdk-0.2.3.zip
unzip qdc_sdk.zip -d qdc_sdk
pip install qdc_sdk/qualcomm_device_cloud_sdk-0.2.3-py3-none-any.whl

- name: Check QDC API key
id: check_secret
env:
QDC_API_KEY: ${{ secrets.QDC_API_KEY }}
run: echo "has-qdc-key=${{ env.QDC_API_KEY != '' }}" >> "$GITHUB_OUTPUT"

- name: Run QDC tests (${{ matrix.device }})
if: steps.check_secret.outputs.has-qdc-key == 'true'
run: |
python scripts/snapdragon/qdc/run_qdc_jobs.py \
--test all \
--pkg-dir pkg-snapdragon/llama.cpp \
--model-url "https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_0.gguf" \
--device ${{ matrix.device }} \
${{ startsWith(matrix.device, 'QCS') && '--retries 2 --retry-delay 300' || '' }}
env:
QDC_API_KEY: ${{ secrets.QDC_API_KEY }}

- name: Cleanup
if: always()
run: rm -rf pkg-snapdragon qdc_sdk qdc_sdk.zip
Loading
Loading