Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
4497 commits
Select commit Hold shift + click to select a range
9b26511
ggml-cpu: implement MXFP4 SIMD for s390x (#16193)
taronaeo Sep 26, 2025
4710dd3
build : fix build-ios-device (#16257)
angt Sep 26, 2025
b995a10
common : use cpp-httplib as a cURL alternative for downloads (#16185)
angt Sep 26, 2025
54dbc37
metal : report OOM errors (#16274)
ggerganov Sep 26, 2025
cc1cfa2
mtmd : fix uninitialized variable in bicubic_resize (#16275)
AlekseiNikiforovIBM Sep 26, 2025
d12a983
codeowners : add rgerganov as owner of RPC [no ci] (#16279)
rgerganov Sep 26, 2025
5d0a40f
Always show message actions for mobile UI + improvements for user mes…
allozaur Sep 26, 2025
e0539eb
webui: switch to hash-based routing (alternative of #16079) (#16157)
isaac-mcfadyen Sep 26, 2025
1a18927
Allow viewing conversations even when llama server is down (#16255)
allozaur Sep 26, 2025
807e8c6
Enhance text file detection logic for file attachments (#16199)
allozaur Sep 26, 2025
624207e
devops: add s390x & ppc64le CI (#15925)
taronaeo Sep 26, 2025
72b24d9
model : make minicpm embedding_scale, residual_scale and logit_scale …
vinkal-chudgar Sep 26, 2025
ace6a54
build : add LLAMA_OPENSSL option (#16287)
angt Sep 27, 2025
3f81b4e
vulkan: support GET_ROWS for k-quants (#16235)
jeffbolznv Sep 27, 2025
234e2ff
server : remove old LLAMA_SERVER_SSL (#16290)
angt Sep 27, 2025
0499b29
vulkan: throw system error instead of SIGABRT during init on older de…
DmyMi Sep 27, 2025
75a3a6c
CUDA: refactor and deduplicate vector FA kernels (#16208)
JohannesGaessler Sep 27, 2025
c0bfc57
CUDA: mul_mat_id for mmf for bs <= 64 for f16 and bs <= 32 for f32 (#…
am17an Sep 27, 2025
4807e8f
Show message actions by default (#16289)
allozaur Sep 27, 2025
8656f5d
vulkan : make the vulkan.hpp dynamic dispatcher instance private (#16…
Acly Sep 27, 2025
e6d65fb
vulkan: support arbitrary KV dimension in flash attention (#16160)
jeffbolznv Sep 27, 2025
1384abf
vulkan: handle mat_mul with A matrix > 4GB (#16176)
jeffbolznv Sep 28, 2025
3b53634
metal : fuse non-sequential nodes (#16102)
ggerganov Sep 28, 2025
6a2c614
metal : extend mat-mat multiplication support (#16225)
ggerganov Sep 28, 2025
d8359f5
vulkan: 64-bit im2col (#16135)
jeffbolznv Sep 28, 2025
2811c65
Fixed a few typos in the README of the LLaMA.cpp HTTP Server [no ci] …
ImadSaddik Sep 28, 2025
0124ac9
devops: switch to using ubuntu-22.04-s390x image (#16302)
taronaeo Sep 28, 2025
d9e0e7c
ci : fix musa docker build (#16306)
yeahdongcn Sep 28, 2025
bd0af02
common : fix reasoning before forced tool call via tool_choice = requ…
crat0z Sep 28, 2025
b887d2f
ggml : fix GGML_F32_VEC_FMA argument order in ggml_vec_mad1_f32 (#16307)
CISC Sep 28, 2025
92cd103
vulkan: Fix validation failure in quantized flash attention (#16292)
jeffbolznv Sep 29, 2025
a4a0aa5
ggml : fix dependencies for ggml_set_rows (#16318)
ggerganov Sep 29, 2025
3ffd0fa
perplexity : show more kl-divergence data (#16321)
ddh0 Sep 29, 2025
2f61c0f
llama-cli: prevent spurious assistant token (#16202)
vinkal-chudgar Sep 29, 2025
66bb798
fix: preserved zero values in chat settings inputs and textareas by s…
ServeurpersoCom Sep 29, 2025
3a2bdcd
Improve Mobile UI for dialogs and action dropdowns (#16222)
allozaur Sep 29, 2025
adc7634
ggml : check cuda and metal argsort limits and add test (#16323)
CISC Sep 29, 2025
02463ab
ggml-backend : add root cause in error message if loading backend lib…
rlewczuk Sep 29, 2025
2db78c7
ggml : bump version to 0.9.1
ggerganov Sep 20, 2025
b6dff20
ggml : prepare for development of 0.9.2-dev
ggerganov Sep 20, 2025
b6ae75a
ggml : bump version to 0.9.3 (ggml/1353)
danbev Sep 25, 2025
c9b1c06
ggml : remove -dev suffix from release version (ggml/1355)
danbev Sep 26, 2025
4d3d455
sync : whisper.cpp (ggml/1359)
ggerganov Sep 29, 2025
2ddd3f2
sync : ggml
ggerganov Sep 29, 2025
b77e6c1
ggml: riscv: add riscv spacemit backend (#15288)
alex-spacemit Sep 29, 2025
d72f5f7
ci : add AMD runners and workflows (#16249)
ggerganov Sep 29, 2025
5f7e166
Fix thinking blocks with quotes + add handling `[THINK]...[/THINK]` b…
ServeurpersoCom Sep 29, 2025
a74a0d6
tests: override test_set_rows::max_nmse_err to allow for occasional r…
jeffbolznv Sep 30, 2025
de41f2b
codeowners: add codeowners for opencl backend (#16344)
lhez Sep 30, 2025
f1eb1cb
kleidiai : fix work size and threads sync for fp16 (#16246)
chaxu01 Sep 30, 2025
3c62aed
common : simplify etag tracking by removing json (#16342)
angt Sep 30, 2025
35fb824
metal : dynamic simdgroups for MV kernels (#16340)
ggerganov Sep 30, 2025
a014310
cuda : Enable CUDA Graph usage for Nemotron Nano v2 (NemotronH) (#16328)
anavp-nvidia Sep 30, 2025
075c015
ggml : bump version to 0.9.4 (ggml/1363)
ggerganov Sep 30, 2025
2df5bcf
ci : disable ccache for android (#16348)
CISC Sep 30, 2025
364a7a6
common : remove common_has_curl() (#16351)
angt Sep 30, 2025
d1c84a6
opencl: support ne3 in get_rows (#15866)
lhez Sep 30, 2025
8d78cd2
ggml webgpu: support for rope,div,sub,glu,scale,cont operators (#16187)
reeselevine Sep 30, 2025
16b0ca0
Chatapi ignore empty sampling (#16330)
ServeurpersoCom Sep 30, 2025
7c156df
opencl: support pad_ext (#15888)
lhez Sep 30, 2025
bf6f3b3
common : disable progress bar without a tty (#16352)
angt Sep 30, 2025
b2ba81d
ci : fix ccache key for ubuntu-cpu-cmake (#16355)
CISC Sep 30, 2025
e74c92e
model : support GLM 4.6 (make a few NextN/MTP tensors not required) (…
bartowski1182 Sep 30, 2025
aa9538a
webui: Remove running `llama-server` within WebUI `dev.sh` script (#1…
allozaur Oct 1, 2025
132d673
vulkan: make ggml_vk_default_dispatcher support older vulkan headers …
netrunnereve Oct 1, 2025
4f15759
Add optional setting for showing "Model used:" information (#16337)
allozaur Oct 1, 2025
1104ca1
ci : use registry cache for docker builds (#16366)
CISC Oct 1, 2025
2a9b633
Improve code block color theming (#16325)
allozaur Oct 1, 2025
7647992
Conversation action dialogs as singletons from Chat Sidebar + apply c…
allozaur Oct 1, 2025
4201dea
common: introduce http.h for httplib-based client (#16373)
angt Oct 1, 2025
1fe4e38
ci: Properly install rocwmma for hip builds (#16305)
IMbackK Oct 1, 2025
ded67b9
llama : parameter conversion and loading fixes for PLaMo2 variants (#…
mitmul Oct 1, 2025
e95fec6
HIP: Disable ROCWMMA fattn on CDNA when compiled against ROCWMMA 2.0.…
IMbackK Oct 1, 2025
c8dedc9
CI: reenable cdna in rocm docker builds (#16376)
IMbackK Oct 1, 2025
95ce098
HIP: add IMbackK to codeowner (#16375)
IMbackK Oct 2, 2025
2be72c2
SYCL: Update to oneAPI 2025.2 (#16371)
NeoZhangJianyu Oct 2, 2025
bbd32bc
ci : fix clean-up of old logs (#16381)
ggerganov Oct 2, 2025
f09aefa
ci: update vulkan ci (#16294)
netrunnereve Oct 2, 2025
72ee736
ci : fix ubuntu-latest-cmake-rpc (disable ccache) (#16388)
CISC Oct 2, 2025
91a2a56
musa: update compile flags (#16265)
yeahdongcn Oct 2, 2025
34fcc5a
model : Apertus model implementation (#15852)
pwilkin Oct 2, 2025
ef07a40
ggml webgpu: add support for soft_max, optimize rms_norm (#16357)
reeselevine Oct 2, 2025
d64c810
test-barrier : do not use more threads than physically available (#16…
CISC Oct 2, 2025
5113efd
fix: track viewportHeight via window.innerHeight to avoid unwanted sc…
ServeurpersoCom Oct 3, 2025
136bda7
webui : Fix messages payload sent to chat completions (#16402)
allozaur Oct 3, 2025
e308efd
vulkan: in flash attention, bounds check against nem1 (don't rely on …
jeffbolznv Oct 3, 2025
7723327
Capture model name only after first token (streaming) or completed re…
allozaur Oct 3, 2025
ad12647
ci : change macos-13 to macos-15-intel (#16401)
danbev Oct 3, 2025
0e1f838
vulkan: Fix FA coopmat1 invalid array indexing (#16365)
jeffbolznv Oct 3, 2025
2aaf0a2
vulkan: Replace uses of maxMemoryAllocationSize and VK_WHOLE_SIZE (#1…
jeffbolznv Oct 3, 2025
84c8e30
Fix missing messages on sibling navigation (#16408)
allozaur Oct 3, 2025
638d330
ggml : fix graph reallocation with multiple chunks (#16396)
Acly Oct 3, 2025
946f71e
llama : fix shapes for bert/mpt q/k norm (#16409)
CISC Oct 3, 2025
606a73f
metal : fix loop bound in ggml_mem_ranges (#16412)
ggerganov Oct 3, 2025
f6dcda3
server : context checkpointing for hybrid and recurrent models (#16382)
ddh0 Oct 3, 2025
128d522
chat : support Magistral thinking (#16413)
ServeurpersoCom Oct 3, 2025
e29acf7
vulkan : incremental shader builds (#16341)
Acly Oct 4, 2025
898acba
rpc : add support for multiple devices (#16276)
rgerganov Oct 4, 2025
f392839
rpc : check src buffer when copying tensor (#16421)
rgerganov Oct 4, 2025
86df2c9
vulkan: use a more appropriate amount of threads when generating shad…
netrunnereve Oct 4, 2025
3526657
ggml webgpu: actually add softmax, fix rms_norm offset (#16400)
reeselevine Oct 5, 2025
ca71fb9
model : Granite docling + Idefics3 preprocessing (SmolVLM) (#16206)
gabe-l-hart Oct 5, 2025
c5fef0f
server: update readme to mention n_past_max metric (#16436)
okuvshynov Oct 6, 2025
1d49ca3
nix : removed metal for nix (#16118)
yuannan Oct 6, 2025
a80ff18
ggml-cpu : fix leftover handling in ggml_vec_scale_f32 for SVE (#16443)
danbev Oct 6, 2025
04e632a
ci : remove missing reranker model files (#16444)
danbev Oct 6, 2025
a23b9bd
ggml : fix unaligned access in AMX code (#16315)
ggerganov Oct 6, 2025
3a002af
ci : refactor sdk caching to minimize storage (#16414)
CISC Oct 6, 2025
c08002a
chat : Granite Docling stopping (#16438)
gabe-l-hart Oct 6, 2025
3df2244
llama : add --no-host to disable host buffers (#16310)
Gadflyii Oct 6, 2025
8ae32dc
metal : various optimizations + refactoring (#16446)
ggerganov Oct 7, 2025
1d6092f
tests : add -INF blocks to the KQ mask in the FA tests (#16380)
ggerganov Oct 7, 2025
0a319bb
metal : add support for non-padded FA KV (#16148)
ggerganov Oct 7, 2025
0123ff3
memory : use sequential equal splits for recurrent modules (#16442)
ggerganov Oct 7, 2025
c61ae20
rpc : update documentation (#16441)
rgerganov Oct 7, 2025
ef4c5b8
presets : fix pooling param for embedding models (#16455)
ggerganov Oct 7, 2025
4e0388a
webui : added download action (#13552) (#16282)
srogmann Oct 7, 2025
df1b612
server : add `/v1/health` endpoint (#16461)
ggerganov Oct 7, 2025
aeaf8a3
llama : support LiquidAI LFM2-MoE hybrid model (#16464)
tdakhran Oct 7, 2025
74b8fc1
ggml webgpu: profiling, CI updates, reworking of command submission (…
reeselevine Oct 7, 2025
7fdd16b
server : improve context checkpoint logic (#16440)
ggerganov Oct 8, 2025
b2c08c9
metal : mark FA blocks (#16372)
ggerganov Oct 8, 2025
d2ee056
server : fix cancel pending task (#16467)
issixx Oct 8, 2025
9d08828
Disable CUDA host buffers on integrated GPUs (#16308)
ai-fonsi Oct 8, 2025
12bbc3f
refactor: centralize CoT parsing in backend for streaming mode (#16394)
ServeurpersoCom Oct 8, 2025
e08db42
model: EmbeddingGemma Adding Support for SentenceTransformers Dense M…
sfallah Oct 9, 2025
b260213
[SYCL] refactor soft_max, add soft_max_back (#16472)
NeoZhangJianyu Oct 9, 2025
d80d6d2
kleidiai: kernel interface refactoring (#16460)
chaxu01 Oct 9, 2025
aa4711d
CANN: Improve ACL graph matching (#16166)
noemotiovon Oct 9, 2025
2c0d875
ci: add ARM64 Kleidiai build and test support (#16462)
sudhiarm Oct 9, 2025
56b4795
model-conversion : add support for SentenceTransformers (#16387)
danbev Oct 9, 2025
8328fd4
No markdown in cot (#16483)
ServeurpersoCom Oct 9, 2025
d00cbea
server : host-memory prompt caching (#16391)
ggerganov Oct 9, 2025
1deee0f
cpu : optimize the ggml NORM operation (#15953)
duduta Oct 9, 2025
1faa13a
webui: updated the chat service to only include max_tokens in the req…
ServeurpersoCom Oct 9, 2025
6d69ab3
cmake : Dont define XOPENSOURCE on AIX (#16481)
mehendarkarprajwal Oct 10, 2025
cdb6da4
server : log requests to /v1/completions (#16495)
rgerganov Oct 10, 2025
68ee98a
server : return HTTP 400 if prompt exceeds context length (#16486)
rgerganov Oct 10, 2025
81086cd
vocab : mark EOT token for Granite models (#16499)
ggerganov Oct 10, 2025
e60f01d
server : fix division by zero when reporting stats (#16501)
ggerganov Oct 10, 2025
477a66b
convert : correctly handle LLaMA tokenizer for Jamba (#16470)
amirai21 Oct 11, 2025
97870e6
cuda : avoid initializing unused devices (#16510)
slaren Oct 11, 2025
31d0ff1
server / ranking : add sorting and management of top_n (#16403)
YannFollet Oct 11, 2025
4a8fbe0
feat: render user content as markdown option (#16358)
ServeurpersoCom Oct 11, 2025
a3cb047
metal : fix mul-mm condition + fix mul-mv permuted kernels (#16494)
ggerganov Oct 11, 2025
11f0af5
CUDA: faster tile FA, add oob checks, more HSs (#16492)
JohannesGaessler Oct 11, 2025
20cc625
ggml: Correct SVE implementation in ggml_vec_dot_f16_unroll (#16518)
sirus20x6 Oct 12, 2025
a2fba89
hparams : add check for layer index in is_recurrent (#16511)
danbev Oct 12, 2025
41aac5c
ggml : Fix FP16 ELU positive branch (#16519)
sirus20x6 Oct 12, 2025
4b2dae3
common : update presets (#16504)
ggerganov Oct 12, 2025
2c301e9
common : handle unicode during partial json parsing (#16526)
aldehir Oct 12, 2025
8415f61
ci : add Vulkan on Ubuntu with default packages build (#16532)
mbaudier Oct 12, 2025
c7be9fe
[SYCL] fix UT fault cases: count-equal, argsort, pad OPs (#16521)
NeoZhangJianyu Oct 12, 2025
81d54bb
webui: remove client-side context pre-check and rely on backend for l…
ServeurpersoCom Oct 12, 2025
a31cf36
metal : add opt_step_adamw and op_sum (#16529)
cern1710 Oct 12, 2025
f9bc66c
CANN: Update several operators to support FP16 data format (#16251)
hipudding Oct 13, 2025
c515fc5
ggml : fix scalar path for computing norm (#16558)
ggerganov Oct 13, 2025
3f750f8
metal: add support for opt_step_sgd (#16539)
cern1710 Oct 13, 2025
1fb9504
fix: add remark plugin to render raw HTML as literal text (#16505)
ServeurpersoCom Oct 13, 2025
56fc38b
CANN: fix CPU memory leak in CANN backend (#16549)
noemotiovon Oct 13, 2025
01d2bdc
ggml : fix build broken with -march=armv9-a on MacOS (#16520)
DamonFool Oct 13, 2025
7049736
CUDA: fix numerical issues in tile FA kernel (#16540)
JohannesGaessler Oct 13, 2025
5016b72
opencl: fix build targeting CL 2 (#16554)
lhez Oct 13, 2025
e38b7c6
graph : support cacheless embeddings with FA and iSWA (#16528)
ggerganov Oct 13, 2025
e60f241
metal : FA support F32 K and V and head size = 32 (#16531)
ggerganov Oct 13, 2025
bc07349
server : dynamic token limit for prompt cache (#16560)
ggerganov Oct 14, 2025
5b6913c
cuda : remove legacy copy-op pointer indirection code (#16485)
anavp-nvidia Oct 14, 2025
48e2fa9
CUDA: add fp kernel for larger batch size MoE (#16512)
am17an Oct 14, 2025
1ee9d0b
CUDA: use fastdiv + ggml_cuda_mad for mmvf (#16557)
am17an Oct 14, 2025
9c7185d
CUDA: enable FA for FP32 KV cache (#16546)
JohannesGaessler Oct 14, 2025
7ea15bb
vulkan: Improve build time for MSVC (#16545)
jeffbolznv Oct 14, 2025
4258e0c
vulkan: Support FA with K/V in F32 (#16543)
jeffbolznv Oct 14, 2025
120bf70
CUDA + openCL: fix bug in accessing rms_norm->src while doing fusion …
am17an Oct 14, 2025
ffa0590
vulkan: Add ACC_TYPE_VEC2 implementation (#16203)
SavicStefan Oct 14, 2025
fa882fd
metal : avoid using Metal's gpuAddress property (#16576)
ggerganov Oct 14, 2025
554fd57
server : fix mtmd checkpoints (#16591)
ggerganov Oct 15, 2025
5acd455
CUDA: Changing the CUDA scheduling strategy to spin (#16585)
JTischbein Oct 15, 2025
3e3cb19
llama-quant: add support for mmproj (#16592)
ngxson Oct 15, 2025
17304cb
server : fix img token logs (#16595)
ggerganov Oct 15, 2025
f4ce81c
metal: optimise `GGML_OP_SUM` (#16559)
cern1710 Oct 15, 2025
f9fb33f
Add server-driven parameter defaults and syncing (#16515)
allozaur Oct 15, 2025
d93f843
opencl: fix FA for f32 (#16584)
lhez Oct 15, 2025
0cb7a06
opencl: add q8_0 mm support (#16469)
lhez Oct 15, 2025
466c191
cpu : add FLOOR, CEIL, ROUND and TRUNC unary operators (#16083)
safranowith Oct 15, 2025
7adc79c
gguf-py : add support for endian conversion of BF16 data (#16594)
AlekseiNikiforovIBM Oct 15, 2025
ee50ee1
SYCL: Add GGML_OP_MEAN operator support (#16009)
yael-works Oct 16, 2025
adc9b60
ggml-cpu: replace putenv with setenv for const-correctness (#16573)
otegami Oct 16, 2025
6f5d924
common : Update the docs on -t --threads (#16236)
takasurazeem Oct 16, 2025
7a50cf3
CANN: format code using .clang-format (#15863)
noemotiovon Oct 16, 2025
b22572e
sycl : add ARANGE operator (#16362)
GittyBurstein Oct 16, 2025
683fa6b
fix: added a normalization step for MathJax-style \[\] and \(\) delim…
ServeurpersoCom Oct 16, 2025
1bb4f43
mtmd : support home-cooked Mistral Small Omni (#14928)
ngxson Oct 16, 2025
ceff6bb
SYCL SET operator optimized for F32 tensors (#16350)
GittyBurstein Oct 17, 2025
79967ec
grammar : use int64_t to avoid int overflows in int schema to grammar…
ochafik Oct 17, 2025
9ad4f19
metal : add `CONV_TRANSPOSE_2D` (#16542)
iliailmer Oct 17, 2025
b194915
vulkan: fix debug build (add_rms_len/data not found) (#16624)
jeffbolznv Oct 17, 2025
ababae7
webui: reorganize settings layout (#16607)
ServeurpersoCom Oct 17, 2025
342c728
ggml : fix SpaceMit IME array out-of-bounds in task assignment (#16629)
muggle-stack Oct 17, 2025
3d4e86b
vulkan: Add State Space Model (SSM) Operations Support (#16463)
giuseppe Oct 17, 2025
41386cf
rpc : report actual free memory (#16616)
rgerganov Oct 17, 2025
66b0dbc
llama-model: fix insonsistent ctxs <-> bufs order (#16581)
JohannesGaessler Oct 17, 2025
8138785
opencl: transposed gemm/gemv moe kernel with mxfp4,f32 (#16602)
shawngu-quic Oct 18, 2025
38355c6
CUDA: use registers instead of smem in topk-moe (#16647)
am17an Oct 18, 2025
e56abd2
vulkan: Implement topk_moe fused shader, ported from CUDA (#16641)
jeffbolznv Oct 18, 2025
ee09828
HIP: fix GPU_TARGETS (#16642)
JohannesGaessler Oct 18, 2025
55754be
CODEOWNERS: update for ggml-cuda/mmf (#16660)
am17an Oct 19, 2025
fcb235b
ci: include s390x release binaries (#16648)
taronaeo Oct 19, 2025
cec5edb
ci : avoid manual updates of docs/ops.md (#16663)
CISC Oct 19, 2025
4f73d0a
ci : fix binaries release failure for s390x (binaries may not work ye…
taronaeo Oct 19, 2025
0398752
model : add Granite Hybrid types (#16635)
giuseppe Oct 19, 2025
7062dd8
llama-context: only warn on pooling_type when user specified (#16674)
otegami Oct 20, 2025
2330de7
SYCL: Add support for FLOOR,CEIL,ROUND and TRUNC unary operators (#16…
safranowith Oct 20, 2025
72d53e6
readme: update bindings (#16651)
deadprogram Oct 20, 2025
06332e2
llama-batch: fix build fails with `-Werror=missing-braces` (#16614)
otegami Oct 20, 2025
13f2cfa
Enable per-conversation loading states to allow having parallel conve…
allozaur Oct 20, 2025
0e4a0cf
Import/Export UX improvements (#16619)
allozaur Oct 20, 2025
7906850
Prevent premature submission on IME input (#16673)
allozaur Oct 20, 2025
b617cfd
ggml-alloc : fix leak when reusing a tensor with a larger size (#16679)
slaren Oct 20, 2025
c9c1972
Handle legacy 'context' attachments (#16687)
allozaur Oct 20, 2025
84bf3c6
model : add BailingMoeV2 support (#16063)
CISC Oct 20, 2025
6de8ed7
sycl : add PAD_REFLECT_D1 operator support (#16145)
ye-NX Oct 20, 2025
fb34984
vulkan: Handle FA with all -inf mask values (#16447)
jeffbolznv Oct 21, 2025
6ea37f5
opencl: fix warnings and clean up profiling (#16688)
lhez Oct 21, 2025
4926419
ggml: add ggml_can_fuse_subgraph (#16662)
am17an Oct 21, 2025
51d1a8c
CUDA: better error for FA kernel with 0 occupancy (#16643)
JohannesGaessler Oct 21, 2025
03792ad
CUDA: topk-moe: add optional parameter for gpt-oss (#16649)
am17an Oct 21, 2025
9285325
CUDA: fix bug in topk-moe softmax (#16711)
am17an Oct 22, 2025
d8eaa26
tests : fix test-thread-safety when compiling with multiple backends …
Acly Oct 22, 2025
19a5a3e
ggml : Leverage the existing GGML_F32_VEC helpers to vectorize ggml_v…
sirus20x6 Oct 22, 2025
9b9201f
webui: introduce OpenAI-compatible model selector in JSON payload (#1…
ServeurpersoCom Oct 22, 2025
a2e0088
Revert "ggml : Leverage the existing GGML_F32_VEC helpers to vectoriz…
slaren Oct 22, 2025
63d2fc4
Add experimental ggml-hexagon backend for the Hexagon NPU (#16547)
max-krasnyansky Oct 22, 2025
9de9672
sycl: use async memory allocation to fix crashes during graph recordi…
mmichel11 Oct 23, 2025
8cf6b42
server : send partial stop string when <EOG> is reached (#15007)
matteoserva Oct 23, 2025
061f0ef
ggml-cuda: use passed ops instead of hardcoded ops (#16712)
am17an Oct 23, 2025
fe6a988
Manually link -lbsd to resolve flock symbol on AIX (#16610)
mehendarkarprajwal Oct 23, 2025
d0660f2
mtmd-cli : allow using --jinja (#16718)
ngxson Oct 23, 2025
dd62dcf
convert : Make mistral-common dependency optional (#16738)
juliendenize Oct 23, 2025
0bf47a1
server: add memory breakdown print (#16740)
JohannesGaessler Oct 23, 2025
f8f071f
convert : handle pre-quantized models (#14810)
compilade Oct 23, 2025
5a91109
model-conversion : add trust_remote_code for orig model run [no ci] (…
danbev Oct 24, 2025
69e9ff0
webui: support q URL parameter (#16728)
odrling Oct 24, 2025
0bcb40b
CUDA: use CUB for arbitary size argsort (#16754)
am17an Oct 24, 2025
55945d2
ggml: fix CUDA grid launch condition for large block_nums.y in binbca…
leejet Oct 24, 2025
5cca254
convert : avoid dequantizing mxfp4 for GPT-OSS (#16756)
compilade Oct 25, 2025
8423d01
vulkan: Optimize SSM_SCAN (#16645)
jeffbolznv Oct 25, 2025
f90b4a8
vulkan: delete dead code (#16732)
giuseppe Oct 25, 2025
226f295
model : set res->t_embd in PLaMo2 models (#16766)
mitmul Oct 25, 2025
5d195f1
convert : handle mmproj filename/path properly (#16760)
Galunid Oct 25, 2025
3cfa9c3
vulkan: deduplicate Microsoft Direct3D12 devices (#16689)
giladgd Oct 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
171 changes: 171 additions & 0 deletions .clang-format
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
---
Language: Cpp
AlignAfterOpenBracket: Align
AlignArrayOfStructures: Left
AlignConsecutiveAssignments: AcrossComments
AlignConsecutiveBitFields: AcrossComments
AlignConsecutiveDeclarations: AcrossComments
AlignConsecutiveMacros: AcrossComments
# AlignConsecutiveShortCaseStatements: AcrossComments
AlignEscapedNewlines: Left # LeftWithLastLine
AlignOperands: Align
AlignTrailingComments:
Kind: Always
OverEmptyLines: 1
AllowAllArgumentsOnNextLine: true
AllowAllParametersOfDeclarationOnNextLine: false
# AllowBreakBeforeNoexceptSpecifier: OnlyWithParen
AllowShortBlocksOnASingleLine: Never
AllowShortCaseLabelsOnASingleLine: false
AllowShortFunctionsOnASingleLine: Inline
AllowShortIfStatementsOnASingleLine: Never
AllowShortLambdasOnASingleLine: Inline
AllowShortLoopsOnASingleLine: false
AlwaysBreakBeforeMultilineStrings: true
# Treat CUDA keywords/attributes as "attribute macros" and avoid breaking lines inside them
AttributeMacros:
- __host__
- __device__
- __global__
- __forceinline__
- __launch_bounds__
BinPackArguments: true
BinPackParameters: false # OnePerLine
BitFieldColonSpacing: Both
BreakBeforeBraces: Custom # Attach
BraceWrapping:
AfterCaseLabel: true
AfterClass: false
AfterControlStatement: false
AfterEnum: false
AfterFunction: false
AfterNamespace: false
AfterObjCDeclaration: false
AfterStruct: false
AfterUnion: false
AfterExternBlock: false
BeforeCatch: false
BeforeElse: false
BeforeLambdaBody: false
BeforeWhile: false
IndentBraces: false
SplitEmptyFunction: false
SplitEmptyRecord: false
SplitEmptyNamespace: false
# BreakAdjacentStringLiterals: true
BreakAfterAttributes: Never
BreakBeforeBinaryOperators: None
BreakBeforeInlineASMColon: OnlyMultiline
BreakBeforeTernaryOperators: false
# BreakBinaryOperations: Never
BreakConstructorInitializers: AfterColon
# BreakFunctionDefinitionParameters: false
BreakInheritanceList: AfterComma
BreakStringLiterals: true
# BreakTemplateDeclarations: Yes
ColumnLimit: 120
CommentPragmas: '^ IWYU pragma:'
CompactNamespaces: false
ConstructorInitializerIndentWidth: 4
ContinuationIndentWidth: 4
Cpp11BracedListStyle: false
DerivePointerAlignment: false
DisableFormat: false
EmptyLineBeforeAccessModifier: Leave
EmptyLineAfterAccessModifier: Never
ExperimentalAutoDetectBinPacking: false
FixNamespaceComments: true
IncludeBlocks: Regroup
IncludeCategories:
- Regex: '".*"'
Priority: 1
SortPriority: 0
- Regex: '^<.*\.h>'
Priority: 2
SortPriority: 0
- Regex: '^<.*'
Priority: 3
SortPriority: 0
- Regex: '.*'
Priority: 4
SortPriority: 0
IncludeIsMainRegex: '([-_](test|unittest))?$'
IncludeIsMainSourceRegex: ''
IndentAccessModifiers: false
IndentCaseBlocks: true
IndentCaseLabels: true
IndentExternBlock: NoIndent
IndentGotoLabels: false
IndentPPDirectives: AfterHash
IndentWidth: 4
IndentWrappedFunctionNames: false
InsertBraces: true # NOTE: may lead to incorrect formatting
InsertNewlineAtEOF: true
JavaScriptQuotes: Leave
JavaScriptWrapImports: true
KeepEmptyLinesAtTheStartOfBlocks: false
LambdaBodyIndentation: Signature
LineEnding: LF
MacroBlockBegin: ''
MacroBlockEnd: ''
MaxEmptyLinesToKeep: 1
NamespaceIndentation: None
ObjCBinPackProtocolList: Auto
ObjCBlockIndentWidth: 4
ObjCSpaceAfterProperty: true
ObjCSpaceBeforeProtocolList: true
PPIndentWidth: -1
PackConstructorInitializers: CurrentLine
PenaltyBreakAssignment: 2
PenaltyBreakBeforeFirstCallParameter: 1
PenaltyBreakComment: 300
PenaltyBreakFirstLessLess: 120
PenaltyBreakString: 1000
PenaltyBreakTemplateDeclaration: 10
PenaltyExcessCharacter: 1000000
PenaltyReturnTypeOnItsOwnLine: 200
PointerAlignment: Middle
QualifierAlignment: Left
#QualifierOrder: ['static', 'inline', 'friend', 'constexpr', 'const', 'volatile', 'type', 'restrict']
RawStringFormats:
- Language: Cpp
Delimiters:
- cc
- CC
- cpp
- Cpp
- CPP
- 'c++'
- 'C++'
CanonicalDelimiter: ''
ReferenceAlignment: Middle
ReflowComments: false # IndentOnly
SeparateDefinitionBlocks: Always
SortIncludes: CaseInsensitive
SortUsingDeclarations: LexicographicNumeric
SpaceAfterCStyleCast: true
SpaceAfterLogicalNot: false
SpaceAfterTemplateKeyword: true
SpaceBeforeAssignmentOperators: true
SpaceBeforeCpp11BracedList: false
SpaceBeforeCtorInitializerColon: true
SpaceBeforeInheritanceColon: true
SpaceBeforeParens: ControlStatements
SpaceBeforeRangeBasedForLoopColon: true
SpaceInEmptyBlock: false
SpaceInEmptyParentheses: false
SpacesBeforeTrailingComments: 2
SpacesInAngles: Never
SpacesInContainerLiterals: true
SpacesInLineCommentPrefix:
Minimum: 1
Maximum: -1
SpacesInParentheses: false
SpacesInSquareBrackets: false
SpaceBeforeSquareBrackets: false
Standard: c++17
TabWidth: 4
UseTab: Never
WhitespaceSensitiveMacros: ['STRINGIZE']
...

5 changes: 5 additions & 0 deletions .clang-tidy
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,17 @@ Checks: >
-readability-implicit-bool-conversion,
-readability-magic-numbers,
-readability-uppercase-literal-suffix,
-readability-simplify-boolean-expr,
-readability-math-missing-parentheses,
clang-analyzer-*,
-clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling,
performance-*,
-performance-enum-size,
portability-*,
-portability-simd-intrinsics,
misc-*,
-misc-const-correctness,
-misc-non-private-member-variables-in-classes,
-misc-no-recursion,
-misc-use-anonymous-namespace,
FormatStyle: none
130 changes: 130 additions & 0 deletions .devops/cann.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# ==============================================================================
# ARGUMENTS
# ==============================================================================

# Define the CANN base image for easier version updates later
ARG CANN_BASE_IMAGE=quay.io/ascend/cann:8.1.rc1-910b-openeuler22.03-py3.10

# ==============================================================================
# BUILD STAGE
# Compile all binary files and libraries
# ==============================================================================
FROM ${CANN_BASE_IMAGE} AS build

# Define the Ascend chip model for compilation. Default is Ascend910B3
ARG ASCEND_SOC_TYPE=Ascend910B3

# -- Install build dependencies --
RUN yum install -y gcc g++ cmake make git libcurl-devel python3 python3-pip && \
yum clean all && \
rm -rf /var/cache/yum

# -- Set the working directory --
WORKDIR /app

# -- Copy project files --
COPY . .

# -- Set CANN environment variables (required for compilation) --
# Using ENV instead of `source` allows environment variables to persist across the entire image layer
ENV ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest
ENV LD_LIBRARY_PATH=${ASCEND_TOOLKIT_HOME}/lib64:${LD_LIBRARY_PATH}
ENV PATH=${ASCEND_TOOLKIT_HOME}/bin:${PATH}
ENV ASCEND_OPP_PATH=${ASCEND_TOOLKIT_HOME}/opp
ENV LD_LIBRARY_PATH=${ASCEND_TOOLKIT_HOME}/runtime/lib64/stub:$LD_LIBRARY_PATH
# ... You can add other environment variables from the original file as needed ...
# For brevity, only core variables are listed here. You can paste the original ENV list here.

# -- Build llama.cpp --
# Use the passed ASCEND_SOC_TYPE argument and add general build options
RUN source /usr/local/Ascend/ascend-toolkit/set_env.sh --force \
&& \
cmake -B build \
-DGGML_CANN=ON \
-DCMAKE_BUILD_TYPE=Release \
-DSOC_TYPE=${ASCEND_SOC_TYPE} \
. && \
cmake --build build --config Release -j$(nproc)

# -- Organize build artifacts for copying in later stages --
# Create a lib directory to store all .so files
RUN mkdir -p /app/lib && \
find build -name "*.so" -exec cp {} /app/lib \;

# Create a full directory to store all executables and Python scripts
RUN mkdir -p /app/full && \
cp build/bin/* /app/full/ && \
cp *.py /app/full/ && \
cp -r gguf-py /app/full/ && \
cp -r requirements /app/full/ && \
cp requirements.txt /app/full/
# If you have a tools.sh script, make sure it is copied here
# cp .devops/tools.sh /app/full/tools.sh

# ==============================================================================
# BASE STAGE
# Create a minimal base image with CANN runtime and common libraries
# ==============================================================================
FROM ${CANN_BASE_IMAGE} AS base

# -- Install runtime dependencies --
RUN yum install -y libgomp curl && \
yum clean all && \
rm -rf /var/cache/yum

# -- Set CANN environment variables (required for runtime) --
ENV ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest
ENV LD_LIBRARY_PATH=/app:${ASCEND_TOOLKIT_HOME}/lib64:${LD_LIBRARY_PATH}
ENV PATH=${ASCEND_TOOLKIT_HOME}/bin:${PATH}
ENV ASCEND_OPP_PATH=${ASCEND_TOOLKIT_HOME}/opp
# ... You can add other environment variables from the original file as needed ...

WORKDIR /app

# Copy compiled .so files from the build stage
COPY --from=build /app/lib/ /app

# ==============================================================================
# FINAL STAGES (TARGETS)
# ==============================================================================

### Target: full
# Complete image with all tools, Python bindings, and dependencies
# ==============================================================================
FROM base AS full

COPY --from=build /app/full /app

# Install Python dependencies
RUN yum install -y git python3 python3-pip && \
pip3 install --no-cache-dir --upgrade pip setuptools wheel && \
pip3 install --no-cache-dir -r requirements.txt && \
yum clean all && \
rm -rf /var/cache/yum

# You need to provide a tools.sh script as the entrypoint
ENTRYPOINT ["/app/tools.sh"]
# If there is no tools.sh, you can set the default to start the server
# ENTRYPOINT ["/app/llama-server"]

### Target: light
# Lightweight image containing only llama-cli
# ==============================================================================
FROM base AS light

COPY --from=build /app/full/llama-cli /app

ENTRYPOINT [ "/app/llama-cli" ]

### Target: server
# Dedicated server image containing only llama-server
# ==============================================================================
FROM base AS server

ENV LLAMA_ARG_HOST=0.0.0.0

COPY --from=build /app/full/llama-server /app

HEALTHCHECK --interval=5m CMD [ "curl", "-f", "http://localhost:8080/health" ]

ENTRYPOINT [ "/app/llama-server" ]
22 changes: 0 additions & 22 deletions .devops/cloud-v-pipeline

This file was deleted.

Loading