Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1349 commits
Select commit Hold shift + click to select a range
531d7b6
ggml: new backend for Virglrenderer API Remoting acceleration (v2) (l…
kpouget Jan 28, 2026
dda7d9c
vulkan: handle device dedup on MacOS + Vega II Duo cards (llama/19058)
okuvshynov Jan 28, 2026
cc0c103
ggml-sycl: remove unused syclcompat header (llama/19140)
PatKamin Jan 28, 2026
33148bb
Vulkan Flash Attention Coopmat1 Refactor (llama/19075)
0cc4m Jan 28, 2026
f0e85bb
sycl: fix norm kernels: l2_norm, group_norm, rms_norm by remove asser…
arthw Jan 29, 2026
62ba8b5
CUDA: refactor topk-moe to enable more models (GLM 4.7, Nemotron etc.…
am17an Jan 29, 2026
e0a2182
ggml-zendnn : resolve ZenDNN backend cross-module symbol dependency (…
z-vishal Jan 29, 2026
34a3e28
HIP: add mmf for CDNA (llama/18896)
zhang-hui-yulo Jan 29, 2026
b997e69
cuda : fix nkvo, offload and cuda graph node properties matching (lla…
ggerganov Jan 29, 2026
2a89a3f
hexagon: enable offloading to Hexagon on Windows on Snapdragon (llama…
tboinovski1 Jan 29, 2026
829e700
ggml-webgpu: improve flastAttention performance by software pipelinin…
ArberSephirotheca Jan 29, 2026
1b3c27e
sycl: implement GGML_OP_TRI (llama/19089)
RachelMantel Jan 30, 2026
2a16e7a
sycl: implement GGML_UNARY_OP_SOFTPLUS (llama/19114)
s8322 Jan 30, 2026
5dca0db
add tensor type checking as part of cuda graph properties (llama/19186)
bssrdf Jan 30, 2026
b529c06
sync : ggml
ggerganov Jan 30, 2026
953e503
talk-llama : sync llama.cpp
ggerganov Jan 30, 2026
acbace0
cuda : fix compile warnings (#0)
ggerganov Jan 30, 2026
bf422cb
scripts : Fix dSYMs path case for macOS xcframework build (#3630)
friederbluemle Jan 30, 2026
aa1bc0d
ruby : add `VAD::Context#segments_from_samples`, allow Pathname, etc.…
KitaitiMakoto Jan 30, 2026
941bdab
ruby : add `Whisper::Context::Params`, fix token memory management (#…
KitaitiMakoto Feb 4, 2026
fc1a3e5
cmake : remove unused file (ggml/1419)
ggerganov Jan 30, 2026
06e3750
ggml : bump version to 0.9.6 (ggml/1423)
ggerganov Feb 7, 2026
efd6344
Correctly fetch q8_1 quantize pipeline in test as needed by 8a3519b (…
sredman Jan 30, 2026
db9c887
opencl: add optimized q8_0 mm kernel for adreno (llama/18871)
shaofeiqi Jan 30, 2026
9b927dd
ggml-hexagon: flash-attention and reduce-sum optimizations (llama/19141)
chraac Jan 31, 2026
aca5953
Bump cmake max version (needed for Windows on Snapdragon builds) (lla…
max-krasnyansky Feb 1, 2026
a0256b8
Remove pipeline cache mutexes (llama/19195)
nikhilJain17 Feb 2, 2026
0e219eb
docs : Minor cleanups (llama/19252)
ckastner Feb 2, 2026
625c8d8
ggml-backend: fix async set/get fallback sync (llama/19179)
JohannesGaessler Feb 2, 2026
73e0455
metal : support virtual devices (llama/18919)
ggerganov Feb 2, 2026
74353e9
sycl: implement GGML_OP_TOP_K (llama/19242)
tdevelope Feb 2, 2026
c4003da
Remove support for Nvidia & AMD GPU, because the oneAPI plugin for Nv…
arthw Feb 2, 2026
8710630
ggml-cpu: FA split across kv for faster TG (llama/19209)
am17an Feb 2, 2026
591072f
opencl: refactor some ops, concat, repeat, tanh and scale (llama/19226)
lhez Feb 2, 2026
6ec362d
cuda : revert CUDA_SCALE_LAUNCH_QUEUES override until investigated (l…
gaugarg-nv Feb 3, 2026
57107b2
ggml: added cleanups in ggml_quantize_free (llama/19278)
noctrex Feb 3, 2026
698265d
CUDA: Fix loop unrolling for BW in mul_mat_q_stream_k_fixup (llama/19…
ORippler Feb 3, 2026
ce8a2da
metal : minor cleanup (llama/19251)
ggerganov Feb 3, 2026
8eede80
CUDA: use mmvq for mul-mat-id for small batch sizes (llama/18958)
am17an Feb 3, 2026
aa34558
vulkan: disable coopmat1 fa on Nvidia Turing (llama/19290)
0cc4m Feb 3, 2026
5dda94d
metal : add solve_tri (llama/19302)
ggerganov Feb 3, 2026
4685ec9
ggml-cpu: use LUT for converting e8->f32 scales on x86 (llama/19288)
am17an Feb 4, 2026
2763054
ggml-virtgpu: make the code thread safe (llama/19204)
kpouget Feb 4, 2026
eecc9bf
metal : add missing includes (llama/19348)
will-lms Feb 5, 2026
e0a3f39
vulkan: fix non-contig rope (llama/19299)
jeffbolznv Feb 5, 2026
5a786f7
vulkan: Set k_load_shmem to false when K is too large (llama/19301)
jeffbolznv Feb 5, 2026
932def3
vulkan: fix GPU deduplication logic. (llama/19222)
okuvshynov Feb 5, 2026
0781df2
metal : add diag (llama/19330)
ggerganov Feb 5, 2026
a567c14
vulkan: Preprocess FA mask to detect all-neg-inf and all-zero. (llama…
jeffbolznv Feb 5, 2026
34d332a
metal : adaptive CPU/GPU interleave based on number of nodes (llama/1…
ggerganov Feb 5, 2026
2a7d549
cuda : cuda graphs now compare all node params (llama/19383)
ggerganov Feb 6, 2026
776cf61
metal : skip loading all-zero mask (llama/19337)
ggerganov Feb 6, 2026
c1b6335
vulkan: make FA mask/softcap enables spec constants (llama/19309)
jeffbolznv Feb 6, 2026
cea22b3
vulkan: For coopmat2 FA, use fp16 accumulators for the final result (…
jeffbolznv Feb 6, 2026
f2f7320
sycl: add F16 support for GGML_OP_CEIL (llama/19306)
NechamaKrashinski Feb 6, 2026
1739af6
ggml-webgpu: JIT compile binary operators and handle binding overlaps…
abhijitramesh Feb 6, 2026
a9a0a51
metal : fix event synchronization in cpy_tensor_async (llama/19402)
ggerganov Feb 7, 2026
55d7cb2
metal : consolidate bin kernels (llama/19390)
ggerganov Feb 7, 2026
b0e81c1
sync : ggml
ggerganov Feb 7, 2026
4b23ff2
talk-llama : sync llama.cpp
ggerganov Feb 7, 2026
193f7cd
ci : try fix mirrors (#3655)
ggerganov Feb 9, 2026
eb27fa2
server : fix hardcoded /inference path in default HTML page (#3639)
sidmohan0 Feb 9, 2026
525be69
cmake: Drop obsolete build-time configuration of backends (#3649)
ckastner Feb 9, 2026
052066c
chore: Update outdated GitHub Actions versions (#3646)
pgoslatara Feb 9, 2026
764482c
ci: add vulkan docker image (#3644)
rare-magma Feb 9, 2026
8089042
CUDA: Fix non-contig rope (llama/19338)
ORippler Feb 8, 2026
a36210c
cuda : extend GGML_OP_PAD to work with non-cont src0 (llama/19429)
ggerganov Feb 10, 2026
6a74f56
CANN: implement quantized MUL_MAT_ID for MoE models (llama/19228)
hipudding Feb 10, 2026
2de2fc9
CANN: Remove unnecessary wrapper for `gml_backend_buft_is_cann` (llam…
rauletorresc Feb 10, 2026
b0fe2e8
ggml : use noexcept overload for is_regular_file in backend registrat…
k4ss4n Feb 10, 2026
d77265c
ggml-cpu: arm64: q6_K repack gemm and gemv (and generic) implementati…
Alcpz Feb 10, 2026
562255f
Plug memory leaks and free resources on shutdown (llama/19315)
nikhilJain17 Feb 10, 2026
57c620b
CUDA : Update CCCL-tag for 3.2 to final release from RC (llama/19486)
ORippler Feb 10, 2026
de949fb
metal : consolidate unary ops (llama/19490)
ggerganov Feb 11, 2026
3504358
ggml : extend bin bcast for permuted src1 (llama/19484)
ggerganov Feb 11, 2026
09587ce
hexagon: Add ARGSORT, DIV, SQR, SQRT, SUM_ROWS, GEGLU (llama/19406)
max-krasnyansky Feb 11, 2026
3ffa1fd
metal : extend l2_norm support for non-cont src0 (llama/19502)
ggerganov Feb 11, 2026
f3e7898
ggml : unary ops support non-cont src0 + metal F16 unary ops (llama/1…
ggerganov Feb 11, 2026
0326fd3
opencl: add general Q6_K mm and Q4_K mv (llama/19347)
lhez Feb 11, 2026
3042056
hexagon: further optimization and tuning of matmul and dot kernels (l…
max-krasnyansky Feb 12, 2026
39b5f41
Add a workaround for compilation with ROCWMMA_FATTN and gfx9 (llama/1…
superm1 Feb 12, 2026
d8e3e2e
metal : update sum_rows kernel to support float4 (llama/19524)
ggerganov Feb 12, 2026
9f87eec
opencl: add basic support for q4_1 (llama/19534)
lhez Feb 12, 2026
195af60
hexagon: fix typo in vtcm_needs_release (llama/19545)
FanShupei Feb 12, 2026
c5325e5
metal : support GGML_OP_SET (llama/19548)
ggerganov Feb 13, 2026
0e94faa
metal : improve concurrency (llama/19555)
ggerganov Feb 13, 2026
3eb4905
CUDA: Do not mutate cgraph for fused ADDs (llama/19566)
ORippler Feb 13, 2026
58e3d5a
CUDA: loop over ne2*ne3 in case it overflows (llama/19538)
am17an Feb 13, 2026
628b545
fix vulkan ggml_acc only works in 3d but not 4d (llama/19426)
ymcki Feb 13, 2026
e8a2565
Fix wrong memcpy length for block_interleave == 4 (llama/19575)
Alcpz Feb 13, 2026
ec57bf4
vulkan: restore -inf check in FA shaders (llama/19582)
jeffbolznv Feb 13, 2026
e6476d4
hexagon: further optimizations and refactoring for flash attention (l…
max-krasnyansky Feb 14, 2026
fc6bbab
vulkan: Add vendor id for Qualcomm drivers (llama/19569)
strongtz Feb 14, 2026
197e9ab
vulkan: support GGML_OP_SET (llama/19584)
jeffbolznv Feb 14, 2026
cc448de
vulkan: support L2_NORM with contiguous rows (llama/19604)
jeffbolznv Feb 14, 2026
fbdac51
metal : fix ACC op (llama/19427)
ggerganov Feb 14, 2026
226e8c0
ggml : fix GGML_DEBUG with OpenMP (llama/19599)
angt Feb 14, 2026
4ac70ce
models : optimize qwen3next graph (llama/19375)
ggerganov Feb 14, 2026
83f2ed1
sync : ggml
ggerganov Feb 15, 2026
364c77f
talk-llama : sync llama.cpp
ggerganov Feb 15, 2026
21411d8
docs : fix duplicate word typo in VAD section (#3670)
cluster2600 Feb 19, 2026
cec1dd9
examples : update miniaudio library to 0.11.24 (#3672)
data-man Feb 27, 2026
4bea3cd
ggml : bump version to 0.9.7 (ggml/1425)
ggerganov Feb 15, 2026
7ee772a
cmake : fix KleidiAI install target failure with EXCLUDE_FROM_ALL (ll…
ssam18 Feb 15, 2026
76f769d
ggml-cpu: FA add GEMM microkernel (llama/19422)
am17an Feb 15, 2026
7b5a1eb
ggml-cpu: optimize ggml_vec_dot_bf16 for s390x (llama/19399)
taronaeo Feb 15, 2026
22f0861
ggml : avoid UB in gemm ukernel (llama/19642)
ggerganov Feb 15, 2026
df2f8d3
cmake : check if KleidiAI API has been fetched (llama/19640)
danbev Feb 15, 2026
02a9f66
cuda: optimize iq2xxs/iq2xs/iq3xxs dequantization (llama/19624)
dfriehs Feb 15, 2026
f8f7c1d
ggml: aarch64: Implement SVE in Gemm q4_k 8x8 q8_k Kernel (llama/19132)
abhijain1204fujitsu Feb 16, 2026
5d9d72e
Adjust workaround for ROCWMMA_FATTN/GFX9 to only newer ROCm veresions…
superm1 Feb 16, 2026
5ee5748
ggml : make `ggml_is_view` as API (llama/19539)
foldl Feb 16, 2026
cf4bd07
cuda : enable CUDA graphs for MMID 1 <= BS <= 4 (llama/19645)
ggerganov Feb 17, 2026
58855d0
ggml: ggml-cpu: force-no-lto-for-cpu-feats (llama/19609)
talhaHavadar Feb 17, 2026
6fadc74
opencl: optimize mean and sum_row kernels (llama/19614)
shaofeiqi Feb 17, 2026
51ce7de
opencl: refactor expm1 and softplus (llama/19404)
shaofeiqi Feb 17, 2026
f1da0a2
vulkan: split mul_mat into multiple dispatches to avoid overflow (lla…
jeffbolznv Feb 18, 2026
fc7a78f
ggml webgpu: shader library organization (llama/19530)
reeselevine Feb 25, 2026
8b3a52b
ggml webgpu: Fix bug in dispatching large matrix-vector multiplicatio…
reeselevine Feb 18, 2026
cc9e5cf
llamafile: powerpc: add FP16 MMA path for Q4/Q8 matmul (llama/19709)
shalinib-ibm Feb 19, 2026
ade724f
CUDA: fix kernel selection logic for tile FA (llama/19686)
JohannesGaessler Feb 19, 2026
3f68f30
vulkan: fix MMQ shader push constants and multi-dispatch (llama/19732)
0cc4m Feb 19, 2026
0158795
ggml-webgpu: Add unary op (SQR, SQRT, SIN, COS) support. (llama/19700)
yomaytk Feb 19, 2026
0c10a15
ggml-cpu: add RVV vec dot kernels for quantization types (llama/18784)
taimur-10x Feb 20, 2026
98915f8
Improve CUDA graph capture (llama/19754)
gaugarg-nv Feb 21, 2026
06fbd9c
ggml-cpu: arm64: q5_K repack gemm and gemv (and generic) implementati…
Alcpz Feb 23, 2026
53b571a
hexagon refactor all Ops to use local context struct (llama/19819)
max-krasnyansky Feb 24, 2026
344eae3
vulkan: fix data race in mul_mat_id shader (llama/19790)
jeffbolznv Feb 24, 2026
dcc8776
vulkan: fix coopmat1 without bf16 support (llama/19793)
jeffbolznv Feb 24, 2026
90800b5
Vulkan Scalar Flash Attention Refactor (llama/19625)
0cc4m Feb 24, 2026
279be33
ggml/gguf : prevent integer overflows (llama/19856)
ggerganov Feb 24, 2026
fb55b26
vulkan: check for memory overlap before doing fusion (llama/19768)
jeffbolznv Feb 25, 2026
4cac408
support permuted, remove check s0/s10 (llama/19889)
arthw Feb 26, 2026
f877e1b
ggml-virtgpu: improve the reliability of the code (llama/19846)
kpouget Feb 26, 2026
e722ee1
vulkan: fix fp16 Flash Attention on Windows AMD RDNA2 and below (llam…
0cc4m Feb 26, 2026
316d921
ggml : fix AMX and add batched support (llama/19925)
angt Feb 26, 2026
9c1fd5c
ggml-zendnn: update code for latest ZenDNN API (llama/19923)
z-vishal Feb 27, 2026
64f4860
replace the magic nunber 768 by max work group size to support iGPU (…
arthw Feb 27, 2026
4734056
sync : ggml
ggerganov Feb 27, 2026
84f8db7
talk-llama : sync llama.cpp
ggerganov Feb 27, 2026
aaf8bdf
scripts : sync gguf
ggerganov Feb 27, 2026
9453b4b
gguf : sync (ggml/0)
ggerganov Feb 27, 2026
30c5194
ruby : null-check (#3689)
KitaitiMakoto Mar 5, 2026
b524b5a
ggml-cpu: add repack for mxfp4 (llama/19738)
am17an Feb 27, 2026
699eaf3
CUDA: add CDNA3 MFMA support for flash attention MMA kernel (llama/19…
Jayluci4 Feb 27, 2026
ca3f6bb
cuda: cap grid.y at 65535 in non-contiguous dequantize/convert kernel…
oobabooga Mar 1, 2026
2a9649c
vulkan: improve partial offloading performance on AMD (llama/19976)
0cc4m Mar 1, 2026
e2be9ed
ggml-cpu: optimise s390x multiply extend instructions (llama/20032)
taronaeo Mar 2, 2026
923a292
vulkan: tune MMVQ for Intel Windows (llama/19988)
0cc4m Mar 2, 2026
de686fa
ggml-webgpu: Support non-contiguous `src0` and overlapping `src0/src1…
yomaytk Mar 2, 2026
22034a5
ggml webgpu: Clean up per-thread parameter buffer pool and job submis…
nikhilJain17 Mar 2, 2026
3145384
ggml webgpu: fix workgroup dispatch limit for large batch sizes (llam…
abhijitramesh Mar 3, 2026
3a96680
opencl: add optimized q4_1 mm kernel for adreno (llama/19840)
shaofeiqi Mar 3, 2026
169d723
kleidiai : add sme fp16 compute path for q4_0 gemm on aarch64 (llama/…
chaxu01 Mar 3, 2026
b1b018d
ggml : use a simple std::thread in AMX without OpenMP (llama/20074)
angt Mar 4, 2026
5d25427
ggml: fix ggml_is_contiguous_n for ne == 1 (llama/20092)
JohannesGaessler Mar 4, 2026
8d78d40
Add concat op to webgpu. (llama/20068)
yomaytk Mar 4, 2026
4834971
Fix wait logic for inflight jobs (llama/20096)
nikhilJain17 Mar 4, 2026
2c50962
opencl: add `SET`, support i32 for `CPY`, minor refactor for cpy (lla…
lhez Mar 5, 2026
2e79b85
hexagon: Flash Attention optimizations (dma, mpyacc, multi-row) and M…
max-krasnyansky Mar 5, 2026
67abc63
chore : correct typos [no ci] (llama/20041)
marcelpetrick Mar 5, 2026
51f397c
CUDA: Improve performance via less synchronizations between token (ll…
aendk Mar 5, 2026
f56fb1b
hexagon: add fp16 support for binary ops: add,sub,mul,div (llama/20139)
YardenTal44 Mar 6, 2026
1d94b0b
opencl: add neg, exp and diag (llama/20127)
lhez Mar 6, 2026
596b655
ggml-cpu: fix data race for debug asserts (llama/20148)
JohannesGaessler Mar 6, 2026
d2d235f
CUDA: use shared mem for ssm_conv (llama/20128)
am17an Mar 6, 2026
548f2e5
ggml-cpu: Fix gcc 15 ICE on ppc64le (ggml/20083) (llama/20130)
shalinib-ibm Mar 6, 2026
5d9b73d
ggml: update comments for backends which have no memory to report (ll…
taronaeo Mar 6, 2026
d658720
ggml-cuda: add mem check for fusion (llama/19916)
am17an Mar 6, 2026
247ec20
cpu: skip redudant ROPE cache updates (llama/20149)
max-krasnyansky Mar 6, 2026
78b3801
hexagon: add f32 ssm_conv op (llama/20122)
tboinovski1 Mar 6, 2026
6e063fa
quants : Add memsets and other fixes for IQ quants (llama/19861)
bartowski1182 Mar 6, 2026
910034d
opencl: add l2_norm (llama/20160)
lhez Mar 7, 2026
49489bf
ggml: add GATED_DELTA_NET op (llama/19504)
am17an Mar 7, 2026
8a9b0ba
supprt Flash Attention for fp32/fp16/Q4/Q5/Q8 (llama/20190)
arthw Mar 8, 2026
4b0653a
vulkan: Fix data races in coopmat1 mul_mat(_id) (llama/20084)
jeffbolznv Mar 8, 2026
8d97f59
ggml-vulkan: Add ELU op support (llama/20183)
GiantPrince Mar 8, 2026
f099ed2
cuda : display total and free VRAM capacity during device initializat…
tehsiuhuang Mar 9, 2026
890c047
vulkan: skip zero size tensors in backend copies (llama/20233)
0cc4m Mar 9, 2026
65dbf3c
ggml-vulkan: add SGN operator, auto-generate Vulkan.csv and ops.md (l…
bertaye Mar 9, 2026
3984ae3
ggml-cuda: disable gdn for musa (llama/20278)
am17an Mar 9, 2026
d19c65e
metal : add upscale (llama/20284)
ggerganov Mar 9, 2026
ae21974
metal : extend mul_mv_ext to BF16, Q2_K, Q3_K (llama/20250)
arkavo-com Mar 9, 2026
cabe3d9
metal: handle command buffer failures gracefully in synchronize (llam…
JulianPscheid Mar 10, 2026
bd64b8a
ggml-cpu: add RVV repack GEMM and GEMV for quantization types (llama/…
taimur-10x Mar 10, 2026
dfa6858
kleidiai : support for concurrent sme and neon kernel execution (llam…
chaxu01 Mar 10, 2026
fddedc5
ggml webgpu: faster normal quant and some k-quant matrix operations, …
reeselevine Mar 10, 2026
1e05b10
ggml : bump RPC version (llama/20330)
ggerganov Mar 10, 2026
72c7a25
fix for failed UT case: ACC, L2_NORM, UPSCALE, fused_glu, unary (llam…
arthw Mar 11, 2026
286387e
fix op rope, add rope_back (llama/20293)
arthw Mar 11, 2026
7c9a16c
cuda/hip: fix loop unrolling in ssm-conv (llama/20369)
IMbackK Mar 11, 2026
8b33555
ggml-cuda: gdn use shared mem for HIP (llama/20366)
IMbackK Mar 11, 2026
c2e384f
metal : add env var to trigger graph capture (llama/20398)
ggerganov Mar 11, 2026
0e1e76f
metal : fix q5_k mul_mv register spill (llama/20399)
ggerganov Mar 11, 2026
e2aa5c7
metal : fix capture_compute counter logic (llama/20410)
ggerganov Mar 11, 2026
5d3a544
llama : add support for Nemotron 3 Super (llama/20411)
danbev Mar 11, 2026
e4021d4
ggml : add NVFP4 quantization type support (llama/19769)
richarddd Mar 11, 2026
d73fe25
llama : enable chunked fused GDN path (llama/20340)
ggerganov Mar 11, 2026
5267523
ggml-webgpu: Add supports for `GGML_OP_REPEAT` (llama/20230)
yomaytk Mar 11, 2026
f5ba865
hip: compile debug builds with -O2 on hip to avoid a compiler bug (ll…
IMbackK Mar 12, 2026
193781c
opencl: add cumsum op (llama/18981)
shaofeiqi Mar 12, 2026
d5772cf
opencl: use larger workgroup size for get_rows (llama/20316)
lhez Mar 12, 2026
26ee4f7
vulkan: Fix ErrorOutOfHostMemory on Intel GPU when loading large mode…
rillomas Mar 12, 2026
6c5e3aa
vulkan: fix OOB check in flash_attn_mask_opt (llama/20296)
jeffbolznv Mar 12, 2026
86e312d
vulkan: fix l2_norm epsilon handling (llama/20350)
jeffbolznv Mar 12, 2026
7ccebd5
sync : ggml
ggerganov Mar 16, 2026
b48ffe2
metal : avoid divisions in bin kernel (llama/20426)
ggerganov Mar 16, 2026
7e816a9
sync : ggml
ggerganov Mar 16, 2026
44c12c6
vulkan: fix SSM_CONV PP scaling with large ubatch sizes (llama/20379)
ProgenyAlpha Mar 12, 2026
2450919
vulkan: add GATED_DELTA_NET op support (llama/20334)
ProgenyAlpha Mar 12, 2026
2ed6dc0
llama : disable graph reuse with pipeline parallelism (llama/20463)
ggerganov Mar 12, 2026
f1f5f43
metal : fix l2 norm scale (llama/20493)
ggerganov Mar 13, 2026
9bfa81d
ggml : fix typo gmml (llama/20512)
angt Mar 13, 2026
5905e87
ggml-cpu: add RVV vec dot kernels for quantization types (llama/18859)
rehan-10xengineer Mar 13, 2026
c7abcd5
graph : remove redundant GDN state transposes (llama/20443)
ggerganov Mar 13, 2026
a31600d
opencl: fix l2_norm (llama/20480)
lhez Mar 14, 2026
46aad76
Fix data race in CUDA's "cpy" kernel (influences GGML's DUP, CONT ope…
Exile333 Mar 14, 2026
96b163e
ggml : add OpenVINO backend (llama/15307)
wine99 Mar 14, 2026
8ad5cb1
Use fp32 in cuBLAS V100 to avoid overflows, env variables to override…
wallentri88 Mar 14, 2026
93d09fd
ggml : add native AVX512-FP16 support for F16 operations (llama/20529)
angt Mar 14, 2026
c5f9a49
add op gated_delta_net (llama/20455)
arthw Mar 14, 2026
55f8cfd
hexagon: Q4_0 and MXFP4 repack fixes (llama/20527)
max-krasnyansky Mar 14, 2026
b312018
metal : add FA specialization for HSK = 320, HSV = 256 (llama/20549)
ggerganov Mar 14, 2026
cd02195
vulkan: use graphics queue on AMD (llama/20551)
0cc4m Mar 15, 2026
55c6610
cuda : add RDNA4-specific MMVQ parameter table for bs=1 decode (llama…
JoursBleu Mar 15, 2026
6770239
ggml : guard against sumq2 being 0 in IQ4_NL (llama/20460)
bartowski1182 Mar 15, 2026
b327a32
ggml/hip: fix APU compatibility - soft error handling for hipMemAdvis…
moonshadow-25 Mar 15, 2026
2fb6aea
ggml: avoid creating CUDA context during device init (llama/20595)
ServeurpersoCom Mar 15, 2026
d7926e6
CUDA: limit number of FA stream-k CUDA blocks (llama/20586)
JohannesGaessler Mar 15, 2026
81ea958
common : add nvfp4 (ggml/0)
ggerganov Mar 15, 2026
d4bc312
ggml : extend im2col f16 (ggml/1434)
David366AI Mar 15, 2026
ab1252c
sync : ggml
ggerganov Mar 16, 2026
2bc630f
talk-llama : sync llama.cpp
ggerganov Mar 16, 2026
27fa207
ggml : try fix arm build (#0)
ggerganov Mar 16, 2026
136dc2e
server: return proper HTTP status codes for error responses (#3707)
dearlordylord Mar 16, 2026
21665ea
examples : Allow max_len to be used for any output format (#3679)
gaelj Mar 16, 2026
975b979
py : replace deprecated openvino-dev with openvino>=2023.3.0 (#3678)
Aiudadadadf Mar 16, 2026
79218f5
go : handle EOF correctly in model download (#3671)
Lumberj3ck Mar 16, 2026
dc96116
fix: VAD time mapping timestamp drift caused by overlap samples (#3711)
lohopupa Mar 17, 2026
945d315
ggml : restore ggml_type_sizef() to aboid major version bump (ggml/1441)
ggerganov Mar 16, 2026
b2be162
ggml : bump version to 0.9.8 (ggml/1442)
ggerganov Mar 16, 2026
f5b477a
sync : ggml
ggerganov Mar 18, 2026
4bbce1e
benches : update
ggerganov Mar 18, 2026
ef3463b
ci : update workflows
ggerganov Mar 18, 2026
9386f23
release : v1.8.4
ggerganov Mar 19, 2026
7ce31d4
Add seed parameter for reproducible sampling
Nov 10, 2025
2cc2313
add_codeowners file
Nov 11, 2025
6befb6f
added approval check worker
Nov 13, 2025
2a94ba2
DEVOPS-916: Add ai-runtime-merge to CODEOWNERS
Proletter Dec 16, 2025
8519283
Merge branch 'master' into rebase-v1.8.4
sharmaraju352 Mar 30, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
8 changes: 4 additions & 4 deletions .devops/main-cuda.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
ARG UBUNTU_VERSION=22.04
# This needs to generally match the container host's environment.
ARG CUDA_VERSION=12.3.1
ARG CUDA_VERSION=13.0.0
# Target the CUDA build image
ARG BASE_CUDA_DEV_CONTAINER=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}
# Target the CUDA runtime image
Expand All @@ -20,12 +20,12 @@
&& rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*

# Ref: https://stackoverflow.com/a/53464012
ENV CUDA_MAIN_VERSION=12.3
ENV CUDA_MAIN_VERSION=13.0
ENV LD_LIBRARY_PATH /usr/local/cuda-${CUDA_MAIN_VERSION}/compat:$LD_LIBRARY_PATH

Check warning on line 24 in .devops/main-cuda.Dockerfile

View workflow job for this annotation

GitHub Actions / Push Docker image to Docker Hub (main-cuda, .devops/main-cuda.Dockerfile, linux/amd64)

Legacy key/value format with whitespace separator should not be used

LegacyKeyValueFormat: "ENV key=value" should be used instead of legacy "ENV key value" format More info: https://docs.docker.com/go/dockerfile/rule/legacy-key-value-format/

COPY .. .
# Enable cuBLAS
RUN make base.en CMAKE_ARGS="-DGGML_CUDA=1"
RUN make base.en CMAKE_ARGS="-DGGML_CUDA=1 -DCMAKE_CUDA_ARCHITECTURES='75;80;86;90'"

RUN find /app/build -name "*.o" -delete && \
find /app/build -name "*.a" -delete && \
Expand All @@ -34,8 +34,8 @@
rm -rf /app/build/_deps

FROM ${BASE_CUDA_RUN_CONTAINER} AS runtime
ENV CUDA_MAIN_VERSION=12.3
ENV CUDA_MAIN_VERSION=13.0
ENV LD_LIBRARY_PATH /usr/local/cuda-${CUDA_MAIN_VERSION}/compat:$LD_LIBRARY_PATH

Check warning on line 38 in .devops/main-cuda.Dockerfile

View workflow job for this annotation

GitHub Actions / Push Docker image to Docker Hub (main-cuda, .devops/main-cuda.Dockerfile, linux/amd64)

Legacy key/value format with whitespace separator should not be used

LegacyKeyValueFormat: "ENV key=value" should be used instead of legacy "ENV key value" format More info: https://docs.docker.com/go/dockerfile/rule/legacy-key-value-format/
WORKDIR /app

RUN apt-get update && \
Expand Down
13 changes: 7 additions & 6 deletions .devops/main-musa.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
ARG UBUNTU_VERSION=22.04
# This needs to generally match the container host's environment.
ARG MUSA_VERSION=rc4.0.1
ARG MUSA_VERSION=rc4.2.0
# Target the MUSA build image
ARG BASE_MUSA_DEV_CONTAINER=mthreads/musa:${MUSA_VERSION}-mudnn-devel-ubuntu${UBUNTU_VERSION}
ARG BASE_MUSA_DEV_CONTAINER=mthreads/musa:${MUSA_VERSION}-devel-ubuntu${UBUNTU_VERSION}-amd64
# Target the MUSA runtime image
ARG BASE_MUSA_RUN_CONTAINER=mthreads/musa:${MUSA_VERSION}-mudnn-runtime-ubuntu${UBUNTU_VERSION}
ARG BASE_MUSA_RUN_CONTAINER=mthreads/musa:${MUSA_VERSION}-runtime-ubuntu${UBUNTU_VERSION}-amd64

FROM ${BASE_MUSA_DEV_CONTAINER} AS build
WORKDIR /app
Expand Down Expand Up @@ -32,8 +32,9 @@ RUN apt-get update && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/* /tmp/* /var/tmp/*

COPY --from=build /app /app
RUN du -sh /app/*
RUN find /app -type f -size +100M
COPY --from=build /app/build/bin /app/build/bin
COPY --from=build /app/samples /app/samples
COPY --from=build /app/models /app/models

ENV PATH=/app/build/bin:$PATH
ENTRYPOINT [ "bash", "-c" ]
20 changes: 20 additions & 0 deletions .devops/main-vulkan.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
FROM ubuntu:24.04 AS build
WORKDIR /app

RUN apt-get update && \
apt-get install -y build-essential wget cmake git libvulkan-dev glslc \
&& rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*

COPY .. .
RUN make base.en CMAKE_ARGS="-DGGML_VULKAN=1"

FROM ubuntu:24.04 AS runtime
WORKDIR /app

RUN apt-get update && \
apt-get install -y curl ffmpeg libsdl2-dev wget cmake git libvulkan1 mesa-vulkan-drivers \
&& rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*

COPY --from=build /app /app
ENV PATH=/app/build/bin:$PATH
ENTRYPOINT [ "bash", "-c" ]
4 changes: 2 additions & 2 deletions .github/workflows/bindings-go.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,10 @@ jobs:
ubuntu-22:
runs-on: ubuntu-22.04
steps:
- uses: actions/setup-go@v5
- uses: actions/setup-go@v6
with:
go-version: '^1.23'
- uses: actions/checkout@v4
- uses: actions/checkout@v6
- run: |
cd bindings/go
make test
2 changes: 1 addition & 1 deletion .github/workflows/bindings-ruby.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,5 @@ jobs:
- uses: ruby/setup-ruby@v1
with:
ruby-version: '3.2'
- uses: actions/checkout@v4
- uses: actions/checkout@v6
- run: rake test
Loading
Loading