Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
251 commits
Select commit Hold shift + click to select a range
1335dfa
sycl : fix for untransposed GDA recurrent state (llama/20583)
CISC Mar 15, 2026
dae7781
CUDA: GDN hide memory latency (llama/20537)
am17an Mar 16, 2026
724ea71
vulkan: fix flash attention dot product precision (llama/20589)
0cc4m Mar 16, 2026
9232af5
kleidiai: add data type check to get_tensor_traits (llama/20639)
martin-klacer-arm Mar 16, 2026
6494251
ehance UPSCALE to support all UT cases (llama/20637)
arthw Mar 17, 2026
49adc8b
vulkan: allow graphics queue only through env var (llama/20599)
0cc4m Mar 17, 2026
ab7d305
kleidiai : fix MUL_MAT support for batched (3D) inputs (llama/20620)
jabr Mar 17, 2026
0ad6cee
vulkan: async and event fixes (llama/20518)
0cc4m Mar 17, 2026
c890a9d
ggml-cpu: fix RVV checks in quants and repacking (llama/20682)
taimur-10x Mar 17, 2026
906aef3
ggml-blas: set mkl threads from thread context (llama/20602)
kannon92 Mar 17, 2026
16ca5e6
vulkan: disable mmvq on Intel Windows driver (llama/20672)
0cc4m Mar 17, 2026
e222814
hexagon: add neg, exp, sigmoid, softplus ops, cont, repeat ops (llama…
srikris-sridhar Mar 17, 2026
61c7cd0
HIP : ignore return of hipMemAdvise [no ci] (llama/20696)
IMbackK Mar 18, 2026
14caedf
ggml-cpu/x86: fix unused changemask warning in repack (llama/20692)
mrshaw01 Mar 18, 2026
d6a0f0d
Move to no timeout for WaitAny in graph submission to avoid deadlocks…
reeselevine Mar 18, 2026
dfba84c
CANN: support flash attention for head dim not multiple of 16, fix AL…
noemotiovon Mar 19, 2026
12015a2
ggml-webgpu: Add supports for `DIAG` and `TRI` (llama/20664)
yomaytk Mar 19, 2026
3d004fb
ggml-webgpu: Update the `RMS_NORM` preprocessor and add `L2_NORM` (ll…
yomaytk Mar 28, 2026
2a6de29
CANN: handle in-place ROPE on non-contiguous f32 tensors (llama/20274)
noemotiovon Mar 19, 2026
fea629d
cmake : fix build warning when kleidiai is enabled (llama/20457)
chaxu01 Mar 19, 2026
43c7c0f
vulkan: dequantize iq4_xs 4 at a time (llama/20657)
netrunnereve Mar 19, 2026
551bb82
ggml webgpu: ops support for qwen3.5 (SET, TRI_SOLVE, SSM_CONV, GATED…
reeselevine Mar 19, 2026
081dc77
ci : add hip quality check (llama/20430)
IMbackK Mar 19, 2026
15f6b6a
hexagon: add Matrix Extensions (HMX) for Hexagon NPU backend (llama/2…
njsyw1997 Mar 19, 2026
e1cdce4
hip: Avoid compiler bug in RDNA code generation during debug builds o…
Exile333 Mar 19, 2026
65d820a
ggml: guard KleidiAI DOWNLOAD_EXTRACT_TIMESTAMP for cmake < 3.24 (lla…
sundaram123krishnan Mar 19, 2026
46dcb35
CANN: add BF16 support for core operators (llama/20152)
hipudding Mar 20, 2026
49b505b
vulkan: change gated_delta_net to shard a column across a subgroup (l…
jeffbolznv Mar 20, 2026
ca5d565
ggml-cpu: add always_inline to tinyBLAS_PPC accumulator saves (llama/…
shalinib-ibm Mar 20, 2026
22710fd
Add shader count for Intel Arc Pro B60 (llama/20818)
TheBlueMatt Mar 21, 2026
5f34282
fix(rpc): prevent division by zero in deserialize_tensor (llama/20712)
y198nt Mar 21, 2026
77b635e
Increase number of output elements per-thread block if the K-dimensio…
gaugarg-nv Mar 22, 2026
69f0d90
ggml-cuda: native bf16 flash attention for vec kernel (llama/20525)
eous Mar 22, 2026
1d0f028
support bf16 and quantized type (llama/20803)
arthw Mar 22, 2026
607c924
CUDA: fix BF16 FA compilation (llama/20865)
JohannesGaessler Mar 22, 2026
c976b22
opencl: add flattened Q4_K mv and general Q4_K mm (llama/20773)
shaofeiqi Mar 23, 2026
a0e41ec
fix(openvino): explicit memset in buffer_context allocation (llama/20…
thedanhoffman Mar 23, 2026
54f5c02
CANN: add RoPE cache preload before ACL graph capture (llama/20747)
noemotiovon Mar 23, 2026
c589dd7
metal: add CONV_3D (llama/19927)
Ra5hidIslam Mar 23, 2026
37c0a52
rpc : RCE patch (llama/20908)
las7 Mar 23, 2026
624be93
opencl: add q6_K gemm and gemv kernels for Adreno (llama/20089)
lhez Mar 23, 2026
116a9f6
hexagon: general DMA and Binary Op fixes for large strides (llama/20918)
max-krasnyansky Mar 23, 2026
eef7422
metal : add FA instantiations for HSK=512, HSV=512 (llama/20902)
ggerganov Mar 24, 2026
9e4e4c2
metal : add FLOOR, CEIL, ROUND, TRUNC unary ops (llama/20930)
nuri-yoo Mar 24, 2026
f2a8e65
sycl : fix wrong variable check by assert (llama/20903)
arthw Mar 25, 2026
3987857
llama: fix llama-model-saver (llama/20503)
JohannesGaessler Mar 25, 2026
495b77a
mtmd: Add DeepSeekOCR Support (llama/17400)
sfallah Mar 25, 2026
a050c7d
CUDA & CPU: support F32 kernel type for `CONV_TRANSPOSE_2D` (llama/17…
AgainstEntropy Mar 26, 2026
eb747f3
ggml-cuda: Add NVFP4 dp4a kernel (llama/20644)
michaelw9999 Mar 26, 2026
07237ff
fix(ggml): correct RISC-V ISA string canonical ordering for RVV in CM…
ihb2032 Mar 26, 2026
1848f99
opencl: allow large buffer for adreno (llama/20997)
lhez Mar 26, 2026
45a7083
hip: use fnuz fp8 for conversion on CDNA3 (llama/21040)
IMbackK Mar 26, 2026
b564a99
metal : Fix dimension constraint violation in matmul2d descriptor (ll…
lathrys-at Mar 27, 2026
7f466e2
rpc : proper handling of data pointers to CPU buffers (llama/21030)
rgerganov Mar 27, 2026
52699f6
hexagon: support for IQ4_NL and MXFP4 (llama/21018)
njsyw1997 Mar 27, 2026
759f008
vulkan: add noncontiguous GLU support (llama/21081)
0cc4m Mar 28, 2026
95ea8f9
sync : ggml
ggerganov Mar 29, 2026
166c20b
whisper : add stateless VAD detect + explicit state reset for streami…
danielbodart Apr 17, 2026
fc67457
bench : sync submit-results URL to ggml-org (#3769)
jinweihan-ai Apr 20, 2026
763a454
ggml : bump version to 0.9.9 (ggml/1449)
ggerganov Mar 30, 2026
9e96d39
hexagon: dma optimizations (mostly fixing regressions) (llama/21137)
max-krasnyansky Mar 29, 2026
6b67c91
Optimize MOE GEMV kernel for BS > 1. (llama/20905)
gaugarg-nv Mar 29, 2026
40ddc5a
rpc : fix misleading error log (llama/21184)
rgerganov Mar 30, 2026
75b9543
CUDA : Fix CUB's argsort when nrows % block_size == 0 CCCL < 3.1 (lla…
ORippler Mar 30, 2026
6ac5a50
opencl: add q4_K gemm and gemv kernels for Adreno (llama/20919)
shaofeiqi Mar 30, 2026
952c662
sycl : enhance fattn perf (llama/21185)
arthw Mar 31, 2026
5ffe588
CANN: fix multi-thread set_tensor race conditions (llama/20151)
hipudding Mar 31, 2026
21b9dd6
ggml-webgpu: port all AOT operators to JIT (llama/20728)
abhijitramesh Apr 1, 2026
78f54d1
ggml webgpu: quantized buffers to u32 + wider browser/device support …
reeselevine Apr 1, 2026
933bd1f
CUDA: Add Flash Attention Support for Head Dimension 512 (llama/20998)
anavp-nvidia Apr 1, 2026
1b95f84
ggml-cpu: fix fallback for RVV kernels without zvfh (llama/21157)
taimur-10x Apr 1, 2026
5c5b88e
ggml : fix RWKV ops thread assignment (llama/21226)
ggerganov Apr 1, 2026
1971a36
CUDA/HIP: Fix kernel slection for mmvq mmid kernel to align host sele…
IMbackK Apr 1, 2026
ace95aa
ggml : bump version to 0.9.10 (ggml/1454)
ggerganov Apr 1, 2026
981195b
ggml-cuda: Add generic NVFP4 MMQ kernel (llama/21074)
michaelw9999 Apr 1, 2026
fab70d2
sycl : support nvfp4 type in mul_mat (llama/21227)
arthw Apr 1, 2026
9a40dd9
hexagon: improve RMS_NORM and DIV accuracy (llama/21251)
aparmp-quic Apr 1, 2026
82bb26f
CUDA: fix FA kernel selection logic (llama/21271)
JohannesGaessler Apr 1, 2026
0810851
opencl: fix leak in Adreno q8_0 path (llama/21212)
lhez Apr 1, 2026
444662b
hexagon : add cumsum op support (llama/21246)
tboinovski1 Apr 2, 2026
514eabc
ggml : bump version to 0.9.11 (ggml/1456)
ggerganov Apr 2, 2026
7f6c0ac
sycl : fix llama_kv_cache hang when kv_cache is huge: 5GB (llama/21283)
arthw Apr 2, 2026
c5a5e65
ggml-webgpu: add vectorized flash attention (llama/20709)
ArberSephirotheca Apr 2, 2026
321f628
rpc : reuse compute graph buffers (llama/21299)
rgerganov Apr 3, 2026
3f51176
ggml-zendnn : add MUL_MAT_ID op support for MoE models (llama/21315)
z-vishal Apr 3, 2026
d6cfdc6
ggml-webgpu: move from parameter buffer pool to single buffer with of…
reeselevine Apr 3, 2026
c031045
hexagon: slight optimization for argosrt output init (llama/21463)
YardenTal44 Apr 6, 2026
42e4a28
sycl : handle other FA case (llama/21377)
arthw Apr 6, 2026
7b19b94
Write an optimized flash_attn_stream_k_fixup kernel (llama/21159)
gaugarg-nv Apr 6, 2026
0c2fbd4
ggml: add Q1_0 1-bit quantization support (CPU) (llama/21273)
khosravipasha Apr 6, 2026
9cbc4b3
ggml-webgpu: Add the support of `MUL_MAT_ID` (llama/21147)
yomaytk Apr 6, 2026
1ebf3ca
Add Q8_0 reorder optimization (~3x tg speedup on Intel Arc) (llama/21…
PMZFX Apr 7, 2026
a1f76fb
ggml-cuda : fix CDNA2 compute capability constant for gfx90a (MI210) …
aviallon Apr 7, 2026
18c98ff
vulkan: add FA dequant for q4_1, q5_0, q5_1, iq4_nl (llama/21029)
mkoker Apr 7, 2026
78b4fd8
ggml: Vulkan build, Linux -- output error string for errno on fork fa…
tomoverlund Apr 7, 2026
f1d2b83
ggml : deprecate GGML_OP_ADD1 (llama/21363)
ggerganov Apr 7, 2026
5ef7aaf
CUDA: check for buffer overlap before fusing (llama/21566)
am17an Apr 7, 2026
d145643
ggml-webgpu: parameterize submission size and add iOS specific limits…
reeselevine Apr 7, 2026
d91d1e8
ggml-cuda: ds_read_b128 for q4_0 and q4_1 mmq kernels (llama/21168)
iacopPBK Apr 7, 2026
fa2eaa4
CUDA: make cuda graphs props check faster (llama/21472)
am17an Apr 8, 2026
15deafa
metal: Q1_0 backend (llama/21528)
khosravipasha Apr 8, 2026
e70c0d4
webgpu : Query for adapter support when registering WebGPU backend (l…
reeselevine Apr 8, 2026
16dd171
fix: free ctx_copy in ggml_opt_free to plug per-training-session leak…
RealOrko Apr 8, 2026
2c74729
CUDA: also store `node->src->data` ptrs for equality check (llama/21635)
am17an Apr 8, 2026
1d55551
vulkan: unify type macros to use Vx instead of _VECx (llama/21605)
0cc4m Apr 9, 2026
4598eb0
sycl : add flash-attn support for head size 512 (llama/21654)
qnixsynapse Apr 9, 2026
f0ee409
metal : add missing mm-id specializations for q1_0 (llama/21662)
ggerganov Apr 9, 2026
c4c6e14
ggml : check return value of CUB calls used in argsort and top-k (the…
fairydreaming Apr 9, 2026
bb895c8
ggml: backend-agnostic tensor parallelism (experimental) (llama/19378)
JohannesGaessler Apr 9, 2026
c77a33d
HIP: add CDNA4 (gfx950) architecture support for MI350X/MI355X (llama…
andyluo7 Apr 9, 2026
2834720
CUDA: fuse muls (llama/21665)
am17an Apr 10, 2026
458ad1d
vulkan: Support Q1_0 (llama/21539)
jeffbolznv Apr 10, 2026
3fc738a
ggml-webgpu: address quantization precision and backend lifecycle man…
Constannnnnt Apr 10, 2026
2580cfc
ggml-webgpu: support non-square subgroup matrix configs for Intel GPU…
SharmaRithik Apr 10, 2026
28ce072
hexagon: improved Op queuing, buffer and cache management (llama/21705)
max-krasnyansky Apr 10, 2026
3af7c87
CUDA: also store node->src ne/nb for graph equality (llama/21736)
am17an Apr 11, 2026
34381b0
ggml : fix a few instances of missing GGML_TYPE_Q1_0 cases (llama/21716)
CISC Apr 11, 2026
e0c8e50
opencl: add basic support for q5_k (llama/21593)
shaofeiqi Apr 11, 2026
c0b46c2
CUDA: skip compilation of superfluous FA kernels (llama/21768)
JohannesGaessler Apr 11, 2026
b907207
mtmd: add Gemma 4 audio conformer encoder support (llama/21421)
stephencox-ict Apr 12, 2026
655072c
sycl: disable Q1_0 in backend and cleanup unused variables (llama/21807)
qnixsynapse Apr 13, 2026
36b7bb3
Remove extra conditional check on debug mode. (llama/21798)
yomaytk Apr 13, 2026
d9ed371
CUDA: Limit DeviceSegmentedSort to immediate mode (llama/21718)
ORippler Apr 13, 2026
0f99a47
vulkan: Flash Attention DP4A shader for quantized KV cache (llama/20797)
0cc4m Apr 13, 2026
cdeaa34
vulkan: Support GGML_TYPE_NVFP4 (llama/21455)
jeffbolznv Apr 14, 2026
b732f4d
ggml-webgpu: Update register tiling matmul to use f32 accumulation (l…
reeselevine Apr 14, 2026
bfdcd4a
cmake: fix CMP0194 warning on Windows with MSVC (llama/21630)
texasich Apr 14, 2026
80f7be7
ggml : fix ARM NEON nvfp4 dot product on non-dotprod targets (llama/2…
richarddd Apr 14, 2026
691b1d0
metal : add XIELU unary op (llama/20802)
seyoungjeong Apr 14, 2026
7024f7e
ci : re-enable mac workflows (llama/21894)
ggerganov Apr 14, 2026
45365fa
vulkan: Programmatically add RoundingModeRTE to all shaders when the …
jeffbolznv Apr 14, 2026
08e412c
metal : fix FA support logic (llama/21898)
ggerganov Apr 14, 2026
44d86c4
ggml : remove ggml-ext.h (llama/21869)
ngxson Apr 14, 2026
24cc89e
hexagon: optimization for HMX mat_mul (llama/21554)
njsyw1997 Apr 14, 2026
86d94cd
docs: more extensive RoPE documentation [no ci] (llama/21953)
ngxson Apr 15, 2026
182db04
rpc : add native RDMA transport for RPC backend (RoCEv2) (llama/20590)
dvv101111 Apr 15, 2026
7e57b20
CUDA: manage NCCL communicators in context (llama/21891)
JohannesGaessler Apr 15, 2026
9638e29
CUDA: require explicit opt-in for P2P access (llama/21910)
JohannesGaessler Apr 15, 2026
2a785c5
ggml-webgpu: Fix dequantization helpers to not pass in pointers (llam…
reeselevine Apr 15, 2026
c6d1fbf
cuda: Q1_0 initial backend (llama/21629)
khosravipasha Apr 15, 2026
7fe6b8e
vulkan: optimize im2col (llama/21713)
0cc4m Apr 15, 2026
f62bb13
Fix Q8_0 reorder: garbage on 2nd prompt + crash on full VRAM (llama/2…
PMZFX Apr 16, 2026
092330b
ggml-webgpu: compute pass batching and removing profiling overhead (l…
reeselevine Apr 16, 2026
07c181b
ggml : implemented simd_gemm kernel for riscv vector extension (llama…
rehan-10xengineer Apr 16, 2026
94d6d0b
ggml-cpu: add 128-bit RVV implementation for Quantization Vector Dot …
rehan-10xengineer Apr 16, 2026
655c075
metal: Implement ROLL op (llama/21946)
kushagharahi Apr 16, 2026
820438a
ggml: add graph_reused (llama/21764)
am17an Apr 16, 2026
57a48a4
opencl: add q5_K gemm and gemv kernels for Adreno (llama/21595)
shaofeiqi Apr 16, 2026
b25d5d0
hexagon: optimize HMX matmul operations (llama/21071)
chraac Apr 16, 2026
77c0630
opencl: refactor q8_0 set_tensor and mul_mat host side dispatch for A…
lhez Apr 17, 2026
918e0ad
CUDA: use LRU based eviction for cuda graphs (llama/21611)
am17an Apr 17, 2026
cbbe935
ggml-webgpu: fix compiler warnings and refactor FlashAttention encodi…
reeselevine Apr 17, 2026
a899e4b
ggml-backend-meta: add multi-segment read support in get_tensor (llam…
ssam18 Apr 18, 2026
32789b9
rpc : refactor the RPC transport (llama/21998)
rgerganov Apr 19, 2026
171f037
cmake: remove CMP0194 policy to restore MSVC builds (llama/21934)
texasich Apr 19, 2026
671fd15
ggml : reduce CPU overhead in meta backend (llama/22041)
gaugarg-nv Apr 19, 2026
945746b
HIP: Remove unesscary NCCL_CHECK (llama/21914)
IMbackK Apr 19, 2026
b8f57c9
CUDA: refactor mma data loading for AMD (llama/22051)
JohannesGaessler Apr 19, 2026
931cf2f
Fix reorder MMVQ assert on unaligned vocab sizes (llama/22035)
PMZFX Apr 20, 2026
5f21fdc
ggml-webgpu: updated matrix-vector multiplication (llama/21738)
neha-ha Apr 20, 2026
2b9fb0b
ggml-cpu: Optimized x86 and generic cpu q1_0 dot (follow up) (llama/2…
pl752 Apr 20, 2026
6429023
TP: fix 0-sized tensor slices, AllReduce fallback (llama/21808)
JohannesGaessler Apr 20, 2026
239c5c8
Tensor-parallel: Fix delayed AllReduce on Gemma-4 MoE (llama/22129)
gaugarg-nv Apr 20, 2026
b13deaa
ggml-cuda: flush legacy pool on OOM and retry (llama/22155)
leonardHONG Apr 20, 2026
e7cffdb
ggml : bump version to 0.10.0 (ggml/1463)
ggerganov Apr 21, 2026
85bbc82
vulkan: Support F16 OP_FILL (llama/22177)
jeffbolznv Apr 21, 2026
150cef5
metal : workaround macOS GPU interactivity watchdog (llama/22216)
ggerganov Apr 21, 2026
3a73f9c
openvino: driver setup, CI split, thread safety, and NPU optimization…
wine99 Apr 21, 2026
e2014d6
hexagon: fix missing v79 entry in libggml-htp.inf (llama/22194)
mengshengwu Apr 21, 2026
84a6b5c
Hexagon: DAIG op (llama/22195)
shreyajn Apr 21, 2026
2e5eb6e
ggml-webgpu: reset CPU/GPU profiling time when freeing context (llama…
yomaytk Apr 21, 2026
d6a4174
hexagon: add support for FILL op (llama/22198)
aparmp-quic Apr 21, 2026
447be52
ggml-webgpu(shader): support conv2d kernels. (llama/21964)
Constannnnnt Apr 22, 2026
c5bb7c0
sycl: Improve mul_mat_id memory efficiency and add BF16 fast path (ll…
qnixsynapse Apr 22, 2026
0fbe4c4
ggml-webgpu: Add fused RMS_NORM + MUL (llama/21983)
yomaytk Apr 22, 2026
d2a26dc
Implement async tensor api and event api (llama/22099)
nikhilJain17 Apr 22, 2026
393fdff
HIP: flip GGML_HIP_GRAPHS to default on (llama/22254)
IMbackK Apr 23, 2026
b6b5478
CUDA: fuse relu + sqr (llama/22249)
anavp-nvidia Apr 23, 2026
df528c4
ggml-webgpu: add support for im2col (llama/22259)
Constannnnnt Apr 23, 2026
b938c50
sycl : fused MoE mul_mat_vec_q for TG (llama/21920)
abotsis Apr 23, 2026
1aba061
ggml-base: use MATH_LIBRARY variable instead of hardcoded 'm' (llama/…
ggerganov Apr 23, 2026
682ee99
metal : fix event synchronization (llama/22260)
ggerganov Apr 23, 2026
71b1ab3
hexagon: add support for basic and extended Op profiling (llama/22269)
max-krasnyansky Apr 23, 2026
641998f
fix(shader): handle the buffer aliasing for rms fuse (llama/22266)
Constannnnnt Apr 23, 2026
23921d5
hexagon: add SOLVE_TRI op (llama/21974)
mengshengwu Apr 24, 2026
dfb8b68
ggml : minor coding style (llama/22308)
ggerganov Apr 24, 2026
07d6db3
metal : print GPU description (llama/22318)
ggerganov Apr 24, 2026
6576c4d
hexagon: use DIRID 13 in libggml-htp.inf for modern InfVerif (llama/2…
mengshengwu Apr 24, 2026
35d679a
ggml-webgpu: enable FLASH_ATTN_EXT on browser without subgroup matrix…
ArberSephirotheca Apr 24, 2026
c546b0b
Hexagon: Bump HMX Frequency to Max Corner (llama/22334)
trivikram-reddy1 Apr 24, 2026
c235b05
ggml-webgpu: support for SSM_SCAN and disable set_rows error checking…
reeselevine Apr 25, 2026
6296fd5
Optimize Q4_0 mul_mat for Arc770, add scripts (llama/22291)
arthw Apr 25, 2026
21da843
metal : optimize Metal Tensor API usage for GGML_OP_MUL_MAT (llama/20…
Developer-Ecosystem-Engineering Apr 25, 2026
da738a7
CUDA: reduce MMQ stream-k overhead (llama/22298)
JohannesGaessler Apr 25, 2026
1be2adf
hexagon: guard HMX clock request for v75+ platforms (llama/22377)
trivikram-reddy1 Apr 26, 2026
93a3f37
opencl: add iq4_nl support (llama/22272)
lhez Apr 26, 2026
4e11277
ggml-cpu: optimize avx2 q6_k (llama/22345)
netrunnereve Apr 26, 2026
2f3df42
ggml-cpu : re-enable fast gelu_quick_f16 (llama/22339)
CISC Apr 26, 2026
9bf6c3c
CUDA: better coalesce data-access for contiguous concat (llama/22330)
ORippler Apr 26, 2026
7296b9c
Fix recurrent state serialization for partial reads and writes (llama…
gaugarg-nv Apr 26, 2026
1478450
add performance-portable tuning for register-tile and subgroup matmul…
SharmaRithik Apr 26, 2026
f5c3ce1
ggml : use 64 bytes aligned tile buffers (llama/21058)
angt Apr 27, 2026
c9ba413
fix: rpc-server cache may not work in Windows environments (llama/22394)
unraido Apr 27, 2026
f675a8c
add fast mat-vec kernels for i-quants (llama/22344)
SharmaRithik Apr 27, 2026
9c233f1
ggml-webgpu: add Q1_0 support (llama/22374)
SharmaRithik Apr 27, 2026
70e4c0a
CANN: add new ops, optimize existing ops (llama/21204)
hipudding Apr 28, 2026
ca624d8
ggml : revert to -lm linking instead of find_library (llama/22355)
angt Apr 28, 2026
6fceff2
ggml : skip already registered backends and devices (llama/22296)
angt Apr 28, 2026
0fa31f9
ggml: improve SPIR-V headers detection with __has_include (llama/21918)
EmilAskerov Apr 28, 2026
35fa508
vulkan: add barrier after writetimestamp (llama/21865)
jeffbolznv Apr 28, 2026
4ea5b6f
ggml-webgpu: fix buffer aliasing for ssm_scan and refactor aliasing l…
reeselevine Apr 28, 2026
e69c109
vulkan: Coalesce Q4_K/Q5_K scale loads (llama/21751)
TheBlueMatt Apr 28, 2026
b553e17
ggml-cuda: add flash-attn support for DKQ=320/DV=256 with ncols2=32 (…
lnigam Apr 28, 2026
c200b58
ggml-cuda: Repost of 21896: Blackwell native NVFP4 support (llama/22196)
michaelw9999 Apr 28, 2026
5301139
TP: fix delayed AllReduce + zero-sized slices (llama/22489)
JohannesGaessler Apr 29, 2026
3076725
ggml : add sve tuned code for gemm_q8_0_4x8_q8_0() kernel (llama/21916)
hrushitfujitsu Apr 29, 2026
fa20229
ggml-webgpu: Fix bug in FlashAttention support check (llama/22492)
reeselevine Apr 29, 2026
6119537
ggml-cpu: cmake: append xsmtvdotii march for SpacemiT IME (llama/22317)
qiurui144 Apr 29, 2026
44e7803
ggml-cuda: refactor fusion code (llama/22468)
am17an Apr 29, 2026
ad67018
ggml : bump version to 0.10.1 (ggml/1469)
ggerganov Apr 29, 2026
320c048
sync : ggml
ggerganov Apr 30, 2026
c59a773
examples : update to Q1_0
ggerganov May 1, 2026
9f2cec1
ggml-cpu : disable tiled matmul on AIX to fix page boundary segfault …
shalinib-ibm Apr 29, 2026
aec8e69
CUDA: fuse SSM_CONV + ADD(bias) + SILU (llama/22478)
anavp-nvidia Apr 29, 2026
66392cf
hexagon: make vmem and buffer-size configurable (llama/22487)
max-krasnyansky Apr 29, 2026
d74c568
add fast matmul iquants (llama/22504)
SharmaRithik Apr 30, 2026
582d256
CUDA: fix tile FA kernel on Pascal (llama/22541)
JohannesGaessler Apr 30, 2026
0c7c3ba
vulkan: add get/set tensor 2d functions (llama/22514)
0cc4m Apr 30, 2026
b34a9f3
ggml-webgpu: Improve performance of mat-vec and mat-mat for MUL_MAT_I…
yomaytk Apr 30, 2026
ccd0452
ggml-webgpu: add the upscale shader (llama/22419)
Constannnnnt May 1, 2026
e100253
sync : ggml
ggerganov May 1, 2026
35cb684
ggml : try fix win32 build (#0)
ggerganov May 1, 2026
95053f6
vulkan: Support asymmetric FA in coopmat2 path (llama/21753)
jeffbolznv May 1, 2026
9623c12
ggml-webgpu: Fix vectorized handling in mul-mat and mul-mat-id (llama…
yomaytk May 1, 2026
f2ce24f
hexagon: enable non-contiguous row tensor support for unary ops (llam…
aparmp-quic May 1, 2026
4861a3e
hexagon: hmx flash attention (llama/22347)
njsyw1997 May 2, 2026
28f8534
ggml : bump version to 0.10.2 (ggml/1474)
ggerganov May 2, 2026
a5a8496
ggml : remove obsoloete wgsl templates (ggml/0)
ggerganov May 2, 2026
bbdaa21
ggml : remove obsolete rms_norm.wgsl (ggml/0)
ggerganov May 2, 2026
8384aa8
sync : ggml
ggerganov May 2, 2026
18162bc
cmake : add FindNCCL.cmake (ggml/0)
ggerganov May 2, 2026
4bf7336
talk-llama : sync llama.cpp
ggerganov May 2, 2026
bcbaaae
Merge upstream ggml-org/whisper.cpp master into v1.8.5 prep
reichert-dev May 4, 2026
d537f54
fix(cmake/coreml): join whisper-targets export set; PRIVATE include dir
reichert-dev May 4, 2026
1318aee
fix(bindings/java): sync WhisperFullParams JNA layout with whisper.h
reichert-dev May 4, 2026
9ead0b7
merge tetherto/master into upstream-sync-v1.8.4.3 (pull in tts-cpp/pa…
Zbig9000 May 18, 2026
47784b9
test: cover whisper_vad streaming API added by upstream PR #3677
Zbig9000 May 18, 2026
eb63b2b
ggml : allow GGML_BACKEND_DL with a static core (QVAC-18993)
Zbig9000 May 19, 2026
3683de4
ggml-backend : android per-arch CPU variant dlopen fallback (QVAC-18993)
Zbig9000 May 19, 2026
14620c8
tts-cpp : add missing <atomic> include in chatterbox_tts.cpp
Zbig9000 May 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,16 @@ public void suppressNonSpeechTokens(boolean enable) {
/** No speech threshold. */
public float no_speech_thold;

/**
* RNG seed for reproducible sampling (when temperature &gt; 0).
* Each decoder uses {@code seed + decoder_index} so concurrent decoders get
* unique seeds. Maps to the {@code seed} field added at
* {@code include/whisper.h:553}; without this field declared here the
* subsequent {@code greedy} / {@code beam_search} struct offsets shift by
* 4 bytes and JNA reads garbage from the C-side defaults.
*/
public int seed;

/** Greedy decoding parameters. */
public GreedyParams greedy;

Expand Down Expand Up @@ -331,6 +341,21 @@ public void setLogitsFilterCallback(WhisperLogitsFilterCallback callback) {
public long i_start_rule;
public float grammar_penalty;

// Voice Activity Detection (VAD) params -- added by upstream after v1.8.4.
// Without these three fields declared here the C struct's tail is missing
// from the JNA layout, which is fine for read-only callers but corrupts
// the trailing memory whenever a Java caller passes WhisperFullParams
// back into the C ABI (e.g. whisper_full).

/** Enable VAD pre-filtering inside whisper_full. (default = false) */
public CBool vad;

/** Path to the Silero VAD model (only used when {@link #vad} is true). */
public String vad_model_path;

/** VAD tuning knobs, mirrors {@code whisper_vad_params}. */
public WhisperVadParams vad_params;

@Override
protected List<String> getFieldOrder() {
return Arrays.asList("strategy", "n_threads", "n_max_text_ctx",
Expand All @@ -343,13 +368,15 @@ protected List<String> getFieldOrder() {
"prompt_tokens", "prompt_n_tokens", "language", "detect_language",
"suppress_blank", "suppress_nst", "temperature",
"max_initial_ts", "length_penalty", "temperature_inc",
"entropy_thold", "logprob_thold", "no_speech_thold", "greedy",
"beam_search", "new_segment_callback", "new_segment_callback_user_data",
"progress_callback", "progress_callback_user_data",
"encoder_begin_callback", "encoder_begin_callback_user_data",
"abort_callback", "abort_callback_user_data",
"logits_filter_callback", "logits_filter_callback_user_data",
"grammar_rules", "n_grammar_rules", "i_start_rule", "grammar_penalty");
"entropy_thold", "logprob_thold", "no_speech_thold", "seed",
"greedy", "beam_search", "new_segment_callback",
"new_segment_callback_user_data", "progress_callback",
"progress_callback_user_data", "encoder_begin_callback",
"encoder_begin_callback_user_data", "abort_callback",
"abort_callback_user_data", "logits_filter_callback",
"logits_filter_callback_user_data", "grammar_rules",
"n_grammar_rules", "i_start_rule", "grammar_penalty",
"vad", "vad_model_path", "vad_params");
}

public static class ByValue extends WhisperFullParams implements Structure.ByValue {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
package io.github.ggerganov.whispercpp.params;

import com.sun.jna.Pointer;
import com.sun.jna.Structure;

import java.util.Arrays;
import java.util.List;

/**
* Voice Activity Detection (VAD) parameters.
* Mirrors {@code struct whisper_vad_params} in include/whisper.h.
*/
public class WhisperVadParams extends Structure {

public WhisperVadParams() {
super();
}

public WhisperVadParams(Pointer p) {
super(p);
}

/** Probability threshold to consider as speech. */
public float threshold;

/** Min duration for a valid speech segment. */
public int min_speech_duration_ms;

/** Min silence duration to consider speech as ended. */
public int min_silence_duration_ms;

/** Max duration of a speech segment before forcing a new segment. */
public float max_speech_duration_s;

/** Padding added before and after speech segments. */
public int speech_pad_ms;

/** Overlap in seconds when copying audio samples from speech segment. */
public float samples_overlap;

@Override
protected List<String> getFieldOrder() {
return Arrays.asList(
"threshold",
"min_speech_duration_ms",
"min_silence_duration_ms",
"max_speech_duration_s",
"speech_pad_ms",
"samples_overlap");
}

public static class ByValue extends WhisperVadParams implements Structure.ByValue {
public ByValue() {
super();
}

public ByValue(Pointer p) {
super(p);
}
}
}
2 changes: 2 additions & 0 deletions bindings/ruby/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,8 @@ whisper.transcribe("path/to/audio.wav", params, n_processors: Etc.nprocessors)

Note that transcription occasionally might be low accuracy when it works in parallel.

If n_processors is greater than 1, you cannot set any callbacks including new_segment_callback, progress_callback, encoder_begin_callback, abort_callback, and log_callback set by Whisper.log_set.

### Segments ###

Once `Whisper::Context#transcribe` called, you can retrieve segments by `#each_segment`:
Expand Down
16 changes: 13 additions & 3 deletions bindings/ruby/ext/ruby_whisper.c
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,10 @@ ruby_whisper_log_callback(enum ggml_log_level level, const char * buffer, void *
return;
}
VALUE log_callback = rb_iv_get(mWhisper, "log_callback");
if (NIL_P(log_callback)) {
return;
}

VALUE udata = rb_iv_get(mWhisper, "user_data");
rb_funcall(log_callback, id_call, 3, INT2NUM(level), rb_str_new2(buffer), udata);
}
Expand All @@ -129,10 +133,16 @@ static VALUE ruby_whisper_s_log_set(VALUE self, VALUE log_callback, VALUE user_d
rb_iv_set(self, "log_callback", log_callback);
rb_iv_set(self, "user_data", user_data);

VALUE finalize_log_callback = rb_funcall(mWhisper, rb_intern("method"), 1, rb_str_new2("finalize_log_callback"));
rb_define_finalizer(log_callback, finalize_log_callback);
if (!NIL_P(log_callback)) {
VALUE finalize_log_callback = rb_funcall(mWhisper, rb_intern("method"), 1, rb_str_new2("finalize_log_callback"));
rb_define_finalizer(log_callback, finalize_log_callback);
}

whisper_log_set(ruby_whisper_log_callback, NULL);
if (NIL_P(log_callback)) {
whisper_log_set(NULL, NULL);
} else {
whisper_log_set(ruby_whisper_log_callback, NULL);
}

return Qnil;
}
Expand Down
1 change: 1 addition & 0 deletions bindings/ruby/ext/ruby_whisper.h
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
#define RUBY_WHISPER_H

#include <ruby.h>
#include <ruby/util.h>
#include <ruby/memory_view.h>
#include "whisper.h"

Expand Down
6 changes: 3 additions & 3 deletions bindings/ruby/ext/ruby_whisper_context.c
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ extern const rb_data_type_t ruby_whisper_context_params_type;
extern VALUE ruby_whisper_transcribe(int argc, VALUE *argv, VALUE self);
extern VALUE rb_whisper_model_s_new(VALUE context);
extern VALUE rb_whisper_segment_s_new(VALUE context, int index);
extern void prepare_transcription(ruby_whisper_params *rwp, VALUE *context);
extern void prepare_transcription(ruby_whisper_params *rwp, VALUE *context, int n_processors);

ID transcribe_option_names[1];

Expand Down Expand Up @@ -436,7 +436,7 @@ full_body(VALUE rb_args)
GetContext(*args->context, rw);
TypedData_Get_Struct(*args->params, ruby_whisper_params, &ruby_whisper_params_type, rwp);

prepare_transcription(rwp, args->context);
prepare_transcription(rwp, args->context, 1);
int result = whisper_full(rw->context, rwp->params, args->samples, args->n_samples);

return INT2NUM(result);
Expand Down Expand Up @@ -487,7 +487,7 @@ full_parallel_body(VALUE rb_args)
GetContext(*args->context, rw);
TypedData_Get_Struct(*args->params, ruby_whisper_params, &ruby_whisper_params_type, rwp);

prepare_transcription(rwp, args->context);
prepare_transcription(rwp, args->context, args->n_processors);
int result = whisper_full_parallel(rw->context, rwp->params, args->samples, args->n_samples, args->n_processors);

return INT2NUM(result);
Expand Down
79 changes: 73 additions & 6 deletions bindings/ruby/ext/ruby_whisper_params.c
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@

extern VALUE cParams;
extern VALUE cVADParams;
extern VALUE mWhisper;

extern ID id_call;

Expand Down Expand Up @@ -186,6 +187,35 @@ static bool abort_callback(void * user_data) {
return false;
}

static void
check_thread_safety(ruby_whisper_params *rwp, VALUE *context, int n_processors)
{
if (n_processors == 1) {
return;
}

if (!NIL_P(rwp->new_segment_callback_container->callback) || 0 != RARRAY_LEN(rwp->new_segment_callback_container->callbacks)) {
rb_raise(rb_eRuntimeError, "new segment callback not supported on parallel transcription");
}

if (!NIL_P(rwp->progress_callback_container->callback) || 0 != RARRAY_LEN(rwp->progress_callback_container->callbacks)) {
rb_raise(rb_eRuntimeError, "progress callback not supported on parallel transcription");
}

if (!NIL_P(rwp->encoder_begin_callback_container->callback) || 0 != RARRAY_LEN(rwp->encoder_begin_callback_container->callbacks)) {
rb_raise(rb_eRuntimeError, "encoder begin callback not supported on parallel transcription");
}

if (!NIL_P(rwp->abort_callback_container->callback) || 0 != RARRAY_LEN(rwp->abort_callback_container->callbacks)) {
rb_raise(rb_eRuntimeError, "abort callback not supported on parallel transcription");
}

VALUE log_callback = rb_iv_get(mWhisper, "log_callback");
if (!NIL_P(log_callback)) {
rb_raise(rb_eRuntimeError, "log callback not supported for parallel transcription");
}
}

static void register_callbacks(ruby_whisper_params * rwp, VALUE * context) {
if (!NIL_P(rwp->new_segment_callback_container->callback) || 0 != RARRAY_LEN(rwp->new_segment_callback_container->callbacks)) {
rwp->new_segment_callback_container->context = context;
Expand Down Expand Up @@ -219,9 +249,13 @@ static void set_vad_params(ruby_whisper_params *rwp)
rwp->params.vad_params = rwvp->params;
}

/*
TODO: Set abort callback to trap SIGINT and SIGTERM
*/
void
prepare_transcription(ruby_whisper_params *rwp, VALUE *context)
prepare_transcription(ruby_whisper_params *rwp, VALUE *context, int n_processors)
{
check_thread_safety(rwp, context, n_processors);
register_callbacks(rwp, context);
set_vad_params(rwp);
}
Expand All @@ -240,6 +274,20 @@ rb_whisper_params_mark(void *p)
void
ruby_whisper_params_free(ruby_whisper_params *rwp)
{
if (rwp->params.language) {
ruby_xfree((void *)rwp->params.language);
}
if (rwp->params.initial_prompt) {
ruby_xfree((void *)rwp->params.initial_prompt);
}
if (rwp->params.vad_model_path) {
ruby_xfree((void *)rwp->params.vad_model_path);
}

xfree(rwp->new_segment_callback_container);
xfree(rwp->progress_callback_container);
xfree(rwp->encoder_begin_callback_container);
xfree(rwp->abort_callback_container);
}

void
Expand All @@ -248,7 +296,7 @@ rb_whisper_params_free(void *p)
ruby_whisper_params *rwp = (ruby_whisper_params *)p;
// How to free user_data and callback only when not referred to by others?
ruby_whisper_params_free(rwp);
free(rwp);
xfree(rwp);
}

static size_t
Expand Down Expand Up @@ -276,6 +324,15 @@ ruby_whisper_params_allocate(VALUE klass)
ruby_whisper_params *rwp;
VALUE obj = TypedData_Make_Struct(klass, ruby_whisper_params, &ruby_whisper_params_type, rwp);
rwp->params = whisper_full_default_params(WHISPER_SAMPLING_GREEDY);
if (rwp->params.language != NULL) {
rwp->params.language = ruby_strdup(rwp->params.language);
}
if (rwp->params.initial_prompt != NULL) {
rwp->params.initial_prompt = ruby_strdup(rwp->params.initial_prompt);
}
if (rwp->params.vad_model_path != NULL) {
rwp->params.vad_model_path = ruby_strdup(rwp->params.vad_model_path);
}
rwp->diarize = false;
rwp->vad_params = TypedData_Wrap_Struct(cVADParams, &ruby_whisper_vad_params_type, (void *)&rwp->params.vad_params);
rwp->new_segment_callback_container = rb_whisper_callback_container_allocate();
Expand All @@ -296,10 +353,12 @@ ruby_whisper_params_set_language(VALUE self, VALUE value)
{
ruby_whisper_params *rwp;
TypedData_Get_Struct(self, ruby_whisper_params, &ruby_whisper_params_type, rwp);
ruby_xfree((void *)rwp->params.language);
rwp->params.language = NULL;
if (value == Qfalse || value == Qnil) {
rwp->params.language = "auto";
rwp->params.language = ruby_strdup("auto");
} else {
rwp->params.language = StringValueCStr(value);
rwp->params.language = ruby_strdup(StringValueCStr(value));
}
return value;
}
Expand Down Expand Up @@ -608,7 +667,13 @@ ruby_whisper_params_set_initial_prompt(VALUE self, VALUE value)
{
ruby_whisper_params *rwp;
TypedData_Get_Struct(self, ruby_whisper_params, &ruby_whisper_params_type, rwp);
rwp->params.initial_prompt = StringValueCStr(value);
ruby_xfree((void *)rwp->params.initial_prompt);
rwp->params.initial_prompt = NULL;
if (NIL_P(value)) {
rwp->params.initial_prompt = NULL;
} else {
rwp->params.initial_prompt = ruby_strdup(StringValueCStr(value));
}
return value;
}
/*
Expand Down Expand Up @@ -1103,12 +1168,14 @@ ruby_whisper_params_set_vad_model_path(VALUE self, VALUE value)
{
ruby_whisper_params *rwp;
TypedData_Get_Struct(self, ruby_whisper_params, &ruby_whisper_params_type, rwp);
ruby_xfree((void *)rwp->params.vad_model_path);
rwp->params.vad_model_path = NULL;
if (NIL_P(value)) {
rwp->params.vad_model_path = NULL;
return value;
}
VALUE path = ruby_whisper_normalize_model_path(value);
rwp->params.vad_model_path = StringValueCStr(path);
rwp->params.vad_model_path = ruby_strdup(StringValueCStr(path));
return value;
}

Expand Down
4 changes: 2 additions & 2 deletions bindings/ruby/ext/ruby_whisper_transcribe.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ extern ID id_to_path;
extern ID transcribe_option_names[1];

extern void
prepare_transcription(ruby_whisper_params * rwp, VALUE * self);
prepare_transcription(ruby_whisper_params * rwp, VALUE * self, int n_processors);

/*
* transcribe a single file
Expand Down Expand Up @@ -73,7 +73,7 @@ ruby_whisper_transcribe(int argc, VALUE *argv, VALUE self) {
// rwp->params.encoder_begin_callback_user_data = &is_aborted;
// }

prepare_transcription(rwp, &self);
prepare_transcription(rwp, &self, n_processors);

if (whisper_full_parallel(rw->context, rwp->params, pcmf32.data(), pcmf32.size(), n_processors) != 0) {
fprintf(stderr, "failed to process audio\n");
Expand Down
8 changes: 7 additions & 1 deletion bindings/ruby/sig/whisper.rbs
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ module Whisper
def self.lang_id: (string name) -> Integer
def self.lang_str: (Integer id) -> String
def self.lang_str_full: (Integer id) -> String
def self.log_set: (log_callback, Object? user_data) -> log_callback
def self.log_set: (log_callback?, Object? user_data) -> log_callback
def self.system_info_str: () -> String

class Context
Expand All @@ -52,6 +52,9 @@ module Whisper
# puts text
# end
#
# If n_processors is greater than 1, you cannot set any callbacks including
# new_segment_callback, progress_callback, encoder_begin_callback, abort_callback,
# and log_callback set by Whisper.log_set
def transcribe: (path, Params, ?n_processors: Integer) -> self
| (path, Params, ?n_processors: Integer) { (String) -> void } -> self

Expand Down Expand Up @@ -129,6 +132,9 @@ module Whisper
# It seems this approach can offer some speedup in some cases.
# However, the transcription accuracy can be worse at the beginning and end of each chunk.
#
# If n_processors is greater than 1, you cannot set any callbacks including
# new_segment_callback, progress_callback, encoder_begin_callback, abort_callback,
# and log_callback set by Whisper.log_set
def full_parallel: (Params, Array[Float], ?Integer n_samples) -> self
| (Params, _Samples, ?Integer n_samples) -> self
| (Params, _Samples, ?Integer? n_samples, Integer n_processors) -> self
Expand Down
Loading
Loading