Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
07854cd
opencl: Adreno optimization for MoE - MxFP4 (llama/22301)
shawngu-quic May 2, 2026
ac514b3
ggml-virtgpu: fix circular dependency in headers (llama/22557)
Juste-Leo2 May 2, 2026
4a51f36
fix: CUDA device PCI bus ID de-dupe OOMing (ignoring other 3 gpus ent…
lucyknada May 2, 2026
0f4a073
ggml-webgpu: add layer norm ops (llama/22406)
Constannnnnt May 4, 2026
e271e4c
vulkan: delete dead GGML_VK_MAX_NODES def (llama/22621)
Atomic-Germ May 4, 2026
5bbef31
CUDA: use fastdiv for batch index split in get_rows (llama/22650)
leonardHONG May 4, 2026
fd184cf
kleidiai : update to v1.24.0 and use release archive (llama/22549)
chaxu01 May 4, 2026
3d3cc92
ggml : implement fast walsh-hadamard transform for kv rotation (#2135…
AlrIsmail May 5, 2026
c34a8d1
llama : add option to save memory in device buffers (llama/22679)
ggerganov May 5, 2026
93539a3
ggml : bump version to 0.11.0 (ggml/1478)
ggerganov May 5, 2026
91dd659
rpc : use graph uid instead of graph cache (llama/22701)
rgerganov May 5, 2026
5a88384
opencl: refactor Adreno q4_0 (llama/22335)
lhez May 10, 2026
1da6d86
Hexagon: Process M-tail rows on HMX instead of HVX (llama/22724)
trivikram-reddy1 May 5, 2026
aa94261
ggml : use `CL_DEVICE_GLOBAL_MEM_SIZE` as memory estimate for OpenCL …
fl0rianr May 6, 2026
be0e7ec
ggml-cpu: fuse RMS_NORM + MUL on CPU backend (llama/22423)
zzzzwc May 6, 2026
62ef80e
ggml-cpu: Optimized risc-v cpu q1_0 dot
pl752 May 7, 2026
900210c
sycl: add FILL, CUMSUM, DIAG, SOLVE_TRI, SSM_SCAN, GATED_DELTA_NET (l…
aicss-genai May 7, 2026
f8e8cf0
opencl: add opfilter regex for debugging (llama/22782)
shaofeiqi May 7, 2026
51e6dd5
llama : fix device state save/load (llama/22805)
ggerganov May 7, 2026
fb4fc65
CUDA: batch out_prod inner loop with cublasSgemmStridedBatched (llama…
leonardHONG May 7, 2026
d3b5f4d
opencl: add q4_0 MoE GEMM for Adreno (llama/22731)
shawngu-quic May 8, 2026
b1aaea6
ggml: update SCHED_DEBUG output to use ggml_op_desc() (llama/22825)
max-krasnyansky May 8, 2026
0d871bd
vulkan: fix spv shadowing (llama/22760)
miyanyan May 8, 2026
acb484d
CUDA: lower-case PCI bus id, standardize for ggml (llama/22820)
JohannesGaessler May 8, 2026
a0c421f
cuda: fuse snake activation (mul, sin, sqr, mul, add) (llama/22667)
ServeurpersoCom May 8, 2026
334fd63
Feature hexagon l2 norm (llama/22816)
pdhinaka May 8, 2026
723c888
sycl: support non-contiguous input in PAD op (llama/22148)
aicss-genai May 9, 2026
90a5a0b
hexagon: add HTP kernel for GGML_OP_GATED_DELTA_NET (llama/22837)
wyanzhao May 9, 2026
a188040
Add flash attention MMA / Tiles to support MiMo-V2.5 (llama/22812)
AesSedai May 9, 2026
acd6a60
sycl: Battlemage AOT build via spir64_gen + MMQ subgroup annotations …
aicss-genai May 9, 2026
571ce99
sycl: Q5_K reorder MMVQ/dequant + Q8_0 reorder MMVQ path (llama/22152)
aicss-genai May 9, 2026
644cae7
Add BF16 support to GET_ROWS operation (llama/21391)
devedse May 9, 2026
2ce3031
SYCL: reduce allocation overhead during flash attention (llama/22732)
sanmai May 9, 2026
a1a69e0
internal AllReduce kernel for CUDA provider (llama/22299)
scutler-nv May 10, 2026
1a98574
ggml : bump version to 0.11.1 (ggml/1484)
ggerganov May 10, 2026
3b255e4
sync : ggml
ggerganov May 10, 2026
1b0922a
talk-llama : sync llama.cpp
ggerganov May 10, 2026
a4d9176
try to fix window cublas CI failure
danbev May 11, 2026
ca1bfc7
Revert "try to fix window cublas CI failure"
danbev May 11, 2026
be867ea
try using CCCL 12.4.127 with cuda 11.8.0 to fix CI failure
danbev May 11, 2026
1cb173b
Revert "try using CCCL 12.4.127 with cuda 11.8.0 to fix CI failure"
danbev May 11, 2026
4babfd4
devops : add spirv-headers to vulkan dockerfile
danbev May 12, 2026
a2839b4
ggml-cuda : add explicit casts to -INFINITY for float and half2 types
danbev May 12, 2026
5cd2284
ggml-cuda : add ar_add() to avoid ambiguous operator+ for half/bfloat…
danbev May 12, 2026
6ff712b
ci : update ONEAPI version to 2025.3.3-0-devel-ubuntu24.04
danbev May 12, 2026
3a067db
squash! ci : update ONEAPI version to 2025.3.3-0-devel-ubuntu24.04
danbev May 12, 2026
a72e70d
Revert "ggml-cuda : add ar_add() to avoid ambiguous operator+ for hal…
danbev May 14, 2026
28cebf5
Revert "ggml-cuda : add explicit casts to -INFINITY for float and hal…
danbev May 14, 2026
c8d3679
ggml: install ggml.pc in <libdir>/pkgconfig (ggml/1480)
robUx4 May 10, 2026
2b5783a
metal : tighten input-position loop in kernel_conv_transpose_1d (ggml…
CrispStrobe May 10, 2026
046ce9e
ggml-virtgpu : include missing mutex header (llama/22810)
olliewalsh May 10, 2026
96c321e
Add OP im2col_3d (llama/22903)
arthw May 11, 2026
ec2b0ce
CUDA: directly include cuda/iterator (llama/22936)
ORippler May 11, 2026
fcc6d72
vulkan: Support asymmetric FA in scalar/mmq/coopmat1 paths (llama/22589)
jeffbolznv May 11, 2026
c754510
Ggml/cuda snake fusion hardening (llama/22912)
ServeurpersoCom May 11, 2026
3377f3a
CUDA: handle OW > 65535 in im2col (2D and 3D) (llama/22944)
CrispStrobe May 11, 2026
aaccdcf
opencl: add q4_1 MoE for Adreno (llama/22856)
shawngu-quic May 11, 2026
82c8a86
metal : promote mul_mv/mul_mm batch divisors to function constants (l…
guyfischman May 12, 2026
24d0ce6
vulkan: Check shared memory size for mmq shaders (llama/22693)
jeffbolznv May 12, 2026
3baccb0
vulkan: Fix Windows performance regression on Intel GPU BF16 workload…
rillomas May 12, 2026
513341c
ggml-webgpu: address precision issues for multimodal (llama/22808)
Constannnnnt May 12, 2026
392c225
ggml-webgpu: Enables running gpt-oss-20b (llama/22906)
yomaytk May 12, 2026
3eaef95
opencl: add opt-in Adreno xmem F16xF32 GEMM for prefill (llama/22755)
happyyzy May 12, 2026
a3e6591
hexagon: eliminate scalar VTCM loads via HVX splat helpers (llama/22993)
trivikram-reddy1 May 13, 2026
e2011d8
ggml-zendnn : adaptive fallback to CPU backend for small batch sizes …
z-sachin May 13, 2026
a3b84f4
hexagon: add unary tanh op (llama/22999)
max-krasnyansky May 13, 2026
858b9de
flush the gpu profile timestamp before the queryset is overflowed (ll…
yomaytk May 13, 2026
a3129c6
opencl: fix crash when warming up MoE on Adreno (llama/22876)
lhez May 13, 2026
9741d32
opencl: add q5_0 and q5_1 MoE for Adreno (llama/22985)
shaofeiqi May 13, 2026
a591708
Fix for issue #22974. Cast intermediate results to float before addin…
scutler-nv May 13, 2026
39e459f
ggml-webgpu: only use subgroup-matrix path when head dims are divisib…
ArberSephirotheca May 13, 2026
eb06cc8
sync : ggml
ggerganov May 14, 2026
b273e14
talk-llama : sync llama.cpp
ggerganov May 14, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .devops/main-intel.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
ARG ONEAPI_VERSION=2025.1.1-0-devel-ubuntu24.04
ARG ONEAPI_VERSION=2025.3.3-0-devel-ubuntu24.04

FROM intel/oneapi-basekit:$ONEAPI_VERSION AS build
FROM intel/deep-learning-essentials:$ONEAPI_VERSION AS build
WORKDIR /app

RUN apt-get update && \
Expand All @@ -16,7 +16,7 @@ RUN if [ "${GGML_SYCL_F16}" = "ON" ]; then \
fi && \
make base.en CMAKE_ARGS="-DGGML_SYCL=1 -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx ${OPT_SYCL_F16}"

FROM intel/oneapi-basekit:$ONEAPI_VERSION AS runtime
FROM intel/deep-learning-essentials:$ONEAPI_VERSION AS runtime
WORKDIR /app

RUN apt-get update && \
Expand Down
2 changes: 1 addition & 1 deletion .devops/main-vulkan.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ FROM ubuntu:24.04 AS build
WORKDIR /app

RUN apt-get update && \
apt-get install -y build-essential wget cmake git libvulkan-dev glslc \
apt-get install -y build-essential wget cmake git libvulkan-dev spirv-headers glslc \
&& rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*

COPY .. .
Expand Down
1 change: 1 addition & 0 deletions examples/talk-llama/llama-arch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,7 @@ static const std::map<llm_kv, const char *> LLM_KV_NAMES = {
{ LLM_KV_ATTENTION_SLIDING_WINDOW_PATTERN, "%s.attention.sliding_window_pattern" },
{ LLM_KV_ATTENTION_SCALE, "%s.attention.scale" },
{ LLM_KV_ATTENTION_OUTPUT_SCALE, "%s.attention.output_scale" },
{ LLM_KV_ATTENTION_VALUE_SCALE, "%s.attention.value_scale" },
{ LLM_KV_ATTENTION_TEMPERATURE_LENGTH, "%s.attention.temperature_length" },
{ LLM_KV_ATTENTION_TEMPERATURE_SCALE, "%s.attention.temperature_scale" },
{ LLM_KV_ATTENTION_KEY_LENGTH_MLA, "%s.attention.key_length_mla" },
Expand Down
1 change: 1 addition & 0 deletions examples/talk-llama/llama-arch.h
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,7 @@ enum llm_kv {
LLM_KV_ATTENTION_SLIDING_WINDOW_PATTERN,
LLM_KV_ATTENTION_SCALE,
LLM_KV_ATTENTION_OUTPUT_SCALE,
LLM_KV_ATTENTION_VALUE_SCALE,
LLM_KV_ATTENTION_TEMPERATURE_LENGTH,
LLM_KV_ATTENTION_TEMPERATURE_SCALE,
LLM_KV_ATTENTION_KEY_LENGTH_MLA,
Expand Down
Loading
Loading