Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
a919001
hexagon: minor refresh for HMX FA and MM (#23796)
max-krasnyansky May 28, 2026
0b24686
server: minor tweaks to use more cpp features (#23785)
mfuntowicz May 28, 2026
bc81d47
CUDA: route batch>=4 quantized matmul to MMQ on AMD MFMA hardware (#2…
jadenmach2 May 28, 2026
d7be461
mmvq Optim: add MMVQ_PARAMETERS_TURING(mmvq_parameter_table_id) for …
yaohengxu May 28, 2026
30af6e2
ggml: auto apply iGPU flag CUDA/HIP if integrated device (#23007)
fl0rianr May 28, 2026
d374e71
test-llama-archs: fix table format [no release] (#23810)
JohannesGaessler May 28, 2026
7fb1e70
arg: Add LLAMA_ARG_API_KEY_FILE environment variable for --api-key-fi…
kucharskim May 28, 2026
dd15579
ci : change Vulkan builds to Release to reduce ccache (#23820)
ggerganov May 28, 2026
d6be315
mtmd: fix gemma 4 audio rms norm eps (#23815)
ngxson May 28, 2026
0b56d28
mtmd: n_head_kv defaults to n_head (#23782)
sfallah May 28, 2026
479a9a1
app : improve help output (#23805)
angt May 28, 2026
445b7ce
ci : releases use Github-hosted builds for the UI (#23823)
ggerganov May 28, 2026
2f6c815
ui: fix audio and video modality detection (#23756)
ValdikSS May 28, 2026
3ef2369
ci : run ui publish on ubuntu-slim (#23818)
CISC May 28, 2026
408ae2b
opencl: move backend info printing into its own function (#23702)
lhez May 28, 2026
c8914ad
mtmd: fix gemma 4 projector pre_norm (#23822)
ngxson May 28, 2026
751ebd1
mtmd-debug: add color and rainbow mode (#23829)
ngxson May 28, 2026
19e92c3
hexagon: basic/generic op fusion support and RMS_NORM+MUL fusion (#23…
max-krasnyansky May 28, 2026
33c718d
meta : Add missing `buffer` set in allreduce fallback !COMPUTE clear …
TheBlueMatt May 29, 2026
241cbd4
cuda : disables launch_fattn PDL enrollment due to compiler bug (#23825)
aendk May 29, 2026
98e480a
app : move licences to llama-app (#23824)
angt May 29, 2026
eef59a7
llama: add llm_graph_input_mtp (#23643)
am17an May 29, 2026
b000431
ngram-mod : Add missing include (#23857)
oazizi000 May 29, 2026
ea02bc3
ggml : bump version to 0.13.1 (ggml/1523)
ggerganov May 29, 2026
fe12e42
sync : ggml
ggerganov May 29, 2026
031ddb2
llama: use f16 mask for FA to save VRAM (#23764)
am17an May 29, 2026
1f0aa2a
model : support for DeepseekV32ForCausalLM with generic DeepSeek Spar…
fairydreaming May 29, 2026
cb47092
server: bump timeout to 3600s (#23842)
ngxson May 29, 2026
6ed481e
CUDA: Check PTX version on host side to guard PDL dispatch (#23530)
ORippler May 29, 2026
da3f990
mtmd: Add DeepSeekOCR 2 Support (#20975)
sfallah May 29, 2026
06d26df
download: add option to skip_download (#23059)
ngxson May 29, 2026
dc71236
ci : update macos release to use macos-26 runner (#23878)
ggerganov May 29, 2026
b5f5228
server: remove obsolete scripts (#23870)
ngxson May 29, 2026
764f1e6
graph : ensure DS32 kq_mask_lid is F32 (#23864)
CISC May 29, 2026
2084434
vocab : support tokenizer for LFM2.5-8B-A1B (#23826)
tdakhran May 29, 2026
22d66b5
ui: handle audio/vnd.wave as audio WAV file (#23754)
ValdikSS May 29, 2026
5a46b46
app: add llama update self updater (#23865)
ServeurpersoCom May 29, 2026
689a9a4
server-bench : add speed-bench for speculative decoding benchmarking …
ruixiang63 May 29, 2026
b22da25
ggml-webgpu: add q4_0/q8_0 SET_ROWS (#23760)
reeselevine May 29, 2026
151f3a9
ggml-webgpu: Check earlier for WebGPU required features (#23879)
reeselevine May 29, 2026
0821c5f
server: in SSE mode, send HTTP headers when slot starts (#23884)
ngxson May 29, 2026
1738129
llama : do not skip iGPU when only RPC devices are present (#23868)
rgerganov May 30, 2026
d4204b0
ci : clear cache instead of "no timestamp" keys + fix macos (#23895)
ggerganov May 30, 2026
3375285
ci : fix s390x release job (#23898)
ggerganov May 30, 2026
6e093b8
vulkan: add Flash Attention support for BFloat16 KV cache (#23420)
0cc4m May 30, 2026
d48a56e
ggml : add some lsx support (#23798)
MQ-mengqing May 30, 2026
4c4e91b
ci : update ios-xcode release job to macos-26 (#23906)
ggerganov May 30, 2026
e674b12
test: (test-llama-archs) log the config name first (#23885)
ngxson May 30, 2026
2d9b7c8
metal : restore im2col implementation for large kernels (#23901)
ggerganov May 30, 2026
8b0e0db
TP: fix granularity for Qwen 3.5/3.6 + 3 GPUs (#23843)
JohannesGaessler May 30, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions .github/actions/ccache-clear/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
name: "ccache-clear"
description: "Delete all GitHub Actions caches matching a key prefix"
inputs:
key:
description: "Cache key prefix to match and delete"
required: true

runs:
using: "composite"
steps:
- name: Clear caches
shell: bash
run: |
CACHES=$(gh cache list --key "ccache-${{ inputs.key }}" --json id,key --jq '.[] | "\(.id) \(.key)"' 2>/dev/null)
if [ -z "$CACHES" ]; then
echo "No caches found with key prefix: ${{ inputs.key }}"
exit 0
fi
while read -r id key; do
echo "Deleting cache: $id ($key)"
gh cache delete "$id"
done <<< "$CACHES"
20 changes: 18 additions & 2 deletions .github/workflows/build-cuda-windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ concurrency:
queue: max

env:
GH_TOKEN: ${{ github.token }}
GGML_NLOOP: 3
GGML_N_THREADS: 1
LLAMA_ARG_LOG_COLORS: 1
Expand All @@ -23,6 +24,9 @@ jobs:
cuda:
runs-on: windows-2022

permissions:
actions: write

strategy:
matrix:
cuda: ['12.4', '13.3']
Expand All @@ -36,7 +40,6 @@ jobs:
uses: ggml-org/ccache-action@v1.2.21
with:
key: release-windows-2022-x64-cuda-${{ matrix.cuda }}
append-timestamp: false # note: use this only with non-concurrent jobs!

- name: Install Cuda Toolkit
uses: ./.github/actions/windows-setup-cuda
Expand Down Expand Up @@ -67,9 +70,17 @@ jobs:
cmake --build build --config Release -j %NINJA_JOBS% -t ggml
cmake --build build --config Release

- name: ccache-clear
uses: ./.github/actions/ccache-clear
with:
key: release-windows-2022-x64-cuda-${{ matrix.cuda }}

hip:
runs-on: windows-2022

permissions:
actions: write

env:
# Make sure this is in sync with build-cache.yml
HIPSDK_INSTALLER_VERSION: "26.Q1"
Expand Down Expand Up @@ -125,7 +136,6 @@ jobs:
# to populate the ccache for the release with manual runs of this workflow
#key: release-windows-2022-x64-hip-${{ env.HIPSDK_INSTALLER_VERSION }}-${{ matrix.name }}
key: cuda-windows-2022-x64-hip-${{ env.HIPSDK_INSTALLER_VERSION }}-${{ matrix.name }}
append-timestamp: false # note: use this only with non-concurrent jobs!

- name: Build
id: cmake_build
Expand All @@ -144,3 +154,9 @@ jobs:
-DGPU_TARGETS="gfx1100" `
-DGGML_RPC=ON
cmake --build build -j ${env:NUMBER_OF_PROCESSORS}

- name: ccache-clear
uses: ./.github/actions/ccache-clear
with:
#key: release-windows-2022-x64-hip-${{ env.HIPSDK_INSTALLER_VERSION }}-${{ matrix.name }}
key: cuda-windows-2022-x64-hip-${{ env.HIPSDK_INSTALLER_VERSION }}-${{ matrix.name }}
34 changes: 16 additions & 18 deletions .github/workflows/build-vulkan.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,14 +52,6 @@ jobs:
id: checkout
uses: actions/checkout@v6

- name: ccache
uses: ggml-org/ccache-action@v1.2.21
with:
key: vulkan-${{ matrix.os }}
variant: ccache
evict-old-files: 1d
save: ${{ github.event_name == 'push' && github.ref == 'refs/heads/master' }}

- name: Dependencies
id: depends
run: |
Expand All @@ -68,14 +60,20 @@ jobs:
echo "CC=gcc-14" >> "$GITHUB_ENV"
echo "CXX=g++-14" >> "$GITHUB_ENV"

- name: ccache
uses: ggml-org/ccache-action@v1.2.21
with:
key: vulkan-${{ matrix.os }}-new
variant: ccache
evict-old-files: 1d
save: ${{ github.event_name == 'push' && github.ref == 'refs/heads/master' }}

- name: Configure
id: cmake_configure
run: |
cmake -B build \
-G "Ninja" \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DGGML_BACKEND_DL=ON \
-DGGML_CPU_ALL_VARIANTS=ON \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_VULKAN=ON

- name: Build
Expand All @@ -91,13 +89,6 @@ jobs:
id: checkout
uses: actions/checkout@v6

- name: ccache
uses: ggml-org/ccache-action@v1.2.21
with:
key: vulkan-ubuntu-24.04-llvmpipe
evict-old-files: 1d
save: ${{ github.event_name == 'push' && github.ref == 'refs/heads/master' }}

- name: Dependencies
id: depends
run: |
Expand All @@ -124,6 +115,13 @@ jobs:
path: ./vulkan_sdk
version: ${{ env.VULKAN_SDK_VERSION }}

- name: ccache
uses: ggml-org/ccache-action@v1.2.21
with:
key: vulkan-ubuntu-24.04-llvmpipe
evict-old-files: 1d
save: ${{ github.event_name == 'push' && github.ref == 'refs/heads/master' }}

- name: Build
id: cmake_build
run: |
Expand Down
Loading
Loading