Skip to content

metal: template GLU kernels to support f16/f32#23882

Merged
ggerganov merged 1 commit into
ggml-org:masterfrom
shrivasshankar:metal-glu-f16
Jun 1, 2026
Merged

metal: template GLU kernels to support f16/f32#23882
ggerganov merged 1 commit into
ggml-org:masterfrom
shrivasshankar:metal-glu-f16

Conversation

@shrivasshankar
Copy link
Copy Markdown
Contributor

Overview

Part of #14909. drops the hardcoded f32 GLU kernels in favor of a single template. we now load and store in the native tensor type (half or float) to save memory bandwidth, but keep the actual ALU compute in float to avoid exploding math in geglu/swiglu. Also opened up the dispatch gate to allow f16 inputs.

Additional information

Tested on Apple M3 Max

./build/bin/test-backend-ops -o REGLU
./build/bin/test-backend-ops -o GEGLU
./build/bin/test-backend-ops -o SWIGLU
./build/bin/test-backend-ops -o SWIGLU_OAI
./build/bin/test-backend-ops -o GEGLU_ERF
./build/bin/test-backend-ops -o GEGLU_QUICK

Tests passing on Metal for f16 + f32

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure:
    YES: I authored the reglu template and precision logic; AI assisted in mechanically applying my pattern to the remaining five kernels.

Drops the hardcoded f32 GLU kernels in favor of a single template. We now load/store in the native tensor type (half or float) to save memory bandwidth, but keep the actual ALU compute in float to avoid exploding math in geglu/swiglu. Also opened up the dispatch gate to allow f16 inputs.
@shrivasshankar shrivasshankar requested a review from a team as a code owner May 29, 2026 19:10
@github-actions github-actions Bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels May 29, 2026
@ggerganov ggerganov self-assigned this May 29, 2026
@ggerganov ggerganov added the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label Jun 1, 2026
@ggerganov ggerganov merged commit 95b8b8e into ggml-org:master Jun 1, 2026
34 of 36 checks passed
@shrivasshankar shrivasshankar deleted the metal-glu-f16 branch June 1, 2026 16:03
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Jun 1, 2026
* origin/master: (36 commits)
vendor : update cpp-httplib to 0.46.1 (ggml-org#23980)
llama: limit max outputs of `llama_context` (ggml-org#23861)
metal: template GLU kernels to support f16/f32 (ggml-org#23882)
vulkan: don't hold the device mutex while compiling pipelines (ggml-org#23641)
vulkan: reduce host memory lock contention (ggml-org#23376)
vocab: add normalizer.lowercase support to WPM (ggml-org#23899)
TP: quantized KV cache support (ggml-org#23792)
security : disable private disclosures (ggml-org#23963)
model: Add EXAONE 4.5 implementations (ggml-org#21733)
vulkan: Block-load Q3_K/Q6_K block data and subtract on 32b ints (ggml-org#23056)
vulkan: Removed unused functions (ggml-org#23175)
common : support manually triggering the reasoning budget end sequence (ggml-org#23949)
ci : add missing Linux label to cpu-x64-high-perf runner (ggml-org#23958)
[SYCL] Support Q4_1, Q5_0, Q5_1 in Flash-attention (ggml-org#23812)
[SYCL] Add more types in GET_ROWS OP (ggml-org#23710)
sycl : Optimize Q3_K mul_mat by reorder (ggml-org#23725)
ci: remove redundant or duplicate jobs (ggml-org#23927)
server : handle If-None-Match weak ETags (ggml-org#23916)
ci : limit trigger paths for the CPU workflow (ggml-org#23938)
vocab : add tokenizer support for jina-embeddings-v2-base-zh (ggml-org#18756)
...
turbo-tan pushed a commit to turbo-tan/llama.cpp-tq3 that referenced this pull request Jun 2, 2026
Drops the hardcoded f32 GLU kernels in favor of a single template. We now load/store in the native tensor type (half or float) to save memory bandwidth, but keep the actual ALU compute in float to avoid exploding math in geglu/swiglu. Also opened up the dispatch gate to allow f16 inputs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants