metal: template GLU kernels to support f16/f32 by shrivasshankar · Pull Request #23882 · ggml-org/llama.cpp

shrivasshankar · 2026-05-29T19:10:50Z

Overview

Part of #14909. drops the hardcoded f32 GLU kernels in favor of a single template. we now load and store in the native tensor type (half or float) to save memory bandwidth, but keep the actual ALU compute in float to avoid exploding math in geglu/swiglu. Also opened up the dispatch gate to allow f16 inputs.

Additional information

Tested on Apple M3 Max

./build/bin/test-backend-ops -o REGLU
./build/bin/test-backend-ops -o GEGLU
./build/bin/test-backend-ops -o SWIGLU
./build/bin/test-backend-ops -o SWIGLU_OAI
./build/bin/test-backend-ops -o GEGLU_ERF
./build/bin/test-backend-ops -o GEGLU_QUICK

Tests passing on Metal for f16 + f32

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure:
YES: I authored the reglu template and precision logic; AI assisted in mechanically applying my pattern to the remaining five kernels.

Drops the hardcoded f32 GLU kernels in favor of a single template. We now load/store in the native tensor type (half or float) to save memory bandwidth, but keep the actual ALU compute in float to avoid exploding math in geglu/swiglu. Also opened up the dispatch gate to allow f16 inputs.

* origin/master: (36 commits) vendor : update cpp-httplib to 0.46.1 (ggml-org#23980) llama: limit max outputs of `llama_context` (ggml-org#23861) metal: template GLU kernels to support f16/f32 (ggml-org#23882) vulkan: don't hold the device mutex while compiling pipelines (ggml-org#23641) vulkan: reduce host memory lock contention (ggml-org#23376) vocab: add normalizer.lowercase support to WPM (ggml-org#23899) TP: quantized KV cache support (ggml-org#23792) security : disable private disclosures (ggml-org#23963) model: Add EXAONE 4.5 implementations (ggml-org#21733) vulkan: Block-load Q3_K/Q6_K block data and subtract on 32b ints (ggml-org#23056) vulkan: Removed unused functions (ggml-org#23175) common : support manually triggering the reasoning budget end sequence (ggml-org#23949) ci : add missing Linux label to cpu-x64-high-perf runner (ggml-org#23958) [SYCL] Support Q4_1, Q5_0, Q5_1 in Flash-attention (ggml-org#23812) [SYCL] Add more types in GET_ROWS OP (ggml-org#23710) sycl : Optimize Q3_K mul_mat by reorder (ggml-org#23725) ci: remove redundant or duplicate jobs (ggml-org#23927) server : handle If-None-Match weak ETags (ggml-org#23916) ci : limit trigger paths for the CPU workflow (ggml-org#23938) vocab : add tokenizer support for jina-embeddings-v2-base-zh (ggml-org#18756) ...

Drops the hardcoded f32 GLU kernels in favor of a single template. We now load/store in the native tensor type (half or float) to save memory bandwidth, but keep the actual ALU compute in float to avoid exploding math in geglu/swiglu. Also opened up the dispatch gate to allow f16 inputs.

shrivasshankar requested a review from a team as a code owner May 29, 2026 19:10

github-actions Bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels May 29, 2026

ggerganov self-assigned this May 29, 2026

ggerganov approved these changes Jun 1, 2026

View reviewed changes

ggerganov added the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label Jun 1, 2026

ggerganov merged commit 95b8b8e into ggml-org:master Jun 1, 2026
34 of 36 checks passed

shrivasshankar deleted the metal-glu-f16 branch June 1, 2026 16:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metal: template GLU kernels to support f16/f32#23882

metal: template GLU kernels to support f16/f32#23882
ggerganov merged 1 commit into
ggml-org:masterfrom
shrivasshankar:metal-glu-f16

shrivasshankar commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shrivasshankar commented May 29, 2026

Overview

Additional information

Requirements

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants