UPSTREAM PR #19226: opencl: refactor some ops, concat, repeat, tanh and scale by loci-dev · Pull Request #1097 · auroralabs-loci/llama.cpp

loci-dev · 2026-01-31T05:45:30Z

Note

Source pull request: ggml-org/llama.cpp#19226

Gemma-3n-E2B and Gemma-3n-E4B have been producing weird (not really gibberish but apparently not correct) output. Ended up refactoring these ops and the issue is now fixed. In addition, this refactor also improves perf a bit.

On X Elite,

gemma-3n-E2B-it-Q8_0,

before,

common_perf_print: prompt eval time =    2522.36 ms /   235 tokens (   10.73 ms per token,    93.17 tokens per second)
common_perf_print:        eval time =   24209.42 ms /   256 runs   (   94.57 ms per token,    10.57 tokens per second)

after,

common_perf_print: prompt eval time =    1473.28 ms /   235 tokens (    6.27 ms per token,   159.51 tokens per second)
common_perf_print:        eval time =   15944.91 ms /   256 runs   (   62.28 ms per token,    16.06 tokens per second)

loci-review · 2026-01-31T06:38:48Z

No meaningful performance changes were detected across 115327 analyzed functions in the following binaries: build.bin.llama-cvector-generator, build.bin.llama-tts, build.bin.libllama.so, build.bin.libmtmd.so, build.bin.llama-gemma3-cli, build.bin.llama-gguf-split, build.bin.llama-llava-cli, build.bin.llama-minicpmv-cli, build.bin.llama-quantize, build.bin.libggml-base.so, build.bin.libggml-cpu.so, build.bin.libggml.so, build.bin.llama-tokenize, build.bin.llama-qwen2vl-cli, build.bin.llama-bench.

🔎 Full breakdown: Loci Inspector.
💬 Questions? Tag @loci-dev.

lhez added 6 commits January 30, 2026 20:57

opencl: refactor concat

93b642e

opencl: refactor repeat

90cfdf0

opencl: refactor tanh

2dd8f10

opencl: enable fp16 for tanh

8d25fb2

opencl: refactor scale

e0310d7

opencl: fix unused variables

45b5fec

loci-dev temporarily deployed to PROD__AL_DEMO January 31, 2026 05:45 — with GitHub Actions Inactive

loci-dev force-pushed the main branch from dbad616 to 7d57416 Compare January 31, 2026 06:18

loci-dev force-pushed the main branch 21 times, most recently from 6515559 to 343bad8 Compare February 1, 2026 04:50

loci-dev force-pushed the main branch 30 times, most recently from 7ff3e7f to 99b11e9 Compare February 3, 2026 09:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #19226: opencl: refactor some ops, concat, repeat, tanh and scale#1097

UPSTREAM PR #19226: opencl: refactor some ops, concat, repeat, tanh and scale#1097
loci-dev wants to merge 6 commits intomainfrom
loci/pr-19226-lh-concat-refactor

loci-dev commented Jan 31, 2026

Uh oh!

loci-review bot commented Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Jan 31, 2026

Uh oh!

loci-review bot commented Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants