Skip to content

UPSTREAM PR #19226: opencl: refactor some ops, concat, repeat, tanh and scale#1097

Open
loci-dev wants to merge 6 commits intomainfrom
loci/pr-19226-lh-concat-refactor
Open

UPSTREAM PR #19226: opencl: refactor some ops, concat, repeat, tanh and scale#1097
loci-dev wants to merge 6 commits intomainfrom
loci/pr-19226-lh-concat-refactor

Conversation

@loci-dev
Copy link

Note

Source pull request: ggml-org/llama.cpp#19226

Gemma-3n-E2B and Gemma-3n-E4B have been producing weird (not really gibberish but apparently not correct) output. Ended up refactoring these ops and the issue is now fixed. In addition, this refactor also improves perf a bit.

On X Elite,

gemma-3n-E2B-it-Q8_0,

before,

common_perf_print: prompt eval time =    2522.36 ms /   235 tokens (   10.73 ms per token,    93.17 tokens per second)
common_perf_print:        eval time =   24209.42 ms /   256 runs   (   94.57 ms per token,    10.57 tokens per second)

after,

common_perf_print: prompt eval time =    1473.28 ms /   235 tokens (    6.27 ms per token,   159.51 tokens per second)
common_perf_print:        eval time =   15944.91 ms /   256 runs   (   62.28 ms per token,    16.06 tokens per second)

@loci-review
Copy link

loci-review bot commented Jan 31, 2026

No meaningful performance changes were detected across 115327 analyzed functions in the following binaries: build.bin.llama-cvector-generator, build.bin.llama-tts, build.bin.libllama.so, build.bin.libmtmd.so, build.bin.llama-gemma3-cli, build.bin.llama-gguf-split, build.bin.llama-llava-cli, build.bin.llama-minicpmv-cli, build.bin.llama-quantize, build.bin.libggml-base.so, build.bin.libggml-cpu.so, build.bin.libggml.so, build.bin.llama-tokenize, build.bin.llama-qwen2vl-cli, build.bin.llama-bench.

🔎 Full breakdown: Loci Inspector.
💬 Questions? Tag @loci-dev.

@loci-dev loci-dev force-pushed the main branch 21 times, most recently from 6515559 to 343bad8 Compare February 1, 2026 04:50
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from 7ff3e7f to 99b11e9 Compare February 3, 2026 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants