UPSTREAM PR #19286: completion : simplify batch (embd) processing#1151
UPSTREAM PR #19286: completion : simplify batch (embd) processing#1151
Conversation
This commit simplifies the processing of embd by removing the for loop that currently exists which uses params.n_batch as its increment. This commit also removes the clamping of n_eval as the size of embd is always at most the size of params.n_batch. The motivation is to clarify the code as it is currently a little confusing when looking at this for loop in isolation and thinking that it can process multiple batches.
|
No meaningful performance changes were detected across 115468 analyzed functions in the following binaries: build.bin.llama-cvector-generator, build.bin.llama-tts, build.bin.libllama.so, build.bin.libmtmd.so, build.bin.llama-bench, build.bin.llama-gguf-split, build.bin.llama-llava-cli, build.bin.llama-minicpmv-cli, build.bin.llama-quantize, build.bin.llama-gemma3-cli, build.bin.libggml.so, build.bin.libggml-base.so, build.bin.libggml-cpu.so, build.bin.llama-tokenize, build.bin.llama-qwen2vl-cli. 🔎 Full breakdown: Loci Inspector. |
|
No meaningful performance changes were detected across 115468 analyzed functions in the following binaries: build.bin.llama-cvector-generator, build.bin.llama-tts, build.bin.libllama.so, build.bin.libmtmd.so, build.bin.llama-gguf-split, build.bin.llama-llava-cli, build.bin.llama-minicpmv-cli, build.bin.llama-quantize, build.bin.llama-gemma3-cli, build.bin.llama-tokenize, build.bin.llama-qwen2vl-cli, build.bin.libggml-base.so, build.bin.libggml-cpu.so, build.bin.libggml.so, build.bin.llama-bench. 🔎 Full breakdown: Loci Inspector. |
823244c to
bab7d39
Compare
a92fe2a to
6495042
Compare
4298c74 to
0db6c47
Compare
56aaa36 to
21147c2
Compare
945fa3a to
0e8e1d6
Compare
Note
Source pull request: ggml-org/llama.cpp#19286
This commit simplifies the processing of embd by removing the for loop that currently exists which uses params.n_batch as its increment. This commit also removes the clamping of n_eval as the size of embd is always at most the size of params.n_batch.
The motivation is to clarify the code as it is currently a little confusing when looking at this for loop in isolation and thinking that it can process multiple batches.