llama-bench: add -d depth arg#13096
Conversation
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
…rwal/llama.cpp into llama-bench/add-depth-param
JohannesGaessler
left a comment
There was a problem hiding this comment.
This is fine too. Please fix the trailing whitespaces and I'll merge.
|
@JohannesGaessler Can you merge this? |
|
Yes, I was just waiting for the CI to finish. |
|
I think there is problem with the test statistics for non-zero depths: ./bin/llama-bench -m ../models/llama-3.2-1b-instruct/ggml-model-q8_0.gguf -fa 1 -p 1,2,3,4,4,4,4,5,6,7,8 -d 0,1024 -n 32 -t 1
build: f9cd683 (5503) Notice how the uncertainty of the results for |
|
The |
|
If I remember correctly we are currently calculating the means and standard deviations of t/s values rather than the runtimes. As long as the differences are small I think this is fine but for large differences between runs (such as when individual runs are very short) I think this is not quite correct and it could lead to bad estimates of the uncertainty. If you want to be fancy you could also do Rao-Blackwellization to get a tighter estimate of the uncertainty but I think this is not needed. |
* add depth param * update llama-bench README and add depth param * llama-bench: default params for depth arg for faster execution * Update examples/llama-bench/README.md Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * fix buffer print ub * use user provided args * remove extra whitespaces --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Add
-dor--n-deptharg in llama-bench to run tests with prefilled KV cache contextRelevant discussion #12874
Sample output