-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama-bench : add model sizes #2771
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe change model_size
to just size
and model_n_params
to params
Also GiB
-> G
to make the table a bit more compact
Never mind, this is all calculated. |
How about something like this? Tried to make it a bit more compact, while still keeping the units.
|
The size unit is incorrect.
We want to report Gibibytes so it's better to use the shorter shorthand |
I just have never seen |
|
* master: (773 commits) server : add `/detokenize` endpoint (ggerganov#2802) convert.py : advanced option (ggerganov#2753) llama : use Unicode Escape Sequence to replace encoded characters (ggerganov#2814) flake.nix : add rocm support and cleanup (ggerganov#2808) llama : move #includes out of _GNU_SOURCE conditional (ggerganov#2817) main : fix bug (penalize_nl=false doesn't work) + suppress warning on mingw (ggerganov#1528) llama : use std::abs in llama_sample_tail_free (ggerganov#2800) k-quants : remove unnecessary tensor shape restrictions (ggerganov#2811) Better perplexity for 2- and 3-bit quantization for LLaMA-v2-70B (ggerganov#2807) Fix HellaSwag (ggerganov#2805) flake : build llama.cpp on Intel with nix (ggerganov#2795) Handle null rope scaling value (ggerganov#2793) Fix spm whitespaces (ggerganov#2806) examples : skip unnecessary external lib in server README.md how-to (ggerganov#2804) llama : fix struct decl (ggerganov#2790) Faster perplexity computation (ggerganov#2786) llama : add llama_beam_search() (ggerganov#2267) convert.py : Get rope scale from HuggingFace models (ggerganov#2772) llama-bench : add model sizes (ggerganov#2771) convert.py : export rope freq_base when converting CodeLlama from an HF model (ggerganov#2773) ...
* llama-bench : add model sizes * more compact markdown output * back to GiB * adjust column sizes
Renames
llama_model_type
API tollama_model_desc
, addsllama_model_size
andllama_model_n_params
APIs to llama.cpp.Currently, the sizes are always shown in the markdown output. I am ok with that, but if it adds too much clutter, I could make them optional.
Example output with markdown:
Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6
build: d0f77b1 (1055)