llama-bench : add model sizes #2771

slaren · 2023-08-24T20:08:04Z

Renames llama_model_type API to llama_model_desc, adds llama_model_size and llama_model_n_params APIs to llama.cpp.

Currently, the sizes are always shown in the markdown output. I am ok with that, but if it adds too much clutter, I could make them optional.

Example output with markdown:

Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6

model	model_size	model_n_params	backend	n_gpu_layers	test	t/s
LLaMA 7B mostly Q4_0	3.56 GiB	6.74 B	CUDA	99	pp 512	2235.89 ± 34.61
LLaMA 13B mostly Q4_0	6.86 GiB	13.02 B	CUDA	99	pp 512	1326.61 ± 100.20
LLaMA 30B mostly Q4_0	17.09 GiB	32.53 B	CUDA	99	pp 512	619.07 ± 2.03

build: d0f77b1 (1055)

ggerganov

Maybe change model_size to just size and model_n_params to params
Also GiB -> G to make the table a bit more compact

SlyEcho · 2023-08-25T11:46:46Z

~~Is it possible to add this kind of metadata to GGUF?~~

Never mind, this is all calculated.

slaren · 2023-08-25T12:26:31Z

How about something like this? Tried to make it a bit more compact, while still keeping the units.

model	size	params	backend	ngl	test	t/s
LLaMA 7B mostly Q4_0	3.56 GB	6.74 B	CUDA	99	pp 512	2239.86 ± 22.42
LLaMA 13B mostly Q4_0	6.86 GB	13.02 B	CUDA	99	pp 512	1379.74 ± 2.01
LLaMA 30B mostly Q4_0	17.09 GB	32.53 B	CUDA	99	pp 512	614.50 ± 2.52

ggerganov · 2023-08-25T12:58:36Z

The size unit is incorrect.

1 Gibibyte is 1073741824 bytes and the shorthand is G or GiB
1 Gigabyte is 1000000000 bytes and the shorthand is GB

We want to report Gibibytes so it's better to use the shorter shorthand G

slaren · 2023-08-25T13:15:50Z

I just have never seen G used to refer to GiB. I have been looking for references for the usage of G and I couldn't find anything. Anyway, it's just two characters, I have left it as GiB for now, it can be changed later if needed.

ggerganov · 2023-08-25T13:19:45Z

ls -lh reports with G - that's where I picked it up:

$ ls -lh
total 41G
-rw-rw-r-- 1 ggerganov ggerganov  13G Jul 19 15:10 ggml-model-f16.bin
-rw-rw-r-- 1 ggerganov ggerganov  13G Aug 25 14:07 ggml-model-f16.gguf
-rw-rw-r-- 1 ggerganov ggerganov 3.6G Jul 24 16:47 ggml-model-q4_0.bin
-rw-rw-r-- 1 ggerganov ggerganov 3.6G Aug 25 14:08 ggml-model-q4_0.gguf
-rw-rw-r-- 1 ggerganov ggerganov 4.0G Aug 16 15:19 ggml-model-q4_1.gguf
-rw-rw-r-- 1 ggerganov ggerganov 4.8G Aug 14 10:56 ggml-model-q5_1.gguf

* master: (773 commits) server : add `/detokenize` endpoint (ggerganov#2802) convert.py : advanced option (ggerganov#2753) llama : use Unicode Escape Sequence to replace encoded characters (ggerganov#2814) flake.nix : add rocm support and cleanup (ggerganov#2808) llama : move #includes out of _GNU_SOURCE conditional (ggerganov#2817) main : fix bug (penalize_nl=false doesn't work) + suppress warning on mingw (ggerganov#1528) llama : use std::abs in llama_sample_tail_free (ggerganov#2800) k-quants : remove unnecessary tensor shape restrictions (ggerganov#2811) Better perplexity for 2- and 3-bit quantization for LLaMA-v2-70B (ggerganov#2807) Fix HellaSwag (ggerganov#2805) flake : build llama.cpp on Intel with nix (ggerganov#2795) Handle null rope scaling value (ggerganov#2793) Fix spm whitespaces (ggerganov#2806) examples : skip unnecessary external lib in server README.md how-to (ggerganov#2804) llama : fix struct decl (ggerganov#2790) Faster perplexity computation (ggerganov#2786) llama : add llama_beam_search() (ggerganov#2267) convert.py : Get rope scale from HuggingFace models (ggerganov#2772) llama-bench : add model sizes (ggerganov#2771) convert.py : export rope freq_base when converting CodeLlama from an HF model (ggerganov#2773) ...

* llama-bench : add model sizes * more compact markdown output * back to GiB * adjust column sizes

llama-bench : add model sizes

53755ed

ggerganov approved these changes Aug 25, 2023

View reviewed changes

more compact markdown output

bc0dc16

slaren added 2 commits August 25, 2023 15:11

back to GiB

cc544b2

adjust column sizes

3247687

slaren merged commit 154725c into master Aug 25, 2023
4 of 24 checks passed

slaren deleted the llama-bench-model-size branch August 25, 2023 13:16

akawrykow pushed a commit to akawrykow/llama.cpp that referenced this pull request Aug 29, 2023

llama-bench : add model sizes (ggerganov#2771)

5274318

* llama-bench : add model sizes * more compact markdown output * back to GiB * adjust column sizes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama-bench : add model sizes #2771

llama-bench : add model sizes #2771

slaren commented Aug 24, 2023

ggerganov left a comment

SlyEcho commented Aug 25, 2023 •

edited

Loading

slaren commented Aug 25, 2023

ggerganov commented Aug 25, 2023

slaren commented Aug 25, 2023

ggerganov commented Aug 25, 2023

llama-bench : add model sizes #2771

llama-bench : add model sizes #2771

Conversation

slaren commented Aug 24, 2023

ggerganov left a comment

Choose a reason for hiding this comment

SlyEcho commented Aug 25, 2023 • edited Loading

slaren commented Aug 25, 2023

ggerganov commented Aug 25, 2023

slaren commented Aug 25, 2023

ggerganov commented Aug 25, 2023

SlyEcho commented Aug 25, 2023 •

edited

Loading