Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama-bench : add model sizes #2771

Merged
merged 4 commits into from
Aug 25, 2023
Merged

llama-bench : add model sizes #2771

merged 4 commits into from
Aug 25, 2023

Conversation

slaren
Copy link
Collaborator

@slaren slaren commented Aug 24, 2023

Renames llama_model_type API to llama_model_desc, adds llama_model_size and llama_model_n_params APIs to llama.cpp.

Currently, the sizes are always shown in the markdown output. I am ok with that, but if it adds too much clutter, I could make them optional.

Example output with markdown:

Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6

model model_size model_n_params backend n_gpu_layers test t/s
LLaMA 7B mostly Q4_0 3.56 GiB 6.74 B CUDA 99 pp 512 2235.89 ± 34.61
LLaMA 13B mostly Q4_0 6.86 GiB 13.02 B CUDA 99 pp 512 1326.61 ± 100.20
LLaMA 30B mostly Q4_0 17.09 GiB 32.53 B CUDA 99 pp 512 619.07 ± 2.03

build: d0f77b1 (1055)

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change model_size to just size and model_n_params to params
Also GiB -> G to make the table a bit more compact

@SlyEcho
Copy link
Collaborator

SlyEcho commented Aug 25, 2023

Is it possible to add this kind of metadata to GGUF?

Never mind, this is all calculated.

@slaren
Copy link
Collaborator Author

slaren commented Aug 25, 2023

How about something like this? Tried to make it a bit more compact, while still keeping the units.

model size params backend ngl test t/s
LLaMA 7B mostly Q4_0 3.56 GB 6.74 B CUDA 99 pp 512 2239.86 ± 22.42
LLaMA 13B mostly Q4_0 6.86 GB 13.02 B CUDA 99 pp 512 1379.74 ± 2.01
LLaMA 30B mostly Q4_0 17.09 GB 32.53 B CUDA 99 pp 512 614.50 ± 2.52

@ggerganov
Copy link
Owner

The size unit is incorrect.

  • 1 Gibibyte is 1073741824 bytes and the shorthand is G or GiB
  • 1 Gigabyte is 1000000000 bytes and the shorthand is GB

We want to report Gibibytes so it's better to use the shorter shorthand G

@slaren
Copy link
Collaborator Author

slaren commented Aug 25, 2023

I just have never seen G used to refer to GiB. I have been looking for references for the usage of G and I couldn't find anything. Anyway, it's just two characters, I have left it as GiB for now, it can be changed later if needed.

@slaren slaren merged commit 154725c into master Aug 25, 2023
4 of 24 checks passed
@slaren slaren deleted the llama-bench-model-size branch August 25, 2023 13:16
@ggerganov
Copy link
Owner

ls -lh reports with G - that's where I picked it up:

$ ls -lh
total 41G
-rw-rw-r-- 1 ggerganov ggerganov  13G Jul 19 15:10 ggml-model-f16.bin
-rw-rw-r-- 1 ggerganov ggerganov  13G Aug 25 14:07 ggml-model-f16.gguf
-rw-rw-r-- 1 ggerganov ggerganov 3.6G Jul 24 16:47 ggml-model-q4_0.bin
-rw-rw-r-- 1 ggerganov ggerganov 3.6G Aug 25 14:08 ggml-model-q4_0.gguf
-rw-rw-r-- 1 ggerganov ggerganov 4.0G Aug 16 15:19 ggml-model-q4_1.gguf
-rw-rw-r-- 1 ggerganov ggerganov 4.8G Aug 14 10:56 ggml-model-q5_1.gguf

mattgauf added a commit to mattgauf/llama.cpp that referenced this pull request Aug 26, 2023
* master: (773 commits)
  server : add `/detokenize` endpoint (ggerganov#2802)
  convert.py : advanced option (ggerganov#2753)
  llama : use Unicode Escape Sequence to replace encoded characters (ggerganov#2814)
  flake.nix : add rocm support and cleanup (ggerganov#2808)
  llama : move #includes out of _GNU_SOURCE conditional (ggerganov#2817)
  main : fix bug (penalize_nl=false doesn't work) + suppress warning on mingw (ggerganov#1528)
  llama : use std::abs in llama_sample_tail_free (ggerganov#2800)
  k-quants : remove unnecessary tensor shape restrictions (ggerganov#2811)
  Better perplexity for 2- and 3-bit quantization for LLaMA-v2-70B (ggerganov#2807)
  Fix HellaSwag (ggerganov#2805)
  flake : build llama.cpp on Intel with nix (ggerganov#2795)
  Handle null rope scaling value (ggerganov#2793)
  Fix spm whitespaces (ggerganov#2806)
  examples : skip unnecessary external lib in server README.md how-to (ggerganov#2804)
  llama : fix struct decl (ggerganov#2790)
  Faster perplexity computation (ggerganov#2786)
  llama : add llama_beam_search() (ggerganov#2267)
  convert.py : Get rope scale from HuggingFace models (ggerganov#2772)
  llama-bench : add model sizes (ggerganov#2771)
  convert.py : export rope freq_base when converting CodeLlama from an HF model (ggerganov#2773)
  ...
akawrykow pushed a commit to akawrykow/llama.cpp that referenced this pull request Aug 29, 2023
* llama-bench : add model sizes

* more compact markdown output

* back to GiB

* adjust column sizes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants