examples : add debug utility/example by danbev · Pull Request #18464 · ggml-org/llama.cpp

danbev · 2025-12-29T16:28:59Z

This commit introduces a new example named llama-debug which is a utility that is intended to be used to assist with developing/debugging a converted model.

The motivation for this utilitiy is to assist in model conversion work to verify that the model produces the expected outputs. It is intended to replace logits.cpp in examples/model-conversion.

Example usage:

./build/bin/llama-debug \
    -m models/Qwen2.5-0.5B-Instruct.gguf \
    --prompt "Hello, my name is" \
    --save-logits
...
Model add_bos: false
Input prompt: "Hello, my name is"
Token ids (5):
Hello(9707) ,(11)  my(847)  name(829)  is(374)
Data saved to data/llamacpp-Qwen2.5-0.5B-Instruct.bin
Data saved to data/llamacpp-Qwen2.5-0.5B-Instruct.txt
Prompt saved to data/llamacpp-Qwen2.5-0.5B-Instruct-prompt.txt
Tokens saved to data/llamacpp-Qwen2.5-0.5B-Instruct-tokens.bin

For more details about the options available for this example, please refer to examples/debug/README.md.

This was suggested/discussed in the following pr: #18281 (comment)

This commit introduces a new example named llama-debug which is a utility that is intended to be used to assist with developing/debugging a converted model. The motivation for this utilitiy is to assist in model conversion work to verify that the model produces the expected outputs. It is intended to replace logits.cpp in examples/model-conversion. Example usage: ```console ./build/bin/llama-debug \ -m models/Qwen2.5-0.5B-Instruct.gguf \ --prompt "Hello, my name is" \ --save-logits ... Model add_bos: false Input prompt: "Hello, my name is" Token ids (5): Hello(9707) ,(11) my(847) name(829) is(374) Data saved to data/llamacpp-Qwen2.5-0.5B-Instruct.bin Data saved to data/llamacpp-Qwen2.5-0.5B-Instruct.txt Prompt saved to data/llamacpp-Qwen2.5-0.5B-Instruct-prompt.txt Tokens saved to data/llamacpp-Qwen2.5-0.5B-Instruct-tokens.bin ``` For more details about the options available for this example, please refer to examples/debug/README.md.

pwilkin · 2025-12-29T20:21:21Z

Does this integrate the llama-eval-callback functionality? Seems like it might be a good idea to phase that one out.

danbev · 2026-01-02T06:33:10Z

Does this integrate the llama-eval-callback functionality?

Yes, this uses the eval-callback feature similar to llama-eval-callback, and we could indeed phase out llama-eval-callback and use this as the reference for that functionality instead.

This commit removes logits.cpp in favor of using llama-debug for generating logits and embeddings.

This was missed in the previous commit.

This commit add support for storing the prompt and the token ids for the prompt when running the original models. The motivation for this is that this will allow us to compare the prompt and the tokens generated for the prompt when verifing the converted model. Currently it is possible that even if the same prompt is used that the tokens generated are different if there is a difference in the tokenization between the original and converted model which would currently go unnoticed (the verification will most likely fail but it might not be obvious why).

fix pyright errors.

This commit adds a script to compare token outputs between original and converted models. Example usage: ```console (venv) $ ./scripts/utils/compare_tokens.py pytorch-gemma-3-270m-it llamacpp-gemma-3-270m-it-bf16 Comparing tokens between: Original : pytorch-gemma-3-270m-it (6 tokens) Converted: llamacpp-gemma-3-270m-it-bf16 (6 tokens) ✅ All 6 tokens match! ``` And there is a verbose flag that will also print out the prompts: ```console (venv) $ ./scripts/utils/compare_tokens.py pytorch-gemma-3-270m-it llamacpp-gemma-3-270m-it-bf16 -v Original model prompt (pytorch-gemma-3-270m-it): prompt: Hello, my name is n_tokens: 6 token ids: 2, 9259, 236764, 1041, 1463, 563 Converted model prompt (llamacpp-gemma-3-270m-it-bf16): prompt: Hello, my name is n_tokens: 6 token ids: 2, 9259, 236764, 1041, 1463, 563 Comparing tokens between: Original : pytorch-gemma-3-270m-it (6 tokens) Converted: llamacpp-gemma-3-270m-it-bf16 (6 tokens) ✅ All 6 tokens match! ```

This commit add the calling of the compare_tokens function in compare-logits.py and semantic_check.py to ensure that the token ids that the tokenizers procoduce are the same before proceeding with verifying the logits/embeddings. Placing them in the existing scripts instead calling them separately ensures that the token comparison is always done prior to the logit/embedding verifications. Follow up commit/pr could refactor the causal logits verification into a single script instead of the two that exist now. This would reduce the code and make it consistent with the embeddings verficiation which only has a single script.

examples/debug/debug.cpp

ngxson · 2026-01-06T15:06:33Z

examples/debug/debug.cpp

+
+    const bool add_bos = llama_vocab_get_add_bos(vocab);
+
+    std::vector<llama_token> tokens = common_tokenize(ctx, params.prompt, add_bos);


can we allow entering raw tokens? maybe reuse the -p option:

-p "hello" --> treat as text input -p "tokens:12,34" --> treat as 2 input tokens

we can skip adding bos/eos in such case. this use case is mostly to isolate testing inference and tokenizer

I'll take a look and see what would be involved.

@ngxson I just wanted to ask if you saw that we recently (in this PR) added support for also saving the prompt and token ids when using the --save-logits flag.
For example, running the following command:

$ ./build/bin/llama-debug -m ../llama.cpp/models/Qwen2.5-0.5B-Instruct.gguf -p "Hello world today" --save-logits ... Model add_bos: false Input prompt: "Hello world today" Token ids (3): Hello(9707) world(1879) today(3351) Data saved to data/llamacpp-Qwen2.5-0.5B-Instruct.bin Data saved to data/llamacpp-Qwen2.5-0.5B-Instruct.txt Prompt saved to data/llamacpp-Qwen2.5-0.5B-Instruct-prompt.txt Tokens saved to data/llamacpp-Qwen2.5-0.5B-Instruct-tokens.bin

The -prompt.txt file would then contain the following information:

(venv) $ cat data/llamacpp-Qwen2.5-0.5B-Instruct-prompt.txt prompt: Hello world today n_tokens: 3 token ids: 9707, 1879, 3351

And the -tokens.bin file contains just the token ids which we then use in the first step of the logits/embeddings verification to make sure that we are using the same inputs.

Would this be enough or do you think that being able to specify the actual input token ids would useful to have?

I'm going to merge this to continue on #18658, but I'm happy to add this in new PR if needed.

This commit updates the debug example to use the new function llama_model_n_embd_out instead of llama_model_n_embd. The motivation for this change is to support late interation retriever models, like LFM2-ColBert-350M, where the output embeddings are down projected to a lower dimension.

This commit adds a print_usage function that is passed to the common_params_parse. The motivation for this is that this enables a specific usage message which will be printed after all the options, for example: ```console example usage: Print tensors: ./build/bin/llama-debug -m model.gguf -p "Hello my name is" --verbose The tensors to be printed can be filtered with --tensor-filter option. Save logits/embeddings: ./build/bin/llama-debug -m model.gguf -p "Hello my name is" --save-logits Add --embedding to save embeddings ```

This commit adds a Python script to automatically detect the pooling configuration from a sentence-transformers model directory. The motivation for this change is that I make a mistake when adding the sentence-transformers support and I incorrectly assumed that if an embedding model uses sentence-transformers, it always used pooling. With the recent addition of support for late interaction models, which can have a down-projection but do not use pooling (like LFM2-ColBert-350M). This commit builds upon ggml-org#18464 which needs to be merged first. Refs: ggml-org#18607 (comment)

examples/debug/debug.cpp

github-actions bot added the examples label Dec 29, 2025

danbev added 3 commits January 2, 2026 07:43

throw runtime error instead of logging error

dc6964f

remove params.warmup and enable the warmup/nowarmup option

88d1958

model-conversion : remove logits.cpp

815a19f

This commit removes logits.cpp in favor of using llama-debug for generating logits and embeddings.

github-actions bot added the python python script changes label Jan 2, 2026

danbev added 5 commits January 2, 2026 10:03

examples : remove model-conversion directory

7d0000e

This was missed in the previous commit.

squash! model-conversion : add support for saving prompt and token ids

3181524

fix pyright errors.

danbev marked this pull request as ready for review January 5, 2026 10:08

danbev requested a review from ggerganov as a code owner January 5, 2026 10:08

This was referenced Jan 5, 2026

eval-callback : add support for saving logits #18281

Closed

model : add LFM2-ColBert-350M #18607

Merged

ggerganov approved these changes Jan 6, 2026

View reviewed changes

examples/debug/debug.cpp Outdated Show resolved Hide resolved

ngxson approved these changes Jan 6, 2026

View reviewed changes

danbev added 3 commits January 7, 2026 06:46

Merge remote-tracking branch 'upstream/master' into llama-debug

e5e777a

loci-dev mentioned this pull request Jan 7, 2026

UPSTREAM PR #18464: examples : add debug utility/example auroralabs-loci/llama.cpp#840

Open

danbev mentioned this pull request Jan 7, 2026

model-conversion : add detect_pooling script #18658

Closed

loci-dev mentioned this pull request Jan 7, 2026

UPSTREAM PR #18658: model-conversion : add detect_pooling script auroralabs-loci/llama.cpp#843

Open

danbev merged commit ffba4f2 into ggml-org:master Jan 7, 2026
73 of 74 checks passed

ggerganov reviewed Jan 8, 2026

View reviewed changes

examples/debug/debug.cpp Show resolved Hide resolved

ggerganov reviewed Jan 8, 2026

View reviewed changes

examples/debug/debug.cpp Show resolved Hide resolved

ggerganov mentioned this pull request Jan 8, 2026

debug : include LLAMA_POOLING_TYPE_UNSPECIFIED in pooling check #18692

Merged

danbev deleted the llama-debug branch January 14, 2026 08:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples : add debug utility/example#18464

examples : add debug utility/example#18464
danbev merged 12 commits intoggml-org:masterfrom
danbev:llama-debug

danbev commented Dec 29, 2025 •

edited

Loading

Uh oh!

pwilkin commented Dec 29, 2025

Uh oh!

danbev commented Jan 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

ngxson Jan 6, 2026

Uh oh!

danbev Jan 7, 2026

Uh oh!

danbev Jan 7, 2026

Uh oh!

danbev Jan 7, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		const bool add_bos = llama_vocab_get_add_bos(vocab);

		std::vector<llama_token> tokens = common_tokenize(ctx, params.prompt, add_bos);

Conversation

danbev commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pwilkin commented Dec 29, 2025

Uh oh!

danbev commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngxson Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

danbev Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

danbev Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

danbev Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

danbev commented Dec 29, 2025 •

edited

Loading

danbev commented Jan 2, 2026 •

edited

Loading