UPSTREAM PR #18464: examples : add debug utility/example by loci-dev · Pull Request #840 · auroralabs-loci/llama.cpp

loci-dev · 2026-01-07T07:38:52Z

This commit introduces a new example named llama-debug which is a utility that is intended to be used to assist with developing/debugging a converted model.

The motivation for this utilitiy is to assist in model conversion work to verify that the model produces the expected outputs. It is intended to replace logits.cpp in examples/model-conversion.

Example usage:

./build/bin/llama-debug \
    -m models/Qwen2.5-0.5B-Instruct.gguf \
    --prompt "Hello, my name is" \
    --save-logits
...
Model add_bos: false
Input prompt: "Hello, my name is"
Token ids (5):
Hello(9707) ,(11)  my(847)  name(829)  is(374)
Data saved to data/llamacpp-Qwen2.5-0.5B-Instruct.bin
Data saved to data/llamacpp-Qwen2.5-0.5B-Instruct.txt
Prompt saved to data/llamacpp-Qwen2.5-0.5B-Instruct-prompt.txt
Tokens saved to data/llamacpp-Qwen2.5-0.5B-Instruct-tokens.bin

For more details about the options available for this example, please refer to examples/debug/README.md.

This was suggested/discussed in the following pr: ggml-org/llama.cpp#18281 (comment)

This commit introduces a new example named llama-debug which is a utility that is intended to be used to assist with developing/debugging a converted model. The motivation for this utilitiy is to assist in model conversion work to verify that the model produces the expected outputs. It is intended to replace logits.cpp in examples/model-conversion. Example usage: ```console ./build/bin/llama-debug \ -m models/Qwen2.5-0.5B-Instruct.gguf \ --prompt "Hello, my name is" \ --save-logits ... Model add_bos: false Input prompt: "Hello, my name is" Token ids (5): Hello(9707) ,(11) my(847) name(829) is(374) Data saved to data/llamacpp-Qwen2.5-0.5B-Instruct.bin Data saved to data/llamacpp-Qwen2.5-0.5B-Instruct.txt Prompt saved to data/llamacpp-Qwen2.5-0.5B-Instruct-prompt.txt Tokens saved to data/llamacpp-Qwen2.5-0.5B-Instruct-tokens.bin ``` For more details about the options available for this example, please refer to examples/debug/README.md.

This commit removes logits.cpp in favor of using llama-debug for generating logits and embeddings.

This was missed in the previous commit.

This commit add support for storing the prompt and the token ids for the prompt when running the original models. The motivation for this is that this will allow us to compare the prompt and the tokens generated for the prompt when verifing the converted model. Currently it is possible that even if the same prompt is used that the tokens generated are different if there is a difference in the tokenization between the original and converted model which would currently go unnoticed (the verification will most likely fail but it might not be obvious why).

fix pyright errors.

This commit adds a script to compare token outputs between original and converted models. Example usage: ```console (venv) $ ./scripts/utils/compare_tokens.py pytorch-gemma-3-270m-it llamacpp-gemma-3-270m-it-bf16 Comparing tokens between: Original : pytorch-gemma-3-270m-it (6 tokens) Converted: llamacpp-gemma-3-270m-it-bf16 (6 tokens) ✅ All 6 tokens match! ``` And there is a verbose flag that will also print out the prompts: ```console (venv) $ ./scripts/utils/compare_tokens.py pytorch-gemma-3-270m-it llamacpp-gemma-3-270m-it-bf16 -v Original model prompt (pytorch-gemma-3-270m-it): prompt: Hello, my name is n_tokens: 6 token ids: 2, 9259, 236764, 1041, 1463, 563 Converted model prompt (llamacpp-gemma-3-270m-it-bf16): prompt: Hello, my name is n_tokens: 6 token ids: 2, 9259, 236764, 1041, 1463, 563 Comparing tokens between: Original : pytorch-gemma-3-270m-it (6 tokens) Converted: llamacpp-gemma-3-270m-it-bf16 (6 tokens) ✅ All 6 tokens match! ```

This commit add the calling of the compare_tokens function in compare-logits.py and semantic_check.py to ensure that the token ids that the tokenizers procoduce are the same before proceeding with verifying the logits/embeddings. Placing them in the existing scripts instead calling them separately ensures that the token comparison is always done prior to the logit/embedding verifications. Follow up commit/pr could refactor the causal logits verification into a single script instead of the two that exist now. This would reduce the code and make it consistent with the embeddings verficiation which only has a single script.

This commit updates the debug example to use the new function llama_model_n_embd_out instead of llama_model_n_embd. The motivation for this change is to support late interation retriever models, like LFM2-ColBert-350M, where the output embeddings are down projected to a lower dimension.

This commit adds a print_usage function that is passed to the common_params_parse. The motivation for this is that this enables a specific usage message which will be printed after all the options, for example: ```console example usage: Print tensors: ./build/bin/llama-debug -m model.gguf -p "Hello my name is" --verbose The tensors to be printed can be filtered with --tensor-filter option. Save logits/embeddings: ./build/bin/llama-debug -m model.gguf -p "Hello my name is" --save-logits Add --embedding to save embeddings ```

loci-review · 2026-01-07T08:24:08Z

Explore the complete analysis inside the Version Insights

I've successfully retrieved the summary report for your project. The report shows performance analysis for llama.cpp (Pull Request #840) comparing base version b85ec3b1 to new version 70805711.

Key Highlights:

Top performers: std::vector::end and _M_const_cast functions in llama-tts show dramatic improvements with over 200% increases in both response time and throughput
Affected binaries: Changes impact both llama-tts (6 functions) and llama-cvector-generator (4 functions)
Pattern: Most affected functions are C++ STL operations, with consistent increases in both response time and throughput metrics

The report indicates that these functions are experiencing higher workload or more intensive usage patterns in the new version, which could be due to more frequent calls, larger data structures, or additional functionality.

Would you like more detailed analysis on any specific function or aspect of this performance report?

danbev added 12 commits December 29, 2025 17:20

throw runtime error instead of logging error

dc6964f

remove params.warmup and enable the warmup/nowarmup option

88d1958

model-conversion : remove logits.cpp

815a19f

This commit removes logits.cpp in favor of using llama-debug for generating logits and embeddings.

examples : remove model-conversion directory

7d0000e

This was missed in the previous commit.

squash! model-conversion : add support for saving prompt and token ids

3181524

fix pyright errors.

Merge remote-tracking branch 'upstream/master' into llama-debug

e5e777a

loci-dev temporarily deployed to PROD__AL_DEMO January 7, 2026 07:38 — with GitHub Actions Inactive

loci-dev force-pushed the main branch from 534cc78 to c6d4b6b Compare January 7, 2026 08:12

loci-dev force-pushed the main branch 15 times, most recently from 5e94637 to bc696ac Compare January 8, 2026 22:09

loci-dev force-pushed the main branch 22 times, most recently from 048ad94 to 6c1fde6 Compare February 3, 2026 13:32

loci-dev force-pushed the main branch 8 times, most recently from 823244c to bab7d39 Compare February 19, 2026 02:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #18464: examples : add debug utility/example#840

UPSTREAM PR #18464: examples : add debug utility/example#840
loci-dev wants to merge 12 commits intomainfrom
upstream-PR18464-branch_danbev-llama-debug

loci-dev commented Jan 7, 2026

Uh oh!

loci-review bot commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

loci-dev commented Jan 7, 2026

Uh oh!

loci-review bot commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments