Skip to content

Conversation

@loci-dev
Copy link

@loci-dev loci-dev commented Dec 4, 2025

Mirrored from ggml-org/llama.cpp#17749

Cont. ggml-org/llama.cpp#17712

Some of the new Ministral-3 Instruct variants have local chat templates, unlike previous Mistral models which required community intervention.

Beforehand, Ministral-3 models were converted and given an Unsloth Devstral chat template. This PR ensures Ministral-3 models converted with --mistral-format use their own provided chat templates instead.

It should also affect Mistral-Large-3 as it has its own chat template too - I don't have the resources to test converting it.

@loci-agentic-ai
Copy link

Explore the complete analysis inside the Version Insights

Performance Analysis Summary - PR #424

Overview

PR #424 modifies the Python conversion script convert_hf_to_gguf.py to prioritize local chat_template.jinja files for Mistral-format models. This is a metadata-only change affecting model conversion workflow, not runtime inference code.

Performance Impact

Binary Analysis:
All 16 binaries show zero performance change:

  • libllama.so, libggml-cpu.so, libggml-base.so, libggml.so, libmtmd.so
  • llama-run, llama-cvector-generator, llama-bench, llama-quantize, llama-tts
  • llama-gguf-split, llama-tokenize, llama-gemma3-cli, llama-llava-cli, llama-minicpmv-cli, llama-qwen2vl-cli

Power consumption changes: 0.0% across all binaries (variations < 1 nJ, within measurement noise).

Function-Level Analysis:
No functions show response time or throughput changes. Core inference functions remain unaffected:

  • llama_decode: No change
  • llama_encode: No change
  • llama_tokenize: No change
  • ggml_backend_graph_compute: No change

Tokens Per Second Impact:
Zero impact on inference throughput. The changes modify conversion-time template selection logic only. Chat templates are embedded as metadata in GGUF files and do not affect token processing during inference.

Code Changes:

  • Added local template file existence check (chat_template.jinja)
  • Refactored get_community_chat_template() to return path instead of contents
  • Unified file reading logic for both local and community templates
  • Updated conditional logic to prioritize local templates for Mistral-format models

This PR enables Ministral-3 and Mistral-Large-3 models to use their official templates while maintaining full backward compatibility with existing conversion workflows.

@loci-dev loci-dev force-pushed the main branch 25 times, most recently from df48f9e to cb46586 Compare December 6, 2025 12:13
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from 1daebfe to 75a97fd Compare December 10, 2025 23:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants