Skip to content

Conversation

@loci-dev
Copy link

@loci-dev loci-dev commented Dec 3, 2025

Mirrored from ggml-org/llama.cpp#17712

Fixes #17691

mistral-common updated _filter_valid_tokenizer_files to return additional data we don't need or expect, causing the conversion to crash when --mistral-format is used.

This change just maps the output back into the format originally expected.

Tested on mistral-common==1.8.3 and mistral-common==1.8.6

... that said, there's a different issue with the new Ministral models, which is partially related to how --mistral-format works:

get_community_chat_templates() is invoked with --mistral-format, and the logic there results in the model getting the "unsloth-mistral-Devstral-Small-2507.jinja" template - not Ministral's local chat_template.jinja - that's only used for SpecialVocab models (not MistralVocab).

To avoid this, users currently need to manually specify the chat template when running the model on e.g: llama-server with --jinja --chat-template-file "./chat_template.jinja"

This issue is the case regardless of this PR - as long as --mistral-format is used I think.

@loci-agentic-ai
Copy link

Explore the complete analysis inside the Version Insights

Performance Analysis Summary: PR #410

Overview

This PR addresses a compatibility issue in the Python-based model conversion tooling (gguf-py/gguf/vocab.py), specifically in the MistralVocab.__init__() method. The change adds 5 lines to handle a breaking API change in the mistral-common library (versions 1.8.3+), where _filter_valid_tokenizer_files() now returns tuples instead of strings.

Analysis Results

Performance Impact: None. This modification occurs in offline model conversion scripts, not in the C++ inference binaries. The change executes once during model loading with O(n) complexity over typically 1-3 tokenizer files, adding approximately 1-5 microseconds of overhead.

Binary Analysis: No changes detected in compiled binaries. The power consumption analysis shows:

  • Core inference libraries (libllama.so, libmtmd.so) report 100% reduction, indicating structural changes unrelated to this PR
  • GGML backend libraries (libggml-base.so: 59,082 nJ, libggml-cpu.so: 116,808 nJ) remain unchanged
  • Utility tools (llama-bench, llama-quantize, llama-tokenize) show 0% change in all metrics

Function-Level Metrics: No modifications to performance-critical functions. The analyzed functions (main in llama-bench, llama-quantize, llama-gguf-split, llama-tokenize) show 0% change in both Response Time and Throughput, with is_modified: false flags.

Tokens Per Second Impact: Zero. The change does not affect inference functions (llama_decode, llama_encode, llama_tokenize) or their execution paths. Model conversion is a one-time operation performed before deployment.

Code Change Nature: Defensive programming to maintain backward compatibility. The implementation uses runtime type checking to detect tuple format and extracts filenames via list comprehension, preserving existing behavior for both old and new library versions.

@loci-dev loci-dev force-pushed the main branch 27 times, most recently from 3e4b499 to e81a7eb Compare December 5, 2025 13:17
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from 87eb8b6 to 118039a Compare January 6, 2026 04:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants