Conversation
Add new kimi-k2.5 keys to mtmd convert Update V_MMPROJ tensor mapping for new mm_projector.proj keys Update V_M_IMP_NORM for new mm_projector.pre_norm key
OverviewAnalysis of 115,396 functions (35 modified, 69 new, 4 removed) across 15 binaries shows minimal performance impact from adding Kimi-K2.5 vision model support. Only build.bin.libmtmd.so exhibits measurable change with +0.77% power consumption increase (180,399.2 nJ vs 179,022.4 nJ). All other binaries remain unchanged: build.bin.libllama.so (249,105.8 nJ), build.bin.libggml-cpu.so (157,685.9 nJ), build.bin.libggml-base.so (73,208.7 nJ), build.bin.llama-tts (360,000.0 nJ), build.bin.llama-cvector-generator (354,510.6 nJ), build.bin.llama-bench (60,119.5 nJ), build.bin.llama-quantize (43,714.7 nJ), build.bin.llama-gguf-split (40,060.0 nJ), build.bin.llama-tokenize (38,524.7 nJ), build.bin.libggml.so (5,124.4 nJ), and CLI tools (277.2 nJ each). Core inference libraries show zero performance change, confirming no impact on critical paths. Function AnalysisSix static initialization functions show expected startup overhead increases of 450-560ns response time (+1.9% to +2.4%) and 42-52ns throughput time (+10.1% to +12.5%) from adding one PROJECTOR_TYPE_KIMIK25 map entry. These are one-time costs during program startup with zero runtime impact. STL utility functions demonstrate compiler optimization benefits: _M_const_cast improved -68.4% response time (-181ns), begin improved -51.2% response time (-88ns), and clip_projector_type_from_string improved -11.8% response time (-92ns) despite adding functionality. The string_replace_all function gained -7.8% response time (-27ns) from compiler optimizations. All changes occur in non-critical model loading and initialization code. Matrix operations, attention mechanisms, KV cache management, and token generation paths remain completely unmodified. Additional FindingsZero impact on GPU backends (CUDA, Metal, HIP, Vulkan, SYCL) and inference hot paths. The 0.77% power increase in libmtmd.so represents negligible one-time startup cost, fully justified by new model architecture support. Changes demonstrate effective isolation with no propagation to performance-critical inference operations. 🔎 Full breakdown: Loci Inspector. |
OverviewAnalysis of 115,396 functions across 14 binaries reveals minimal performance impact from Kimi-K2.5 multimodal model support addition. Modified: 35 functions (0.03%), New: 69 (0.06%), Removed: 4 (0.003%), Unchanged: 115,288 (99.91%). Power Consumption Changes:
Function AnalysisStatic Initialization Functions (5 functions in cogvlm.cpp, glm4v.cpp, internvl.cpp, minicpmv.cpp, clip.cpp):
STL Utility Functions (improvements despite no code changes):
String Processing Functions:
Other analyzed functions showed negligible changes. Additional FindingsChanges are architecturally isolated to MTMD (multimodal) subsystem. Core inference libraries (libllama.so, libggml.so, all GPU backends) show zero change, confirming no impact on performance-critical paths: matrix operations, attention computation, KV cache management, or token generation. The 1.1% power increase in libmtmd.so represents ~250ns one-time initialization overhead, fully offset by ~329ns in compiler optimization gains across other functions. Net startup performance is neutral to positive. Five commits added Kimi-K2.5 vision model support with proper stability fixes (assert crash fix, selective revert for backward compatibility). Implementation follows established patterns for projector type additions. 🔎 Full breakdown: Loci Inspector. |
40ccb9a to
d9cffb7
Compare
048ad94 to
6c1fde6
Compare
823244c to
bab7d39
Compare
a92fe2a to
6495042
Compare
ef246cc to
8c889a6
Compare
Note
Source pull request: ggml-org/llama.cpp#19170
Adding support for https://huggingface.co/moonshotai/Kimi-K2.5
Since this model includes compressed-tensors (INT4 for the conditional experts), I moved the
dequant_modelto theprepare_tensorscall at @compilade's suggestion. The model conversion fails otherwise because thequantization_configis nested under thetext_configin the config.json.Additionally, this model adds some new keys for the vision tower, prefixed as
vt_, and the preprocessor_config.json has the expected fields nested in themedia_proc_cfgkey.This PR does not include the "hacked" Q4_0 changes by @jukofyork, referred to in this comment.
I have added a first pass at vision support, heavily aided by LLM assistance. I entirely expect @ngxson to tear it to shreds or call me a dummy and show me an easier way to add that vision support :)