UPSTREAM PR #19170: Add Kimi-K2.5 support by loci-dev · Pull Request #1119 · auroralabs-loci/llama.cpp

loci-dev · 2026-02-01T10:43:37Z

Note

Source pull request: ggml-org/llama.cpp#19170

Adding support for https://huggingface.co/moonshotai/Kimi-K2.5

Since this model includes compressed-tensors (INT4 for the conditional experts), I moved the dequant_model to the prepare_tensors call at @compilade's suggestion. The model conversion fails otherwise because the quantization_config is nested under the text_config in the config.json.

Additionally, this model adds some new keys for the vision tower, prefixed as vt_, and the preprocessor_config.json has the expected fields nested in the media_proc_cfg key.

This PR does not include the "hacked" Q4_0 changes by @jukofyork, referred to in this comment.

I have added a first pass at vision support, heavily aided by LLM assistance. I entirely expect @ngxson to tear it to shreds or call me a dummy and show me an easier way to add that vision support :)

Add new kimi-k2.5 keys to mtmd convert Update V_MMPROJ tensor mapping for new mm_projector.proj keys Update V_M_IMP_NORM for new mm_projector.pre_norm key

loci-review · 2026-02-01T11:57:14Z

Overview

Analysis of 115,396 functions (35 modified, 69 new, 4 removed) across 15 binaries shows minimal performance impact from adding Kimi-K2.5 vision model support. Only build.bin.libmtmd.so exhibits measurable change with +0.77% power consumption increase (180,399.2 nJ vs 179,022.4 nJ). All other binaries remain unchanged: build.bin.libllama.so (249,105.8 nJ), build.bin.libggml-cpu.so (157,685.9 nJ), build.bin.libggml-base.so (73,208.7 nJ), build.bin.llama-tts (360,000.0 nJ), build.bin.llama-cvector-generator (354,510.6 nJ), build.bin.llama-bench (60,119.5 nJ), build.bin.llama-quantize (43,714.7 nJ), build.bin.llama-gguf-split (40,060.0 nJ), build.bin.llama-tokenize (38,524.7 nJ), build.bin.libggml.so (5,124.4 nJ), and CLI tools (277.2 nJ each). Core inference libraries show zero performance change, confirming no impact on critical paths.

Function Analysis

Six static initialization functions show expected startup overhead increases of 450-560ns response time (+1.9% to +2.4%) and 42-52ns throughput time (+10.1% to +12.5%) from adding one PROJECTOR_TYPE_KIMIK25 map entry. These are one-time costs during program startup with zero runtime impact.

STL utility functions demonstrate compiler optimization benefits: _M_const_cast improved -68.4% response time (-181ns), begin improved -51.2% response time (-88ns), and clip_projector_type_from_string improved -11.8% response time (-92ns) despite adding functionality. The string_replace_all function gained -7.8% response time (-27ns) from compiler optimizations.

All changes occur in non-critical model loading and initialization code. Matrix operations, attention mechanisms, KV cache management, and token generation paths remain completely unmodified.

Additional Findings

Zero impact on GPU backends (CUDA, Metal, HIP, Vulkan, SYCL) and inference hot paths. The 0.77% power increase in libmtmd.so represents negligible one-time startup cost, fully justified by new model architecture support. Changes demonstrate effective isolation with no propagation to performance-critical inference operations.

🔎 Full breakdown: Loci Inspector.
💬 Questions? Tag @loci-dev.

loci-review · 2026-02-01T12:49:57Z

Overview

Analysis of 115,396 functions across 14 binaries reveals minimal performance impact from Kimi-K2.5 multimodal model support addition. Modified: 35 functions (0.03%), New: 69 (0.06%), Removed: 4 (0.003%), Unchanged: 115,288 (99.91%).

Power Consumption Changes:

build.bin.libmtmd.so: +1,973 nJ (+1.1%) - only affected binary
build.bin.libllama.so: +0.72 nJ (0.0%)
build.bin.llama-tts: -0.57 nJ (0.0%)
build.bin.llama-cvector-generator: -0.61 nJ (0.0%)
build.bin.llama-bench: 0 nJ (0.0%)
build.bin.llama-tokenize: 0 nJ (0.0%)
build.bin.llama-quantize: 0 nJ (0.0%)
build.bin.llama-qwen2vl-cli: 0 nJ (0.0%)
build.bin.libggml.so: 0 nJ (0.0%)
build.bin.libggml-base.so: 0 nJ (0.0%)
build.bin.libggml-cpu.so: 0 nJ (0.0%)
build.bin.llama-gemma3-cli: 0 nJ (0.0%)
build.bin.llama-gguf-split: 0 nJ (0.0%)
build.bin.llama-llava-cli: 0 nJ (0.0%)
build.bin.llama-minicpmv-cli: 0 nJ (0.0%)

Function Analysis

Static Initialization Functions (5 functions in cogvlm.cpp, glm4v.cpp, internvl.cpp, minicpmv.cpp, clip.cpp):

__static_initialization_and_destruction_0 variants show consistent regressions
Response time: +549-558ns (+2.33-2.37%)
Throughput time: +44-53ns (+10.36-12.80%)
Cause: Added PROJECTOR_TYPE_KIMIK25 entry to PROJECTOR_TYPE_NAMES static map (30→31 entries)
Context: One-time startup initialization before main(), zero inference impact

STL Utility Functions (improvements despite no code changes):

std::_Rb_tree_const_iterator::_M_const_cast: -181ns response (-68.4%), -182ns throughput (-74.0%)
std::vector<llava_uhd::slice_coordinates>::begin: -88ns response (-51.2%), -88ns throughput (-58.5%)
std::vector<mobilenetv5_block>::_M_check_len: -55ns response (-3.5%), -56ns throughput (-26.0%)
Cause: Compiler optimizations (GCC 13, ARM64)

String Processing Functions:

clip_projector_type_from_string: -92ns response (-11.8%), -92ns throughput (-38.6%)
string_replace_all: -27ns response (-7.8%), -27ns throughput (-10.6%)
Cause: Compiler optimizations in string operations

Other analyzed functions showed negligible changes.

Additional Findings

Changes are architecturally isolated to MTMD (multimodal) subsystem. Core inference libraries (libllama.so, libggml.so, all GPU backends) show zero change, confirming no impact on performance-critical paths: matrix operations, attention computation, KV cache management, or token generation. The 1.1% power increase in libmtmd.so represents ~250ns one-time initialization overhead, fully offset by ~329ns in compiler optimization gains across other functions. Net startup performance is neutral to positive. Five commits added Kimi-K2.5 vision model support with proper stability fixes (assert crash fix, selective revert for backward compatibility). Implementation follows established patterns for projector type additions.

🔎 Full breakdown: Loci Inspector.
💬 Questions? Tag @loci-dev.

AesSedai added 4 commits February 1, 2026 02:15

Move dequant_model to after the text_config merge

042c3cb

Add new kimi-k2.5 keys to mtmd convert Update V_MMPROJ tensor mapping for new mm_projector.proj keys Update V_M_IMP_NORM for new mm_projector.pre_norm key

Fix a couple of oversights

a4c9a08

Add image support for Kimi-K2.5

9c44981

Revert changes to KimiVLForConditionalGeneration

9b14cb8

loci-dev temporarily deployed to PROD__AL_DEMO February 1, 2026 10:43 — with GitHub Actions Inactive

Fix an assert crash

37a386d

loci-dev force-pushed the main branch from 237828b to b128b33 Compare February 1, 2026 11:09

loci-dev temporarily deployed to PROD__AL_DEMO February 1, 2026 11:38 — with GitHub Actions Inactive

loci-dev force-pushed the main branch from b128b33 to d613f70 Compare February 1, 2026 12:16

loci-dev force-pushed the main branch 19 times, most recently from 40ccb9a to d9cffb7 Compare February 2, 2026 08:22

loci-dev force-pushed the main branch 7 times, most recently from 048ad94 to 6c1fde6 Compare February 3, 2026 13:32

loci-dev force-pushed the main branch 10 times, most recently from 823244c to bab7d39 Compare February 19, 2026 02:17

loci-dev force-pushed the main branch 10 times, most recently from a92fe2a to 6495042 Compare February 27, 2026 02:17

loci-dev force-pushed the main branch 3 times, most recently from ef246cc to 8c889a6 Compare March 2, 2026 02:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #19170: Add Kimi-K2.5 support#1119

UPSTREAM PR #19170: Add Kimi-K2.5 support#1119
loci-dev wants to merge 5 commits intomainfrom
loci/pr-19170-kimi-k2.5

loci-dev commented Feb 1, 2026

Uh oh!

loci-review bot commented Feb 1, 2026

Uh oh!

loci-review bot commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Feb 1, 2026

Uh oh!

loci-review bot commented Feb 1, 2026

Overview

Function Analysis

Additional Findings

Uh oh!

loci-review bot commented Feb 1, 2026

Overview

Function Analysis

Additional Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants