[Models] Kimi-K2.5#33131

Merged

youkaichao merged 41 commits intomainfrom

feat-k2.5-support

Jan 27, 2026

Member

ywang96 commented Jan 27, 2026 •

edited by github-actions bot

Loading

Purpose

Kimi-K2.5 model support - see recipe at https://docs.vllm.ai/projects/recipes/en/latest/moonshotai/Kimi-K2.5.html

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

wangln19 and others added 30 commits

January 4, 2026 07:25


          feat: support K2.5 model

c9d5ff0

Add support for K2.5 model (K2VL) in vllm.

Key changes:
- Add K2VL model implementation and configuration.
- Add Kimi K2 reasoning parser.
- Update Chat API and multimodal inputs to support video chunks.
- Add K2.5 specific multimodal processing logic and parsing.
- Register K2VL model and configuration.

Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>


          delete enable_thinking and training para

120e059

Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>


          Update vllm/model_executor/models/k2vl_vit.py

c9243a0

Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>


          Update vllm/model_executor/models/k2vl_vit.py

f28234d

Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>


          Update vllm/entrypoints/chat_utils.py

26898aa

Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>


          Update vllm/entrypoints/chat_utils.py

4502ffc

Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>


          refactor: split K2VLConfig vision params into K2VLVisionConfig

256bc09

Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>


          fix: rename mm_projection_auto to mm_projector_forward

a6db17b

Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>


          Address code review feedback: refactor tpool_patch_merger and fix imp…

dabbad7

…orts

Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>


          refactor: address code review feedback

ffb3ecf

- k2vl_vit.py: Use get_act_fn('gelu_pytorch_tanh') instead of PytorchGELUTanh
- k2vl_vit.py: Remove VisionTowerConfig and ProjectorConfig wrapper classes
- k2vl.py: Use TensorSchema (K2VLMediaPixelInputs) instead of BatchFeature
- chat_utils.py: Add assert isinstance(item, dict) for type narrowing

Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>


          docs: add K2VL model to registry and supported models

bd5bb72

Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>


          [Model] Refactor K2VL ViT to use MMEncoderAttention

a0a0fb6

- Replace custom multihead_attention/eager_attention with MMEncoderAttention
- Add tensor parallel support using QKVParallelLinear and RowParallelLinear
- MLP2 now uses ColumnParallelLinear/RowParallelLinear with TP/DP support
- Pass multimodal_config through MoonViT3dPretrainedModel to encoder layers
- Rename vision_tower_forward_auto to vision_tower_forward

This aligns with PR #31738 pattern for unified multi-platform attention backends.

Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>


          refactor(k2vl): remove unused video-chunk config params

875d970

Remove temporal_merge_kernel_size, sample_fps, and timestamp_mode
from K2VLConfig as they are not used by vLLM.

These parameters are loaded from preprocessor_config.json via
media_processor and should not be duplicated in the model config.

Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>


          Rename K2VL to KimiK25

5f54da5

- Rename files: k2vl.py -> kimi_k25.py, k2vl_vit.py -> kimi_k25_vit.py
- Rename classes: K2VLConfig -> KimiK25Config, K2VLForConditionalGeneration -> KimiK25ForConditionalGeneration, etc.
- Update registry, configs/__init__.py, config.py mappings
- Update docs and comments to reference Kimi-K2.5 instead of K2VL
- Pass vision_config to KimiK25MultiModalProjector per reference diff

Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>


          docs: Update Kimi-K2.5 model name and remove video support marker

c2bfb1d

Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>


          perf(chat_utils): use .media instead of .original_bytes to avoid redu…

0bbbd38

…ndant bytes→PIL conversion

When processing vision_chunk items, pass the already-decoded PIL.Image
via .media instead of .original_bytes. This avoids an unnecessary
bytes→PIL.Image conversion in media_processor since the image was
already decoded in load_bytes().

Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>


          Address code review feedback for KimiK25 model

a69f0d5

Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>


          style: fix ruff format - remove extra blank lines

1043ba3

Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>


          refactor(kimi_k25): address code review feedback

009dff4

- Move PIL.Image import out of TYPE_CHECKING in inputs.py
- Remove duplicate get_dummy_image, use parent _get_dummy_images instead
- Add get_expert_mapping method following DeepseekV2 pattern
- Use SharedFusedMoE for proper shared experts handling

Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>


          refactor(kimi_k25_vit): use run_dp_sharded_mrope_vision_model for ViT DP

389aa5a

Replace manual sequential batching loop with vLLM's standard DP utility.
This enables true data parallelism across GPUs for vision features.

- Add run_dp_sharded_mrope_vision_model import
- Remove KIMIV_VT_INFER_MAX_PATCH_NUM constant
- Simplify vision_tower_forward to use DP sharding
- Add self.config to MoonViT3dPretrainedModel for DP compatibility

Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>


          Merge remote-tracking branch 'upstream/main' into k2.5-render

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>


          move video chunk rebuild to render

d1553ad

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>


          fix legacy import

d27e665

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>


          fix outdate

f79ba64

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>


          fix vit tp mode

62be6ad

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

fix

0ed896d

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>


          fix reason parser

69c8721

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>


          Merge remote-tracking branch 'vllm-kv/main' into k2.5-data-parser

0ce7a60

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>


          use vision chunk uuid validation through data parser

e50f62c

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>


          fix missing uuids

fcb1983

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

ywang96 requested review from mgoin, pavanimajety and tjtanaa as code owners

January 27, 2026 02:15

ywang96 added the ready label

youkaichao approved these changes

View reviewed changes

Member

youkaichao left a comment

🔥

cursor bot reviewed

View reviewed changes

cursor bot left a comment

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

Comment @cursor review or bugbot run to trigger another review on this PR

vllm/model_executor/models/kimi_k25.py

+                              width=MaxImageTokenMeta.width,
+                              num_images=1,
+                          )[0],
+                      )

cursor bot Jan 27, 2026

Incomplete TypedDict instances missing required fields

Medium Severity

The VisionChunkVideo and VisionChunkImage instances created in get_dummy_mm_items are missing required TypedDict fields. VisionChunkVideo on line 191-193 is missing uuid, prompt, and video_idx, while VisionChunkImage on line 198-205 is missing uuid. Since these items are passed to media_tokens_calculator (external code loaded via trust_remote_code), if that code accesses these missing keys directly instead of using .get(), it will raise a KeyError at runtime during model profiling.

vllm/entrypoints/chat_utils.py

+                                      else:
+                                          video_data = data
+                                      video_chunks = mm_processor.split_video_chunks(video_data)
+                                      for i, vc in enumerate(video_chunks):

cursor bot Jan 27, 2026

Loop variable shadowing causes confusing code

Low Severity

The inner loop at line 707 uses i as its loop variable (for i, vc in enumerate(video_chunks)), which shadows the outer loop's i variable from line 679 (for i, (idx, item) in enumerate(filtered_items)). While Python reassigns loop variables on each iteration so this doesn't cause incorrect behavior currently, it creates confusing code and could lead to subtle bugs if the code is modified in the future.

Additional Locations (1)

vllm/entrypoints/chat_utils.py#L678-L679

Isotr0py approved these changes

View reviewed changes

Member

Isotr0py left a comment

🚀

DarkLight1337 reviewed

View reviewed changes

vllm/transformers_utils/configs/kimi_k25.py Outdated



		class KimiK25VisionConfig(PretrainedConfig):
		"""Vision configuration for Kimi-K2.5 (vision tower + mm projector).

Member

DarkLight1337 Jan 27, 2026

Is this just so that users don't have to install dev version of transformers? Otherwise trust_remote_code=True should be able to load the config (assuming it's on HF Hub)

DarkLight1337 reviewed

View reviewed changes

vllm/multimodal/video.py

Comment on lines +245 to +246

		NOTE: This is temporary for Kimi-K2.5 testing. Remember to change back
		to opencv before release if needed.

Member

DarkLight1337 Jan 27, 2026

Resolve this NOTE?

DarkLight1337 reviewed

View reviewed changes

vllm/entrypoints/chat_utils.py

                   if "video" in items_by_modality:
                       mm_data["video"] = [data for data, uuid in items_by_modality["video"]]
                       mm_uuids["video"] = [uuid for data, uuid in items_by_modality["video"]]
+                  if "vision_chunk" in items_by_modality:

Member

DarkLight1337 Jan 27, 2026

This looks quite complicated, I feel that we should have unit tests for this

Member

DarkLight1337 Jan 27, 2026

And preferably we should separate this out into another function

DarkLight1337 reviewed

View reviewed changes

vllm/model_executor/layers/fused_moe/layer.py Outdated

                           "CompressedTensorsWNA16MoEMethod",
                       ):
-                          loaded_weight = loaded_weight.t().contiguous()
+                          loaded_weight = loaded_weight.t()

Member

DarkLight1337 Jan 27, 2026

Is this intended?

Member

DarkLight1337 Jan 27, 2026

cc @tjtanaa please check this doesn't break on AMD

Member

Isotr0py Jan 27, 2026

This is used to speedup the int4 weight loading, we can revert this in case of corner case.

DarkLight1337 reviewed

View reviewed changes

vllm/entrypoints/chat_utils.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed

View reviewed changes

vllm/entrypoints/chat_utils.py Outdated Show resolved Hide resolved

Isotr0py and others added 6 commits

January 27, 2026 11:17


          fix vit dtype

3c8c7ca

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>


          fuse moe contiguous

d19264e

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>


          clean mm tracker adding

bfd8e63

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>


          fix image embeds test

01d3aa7

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>


          override doc

61a45f5

Signed-off-by: Roger Wang <hey@rogerw.io>


          Merge branch 'main' into feat-k2.5-support

b98ddea

youkaichao merged commit b539f98 into main

61 checks passed

youkaichao deleted the feat-k2.5-support branch

January 27, 2026 06:50

ywang96 restored the feat-k2.5-support branch

January 27, 2026 07:35

chaunceyjiang reviewed

View reviewed changes

vllm/reasoning/kimi_k2_reasoning_parser.py

		logger = init_logger(__name__)


		class KimiK2ReasoningParser(ReasoningParser):

Collaborator

chaunceyjiang Jan 28, 2026

@ywang96 @youkaichao Sorry, I was on leave for the past couple of days.

We already have a Holo2ReasoningParser that provides exactly the same functionality, so there’s no need to duplicate the code.

I’ve removed it: #33221 PTAL.

khluu pushed a commit that referenced this pull request


          [Models] Kimi-K2.5 (#33131)

Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: wanglinian <wanglinian@stu.pku.edu.cn>
Co-authored-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
(cherry picked from commit b539f98)

hjjq pushed a commit to djmmoss/vllm that referenced this pull request


          [Models] Kimi-K2.5 (vllm-project#33131)

d9ce752

Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: wanglinian <wanglinian@stu.pku.edu.cn>
Co-authored-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

apd10 pushed a commit to apd10/vllm that referenced this pull request


          [Models] Kimi-K2.5 (vllm-project#33131)

e96b587

Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: wanglinian <wanglinian@stu.pku.edu.cn>
Co-authored-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

ehfd mentioned this pull request

[Bug]: The content of response from Kimi-K2.5 is empty. #33654

Open

1 task

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request


          [Models] Kimi-K2.5 (vllm-project#33131)

69f4057

Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: wanglinian <wanglinian@stu.pku.edu.cn>
Co-authored-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

chaunceyjiang chaunceyjiang left review comments

Isotr0py Isotr0py approved these changes

DarkLight1337 DarkLight1337 left review comments

cursor[bot] cursor[bot] left review comments

youkaichao youkaichao approved these changes

aarnphm Awaiting requested review from aarnphm aarnphm is a code owner

NickLucche Awaiting requested review from NickLucche NickLucche is a code owner

tjtanaa Awaiting requested review from tjtanaa tjtanaa is a code owner

mgoin Awaiting requested review from mgoin

pavanimajety Awaiting requested review from pavanimajety

+1 more reviewer

gemini-code-assist[bot] gemini-code-assist[bot] left review comments

Labels

documentation frontend multi-modality new-model ready