Skip to content

[Models] Kimi-K2.5#33131

Merged
youkaichao merged 41 commits intomainfrom
feat-k2.5-support
Jan 27, 2026
Merged

[Models] Kimi-K2.5#33131
youkaichao merged 41 commits intomainfrom
feat-k2.5-support

Conversation

@ywang96
Copy link
Copy Markdown
Member

@ywang96 ywang96 commented Jan 27, 2026

Purpose

Kimi-K2.5 model support - see recipe at https://docs.vllm.ai/projects/recipes/en/latest/moonshotai/Kimi-K2.5.html

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

wangln19 and others added 30 commits January 4, 2026 07:25
Add support for K2.5 model (K2VL) in vllm.

Key changes:
- Add K2VL model implementation and configuration.
- Add Kimi K2 reasoning parser.
- Update Chat API and multimodal inputs to support video chunks.
- Add K2.5 specific multimodal processing logic and parsing.
- Register K2VL model and configuration.

Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
…orts

Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
- k2vl_vit.py: Use get_act_fn('gelu_pytorch_tanh') instead of PytorchGELUTanh
- k2vl_vit.py: Remove VisionTowerConfig and ProjectorConfig wrapper classes
- k2vl.py: Use TensorSchema (K2VLMediaPixelInputs) instead of BatchFeature
- chat_utils.py: Add assert isinstance(item, dict) for type narrowing

Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
- Replace custom multihead_attention/eager_attention with MMEncoderAttention
- Add tensor parallel support using QKVParallelLinear and RowParallelLinear
- MLP2 now uses ColumnParallelLinear/RowParallelLinear with TP/DP support
- Pass multimodal_config through MoonViT3dPretrainedModel to encoder layers
- Rename vision_tower_forward_auto to vision_tower_forward

This aligns with PR #31738 pattern for unified multi-platform attention backends.

Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Remove temporal_merge_kernel_size, sample_fps, and timestamp_mode
from K2VLConfig as they are not used by vLLM.

These parameters are loaded from preprocessor_config.json via
media_processor and should not be duplicated in the model config.

Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
- Rename files: k2vl.py -> kimi_k25.py, k2vl_vit.py -> kimi_k25_vit.py
- Rename classes: K2VLConfig -> KimiK25Config, K2VLForConditionalGeneration -> KimiK25ForConditionalGeneration, etc.
- Update registry, configs/__init__.py, config.py mappings
- Update docs and comments to reference Kimi-K2.5 instead of K2VL
- Pass vision_config to KimiK25MultiModalProjector per reference diff

Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
…ndant bytes→PIL conversion

When processing vision_chunk items, pass the already-decoded PIL.Image
via .media instead of .original_bytes. This avoids an unnecessary
bytes→PIL.Image conversion in media_processor since the image was
already decoded in load_bytes().

Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
- Move PIL.Image import out of TYPE_CHECKING in inputs.py
- Remove duplicate get_dummy_image, use parent _get_dummy_images instead
- Add get_expert_mapping method following DeepseekV2 pattern
- Use SharedFusedMoE for proper shared experts handling

Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Replace manual sequential batching loop with vLLM's standard DP utility.
This enables true data parallelism across GPUs for vision features.

- Add run_dp_sharded_mrope_vision_model import
- Remove KIMIV_VT_INFER_MAX_PATCH_NUM constant
- Simplify vision_tower_forward to use DP sharding
- Add self.config to MoonViT3dPretrainedModel for DP compatibility

Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
@ywang96 ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 27, 2026
Copy link
Copy Markdown
Member

@youkaichao youkaichao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

width=MaxImageTokenMeta.width,
num_images=1,
)[0],
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete TypedDict instances missing required fields

Medium Severity

The VisionChunkVideo and VisionChunkImage instances created in get_dummy_mm_items are missing required TypedDict fields. VisionChunkVideo on line 191-193 is missing uuid, prompt, and video_idx, while VisionChunkImage on line 198-205 is missing uuid. Since these items are passed to media_tokens_calculator (external code loaded via trust_remote_code), if that code accesses these missing keys directly instead of using .get(), it will raise a KeyError at runtime during model profiling.

Fix in Cursor Fix in Web

else:
video_data = data
video_chunks = mm_processor.split_video_chunks(video_data)
for i, vc in enumerate(video_chunks):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loop variable shadowing causes confusing code

Low Severity

The inner loop at line 707 uses i as its loop variable (for i, vc in enumerate(video_chunks)), which shadows the outer loop's i variable from line 679 (for i, (idx, item) in enumerate(filtered_items)). While Python reassigns loop variables on each iteration so this doesn't cause incorrect behavior currently, it creates confusing code and could lead to subtle bugs if the code is modified in the future.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Member

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀



class KimiK25VisionConfig(PretrainedConfig):
"""Vision configuration for Kimi-K2.5 (vision tower + mm projector).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just so that users don't have to install dev version of transformers? Otherwise trust_remote_code=True should be able to load the config (assuming it's on HF Hub)

Comment on lines +245 to +246
NOTE: This is temporary for Kimi-K2.5 testing. Remember to change back
to opencv before release if needed.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve this NOTE?

if "video" in items_by_modality:
mm_data["video"] = [data for data, uuid in items_by_modality["video"]]
mm_uuids["video"] = [uuid for data, uuid in items_by_modality["video"]]
if "vision_chunk" in items_by_modality:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks quite complicated, I feel that we should have unit tests for this

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And preferably we should separate this out into another function

"CompressedTensorsWNA16MoEMethod",
):
loaded_weight = loaded_weight.t().contiguous()
loaded_weight = loaded_weight.t()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @tjtanaa please check this doesn't break on AMD

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used to speedup the int4 weight loading, we can revert this in case of corner case.

Isotr0py and others added 6 commits January 27, 2026 11:17
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Roger Wang <hey@rogerw.io>
@youkaichao youkaichao merged commit b539f98 into main Jan 27, 2026
61 checks passed
@youkaichao youkaichao deleted the feat-k2.5-support branch January 27, 2026 06:50
@ywang96 ywang96 restored the feat-k2.5-support branch January 27, 2026 07:35
logger = init_logger(__name__)


class KimiK2ReasoningParser(ReasoningParser):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ywang96 @youkaichao Sorry, I was on leave for the past couple of days.

We already have a Holo2ReasoningParser that provides exactly the same functionality, so there’s no need to duplicate the code.

I’ve removed it: #33221 PTAL.

khluu pushed a commit that referenced this pull request Jan 28, 2026
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: wanglinian <wanglinian@stu.pku.edu.cn>
Co-authored-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
(cherry picked from commit b539f98)
hjjq pushed a commit to djmmoss/vllm that referenced this pull request Jan 30, 2026
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: wanglinian <wanglinian@stu.pku.edu.cn>
Co-authored-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
apd10 pushed a commit to apd10/vllm that referenced this pull request Jan 31, 2026
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: wanglinian <wanglinian@stu.pku.edu.cn>
Co-authored-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: wanglinian <wanglinian@stu.pku.edu.cn>
Co-authored-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation frontend multi-modality Related to multi-modality (#4194) new-model Requests to new models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants