Conversation
Add support for K2.5 model (K2VL) in vllm. Key changes: - Add K2VL model implementation and configuration. - Add Kimi K2 reasoning parser. - Update Chat API and multimodal inputs to support video chunks. - Add K2.5 specific multimodal processing logic and parsing. - Register K2VL model and configuration. Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
…orts Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
- k2vl_vit.py: Use get_act_fn('gelu_pytorch_tanh') instead of PytorchGELUTanh
- k2vl_vit.py: Remove VisionTowerConfig and ProjectorConfig wrapper classes
- k2vl.py: Use TensorSchema (K2VLMediaPixelInputs) instead of BatchFeature
- chat_utils.py: Add assert isinstance(item, dict) for type narrowing
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
- Replace custom multihead_attention/eager_attention with MMEncoderAttention - Add tensor parallel support using QKVParallelLinear and RowParallelLinear - MLP2 now uses ColumnParallelLinear/RowParallelLinear with TP/DP support - Pass multimodal_config through MoonViT3dPretrainedModel to encoder layers - Rename vision_tower_forward_auto to vision_tower_forward This aligns with PR #31738 pattern for unified multi-platform attention backends. Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Remove temporal_merge_kernel_size, sample_fps, and timestamp_mode from K2VLConfig as they are not used by vLLM. These parameters are loaded from preprocessor_config.json via media_processor and should not be duplicated in the model config. Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
- Rename files: k2vl.py -> kimi_k25.py, k2vl_vit.py -> kimi_k25_vit.py - Rename classes: K2VLConfig -> KimiK25Config, K2VLForConditionalGeneration -> KimiK25ForConditionalGeneration, etc. - Update registry, configs/__init__.py, config.py mappings - Update docs and comments to reference Kimi-K2.5 instead of K2VL - Pass vision_config to KimiK25MultiModalProjector per reference diff Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
…ndant bytes→PIL conversion When processing vision_chunk items, pass the already-decoded PIL.Image via .media instead of .original_bytes. This avoids an unnecessary bytes→PIL.Image conversion in media_processor since the image was already decoded in load_bytes(). Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
- Move PIL.Image import out of TYPE_CHECKING in inputs.py - Remove duplicate get_dummy_image, use parent _get_dummy_images instead - Add get_expert_mapping method following DeepseekV2 pattern - Use SharedFusedMoE for proper shared experts handling Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Replace manual sequential batching loop with vLLM's standard DP utility. This enables true data parallelism across GPUs for vision features. - Add run_dp_sharded_mrope_vision_model import - Remove KIMIV_VT_INFER_MAX_PATCH_NUM constant - Simplify vision_tower_forward to use DP sharding - Add self.config to MoonViT3dPretrainedModel for DP compatibility Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
| width=MaxImageTokenMeta.width, | ||
| num_images=1, | ||
| )[0], | ||
| ) |
There was a problem hiding this comment.
Incomplete TypedDict instances missing required fields
Medium Severity
The VisionChunkVideo and VisionChunkImage instances created in get_dummy_mm_items are missing required TypedDict fields. VisionChunkVideo on line 191-193 is missing uuid, prompt, and video_idx, while VisionChunkImage on line 198-205 is missing uuid. Since these items are passed to media_tokens_calculator (external code loaded via trust_remote_code), if that code accesses these missing keys directly instead of using .get(), it will raise a KeyError at runtime during model profiling.
| else: | ||
| video_data = data | ||
| video_chunks = mm_processor.split_video_chunks(video_data) | ||
| for i, vc in enumerate(video_chunks): |
There was a problem hiding this comment.
Loop variable shadowing causes confusing code
Low Severity
The inner loop at line 707 uses i as its loop variable (for i, vc in enumerate(video_chunks)), which shadows the outer loop's i variable from line 679 (for i, (idx, item) in enumerate(filtered_items)). While Python reassigns loop variables on each iteration so this doesn't cause incorrect behavior currently, it creates confusing code and could lead to subtle bugs if the code is modified in the future.
Additional Locations (1)
|
|
||
|
|
||
| class KimiK25VisionConfig(PretrainedConfig): | ||
| """Vision configuration for Kimi-K2.5 (vision tower + mm projector). |
There was a problem hiding this comment.
Is this just so that users don't have to install dev version of transformers? Otherwise trust_remote_code=True should be able to load the config (assuming it's on HF Hub)
| NOTE: This is temporary for Kimi-K2.5 testing. Remember to change back | ||
| to opencv before release if needed. |
| if "video" in items_by_modality: | ||
| mm_data["video"] = [data for data, uuid in items_by_modality["video"]] | ||
| mm_uuids["video"] = [uuid for data, uuid in items_by_modality["video"]] | ||
| if "vision_chunk" in items_by_modality: |
There was a problem hiding this comment.
This looks quite complicated, I feel that we should have unit tests for this
There was a problem hiding this comment.
And preferably we should separate this out into another function
| "CompressedTensorsWNA16MoEMethod", | ||
| ): | ||
| loaded_weight = loaded_weight.t().contiguous() | ||
| loaded_weight = loaded_weight.t() |
There was a problem hiding this comment.
This is used to speedup the int4 weight loading, we can revert this in case of corner case.
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Roger Wang <hey@rogerw.io>
| logger = init_logger(__name__) | ||
|
|
||
|
|
||
| class KimiK2ReasoningParser(ReasoningParser): |
There was a problem hiding this comment.
@ywang96 @youkaichao Sorry, I was on leave for the past couple of days.
We already have a Holo2ReasoningParser that provides exactly the same functionality, so there’s no need to duplicate the code.
I’ve removed it: #33221 PTAL.
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn> Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: wanglinian <wanglinian@stu.pku.edu.cn> Co-authored-by: wangln19 <96399074+wangln19@users.noreply.github.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> (cherry picked from commit b539f98)
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn> Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: wanglinian <wanglinian@stu.pku.edu.cn> Co-authored-by: wangln19 <96399074+wangln19@users.noreply.github.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn> Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: wanglinian <wanglinian@stu.pku.edu.cn> Co-authored-by: wangln19 <96399074+wangln19@users.noreply.github.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn> Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: wanglinian <wanglinian@stu.pku.edu.cn> Co-authored-by: wangln19 <96399074+wangln19@users.noreply.github.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Purpose
Kimi-K2.5 model support - see recipe at https://docs.vllm.ai/projects/recipes/en/latest/moonshotai/Kimi-K2.5.html
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.