Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
c4e7e8f
finish add kimi vl config to sglang
liwenju0 Apr 13, 2025
59961c3
use deepseek config of sglang
Apr 13, 2025
ed9ff11
fix version_v9
Apr 14, 2025
f2502ef
add embedding getting method
liwenju0 Apr 14, 2025
a308b0c
finish first version
liwenju0 Apr 14, 2025
e3fa4ca
Merge branch 'main' into feature-add-support-kimivl-model
liwenju0 Apr 15, 2025
998af0c
Merge branch 'main' into feature-add-support-kimivl-model
zhaochenyang20 Apr 15, 2025
b35ece2
clean code
Apr 15, 2025
c590225
format code
Apr 15, 2025
5099cd3
fix hard coded token id
liwenju0 Apr 16, 2025
e730f6c
Merge branch 'main' into feature-add-support-kimivl-model
liwenju0 Apr 16, 2025
de35867
Merge branch 'main' into feature-add-support-kimivl-model
zhaochenyang20 Apr 16, 2025
68f1047
add test_vision_openai_server of kimi-vl
liwenju0 Apr 17, 2025
a976538
Merge branch 'feature-add-support-kimivl-model' of github.com:liwenju…
liwenju0 Apr 17, 2025
57b8487
Merge branch 'main' into feature-add-support-kimivl-model
liwenju0 Apr 17, 2025
a2461ce
print detail information when assert failed in test_vision_openai_ser…
liwenju0 Apr 17, 2025
bd8306d
format code
liwenju0 Apr 17, 2025
1f5f003
Merge branch 'main' into feature-add-support-kimivl-model
zhaochenyang20 Apr 17, 2025
e4bd1b0
Merge branch 'main' into feature-add-support-kimivl-model
zhaochenyang20 Apr 17, 2025
c39d157
Merge branch 'main' into feature-add-support-kimivl-model
liwenju0 Apr 18, 2025
18a1188
Merge branch 'main' into feature-add-support-kimivl-model
zhaochenyang20 Apr 18, 2025
47a1772
Merge branch 'main' into feature-add-support-kimivl-model
liwenju0 Apr 21, 2025
7458bf8
Merge branch 'main' into feature-add-support-kimivl-model
zhaochenyang20 Apr 21, 2025
e9a024b
improve readbility and performance
liwenju0 Apr 21, 2025
128e044
Merge branch 'main' into feature-add-support-kimivl-model
BBuf Apr 21, 2025
05e459e
Merge branch 'main' into feature-add-support-kimivl-model
liwenju0 Apr 22, 2025
0e7ced3
Merge branch 'main' into feature-add-support-kimivl-model
liwenju0 Apr 22, 2025
2120c70
Merge branch 'main' into feature-add-support-kimivl-model
liwenju0 Apr 23, 2025
725cbe1
Fix: In DeepseekV2AttentionMLA, update the processing of k_nope to en…
Apr 24, 2025
79c712b
fix init_tts argument error
Apr 24, 2025
5764a3e
Merge branch 'main' into feature-add-support-kimivl-model
zhaochenyang20 Apr 24, 2025
669d32a
fix lint error
Apr 24, 2025
38b5384
Rename the parameter prompt to input_text and update the related call…
Apr 24, 2025
1d9fb47
[Doc] add kimi-vl model document
Apr 24, 2025
782f225
add comments
Apr 24, 2025
9d502fb
Merge branch 'main' into feature-add-support-kimivl-model
liwenju0 Apr 25, 2025
45769f2
fix lint
Apr 25, 2025
1df0335
fix import vllm scaled_fp8_quant error
Apr 25, 2025
bd2d294
recover import vllm scaled_fp8_quant
Apr 25, 2025
1087387
Merge branch 'main' into feature-add-support-kimivl-model
liwenju0 Apr 25, 2025
0a5f9fb
add blobfile dependency
liwenju0 Apr 25, 2025
de3fd33
Merge branch 'main' into feature-add-support-kimivl-model
liwenju0 Apr 25, 2025
fb7841e
Merge branch 'main' into feature-add-support-kimivl-model
zhaochenyang20 Apr 26, 2025
268c48a
Merge branch 'main' into feature-add-support-kimivl-model
yizhang2077 Apr 26, 2025
50248ed
Merge branch 'main' into feature-add-support-kimivl-model
zhaochenyang20 Apr 26, 2025
6bddb59
remove \n and whitespace in regx to avoid error of xgrammar
Apr 28, 2025
686c288
Merge branch 'main' into feature-add-support-kimivl-model
liwenju0 Apr 28, 2025
b153234
Enhance the test cases to add checks for “vehicle” and “car” in the t…
Apr 28, 2025
b166c40
fix lint
Apr 28, 2025
76e51de
Merge branch 'main' into feature-add-support-kimivl-model
liwenju0 Apr 28, 2025
32da098
Feat: add support for thinking mode via chat_template_kwargs.enable_t…
minleminzui Apr 28, 2025
9ea328f
fix: fix the error where the content is None when reasoning and tool …
minleminzui Apr 28, 2025
fcc44a2
feat: Add fused moe triton config for qwen3 moe on h100 (#5833)
JustinTong0323 Apr 28, 2025
759eebe
fused moe triton tuning script support qwen3 (#5842)
BBuf Apr 28, 2025
d7e0740
feat: Add fused moe triton config for qwen3bf16 moe on h20 (#5839)
yhyang201 Apr 28, 2025
13071e4
[PD] support pd fake transfer for warmup (#5726)
whybeyoung Apr 28, 2025
c48f47b
[config] qwen3moe_tune_h20 fp8 tp4 (#5846)
whybeyoung Apr 28, 2025
b73a285
[Doc] Recover history of server_arguments.md (#5851)
Fridge003 Apr 28, 2025
7bc416d
feat: Add fused moe triton config for qwen3-30b-fp8 moe on h20 (#5850)
GeLee-Q Apr 28, 2025
791c409
[CI] test chunked prefill more (#5798)
merrymercy Apr 28, 2025
0b5bf81
ROCm: update AITER (#5816)
HaiShaw Apr 28, 2025
a154f9f
[Feat] QWen-1M context support[1/2]: Update block sparse attention ba…
FlamingoPg Apr 28, 2025
b1802ae
[Fix] Missing bootstrap_port field (#5823)
xutianyi1999 Apr 28, 2025
b8b2d9c
feat: update is_fa3_default_architecture (#5854)
zhyncs Apr 28, 2025
f385fb8
add fused moe config for qwen3moe fp8/bf16 (#5849)
yizhang2077 Apr 28, 2025
b0e2d5c
chore: bump v0.4.6.post1 (#5845)
zhyncs Apr 28, 2025
abe98a6
Support `max_completion_tokens` for OpenAIChatCompletions (#5857)
CatherineSue Apr 28, 2025
96bf47c
simplify fused_moe config logging (#5801)
BBuf Apr 29, 2025
16b07c3
[CI] tune the test order to warmup the server (#5860)
merrymercy Apr 29, 2025
bd7cada
Cutlass MLA decode - fix dtype error (#5868)
trevor-m Apr 29, 2025
d3445ce
cutlass 3.9 supported to improve fp8_blockwise_gemm (#5820)
BBuf Apr 29, 2025
7be88d1
[Feature] support auto chat template (#4949)
woodx9 Apr 29, 2025
ac9dd2c
Feat: support cuda graph for LoRA (#4115)
Qiaolin-Yu Apr 29, 2025
6783f3b
Merge branch 'main' into feature-add-support-kimivl-model
liwenju0 Apr 29, 2025
ca0a0ce
register chat tempate
Apr 29, 2025
8d8d7a4
Merge branch 'main' into feature-add-support-kimivl-model
zhaochenyang20 Apr 30, 2025
863d8b0
Merge branch 'main' into feature-add-support-kimivl-model
liwenju0 Apr 30, 2025
6f03d62
Merge branch 'main' into feature-add-support-kimivl-model
liwenju0 Apr 30, 2025
80f38a3
Merge branch 'main' into feature-add-support-kimivl-model
liwenju0 Apr 30, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/supported_models/vision_language_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,5 @@ python3 -m sglang.launch_server \
| **LLaVA** (v1.5 & v1.6) | *e.g.* `liuhaotian/llava-v1.5-13b` | `vicuna_v1.1` | Open vision-chat models that add an image encoder to LLaMA/Vicuna (e.g. LLaMA2 13B) for following multimodal instruction prompts. |
| **LLaVA-NeXT** (8B, 72B) | `lmms-lab/llava-next-72b` | `chatml-llava` | Improved LLaVA models (with an 8B Llama3 version and a 72B version) offering enhanced visual instruction-following and accuracy on multimodal benchmarks. |
| **LLaVA-OneVision** | `lmms-lab/llava-onevision-qwen2-7b-ov` | `chatml-llava` | Enhanced LLaVA variant integrating Qwen as the backbone; supports multiple images (and even video frames) as inputs via an OpenAI Vision API-compatible format. |
| **Gemma 3 (Multimodal)** | `google/gemma-3-4b-it` | `gemma-it` | Gemma 3’s larger models (4B, 12B, 27B) accept images (each image encoded as 256 tokens) alongside text in a combined 128K-token context. |
| **Gemma 3 (Multimodal)** | `google/gemma-3-4b-it` | `gemma-it` | Gemma 3’s larger models (4B, 12B, 27B) accept images (each image encoded as 256 tokens) alongside text in a combined 128K-token context. |
| **Kimi-VL** (A3B) | `moonshotai/Kimi-VL-A3B-Instruct` | `kimi-vl` | Kimi-VL is a multimodal model that can understand and generate text from images. |
1 change: 1 addition & 0 deletions python/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ runtime_common = [
"uvicorn",
"uvloop",
"xgrammar==0.1.17",
"blobfile==3.0.0"
]

srt = [
Expand Down
4 changes: 4 additions & 0 deletions python/sglang/srt/configs/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,15 @@
from sglang.srt.configs.deepseekvl2 import DeepseekVL2Config
from sglang.srt.configs.exaone import ExaoneConfig
from sglang.srt.configs.janus_pro import MultiModalityConfig
from sglang.srt.configs.kimi_vl import KimiVLConfig
from sglang.srt.configs.kimi_vl_moonvit import MoonViTConfig

__all__ = [
"ExaoneConfig",
"ChatGLMConfig",
"DbrxConfig",
"DeepseekVL2Config",
"MultiModalityConfig",
"KimiVLConfig",
"MoonViTConfig",
]
38 changes: 38 additions & 0 deletions python/sglang/srt/configs/kimi_vl.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# SPDX-License-Identifier: Apache-2.0
# Adapted from https://huggingface.co/moonshotai/Kimi-VL-A3B-Instruct/blob/main/configuration_kimi_vl.py
from typing import Optional, Union

from transformers.configuration_utils import PretrainedConfig

from sglang.srt.configs.deepseekvl2 import DeepseekV2Config
from sglang.srt.configs.kimi_vl_moonvit import MoonViTConfig


class KimiVLConfig(PretrainedConfig):
model_type = "kimi_vl"

def __init__(
self,
vision_config: Optional[Union[dict, MoonViTConfig]] = None,
text_config: Optional[Union[dict, DeepseekV2Config]] = None,
ignore_index: int = -100,
media_placeholder_token_id: int = 163605,
pad_token_id: int = 0,
**kwargs
):
if vision_config is None:
vision_config = MoonViTConfig()
elif isinstance(vision_config, dict):
vision_config = MoonViTConfig(**vision_config)
self.vision_config = vision_config

if text_config is None:
text_config = DeepseekV2Config()
elif isinstance(text_config, dict):
text_config = DeepseekV2Config(**text_config)
self.text_config = text_config

self.ignore_index = ignore_index
self.media_placeholder_token_id = media_placeholder_token_id

super().__init__(pad_token_id=pad_token_id, **kwargs)
32 changes: 32 additions & 0 deletions python/sglang/srt/configs/kimi_vl_moonvit.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# SPDX-License-Identifier: Apache-2.0
# Adapted from https://huggingface.co/moonshotai/Kimi-VL-A3B-Instruct/blob/main/configuration_kimi_vl.py
from transformers.configuration_utils import PretrainedConfig


class MoonViTConfig(PretrainedConfig):
model_type = "moonvit"

def __init__(
self,
patch_size: int = 14,
init_pos_emb_height: int = 64,
init_pos_emb_width: int = 64,
num_attention_heads: int = 16,
num_hidden_layers: int = 27,
hidden_size: int = 1152,
intermediate_size: int = 4304,
merge_kernel_size: tuple[int, int] = (2, 2),
**kwargs,
):
super().__init__(**kwargs)
self.patch_size = patch_size
# Positional embedding config
self.init_pos_emb_height = init_pos_emb_height
self.init_pos_emb_width = init_pos_emb_width
# Transformer config
self.num_hidden_layers = num_hidden_layers
self.num_attention_heads = num_attention_heads
self.hidden_size = hidden_size
self.intermediate_size = intermediate_size
# Patch merger config
self.merge_kernel_size = merge_kernel_size
8 changes: 8 additions & 0 deletions python/sglang/srt/configs/model_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,13 @@ def __init__(
self.attention_arch = AttentionArch.MLA
self.kv_lora_rank = self.hf_text_config.kv_lora_rank
self.qk_rope_head_dim = self.hf_text_config.qk_rope_head_dim
elif "KimiVLForConditionalGeneration" in self.hf_config.architectures:
self.head_dim = 256
self.attention_arch = AttentionArch.MLA
self.kv_lora_rank = self.hf_text_config.kv_lora_rank
self.qk_rope_head_dim = self.hf_text_config.qk_rope_head_dim
self.v_head_dim = self.hf_text_config.v_head_dim
self.qk_nope_head_dim = self.hf_text_config.qk_nope_head_dim
else:
self.attention_arch = AttentionArch.MHA

Expand Down Expand Up @@ -530,6 +537,7 @@ def is_generation_model(model_architectures: List[str], is_embedding: bool = Fal
"Qwen2VLForConditionalGeneration",
"Qwen2_5_VLForConditionalGeneration",
"CLIPModel",
"KimiVLForConditionalGeneration",
]


Expand Down
25 changes: 25 additions & 0 deletions python/sglang/srt/conversation.py
Original file line number Diff line number Diff line change
Expand Up @@ -806,6 +806,24 @@ def generate_chat_conv(
)
)

# Reference: https://huggingface.co/moonshotai/Kimi-VL-A3B-Instruct/blob/main/chat_template.jinja
register_conv_template(
Conversation(
name="kimi-vl",
system_message="You are a helpful assistant",
system_template="<|im_system|>system<|im_middle|>{system_message}",
roles=(
"<|im_user|>user<|im_middle|>",
"<|im_assistant|>assistant<|im_middle|>",
),
messages=[],
sep="<|im_end|>",
sep_style=SeparatorStyle.NO_COLON_SINGLE,
stop_str="<|im_end|>",
image_token="<|media_start|>image<|media_content|><|media_pad|><|media_end|>",
)
)


@register_conv_template_matching_function
def match_deepseek_janus_pro(model_path: str):
Expand Down Expand Up @@ -888,3 +906,10 @@ def match_openbmb_minicpm(model_path: str):
return "minicpmv"
elif "minicpm-o" in model_path:
return "minicpmo"


@register_conv_template_matching_function
def match_moonshot_kimivl(model_path: str):
model_path = model_path.lower()
if "kimi" in model_path and "vl" in model_path:
return "kimi-vl"
2 changes: 2 additions & 0 deletions python/sglang/srt/hf_transformers_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
DbrxConfig,
DeepseekVL2Config,
ExaoneConfig,
KimiVLConfig,
MultiModalityConfig,
)
from sglang.srt.connector import create_remote_connector
Expand All @@ -46,6 +47,7 @@
ExaoneConfig.model_type: ExaoneConfig,
DeepseekVL2Config.model_type: DeepseekVL2Config,
MultiModalityConfig.model_type: MultiModalityConfig,
KimiVLConfig.model_type: KimiVLConfig,
}

for name, cls in _CONFIG_REGISTRY.items():
Expand Down
73 changes: 73 additions & 0 deletions python/sglang/srt/managers/multimodal_processors/kimi_vl.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
import asyncio
import math
from typing import List, Union

import torch
from PIL import Image

from sglang.srt.managers.multimodal_processors.base_processor import (
BaseMultimodalProcessor as SGLangBaseProcessor,
)
from sglang.srt.managers.multimodal_processors.base_processor import (
MultimodalSpecialTokens,
)
from sglang.srt.managers.schedule_batch import Modality, MultimodalDataItem
from sglang.srt.models.kimi_vl import KimiVLForConditionalGeneration


# Compatible with KimiVLForConditionalGeneration
class KimiVLImageProcessor(SGLangBaseProcessor):
models = [KimiVLForConditionalGeneration]

def __init__(self, hf_config, server_args, _processor):
super().__init__(hf_config, server_args, _processor)
self.IMAGE_TOKEN = "<|media_pad|>"
self.im_token_id = _processor.tokenizer.convert_tokens_to_ids(self.IMAGE_TOKEN)

self.im_start = "<|media_start|>"
self.im_start_id = _processor.tokenizer.convert_tokens_to_ids(self.im_start)

self.im_end = "<|media_end|>"
self.im_end_id = _processor.tokenizer.convert_tokens_to_ids(self.im_end)

self.im_content = "<|media_content|>"
self.im_content_id = _processor.tokenizer.convert_tokens_to_ids(self.im_content)

async def process_mm_data_async(
self,
image_data: List[Union[str, bytes]],
input_text,
request_obj,
max_req_input_len,
*args,
**kwargs,
):
if not image_data:
return None
if isinstance(image_data, str):
image_data = [image_data]

base_output = self.load_mm_data(
prompt=input_text,
image_data=image_data,
multimodal_tokens=MultimodalSpecialTokens(image_token=self.IMAGE_TOKEN),
max_req_input_len=max_req_input_len,
)
ret = self.process_mm_data(
input_text=base_output.input_text,
images=base_output.images,
)
return {
"input_ids": ret["input_ids"].flatten().tolist(),
"mm_items": [
MultimodalDataItem(
pixel_values=ret["pixel_values"],
image_grid_thws=ret["image_grid_hws"],
modality=Modality.IMAGE,
)
],
"im_token_id": self.im_token_id,
"im_start_id": self.im_start_id,
"im_end_id": self.im_end_id,
"im_content_id": self.im_content_id,
}
5 changes: 4 additions & 1 deletion python/sglang/srt/models/deepseek_v2.py
Original file line number Diff line number Diff line change
Expand Up @@ -752,7 +752,7 @@ def forward_absorb(
q_nope_out = q_nope_out.transpose(0, 1)

k_nope = latent_cache[..., : self.kv_lora_rank]
k_nope = self.kv_a_layernorm(k_nope).unsqueeze(1)
k_nope = self.kv_a_layernorm(k_nope.contiguous()).unsqueeze(1)
k_pe = latent_cache[..., self.kv_lora_rank :].unsqueeze(1)

q_pe, k_pe = self.rotary_emb(positions, q_pe, k_pe)
Expand Down Expand Up @@ -1391,6 +1391,9 @@ def __init__(

self.dp_size = get_attention_dp_size()

def get_input_embeddings(self) -> torch.Tensor:
return self.embed_tokens

def forward(
self,
input_ids: torch.Tensor,
Expand Down
Loading
Loading