Skip to content

[Model] Add MiniCPM-V 4.6 support#24855

Merged
AgainstEntropy merged 2 commits into
sgl-project:mainfrom
AgainstEntropy:feat/minicpm-v
May 10, 2026
Merged

[Model] Add MiniCPM-V 4.6 support#24855
AgainstEntropy merged 2 commits into
sgl-project:mainfrom
AgainstEntropy:feat/minicpm-v

Conversation

@AgainstEntropy
Copy link
Copy Markdown
Collaborator

@AgainstEntropy AgainstEntropy commented May 9, 2026

Motivation

Add support for MiniCPM-V 4.6, the next iteration of the MiniCPM-V series. Compared to 4.5 (Qwen3 + SigLip + Perceiver-style resampler) the architecture changes are:

  • Mid-ViT compression: a 2x2 window-attention + 2x2 spatial fold inserted inside the SigLip encoder at config.insert_layer_id.
  • Post-encoder MLP merger (MiniCPMV_Merger) replacing the legacy Perceiver resampler.
  • Qwen3.5 hybrid backbone in the LLM tower.
  • config.downsample_mode toggles "16x" (mid-ViT + post merger, default) vs "4x" (skip mid-ViT, keep 4x more visual tokens).

Usage

Cookbook PR: #24876

sglang serve --model-path openbmb/MiniCPM-V-4_6 \
  --trust-remote-code \
  --dtype bfloat16 \
  --mem-fraction-static 0.15 \
  --mamba-scheduler-strategy extra_buffer \
  --reasoning-parser qwen3 \
  --host 0.0.0.0 --port 30000

Note --dtype bfloat16 is required here ckpt's config.json may not declare a torch_dtype / dtype; without --dtype bfloat16, sglang's default dtype-resolution path triggers a bf16/fp16 conditional-load mismatch in the GDN causal_conv1d Triton kernel. Reproduced on the current sgl-kernel 0.4.2.post1.

Example image+text request (OpenAI-compatible):

curl http://127.0.0.1:30000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openbmb/MiniCPM-V-4_6",
    "messages": [{"role": "user", "content": [
      {"type": "image_url", "image_url": {"url": "https://example.com/cat.jpg"}},
      {"type": "text", "text": "What is in this image?"}
    ]}],
    "chat_template_kwargs": {"enable_thinking": false}
  }'

chat_template_kwargs.enable_thinking toggles 4.6's reasoning mode: true (default) emits a <think>...</think> block routed to reasoning_content; false skips reasoning and routes the answer straight into content. Multi-image and video are passed as additional image_url / video_url parts in the same content list.

See more in the MiniCPM-V 4.6 cookbook.

Accuracy Tests

python benchmark/mmmu/bench_sglang.py --port 30000 --concurrency 32 --max-new-tokens 2048

MMMU val (900): T=0.6 sampling, #24084 extractor, ~37-39% across trials (consistent with the reference number on the same ckpt from MiniCPM team).

Speed Tests and Profiling

sglang.bench_serving on the open test ckpt, 1× H200, BF16, --mem-fraction-static 0.5, no TP/DP.

  • Text-only (random 1000 input / 1000 output tokens):
Concurrency Req/s Input tok/s Output tok/s Median TTFT Median TPOT Median E2E
1 1.34 816 565 104 ms 1.44 ms 590 ms
100 21.24 10,675 10,628 185 ms 9.16 ms 4.3 s
  • Vision (random 720p image + 128 input / 1024 output tokens, --chunked-prefill-size -1):
Concurrency Req/s Input tok/s Output tok/s Median TTFT Median TPOT Median E2E
1 0.97 75 411 403 ms 1.44 ms 898 ms
100 2.78 222 1,419 34.3 s 1.45 ms 35.3 s

Full commands and numbers will be in the MiniCPM-V 4.6 cookbook.

Checklist

Review and Merge Process

  1. Ping Merge Oncalls to start the process. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
  4. After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@AgainstEntropy
Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci

@github-actions github-actions Bot added the run-ci label May 9, 2026
Copy link
Copy Markdown
Collaborator

@JustinTong0323 JustinTong0323 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great Job!!

@AgainstEntropy
Copy link
Copy Markdown
Collaborator Author

AgainstEntropy commented May 9, 2026

/rerun-failed-ci, try again

@AgainstEntropy AgainstEntropy merged commit 9150e77 into sgl-project:main May 10, 2026
119 of 130 checks passed
ltcs11 added a commit to ltcs11/sglang that referenced this pull request May 11, 2026
* main: (87 commits)
  [Fix] Disable FlashInfer allreduce fusion under deterministic inference (sgl-project#24629)
  fix: STANDALONE spec-decode hidden-size mismatch crash (sgl-project#24217)
  Followup fix for Custom AR V2 in non NVL scenarios (sgl-project#24742)
  Fix reduce_scatterv producer contract for SUM_LEN (sgl-project#24785)
  [NPU]Documentation update for communications quantization feature (sgl-project#24668)
  [Session R3] Add routed_experts_start_len for absolute routing slice control (sgl-project#24851)
  [Model] Add MiniCPM-V 4.6 support (sgl-project#24855)
  Support Intern-S2-Preview (sgl-project#24875)
  [PD] Unify dsv4 dispatch with swa (sgl-project#24888)
  Optimize MHC pipeline: DeepGemm, fused norm, fused hc_head (sgl-project#24775)
  Fix PD bootstrap failure handling (sgl-project#24772)
  [Spec] Cleanup idle stub and shape-check patterns (sgl-project#24881)
  [Bug] Add dsv4 state_type branch to mooncake disaggregation (sgl-project#24878)
  [Spec V1] Split draft-extend phase from `EagleDraftInput` into new `EagleDraftExtendInput` (sgl-project#24859)
  [Gemma4] Optimize Gemm4 with fused Q/K/V RMSNorm + per-expert FP8 ckpt loader (sgl-project#24696)
  [spec decoding] support kimi-k2.5-eagle3-mla (sgl-project#24826)
  [SPEC V2] fix: skip stale state updates in spec-v2 overlap (sgl-project#23456)
  [RL] Call torch.cuda.empty_cache() for `in-place` pause mode to avoid OOM (sgl-project#24854)
  [diffusion] CI: add cache-dit CI tests (sgl-project#19213)
  [Utils] Make request dump robust to unpicklable server_args and large meta_info (sgl-project#24767)
  ...

# Conflicts:
#	python/sglang/srt/utils/common.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants