[Model] Add MiMo-V2-Flash support by Abatom · Pull Request #30836 · vllm-project/vllm

Abatom · 2025-12-17T04:08:43Z

Purpose

Add support for MiMo-V2-Flash.

Examples

Example 1

vllm serve XiaomiMiMo/MiMo-V2-Flash \
    --host 0.0.0.0 \
    --port 9001 \
    --seed 1024 \
    --served-model-name base_model \
    --tensor-parallel-size 4 \
    --trust-remote-code \
    --gpu-memory-utilization 0.9 \
    --generation-config vllm

Example 2

vllm serve XiaomiMiMo/MiMo-V2-Flash \
    --host 0.0.0.0 \
    --port 9001 \
    --seed 1024 \
    --served-model-name base_model \
    --data-parallel-size 2 \
    --tensor-parallel-size 4 \
    --enable-expert-parallel \
    --trust-remote-code \
    --gpu-memory-utilization 0.9 \
    --generation-config vllm

Accuracy

GSM8K

lm_eval \
    --model local-chat-completions \
    --tasks gsm8k \
    --num_fewshot 5 \
    --apply_chat_template \
    --model_args model=base_model,base_url=http://0.0.0.0:9001/v1/chat/completions,num_concurrent=100,max_retries=20,tokenized_requests=False,tokenizer_backend=none,max_gen_toks=256

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.9128|±  |0.0078|
|     |       |strict-match    |     5|exact_match|↑  |0.9075|±  |0.0080|

Co-authored-by: Jumiar <liuanqim10@126.com> Co-authored-by: Zyann7 <zyann7@outlook.com> Signed-off-by: Abatom <abzhonghua@gmail.com> Signed-off-by: Jumiar <liuanqim10@126.com> Signed-off-by: Zyann7 <zyann7@outlook.com>

Signed-off-by: Abatom <abzhonghua@gmail.com>

chatgpt-codex-connector · 2025-12-17T04:08:53Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

gemini-code-assist

Code Review

This pull request introduces support for the MiMo-V2-Flash model. The changes primarily involve adding the model definition in vllm/model_executor/models/mimo_v2_flash.py and making necessary adjustments in vllm/model_executor/layers/linear.py to accommodate features like variable value head sizes and FP8 block shape mismatches. My review has identified a critical issue in the Mixture-of-Experts (MoE) layer detection logic that could lead to incorrect model configurations, and a high-severity issue regarding inconsistent access to the model configuration. Addressing these points will improve the robustness and correctness of the new model implementation.

vllm/model_executor/models/mimo_v2_flash.py

jeejeelee

Some init comments, thank you for contribution

vllm/model_executor/layers/linear.py

vllm/model_executor/models/mimo_v2_flash.py

mergify · 2025-12-17T04:27:25Z

Hi @Abatom, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

jeejeelee · 2025-12-17T06:29:07Z

To add a new LLM model, you also need to :

Add the model to the documentation
Register the model in test registry

Signed-off-by: Jumiar <liuanqim10@126.com>

mergify · 2025-12-17T06:59:19Z

Hi @Abatom, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Jumiar <liuanqim10@126.com>

mergify · 2025-12-17T07:22:54Z

Hi @Abatom, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Jumiar <liuanqim10@126.com>

mergify · 2025-12-17T07:44:10Z

Hi @Abatom, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Abatom <abzhonghua@gmail.com>

mergify · 2025-12-17T08:40:58Z

Documentation preview: https://vllm--30836.org.readthedocs.build/en/30836/

Abatom · 2025-12-17T08:50:06Z

To add a new LLM model, you also need to :

Add the model to the documentation

Register the model in test registry

Done!

vllm/model_executor/models/mimo_v2_flash.py

LucasWilkinson · 2025-12-17T17:18:33Z

FYI: #28775 also uses diff q and kv head dims; we can probably support this more naturally in a future follow-up PR

Signed-off-by: Abatom <abzhonghua@gmail.com>

vllm/model_executor/models/mimo_v2_flash.py

Isotr0py · 2025-12-18T15:34:36Z

vllm/model_executor/layers/quantization/utils/fp8_utils.py

+    if getattr(layer, "allow_fp8_block_shape_mismatch", False):
+        logger.debug(
+            "Skipping FP8 block shape validation for layer %s due to detected"
+            " mismatch allowance.",
+            getattr(layer, "prefix", "<unknown>"),
+        )
+        return


I'm a bit worried that this will cause unexpected behavior for FP8 kernel if we disabled block shape check.

Perhaps we should improve the block shape check for Mimo-V2's edge case instead of just skipping it.

Perhaps @mgoin can give more insights?

@Isotr0py We tried removing the code above and got the following error.

ValueError: Weight output_partition_size = 192 is not divisible by weight quantization block_n = 128.

Hmm we support weights that aren't divisible by 128 for other block fp8 models fine, such as kv_a_proj in deepseek, I wonder if it is a specific fused layer

vllm/model_executor/models/mimo_v2_flash.py

jeejeelee · 2025-12-18T15:53:28Z

Here are the GSM8K results from my local testing

local-chat-completions (model=XiaomiMiMo/MiMo-V2-Flash,base_url=http://0.0.0.0:8000/v1/chat/completions,num_concurrent=100,max_retries=20,tokenized_requests=False,tokenizer_backend=none,max_gen_toks=256), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.9075|±  |0.0080|
|     |       |strict-match    |     5|exact_match|↑  |0.9022|±  |0.0082|

Signed-off-by: Abatom <abzhonghua@gmail.com>

mergify · 2025-12-19T04:08:44Z

Hi @Abatom, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Abatom <abzhonghua@gmail.com>

Signed-off-by: Jumiar <liuanqim10@126.com>

mergify · 2025-12-19T10:51:17Z

Hi @Abatom, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Abatom <abzhonghua@gmail.com>

jeejeelee

Thank you for contribution, let's land this model first and continue improving it in subsequent PRs

Isotr0py

Unlock quantization tests CI to see if other models are fine with the temporary hacky fp8 patch.

mgoin · 2025-12-22T19:26:37Z

vllm/model_executor/models/mimo_v2_flash.py

+                if param_name == "qkv_proj" and shard_id == "v":
+                    v_scale = (
+                        self.v_scale
+                        if self.v_scale is not None
+                        else getattr(self.config, "attention_value_scale", None)
+                    )
+                    if v_scale is not None and (
+                        name.endswith("weight_scale_inv") or name.endswith(".bias")
+                    ):
+                        loaded_weight *= float(v_scale)


I don't think this is valid. When I revert this to apply v = v * self.v_scale before attention in forward pass, I see my gsm8k eval score improve from ~74% to 78%

Signed-off-by: Abatom <abzhonghua@gmail.com> Signed-off-by: Jumiar <liuanqim10@126.com> Signed-off-by: Zyann7 <zyann7@outlook.com> Co-authored-by: Jumiar <liuanqim10@126.com> Co-authored-by: Zyann7 <zyann7@outlook.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>

Signed-off-by: Abatom <abzhonghua@gmail.com> Signed-off-by: Jumiar <liuanqim10@126.com> Signed-off-by: Zyann7 <zyann7@outlook.com> Co-authored-by: Jumiar <liuanqim10@126.com> Co-authored-by: Zyann7 <zyann7@outlook.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Ubuntu <mjtaheri68@gmail.com>

### What this PR does / why we need it? Fix vllm break in the pr: 1. [Add MiMo-V2-Flash support] (vllm-project/vllm#30836) ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Co-authored-by: zxwang [1476209578@qq.com](mailto:1476209578@qq.com) - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@5fbfa8d --------- Signed-off-by: leo-pony <nengjunma@outlook.com> Signed-off-by: zxwang <1476209578@qq.com> Co-authored-by: zxwang <1476209578@qq.com>

Signed-off-by: Abatom <abzhonghua@gmail.com> Signed-off-by: Jumiar <liuanqim10@126.com> Signed-off-by: Zyann7 <zyann7@outlook.com> Co-authored-by: Jumiar <liuanqim10@126.com> Co-authored-by: Zyann7 <zyann7@outlook.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

Signed-off-by: Abatom <abzhonghua@gmail.com> Signed-off-by: Jumiar <liuanqim10@126.com> Signed-off-by: Zyann7 <zyann7@outlook.com> Co-authored-by: Jumiar <liuanqim10@126.com> Co-authored-by: Zyann7 <zyann7@outlook.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>

### What this PR does / why we need it? Fix vllm break in the pr: 1. [Add MiMo-V2-Flash support] (vllm-project/vllm#30836) ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Co-authored-by: zxwang [1476209578@qq.com](mailto:1476209578@qq.com) - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@5fbfa8d --------- Signed-off-by: leo-pony <nengjunma@outlook.com> Signed-off-by: zxwang <1476209578@qq.com> Co-authored-by: zxwang <1476209578@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

Abatom and others added 2 commits December 17, 2025 11:39

add mimo v2 flash

4aedbe6

Co-authored-by: Jumiar <liuanqim10@126.com> Co-authored-by: Zyann7 <zyann7@outlook.com> Signed-off-by: Abatom <abzhonghua@gmail.com> Signed-off-by: Jumiar <liuanqim10@126.com> Signed-off-by: Zyann7 <zyann7@outlook.com>

rm note

2aadb1a

Signed-off-by: Abatom <abzhonghua@gmail.com>

Abatom requested review from mgoin, pavanimajety, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners December 17, 2025 04:08

mergify bot added the new-model Requests to new models label Dec 17, 2025

gemini-code-assist bot reviewed Dec 17, 2025

View reviewed changes

vllm/model_executor/models/mimo_v2_flash.py Show resolved Hide resolved

vllm/model_executor/models/mimo_v2_flash.py Show resolved Hide resolved

jeejeelee reviewed Dec 17, 2025

View reviewed changes

rm Optional

9d9a24f

Signed-off-by: Jumiar <liuanqim10@126.com>

rm Optional

67830f4

Signed-off-by: Jumiar <liuanqim10@126.com>

mv load_weights get_expert_mapping

5065a1d

Signed-off-by: Jumiar <liuanqim10@126.com>

Abatom added 3 commits December 17, 2025 15:47

format

32f5cc5

Signed-off-by: Abatom <abzhonghua@gmail.com>

format

51bc3e8

Signed-off-by: Abatom <abzhonghua@gmail.com>

doc and test

3cdbef0

Signed-off-by: Abatom <abzhonghua@gmail.com>

Abatom requested review from DarkLight1337 and ywang96 as code owners December 17, 2025 08:40

mergify bot added the documentation Improvements or additions to documentation label Dec 17, 2025

Isotr0py reviewed Dec 17, 2025

View reviewed changes

vllm/model_executor/models/mimo_v2_flash.py Outdated Show resolved Hide resolved

vllm/model_executor/models/mimo_v2_flash.py Outdated Show resolved Hide resolved

LucasWilkinson mentioned this pull request Dec 17, 2025

[Model] Add support for openPangu moe model #28775

Merged

5 tasks

moe

791f121

Signed-off-by: Abatom <abzhonghua@gmail.com>

Isotr0py reviewed Dec 18, 2025

View reviewed changes

jeejeelee reviewed Dec 18, 2025

View reviewed changes

vllm/model_executor/models/mimo_v2_flash.py Outdated Show resolved Hide resolved

vllm/model_executor/models/mimo_v2_flash.py Outdated Show resolved Hide resolved

heheda12345 mentioned this pull request Dec 18, 2025

[Core] Parse vLLM engine required fields from hf_config to model_arch_config #28454

Merged

3 tasks

gate

d91cac9

Signed-off-by: Abatom <abzhonghua@gmail.com>

Abatom and others added 2 commits December 19, 2025 12:16

format

384277c

Signed-off-by: Abatom <abzhonghua@gmail.com>

mv v_scale

3ec888c

Signed-off-by: Jumiar <liuanqim10@126.com>

Abatom added 2 commits December 19, 2025 19:19

format

4299f42

Signed-off-by: Abatom <abzhonghua@gmail.com>

v_scale

3fa4cc4

Signed-off-by: Abatom <abzhonghua@gmail.com>

jeejeelee approved these changes Dec 19, 2025

View reviewed changes

Isotr0py added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 19, 2025

Isotr0py approved these changes Dec 19, 2025

View reviewed changes

Merge branch 'main' into mimo-v2-flash

b773c07

jeejeelee enabled auto-merge (squash) December 19, 2025 15:17

jeejeelee merged commit 969bbc7 into vllm-project:main Dec 19, 2025
63 checks passed

mgoin reviewed Dec 22, 2025

View reviewed changes

Toneymiller mentioned this pull request Dec 24, 2025

Update 12 23 vllm-project/vllm-ascend#5268

Closed

leo-pony mentioned this pull request Dec 24, 2025

Update vllm pin to 12.24 vllm-project/vllm-ascend#5307

Merged

Uh oh!

Conversation

Abatom commented Dec 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Examples

Example 1

Example 2

Accuracy

Uh oh!

chatgpt-codex-connector bot commented Dec 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

jeejeelee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Dec 17, 2025

Uh oh!

jeejeelee commented Dec 17, 2025

Uh oh!

mergify bot commented Dec 17, 2025

Uh oh!

mergify bot commented Dec 17, 2025

Uh oh!

mergify bot commented Dec 17, 2025

Uh oh!

mergify bot commented Dec 17, 2025

Uh oh!

Abatom commented Dec 17, 2025

Uh oh!

Uh oh!

Uh oh!

LucasWilkinson commented Dec 17, 2025

Uh oh!

Uh oh!

Uh oh!

Isotr0py Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Abatom Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

mgoin Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jeejeelee commented Dec 18, 2025

Uh oh!

mergify bot commented Dec 19, 2025

Uh oh!

mergify bot commented Dec 19, 2025

Uh oh!

jeejeelee left a comment

Choose a reason for hiding this comment

Uh oh!

Isotr0py left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mgoin Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Abatom commented Dec 17, 2025 •

edited by github-actions bot

Loading

Isotr0py Dec 18, 2025 •

edited

Loading

Isotr0py left a comment •

edited

Loading