Skip to content

[Model] Add MiMo-V2-Flash support#30836

Merged
jeejeelee merged 20 commits intovllm-project:mainfrom
Abatom:mimo-v2-flash
Dec 19, 2025
Merged

[Model] Add MiMo-V2-Flash support#30836
jeejeelee merged 20 commits intovllm-project:mainfrom
Abatom:mimo-v2-flash

Conversation

@Abatom
Copy link
Contributor

@Abatom Abatom commented Dec 17, 2025

Purpose

Add support for MiMo-V2-Flash.

Examples

Example 1

vllm serve XiaomiMiMo/MiMo-V2-Flash \
    --host 0.0.0.0 \
    --port 9001 \
    --seed 1024 \
    --served-model-name base_model \
    --tensor-parallel-size 4 \
    --trust-remote-code \
    --gpu-memory-utilization 0.9 \
    --generation-config vllm

Example 2

vllm serve XiaomiMiMo/MiMo-V2-Flash \
    --host 0.0.0.0 \
    --port 9001 \
    --seed 1024 \
    --served-model-name base_model \
    --data-parallel-size 2 \
    --tensor-parallel-size 4 \
    --enable-expert-parallel \
    --trust-remote-code \
    --gpu-memory-utilization 0.9 \
    --generation-config vllm

Accuracy

GSM8K

lm_eval \
    --model local-chat-completions \
    --tasks gsm8k \
    --num_fewshot 5 \
    --apply_chat_template \
    --model_args model=base_model,base_url=http://0.0.0.0:9001/v1/chat/completions,num_concurrent=100,max_retries=20,tokenized_requests=False,tokenizer_backend=none,max_gen_toks=256
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.9128|±  |0.0078|
|     |       |strict-match    |     5|exact_match|↑  |0.9075|±  |0.0080|

Abatom and others added 2 commits December 17, 2025 11:39
Co-authored-by: Jumiar <liuanqim10@126.com>
Co-authored-by: Zyann7 <zyann7@outlook.com>
Signed-off-by: Abatom <abzhonghua@gmail.com>
Signed-off-by: Jumiar <liuanqim10@126.com>
Signed-off-by: Zyann7 <zyann7@outlook.com>
Signed-off-by: Abatom <abzhonghua@gmail.com>
@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@mergify mergify bot added the new-model Requests to new models label Dec 17, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the MiMo-V2-Flash model. The changes primarily involve adding the model definition in vllm/model_executor/models/mimo_v2_flash.py and making necessary adjustments in vllm/model_executor/layers/linear.py to accommodate features like variable value head sizes and FP8 block shape mismatches. My review has identified a critical issue in the Mixture-of-Experts (MoE) layer detection logic that could lead to incorrect model configurations, and a high-severity issue regarding inconsistent access to the model configuration. Addressing these points will improve the robustness and correctness of the new model implementation.

Copy link
Collaborator

@jeejeelee jeejeelee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some init comments, thank you for contribution

@mergify
Copy link

mergify bot commented Dec 17, 2025

Hi @Abatom, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@jeejeelee
Copy link
Collaborator

To add a new LLM model, you also need to :

Signed-off-by: Jumiar <liuanqim10@126.com>
@mergify
Copy link

mergify bot commented Dec 17, 2025

Hi @Abatom, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Jumiar <liuanqim10@126.com>
@mergify
Copy link

mergify bot commented Dec 17, 2025

Hi @Abatom, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Jumiar <liuanqim10@126.com>
@mergify
Copy link

mergify bot commented Dec 17, 2025

Hi @Abatom, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Abatom <abzhonghua@gmail.com>
Signed-off-by: Abatom <abzhonghua@gmail.com>
Signed-off-by: Abatom <abzhonghua@gmail.com>
@mergify
Copy link

mergify bot commented Dec 17, 2025

Documentation preview: https://vllm--30836.org.readthedocs.build/en/30836/

@mergify mergify bot added the documentation Improvements or additions to documentation label Dec 17, 2025
@Abatom
Copy link
Contributor Author

Abatom commented Dec 17, 2025

To add a new LLM model, you also need to :

Done!

@LucasWilkinson
Copy link
Collaborator

FYI: #28775 also uses diff q and kv head dims; we can probably support this more naturally in a future follow-up PR

Signed-off-by: Abatom <abzhonghua@gmail.com>
Comment on lines +1255 to +1261
if getattr(layer, "allow_fp8_block_shape_mismatch", False):
logger.debug(
"Skipping FP8 block shape validation for layer %s due to detected"
" mismatch allowance.",
getattr(layer, "prefix", "<unknown>"),
)
return
Copy link
Member

@Isotr0py Isotr0py Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried that this will cause unexpected behavior for FP8 kernel if we disabled block shape check.

Perhaps we should improve the block shape check for Mimo-V2's edge case instead of just skipping it.

Perhaps @mgoin can give more insights?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Isotr0py We tried removing the code above and got the following error.

ValueError: Weight output_partition_size = 192 is not divisible by weight quantization block_n = 128.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm we support weights that aren't divisible by 128 for other block fp8 models fine, such as kv_a_proj in deepseek, I wonder if it is a specific fused layer

@jeejeelee
Copy link
Collaborator

Here are the GSM8K results from my local testing

local-chat-completions (model=XiaomiMiMo/MiMo-V2-Flash,base_url=http://0.0.0.0:8000/v1/chat/completions,num_concurrent=100,max_retries=20,tokenized_requests=False,tokenizer_backend=none,max_gen_toks=256), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.9075|±  |0.0080|
|     |       |strict-match    |     5|exact_match|↑  |0.9022|±  |0.0082|

Signed-off-by: Abatom <abzhonghua@gmail.com>
@mergify
Copy link

mergify bot commented Dec 19, 2025

Hi @Abatom, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Abatom and others added 2 commits December 19, 2025 12:16
Signed-off-by: Abatom <abzhonghua@gmail.com>
Signed-off-by: Jumiar <liuanqim10@126.com>
@mergify
Copy link

mergify bot commented Dec 19, 2025

Hi @Abatom, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Abatom <abzhonghua@gmail.com>
Signed-off-by: Abatom <abzhonghua@gmail.com>
Copy link
Collaborator

@jeejeelee jeejeelee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for contribution, let's land this model first and continue improving it in subsequent PRs

@Isotr0py Isotr0py added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 19, 2025
Copy link
Member

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlock quantization tests CI to see if other models are fine with the temporary hacky fp8 patch.

@jeejeelee jeejeelee enabled auto-merge (squash) December 19, 2025 15:17
@jeejeelee jeejeelee merged commit 969bbc7 into vllm-project:main Dec 19, 2025
63 checks passed
Comment on lines +609 to +618
if param_name == "qkv_proj" and shard_id == "v":
v_scale = (
self.v_scale
if self.v_scale is not None
else getattr(self.config, "attention_value_scale", None)
)
if v_scale is not None and (
name.endswith("weight_scale_inv") or name.endswith(".bias")
):
loaded_weight *= float(v_scale)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is valid. When I revert this to apply v = v * self.v_scale before attention in forward pass, I see my gsm8k eval score improve from ~74% to 78%

yugong333 pushed a commit to yugong333/vllm that referenced this pull request Dec 22, 2025
Signed-off-by: Abatom <abzhonghua@gmail.com>
Signed-off-by: Jumiar <liuanqim10@126.com>
Signed-off-by: Zyann7 <zyann7@outlook.com>
Co-authored-by: Jumiar <liuanqim10@126.com>
Co-authored-by: Zyann7 <zyann7@outlook.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Majid-Taheri pushed a commit to Majid-Taheri/vllm that referenced this pull request Dec 23, 2025
Signed-off-by: Abatom <abzhonghua@gmail.com>
Signed-off-by: Jumiar <liuanqim10@126.com>
Signed-off-by: Zyann7 <zyann7@outlook.com>
Co-authored-by: Jumiar <liuanqim10@126.com>
Co-authored-by: Zyann7 <zyann7@outlook.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Ubuntu <mjtaheri68@gmail.com>
wangxiyuan pushed a commit to vllm-project/vllm-ascend that referenced this pull request Dec 24, 2025
### What this PR does / why we need it?
Fix vllm break in the pr:
1. [Add MiMo-V2-Flash support]
(vllm-project/vllm#30836)

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Co-authored-by: zxwang [1476209578@qq.com](mailto:1476209578@qq.com)

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@5fbfa8d

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>
Signed-off-by: zxwang <1476209578@qq.com>
Co-authored-by: zxwang <1476209578@qq.com>
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
Signed-off-by: Abatom <abzhonghua@gmail.com>
Signed-off-by: Jumiar <liuanqim10@126.com>
Signed-off-by: Zyann7 <zyann7@outlook.com>
Co-authored-by: Jumiar <liuanqim10@126.com>
Co-authored-by: Zyann7 <zyann7@outlook.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
Signed-off-by: Abatom <abzhonghua@gmail.com>
Signed-off-by: Jumiar <liuanqim10@126.com>
Signed-off-by: Zyann7 <zyann7@outlook.com>
Co-authored-by: Jumiar <liuanqim10@126.com>
Co-authored-by: Zyann7 <zyann7@outlook.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
### What this PR does / why we need it?
Fix vllm break in the pr:
1. [Add MiMo-V2-Flash support]
(vllm-project/vllm#30836)

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Co-authored-by: zxwang [1476209578@qq.com](mailto:1476209578@qq.com)

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@5fbfa8d

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>
Signed-off-by: zxwang <1476209578@qq.com>
Co-authored-by: zxwang <1476209578@qq.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
### What this PR does / why we need it?
Fix vllm break in the pr:
1. [Add MiMo-V2-Flash support]
(vllm-project/vllm#30836)

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Co-authored-by: zxwang [1476209578@qq.com](mailto:1476209578@qq.com)

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@5fbfa8d

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>
Signed-off-by: zxwang <1476209578@qq.com>
Co-authored-by: zxwang <1476209578@qq.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation new-model Requests to new models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants