[Model] Add MiMo-V2-Flash support#30836
Conversation
Co-authored-by: Jumiar <liuanqim10@126.com> Co-authored-by: Zyann7 <zyann7@outlook.com> Signed-off-by: Abatom <abzhonghua@gmail.com> Signed-off-by: Jumiar <liuanqim10@126.com> Signed-off-by: Zyann7 <zyann7@outlook.com>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
There was a problem hiding this comment.
Code Review
This pull request introduces support for the MiMo-V2-Flash model. The changes primarily involve adding the model definition in vllm/model_executor/models/mimo_v2_flash.py and making necessary adjustments in vllm/model_executor/layers/linear.py to accommodate features like variable value head sizes and FP8 block shape mismatches. My review has identified a critical issue in the Mixture-of-Experts (MoE) layer detection logic that could lead to incorrect model configurations, and a high-severity issue regarding inconsistent access to the model configuration. Addressing these points will improve the robustness and correctness of the new model implementation.
jeejeelee
left a comment
There was a problem hiding this comment.
Some init comments, thank you for contribution
|
Hi @Abatom, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
To add a new LLM model, you also need to :
|
Signed-off-by: Jumiar <liuanqim10@126.com>
|
Hi @Abatom, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: Jumiar <liuanqim10@126.com>
|
Hi @Abatom, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: Jumiar <liuanqim10@126.com>
|
Hi @Abatom, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
Documentation preview: https://vllm--30836.org.readthedocs.build/en/30836/ |
Done! |
|
FYI: #28775 also uses diff q and kv head dims; we can probably support this more naturally in a future follow-up PR |
| if getattr(layer, "allow_fp8_block_shape_mismatch", False): | ||
| logger.debug( | ||
| "Skipping FP8 block shape validation for layer %s due to detected" | ||
| " mismatch allowance.", | ||
| getattr(layer, "prefix", "<unknown>"), | ||
| ) | ||
| return |
There was a problem hiding this comment.
I'm a bit worried that this will cause unexpected behavior for FP8 kernel if we disabled block shape check.
Perhaps we should improve the block shape check for Mimo-V2's edge case instead of just skipping it.
Perhaps @mgoin can give more insights?
There was a problem hiding this comment.
@Isotr0py We tried removing the code above and got the following error.
ValueError: Weight output_partition_size = 192 is not divisible by weight quantization block_n = 128.
There was a problem hiding this comment.
Hmm we support weights that aren't divisible by 128 for other block fp8 models fine, such as kv_a_proj in deepseek, I wonder if it is a specific fused layer
|
Here are the GSM8K results from my local testing |
|
Hi @Abatom, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: Jumiar <liuanqim10@126.com>
|
Hi @Abatom, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
jeejeelee
left a comment
There was a problem hiding this comment.
Thank you for contribution, let's land this model first and continue improving it in subsequent PRs
| if param_name == "qkv_proj" and shard_id == "v": | ||
| v_scale = ( | ||
| self.v_scale | ||
| if self.v_scale is not None | ||
| else getattr(self.config, "attention_value_scale", None) | ||
| ) | ||
| if v_scale is not None and ( | ||
| name.endswith("weight_scale_inv") or name.endswith(".bias") | ||
| ): | ||
| loaded_weight *= float(v_scale) |
There was a problem hiding this comment.
I don't think this is valid. When I revert this to apply v = v * self.v_scale before attention in forward pass, I see my gsm8k eval score improve from ~74% to 78%
Signed-off-by: Abatom <abzhonghua@gmail.com> Signed-off-by: Jumiar <liuanqim10@126.com> Signed-off-by: Zyann7 <zyann7@outlook.com> Co-authored-by: Jumiar <liuanqim10@126.com> Co-authored-by: Zyann7 <zyann7@outlook.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Abatom <abzhonghua@gmail.com> Signed-off-by: Jumiar <liuanqim10@126.com> Signed-off-by: Zyann7 <zyann7@outlook.com> Co-authored-by: Jumiar <liuanqim10@126.com> Co-authored-by: Zyann7 <zyann7@outlook.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Ubuntu <mjtaheri68@gmail.com>
### What this PR does / why we need it? Fix vllm break in the pr: 1. [Add MiMo-V2-Flash support] (vllm-project/vllm#30836) ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Co-authored-by: zxwang [1476209578@qq.com](mailto:1476209578@qq.com) - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@5fbfa8d --------- Signed-off-by: leo-pony <nengjunma@outlook.com> Signed-off-by: zxwang <1476209578@qq.com> Co-authored-by: zxwang <1476209578@qq.com>
Signed-off-by: Abatom <abzhonghua@gmail.com> Signed-off-by: Jumiar <liuanqim10@126.com> Signed-off-by: Zyann7 <zyann7@outlook.com> Co-authored-by: Jumiar <liuanqim10@126.com> Co-authored-by: Zyann7 <zyann7@outlook.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
Signed-off-by: Abatom <abzhonghua@gmail.com> Signed-off-by: Jumiar <liuanqim10@126.com> Signed-off-by: Zyann7 <zyann7@outlook.com> Co-authored-by: Jumiar <liuanqim10@126.com> Co-authored-by: Zyann7 <zyann7@outlook.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
### What this PR does / why we need it? Fix vllm break in the pr: 1. [Add MiMo-V2-Flash support] (vllm-project/vllm#30836) ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Co-authored-by: zxwang [1476209578@qq.com](mailto:1476209578@qq.com) - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@5fbfa8d --------- Signed-off-by: leo-pony <nengjunma@outlook.com> Signed-off-by: zxwang <1476209578@qq.com> Co-authored-by: zxwang <1476209578@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? Fix vllm break in the pr: 1. [Add MiMo-V2-Flash support] (vllm-project/vllm#30836) ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Co-authored-by: zxwang [1476209578@qq.com](mailto:1476209578@qq.com) - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@5fbfa8d --------- Signed-off-by: leo-pony <nengjunma@outlook.com> Signed-off-by: zxwang <1476209578@qq.com> Co-authored-by: zxwang <1476209578@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
Purpose
Add support for MiMo-V2-Flash.
Examples
Example 1
Example 2
Accuracy
GSM8K