【Hackathon 9th No.93】Add Minimax-m1 for FastDeploy 非对齐版 #4629

ZhijunLStudio · 2025-10-28T12:31:32Z

背景

本次 Pull Request 的目标是为 FastDeploy 新增对 MiniMax-M1 模型的初步支持。该模型采用了一种混合架构，结合了线性和注意力（Linear Attention）层与标准的分组查询注意力（Grouped-Query Attention, GQA）层。本次提交包含了模型的定义、用于线性和注意力的 Triton 自定义核函数，以及在 Model Runner 中的集成。

目前的主要工作是完成模型的功能实现，并与 vLLM 的实现进行精度对齐。

修改内容

模型定义: 新增 fastdeploy/model_executor/models/minimax_m1.py 文件，用于定义模型结构和前向传播逻辑。
自定义 Kernel: 在 ops/triton_ops/ 目录下新增了 minimax_mamba_ops.py 和 minimax_mamba_kernels.py，用于支持类 Mamba 结构的线性和注意力机制。
RoPE 集成: 修改了 rotary_embedding.py，为 MiniMax-M1 模型中的 GQA 层应用 GLM 风格的旋转位置编码（RoPE）。
状态缓存: 更新了 gpu_model_runner.py 和 forward_meta.py，以支持和管理线性和注意力层所需的状态缓存（linear_attn_caches）。
配置更新: 修改了 config.py，加入了模型相关的配置项。

使用方法

模型在单机 8 卡环境下进行测试。

from fastdeploy.engine.sampling_params import SamplingParams
from fastdeploy.entrypoints.llm import LLM

model_name_or_path = "/home/aistudio/config_folder"

# 超参设置
sampling_params = SamplingParams(temperature=0.1, max_tokens=30)
llm = LLM(model=model_name_or_path, tensor_parallel_size=8, load_choices="default_v1")
output = llm.generate(prompts="who are you？", use_tqdm=True, sampling_params=sampling_params)

print(output)

精度测试

逐层与 vLLM 进行精度对齐。该模型共有 80 层，本次调试重点关注前 8 层（7 层线性和注意力 + 1 层 GQA）。

第一部分：线性和注意力层（0-7层）精度对齐

前 7 个线性和注意力层的输出与 vLLM 表现出高度的精度一致性。以下是第 7 层（最后一个GQA）的日志，证明了在注意力计算之前，Q、K、V 张量的数值与 vLLM 基本吻合。

第 7 层：RoPE/Attention 前的 QKV 张量对比

框架	张量名称	均值	标准差	形状	备注
FastDeploy	`After_QKV_Proj_Combined`	`-0.092525`	`1.627345`	`[4, 2560]`	✅ 精度对齐
vLLM	`After_QKV_Proj_Combined`	`-0.092362`	`1.624597`	`[4, 2560]`	(基准)
FastDeploy	`Q_BeforeRoPE`	`-0.096660`	`1.136759`	`[4, 2048]`	✅ 精度对齐
vLLM	`Q_BeforeRoPE`	`-0.096524`	`1.135294`	`[4, 2048]`	(基准)
FastDeploy	`K_BeforeRoPE`	`-0.151537`	`4.017039`	`[4, 256]`	✅ 精度对齐
vLLM	`K_BeforeRoPE`	`-0.151061`	`4.009206`	`[4, 256]`	(基准)

这证实了线性和注意力的实现以及之前所有层的计算是正确的。

第二部分：GQA 层（第8层）精度不一致

问题出现在第 8 层，这是模型中的第一个 GQA 层。

问题描述: 在第 8 层的 QKV 投影（Projection）之后，张量的值是正确的、非零的。然而，在对 Q 和 K 张量应用 GLM 风格的旋转位置编码（RoPE）后，这两个张量的值全部变成了 0。这直接导致了后续注意力计算的输出错误。
初步推断: 问题很可能出在 GlmRotaryEmbedding 的具体实现逻辑中，当它被应用于 MiniMax-M1 的 GQA 层的特定条件下时出现了错误。进入 RoPE 函数的输入张量是正确的，但输出不正确。

paddle-bot · 2025-10-28T12:31:40Z

Thanks for your contribution!

paddle-bot bot added the contributor External developers label Oct 28, 2025

luotao1 mentioned this pull request Oct 29, 2025

【Hackathon 9th】开源贡献个人挑战赛 PaddlePaddle/Paddle#74773

Open

ZhijunLStudio added 10 commits October 31, 2025 16:21

first commit

40268db

remove large file

bec4fbe

remove some print

3b64896

remove rotary print

b6516b6

add 2_After_Attention print

e7ec2ab

Delete Chinese comments

a2598e9

modify model_name_or_path

6997389

Delete a space

92c71d4

Executable code

891544c

Temporary code storage

e612bd6

ZhijunLStudio force-pushed the minimax-1023 branch from f41fd23 to e612bd6 Compare October 31, 2025 08:23

luotao1 added the PaddlePaddle Hackathon label Nov 3, 2025

luotao1 assigned luotao1 and chang-wenbin Nov 3, 2025

ZhijunLStudio added 6 commits November 4, 2025 11:25

feat: Replace einops with paddle ops in mamba ops

3b18acc

attention is not 0

f66ebac

add print

e47f441

Merge branch 'develop' into minimax-1023

0e7deb7

Merge into the new version and run successfully

bd1ea18

Linear Attention Alignment at 1121

a30233b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

【Hackathon 9th No.93】Add Minimax-m1 for FastDeploy 非对齐版 #4629

【Hackathon 9th No.93】Add Minimax-m1 for FastDeploy 非对齐版 #4629

ZhijunLStudio commented Oct 28, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

【Hackathon 9th No.93】Add Minimax-m1 for FastDeploy 非对齐版 #4629

Are you sure you want to change the base?

【Hackathon 9th No.93】Add Minimax-m1 for FastDeploy 非对齐版 #4629

Conversation

ZhijunLStudio commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

背景

修改内容

使用方法

精度测试

第一部分：线性和注意力层（0-7层）精度对齐

第二部分：GQA 层（第8层）精度不一致

Uh oh!

paddle-bot bot commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ZhijunLStudio commented Oct 28, 2025 •

edited

Loading