[Feat] Support MLP_TP feature, exclude MOE layer#4999
[Feat] Support MLP_TP feature, exclude MOE layer#4999jianzs merged 2 commits intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request introduces support for fine-grained tensor parallelism (TP) for MLP, o_proj, lm_head, and embedding layers, refactoring the configuration into a finegrained_tp_config object. It also adds logic to exclude MoE layers from MLP TP. The changes are generally well-structured, but I've identified a few critical issues, including a copy-paste error in process group initialization and potential crashes in the is_moe_layer function due to lack of error handling and incorrect logic. Addressing these will be crucial for the stability and correctness of this new feature.
I am having trouble creating individual review comments. Click here to see my feedback.
vllm_ascend/distributed/parallel_state.py (150-151)
There appears to be a copy-paste error here. mlp_tp_size is being assigned the value of embedding_tensor_parallel_size instead of mlp_tensor_parallel_size. This will cause the MLP tensor parallelism to be configured with the wrong size, leading to incorrect behavior or errors.
mlp_tp_size = get_ascend_config(
).finegrained_tp_config.mlp_tensor_parallel_size
vllm_ascend/ops/linear_op.py (690-691)
The code does not handle the case where re.search returns None. If the prefix string does not contain the pattern r'layers\.([0-9]+)\.', match will be None, and the subsequent call to match.group(1) will raise an AttributeError, causing a crash. You should add a check to handle this case gracefully, for example by returning False if no match is found.
match = re.search(r'layers\.([0-9]+)\.', prefix)
if not match:
return False
layer_idx = int(match.group(1))
vllm_ascend/ops/linear_op.py (698-700)
The condition n_routed_experts is not None is likely incorrect for checking if a layer is a MoE layer. Since getattr is used with a default of 0, n_routed_experts will be 0 if the attribute is missing, which is not None. A more robust check would be n_routed_experts > 0, as a model is only considered MoE if it has one or more experts.
is_moe_layer = (n_routed_experts > 0
and layer_idx >= first_k_dense_replace
and layer_idx % moe_layer_freq == 0)
Signed-off-by: zzhx1 <zzh_201018@outlook.com> Co-authored-by: 子潜 <ziqian@U-DMKXH32D-2015.local> Co-authored-by: chenxiao <Jaychou1620@Gmail.com>
vllm-project#4257 This PR implements the dense_ffn TP of the first three layers of the deepseek model, I have refactored this PR and used very little code to support the implementation of this feature. This PR adds a function `is_moe_layer` to mlp_tp, which supports MLP TP in models with both mlp and moe, such as deepseek or chat GLM. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: zzhx1 <zzh_201018@outlook.com> Co-authored-by: 子潜 <ziqian@U-DMKXH32D-2015.local> Co-authored-by: chenxiao <Jaychou1620@Gmail.com> Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com>
vllm-project#4257 This PR implements the dense_ffn TP of the first three layers of the deepseek model, I have refactored this PR and used very little code to support the implementation of this feature. This PR adds a function `is_moe_layer` to mlp_tp, which supports MLP TP in models with both mlp and moe, such as deepseek or chat GLM. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: zzhx1 <zzh_201018@outlook.com> Co-authored-by: 子潜 <ziqian@U-DMKXH32D-2015.local> Co-authored-by: chenxiao <Jaychou1620@Gmail.com> Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
vllm-project#4257 This PR implements the dense_ffn TP of the first three layers of the deepseek model, I have refactored this PR and used very little code to support the implementation of this feature. This PR adds a function `is_moe_layer` to mlp_tp, which supports MLP TP in models with both mlp and moe, such as deepseek or chat GLM. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: zzhx1 <zzh_201018@outlook.com> Co-authored-by: 子潜 <ziqian@U-DMKXH32D-2015.local> Co-authored-by: chenxiao <Jaychou1620@Gmail.com> Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
What this PR does / why we need it?
#4257 This PR implements the dense_ffn TP of the first three layers of the deepseek model, I have refactored this PR and used very little code to support the implementation of this feature.
This PR adds a function
is_moe_layerto mlp_tp, which supports MLP TP in models with both mlp and moe, such as deepseek or chat GLM.Does this PR introduce any user-facing change?
How was this patch tested?