Skip to content

[Feat] Support MLP_TP feature, exclude MOE layer#4999

Merged
jianzs merged 2 commits intovllm-project:mainfrom
zzhx1:dense_TP
Dec 18, 2025
Merged

[Feat] Support MLP_TP feature, exclude MOE layer#4999
jianzs merged 2 commits intovllm-project:mainfrom
zzhx1:dense_TP

Conversation

@zzhx1
Copy link
Copy Markdown
Contributor

@zzhx1 zzhx1 commented Dec 14, 2025

What this PR does / why we need it?

#4257 This PR implements the dense_ffn TP of the first three layers of the deepseek model, I have refactored this PR and used very little code to support the implementation of this feature.
This PR adds a function is_moe_layer to mlp_tp, which supports MLP TP in models with both mlp and moe, such as deepseek or chat GLM.

Does this PR introduce any user-facing change?

How was this patch tested?

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for fine-grained tensor parallelism (TP) for MLP, o_proj, lm_head, and embedding layers, refactoring the configuration into a finegrained_tp_config object. It also adds logic to exclude MoE layers from MLP TP. The changes are generally well-structured, but I've identified a few critical issues, including a copy-paste error in process group initialization and potential crashes in the is_moe_layer function due to lack of error handling and incorrect logic. Addressing these will be crucial for the stability and correctness of this new feature.

I am having trouble creating individual review comments. Click here to see my feedback.

vllm_ascend/distributed/parallel_state.py (150-151)

critical

There appears to be a copy-paste error here. mlp_tp_size is being assigned the value of embedding_tensor_parallel_size instead of mlp_tensor_parallel_size. This will cause the MLP tensor parallelism to be configured with the wrong size, leading to incorrect behavior or errors.

    mlp_tp_size = get_ascend_config(
    ).finegrained_tp_config.mlp_tensor_parallel_size

vllm_ascend/ops/linear_op.py (690-691)

critical

The code does not handle the case where re.search returns None. If the prefix string does not contain the pattern r'layers\.([0-9]+)\.', match will be None, and the subsequent call to match.group(1) will raise an AttributeError, causing a crash. You should add a check to handle this case gracefully, for example by returning False if no match is found.

    match = re.search(r'layers\.([0-9]+)\.', prefix)
    if not match:
        return False
    layer_idx = int(match.group(1))

vllm_ascend/ops/linear_op.py (698-700)

high

The condition n_routed_experts is not None is likely incorrect for checking if a layer is a MoE layer. Since getattr is used with a default of 0, n_routed_experts will be 0 if the attribute is missing, which is not None. A more robust check would be n_routed_experts > 0, as a model is only considered MoE if it has one or more experts.

    is_moe_layer = (n_routed_experts > 0
                    and layer_idx >= first_k_dense_replace
                    and layer_idx % moe_layer_freq == 0)

Comment thread vllm_ascend/ops/linear_op.py
Signed-off-by: zzhx1 <zzh_201018@outlook.com>
Co-authored-by: 子潜 <ziqian@U-DMKXH32D-2015.local>
Co-authored-by: chenxiao <Jaychou1620@Gmail.com>
@jianzs jianzs merged commit a74a119 into vllm-project:main Dec 18, 2025
14 checks passed
chenaoxuan pushed a commit to chenaoxuan/vllm-ascend that referenced this pull request Dec 20, 2025
vllm-project#4257 This PR implements the dense_ffn TP of the first three layers of
the deepseek model, I have refactored this PR and used very little code
to support the implementation of this feature.
This PR adds a function `is_moe_layer` to mlp_tp, which supports MLP TP
in models with both mlp and moe, such as deepseek or chat GLM.


- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: zzhx1 <zzh_201018@outlook.com>
Co-authored-by: 子潜 <ziqian@U-DMKXH32D-2015.local>
Co-authored-by: chenxiao <Jaychou1620@Gmail.com>
Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
vllm-project#4257 This PR implements the dense_ffn TP of the first three layers of
the deepseek model, I have refactored this PR and used very little code
to support the implementation of this feature.
This PR adds a function `is_moe_layer` to mlp_tp, which supports MLP TP
in models with both mlp and moe, such as deepseek or chat GLM.

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: zzhx1 <zzh_201018@outlook.com>
Co-authored-by: 子潜 <ziqian@U-DMKXH32D-2015.local>
Co-authored-by: chenxiao <Jaychou1620@Gmail.com>
Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
vllm-project#4257 This PR implements the dense_ffn TP of the first three layers of
the deepseek model, I have refactored this PR and used very little code
to support the implementation of this feature.
This PR adds a function `is_moe_layer` to mlp_tp, which supports MLP TP
in models with both mlp and moe, such as deepseek or chat GLM.

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: zzhx1 <zzh_201018@outlook.com>
Co-authored-by: 子潜 <ziqian@U-DMKXH32D-2015.local>
Co-authored-by: chenxiao <Jaychou1620@Gmail.com>
Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation module:core module:ops module:tests ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants