[Feat] Support MLP_TP feature, exclude MOE layer by zzhx1 · Pull Request #4999 · vllm-project/vllm-ascend

zzhx1 · 2025-12-14T03:54:29Z

What this PR does / why we need it?

#4257 This PR implements the dense_ffn TP of the first three layers of the deepseek model, I have refactored this PR and used very little code to support the implementation of this feature.
This PR adds a function is_moe_layer to mlp_tp, which supports MLP TP in models with both mlp and moe, such as deepseek or chat GLM.

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.12.0
vLLM main: vllm-project/vllm@ad32e3e

github-actions · 2025-12-14T03:54:37Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request introduces support for fine-grained tensor parallelism (TP) for MLP, o_proj, lm_head, and embedding layers, refactoring the configuration into a finegrained_tp_config object. It also adds logic to exclude MoE layers from MLP TP. The changes are generally well-structured, but I've identified a few critical issues, including a copy-paste error in process group initialization and potential crashes in the is_moe_layer function due to lack of error handling and incorrect logic. Addressing these will be crucial for the stability and correctness of this new feature.

I am having trouble creating individual review comments. Click here to see my feedback.

vllm_ascend/distributed/parallel_state.py (150-151)

There appears to be a copy-paste error here. mlp_tp_size is being assigned the value of embedding_tensor_parallel_size instead of mlp_tensor_parallel_size. This will cause the MLP tensor parallelism to be configured with the wrong size, leading to incorrect behavior or errors.

    mlp_tp_size = get_ascend_config(
    ).finegrained_tp_config.mlp_tensor_parallel_size

vllm_ascend/ops/linear_op.py (690-691)

The code does not handle the case where re.search returns None. If the prefix string does not contain the pattern r'layers\.([0-9]+)\.', match will be None, and the subsequent call to match.group(1) will raise an AttributeError, causing a crash. You should add a check to handle this case gracefully, for example by returning False if no match is found.

    match = re.search(r'layers\.([0-9]+)\.', prefix)
    if not match:
        return False
    layer_idx = int(match.group(1))

vllm_ascend/ops/linear_op.py (698-700)

The condition n_routed_experts is not None is likely incorrect for checking if a layer is a MoE layer. Since getattr is used with a default of 0, n_routed_experts will be 0 if the attribute is missing, which is not None. A more robust check would be n_routed_experts > 0, as a model is only considered MoE if it has one or more experts.

    is_moe_layer = (n_routed_experts > 0
                    and layer_idx >= first_k_dense_replace
                    and layer_idx % moe_layer_freq == 0)

Signed-off-by: zzhx1 <zzh_201018@outlook.com> Co-authored-by: 子潜 <ziqian@U-DMKXH32D-2015.local> Co-authored-by: chenxiao <Jaychou1620@Gmail.com>

vllm-project#4257 This PR implements the dense_ffn TP of the first three layers of the deepseek model, I have refactored this PR and used very little code to support the implementation of this feature. This PR adds a function `is_moe_layer` to mlp_tp, which supports MLP TP in models with both mlp and moe, such as deepseek or chat GLM. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: zzhx1 <zzh_201018@outlook.com> Co-authored-by: 子潜 <ziqian@U-DMKXH32D-2015.local> Co-authored-by: chenxiao <Jaychou1620@Gmail.com> Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com>

vllm-project#4257 This PR implements the dense_ffn TP of the first three layers of the deepseek model, I have refactored this PR and used very little code to support the implementation of this feature. This PR adds a function `is_moe_layer` to mlp_tp, which supports MLP TP in models with both mlp and moe, such as deepseek or chat GLM. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: zzhx1 <zzh_201018@outlook.com> Co-authored-by: 子潜 <ziqian@U-DMKXH32D-2015.local> Co-authored-by: chenxiao <Jaychou1620@Gmail.com> Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

github-actions Bot added documentation Improvements or additions to documentation module:tests module:ops module:core labels Dec 14, 2025

zzhx1 force-pushed the dense_TP branch from 1b5a855 to dcb15ce Compare December 14, 2025 04:00

gemini-code-assist Bot reviewed Dec 14, 2025

View reviewed changes

zzhx1 force-pushed the dense_TP branch from 37758ea to a7e14f6 Compare December 14, 2025 17:37

wangxiyuan reviewed Dec 15, 2025

View reviewed changes

Comment thread vllm_ascend/ops/linear_op.py

weijinqian0 approved these changes Dec 15, 2025

View reviewed changes

zzhx1 mentioned this pull request Dec 16, 2025

[Doc]Add the user_guide doc file regarding fine-grained TP. #5084

Merged

jianzs added ready read for review ready-for-test start test by label for PR labels Dec 17, 2025

zzhx1 force-pushed the dense_TP branch from 7fad35e to 6657cef Compare December 18, 2025 07:03

add mlp_tp logic exclude moe layer

a19bd0d

Signed-off-by: zzhx1 <zzh_201018@outlook.com> Co-authored-by: 子潜 <ziqian@U-DMKXH32D-2015.local> Co-authored-by: chenxiao <Jaychou1620@Gmail.com>

zzhx1 force-pushed the dense_TP branch from 6657cef to a19bd0d Compare December 18, 2025 07:26

jianzs approved these changes Dec 18, 2025

View reviewed changes

Merge branch 'main' into dense_TP

e88cd55

jianzs merged commit a74a119 into vllm-project:main Dec 18, 2025
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Support MLP_TP feature, exclude MOE layer#4999

[Feat] Support MLP_TP feature, exclude MOE layer#4999
jianzs merged 2 commits intovllm-project:mainfrom
zzhx1:dense_TP

zzhx1 commented Dec 14, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented Dec 14, 2025

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

zzhx1 commented Dec 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions Bot commented Dec 14, 2025

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

vllm_ascend/distributed/parallel_state.py (150-151)

vllm_ascend/ops/linear_op.py (690-691)

vllm_ascend/ops/linear_op.py (698-700)

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zzhx1 commented Dec 14, 2025 •

edited

Loading