[Dist][EP] Remove ETP/EP maintained in vllm-ascend by MengqingCao · Pull Request #1681 · vllm-project/vllm-ascend

MengqingCao · 2025-07-09T01:55:00Z

What this PR does / why we need it?

Remove ETP/EP maintained in branch main. We drop this as there is no relevant scenarios to use ETP now, and we may subsequently advocate implementing expert tensor parallelism in vLLM to support scenarios where the expert is needed to be sliced

This is a part of #1422 backport.

Fixes #1396 #1154

Does this PR introduce any user-facing change?

We'll not maintain etp/ep in vllm-ascend anymore, and use the tp/ep in vllm instead.

How was this patch tested?

CI passed with new added and existing test.

vLLM version: v0.9.2
vLLM main: vllm-project/vllm@fe8a2c5

codecov · 2025-07-09T07:21:39Z

Codecov Report

❌ Patch coverage is 36.36364% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 53.41%. Comparing base (f9dfde0) to head (fa7a1f0).
⚠️ Report is 559 commits behind head on main.

Files with missing lines	Patch %	Lines
vllm_ascend/ops/fused_moe.py	12.50%	7 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1681      +/-   ##
==========================================
- Coverage   54.18%   53.41%   -0.77%     
==========================================
  Files          74       72       -2     
  Lines        9235     9053     -182     
==========================================
- Hits         5004     4836     -168     
+ Misses       4231     4217      -14

Flag	Coverage Δ
unittests	`53.41% <36.36%> (-0.77%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-07-10T06:31:15Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

MengqingCao · 2025-07-11T06:58:33Z

            distributed_executor_backend="mp",
            enforce_eager=False,
            additional_config=additional_config,
+            enable_expert_parallel=True,


We must set enable_expert_parallel to True when using pangu with the ep in vllm

We should do this until disabling ep is supported in pangu. plz help review this, thanks! cc @Angazenn

So maybe the related doc should be updated as well. For example https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_npu_moge.html

So maybe the related doc should be updated as well. For example https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_npu_moge.html

Now the example is updated, thanks!

wangxiyuan · 2025-07-14T03:34:33Z

            distributed_executor_backend="mp",
            enforce_eager=False,
            additional_config=additional_config,
+            enable_expert_parallel=True,


So maybe the related doc should be updated as well. For example https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_npu_moge.html

MengqingCao · 2025-07-15T08:38:49Z

@jianzs @ttanzhiqiang could you help review this pr?

jianzs · 2025-07-15T09:24:17Z

+        tp_rank = get_tp_group().rank_in_group
+        tp_size = get_tp_group().world_size


I think it's fine to use tp group here? cause this is the weight loader for linear layer

ttanzhiqiang · 2025-07-15T09:36:20Z

@MengqingCao

If Prefill/decode uses AllGather or NaiveMulticast solution at the same time, this is ETP logic.
If Prefill/decode uses All2All/MC2 solution at the same time, this is EP logic.
Prefill uses AllGatherEP solution (using VLLM_ENABLE_FUSED_EXPERTS_ALLGATHER_EP switch), and Decode uses MC2 solution, this is AllGatherEP logic. .
In the PD separation scenario, the strategies used by P and D are separate.
After this pr, will the get_fused_moe_state function still have ep=1?

MengqingCao · 2025-07-15T12:53:40Z

@MengqingCao

If Prefill/decode uses AllGather or NaiveMulticast solution at the same time, this is ETP logic.

If Prefill/decode uses All2All/MC2 solution at the same time, this is EP logic.

Prefill uses AllGatherEP solution (using VLLM_ENABLE_FUSED_EXPERTS_ALLGATHER_EP switch), and Decode uses MC2 solution, this is AllGatherEP logic. .

Thanks for this info, but I'm not formalir with the different fused moe state, maybe I need to read more code to understand the above 1, 2 and 3.

In the PD separation scenario, the strategies used by P and D are separate.
After this pr, will the get_fused_moe_state function still have ep=1?

I think we still have ep=1 when ep is disabled, you can refer to https://github.com/vllm-project/vllm/blob/235bfd5dfe0975e42b115cfb910e73eff5c670d8/vllm/model_executor/layers/fused_moe/config.py#L274-L281

github-actions · 2025-07-16T02:59:12Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

MengqingCao · 2025-07-16T06:09:03Z

@ttanzhiqiang maybe we should remove the best example on A2 in #1101, WDYT?

ApsarasX · 2025-07-16T07:12:20Z

@ttanzhiqiang maybe we should remove the best example on A2 in #1101, WDYT?

I think you can remove these scripts, as we only need to maintain them internally on our side.

MengqingCao · 2025-07-16T07:25:23Z

@ttanzhiqiang maybe we should remove the best example on A2 in #1101, WDYT?

I think you can remove these scripts, as we only need to maintain them internally on our side.

OK, I want to remove it mainly because it is a best example on ETP, which is removed here. I'll remove it then

Signed-off-by: MengqingCao <cmq0113@163.com>

MengqingCao · 2025-07-18T01:16:04Z

Hi @jianzs @ttanzhiqiang @ApsarasX , your suggestions are addressed now, could you take a look again? thanks!

wangxiyuan · 2025-07-21T01:08:27Z

Let's merge this first. Before next release, we should do a deep test about TP and EP

MingXiangL · 2025-07-21T02:41:17Z

Let's merge this first. Before next release, we should do a deep test about TP and EP

Do you have a timeline for the next release? Also, are there any temporary solutions for the bug described in #1396?

Remove ETP/EP maintained in branch main. We drop this as there is no relevant scenarios to use ETP now, and we may subsequently advocate implementing expert tensor parallelism in vLLM to support scenarios where the expert is needed to be sliced This is a part of vllm-project#1422 backport. Fixes vllm-project#1396 vllm-project#1154 We'll not maintain etp/ep in vllm-ascend anymore, and use the tp/ep in vllm instead. CI passed with new added and existing test. - vLLM version: v0.9.2 - vLLM main: vllm-project/vllm@fe8a2c5 Signed-off-by: MengqingCao <cmq0113@163.com>

### What this PR does / why we need it? Remove ETP/EP maintained in branch main. We drop this as there is no relevant scenarios to use ETP now, and we may subsequently advocate implementing expert tensor parallelism in vLLM to support scenarios where the expert is needed to be sliced This is a part of vllm-project#1422 backport. Fixes vllm-project#1396 vllm-project#1154 ### Does this PR introduce _any_ user-facing change? We'll not maintain etp/ep in vllm-ascend anymore, and use the tp/ep in vllm instead. ### How was this patch tested? CI passed with new added and existing test. - vLLM version: v0.9.2 - vLLM main: vllm-project/vllm@fe8a2c5 Signed-off-by: MengqingCao <cmq0113@163.com>

github-actions Bot added documentation Improvements or additions to documentation module:tests module:ops module:core module:quantization labels Jul 9, 2025

MengqingCao force-pushed the etp branch 2 times, most recently from 0e03620 to 5a32cb8 Compare July 10, 2025 03:09

github-actions Bot added the merge-conflicts label Jul 10, 2025

MengqingCao force-pushed the etp branch from 5a32cb8 to de38213 Compare July 10, 2025 11:55

github-actions Bot removed the merge-conflicts label Jul 10, 2025

MengqingCao commented Jul 11, 2025

View reviewed changes

MengqingCao mentioned this pull request Jul 14, 2025

[ExternalDP][RL] Make external DP support on EP and ETP #1384

Closed

wangxiyuan reviewed Jul 14, 2025

View reviewed changes

MengqingCao force-pushed the etp branch from 8fb2cd7 to f73bbbc Compare July 15, 2025 01:15

wangxiyuan approved these changes Jul 15, 2025

View reviewed changes

jianzs reviewed Jul 15, 2025

View reviewed changes

Comment thread vllm_ascend/ops/fused_moe.py Outdated

jianzs reviewed Jul 15, 2025

View reviewed changes

ApsarasX reviewed Jul 15, 2025

View reviewed changes

Comment thread vllm_ascend/models/pangu_moe.py

github-actions Bot added the merge-conflicts label Jul 16, 2025

MengqingCao force-pushed the etp branch from 61dee65 to f47716a Compare July 17, 2025 03:13

github-actions Bot removed the merge-conflicts label Jul 17, 2025

MengqingCao force-pushed the etp branch 2 times, most recently from ae9f97d to cf0c584 Compare July 17, 2025 06:29

[Dist][EP] Remove ETP/EP maintained in vllm-ascend

fa7a1f0

Signed-off-by: MengqingCao <cmq0113@163.com>

MengqingCao force-pushed the etp branch from cf0c584 to fa7a1f0 Compare July 17, 2025 12:15

wangxiyuan approved these changes Jul 18, 2025

View reviewed changes

This was referenced Jul 20, 2025

[Bug]: external_launcher下无法感知data_parallel_size 导致cpu_group计算异常 assert self.cpu_group is not None #1154

Closed

[Bug]: assert self.cpu_group is not None #1396

Closed

wangxiyuan approved these changes Jul 21, 2025

View reviewed changes

wangxiyuan merged commit 8cfd257 into vllm-project:main Jul 21, 2025
24 checks passed

MengqingCao deleted the etp branch July 21, 2025 01:50

Potabk mentioned this pull request Jul 28, 2025

[Bug]: 0.9.1dev版本0728，vllm ascend运行qwen3 235B A22B W8A8，800T A2 下8卡，attention dp8， moe etp8切分，上下文长度4200，推理卡死，plog报错out of memory #2069

Open

		tp_rank = get_tp_group().rank_in_group
		tp_size = get_tp_group().world_size

Conversation

MengqingCao commented Jul 9, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

codecov Bot commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Jul 10, 2025

Uh oh!

MengqingCao Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

MengqingCao Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangxiyuan Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

MengqingCao Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wangxiyuan Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

MengqingCao commented Jul 15, 2025

Uh oh!

Uh oh!

jianzs Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

MengqingCao Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ttanzhiqiang commented Jul 15, 2025

Uh oh!

MengqingCao commented Jul 15, 2025

Uh oh!

github-actions Bot commented Jul 16, 2025

Uh oh!

MengqingCao commented Jul 16, 2025

Uh oh!

ApsarasX commented Jul 16, 2025

Uh oh!

MengqingCao commented Jul 16, 2025

Uh oh!

MengqingCao commented Jul 18, 2025

Uh oh!

Uh oh!

wangxiyuan commented Jul 21, 2025

Uh oh!

MingXiangL commented Jul 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

MengqingCao commented Jul 9, 2025 •

edited by github-actions Bot

Loading

codecov Bot commented Jul 9, 2025 •

edited

Loading

MengqingCao Jul 14, 2025 •

edited

Loading