[Dist][EP] Remove ETP/EP maintained in vllm-ascend#1681
[Dist][EP] Remove ETP/EP maintained in vllm-ascend#1681wangxiyuan merged 1 commit intovllm-project:mainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1681 +/- ##
==========================================
- Coverage 54.18% 53.41% -0.77%
==========================================
Files 74 72 -2
Lines 9235 9053 -182
==========================================
- Hits 5004 4836 -168
+ Misses 4231 4217 -14
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
0e03620 to
5a32cb8
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
| distributed_executor_backend="mp", | ||
| enforce_eager=False, | ||
| additional_config=additional_config, | ||
| enable_expert_parallel=True, |
There was a problem hiding this comment.
We must set enable_expert_parallel to True when using pangu with the ep in vllm
There was a problem hiding this comment.
We should do this until disabling ep is supported in pangu. plz help review this, thanks! cc @Angazenn
There was a problem hiding this comment.
So maybe the related doc should be updated as well. For example https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_npu_moge.html
There was a problem hiding this comment.
So maybe the related doc should be updated as well. For example https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_npu_moge.html
Now the example is updated, thanks!
| distributed_executor_backend="mp", | ||
| enforce_eager=False, | ||
| additional_config=additional_config, | ||
| enable_expert_parallel=True, |
There was a problem hiding this comment.
So maybe the related doc should be updated as well. For example https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_npu_moge.html
|
@jianzs @ttanzhiqiang could you help review this pr? |
| tp_rank = get_tp_group().rank_in_group | ||
| tp_size = get_tp_group().world_size |
There was a problem hiding this comment.
I think it's fine to use tp group here? cause this is the weight loader for linear layer
|
Thanks for this info, but I'm not formalir with the different fused moe state, maybe I need to read more code to understand the above 1, 2 and 3.
I think we still have ep=1 when ep is disabled, you can refer to https://github.com/vllm-project/vllm/blob/235bfd5dfe0975e42b115cfb910e73eff5c670d8/vllm/model_executor/layers/fused_moe/config.py#L274-L281 |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
@ttanzhiqiang maybe we should remove the best example on A2 in #1101, WDYT? |
I think you can remove these scripts, as we only need to maintain them internally on our side. |
OK, I want to remove it mainly because it is a best example on ETP, which is removed here. I'll remove it then |
ae9f97d to
cf0c584
Compare
Signed-off-by: MengqingCao <cmq0113@163.com>
|
Hi @jianzs @ttanzhiqiang @ApsarasX , your suggestions are addressed now, could you take a look again? thanks! |
|
Let's merge this first. Before next release, we should do a deep test about TP and EP |
Do you have a timeline for the next release? Also, are there any temporary solutions for the bug described in #1396? |
Remove ETP/EP maintained in branch main. We drop this as there is no relevant scenarios to use ETP now, and we may subsequently advocate implementing expert tensor parallelism in vLLM to support scenarios where the expert is needed to be sliced This is a part of vllm-project#1422 backport. Fixes vllm-project#1396 vllm-project#1154 We'll not maintain etp/ep in vllm-ascend anymore, and use the tp/ep in vllm instead. CI passed with new added and existing test. - vLLM version: v0.9.2 - vLLM main: vllm-project/vllm@fe8a2c5 Signed-off-by: MengqingCao <cmq0113@163.com>
### What this PR does / why we need it? Remove ETP/EP maintained in branch main. We drop this as there is no relevant scenarios to use ETP now, and we may subsequently advocate implementing expert tensor parallelism in vLLM to support scenarios where the expert is needed to be sliced This is a part of vllm-project#1422 backport. Fixes vllm-project#1396 vllm-project#1154 ### Does this PR introduce _any_ user-facing change? We'll not maintain etp/ep in vllm-ascend anymore, and use the tp/ep in vllm instead. ### How was this patch tested? CI passed with new added and existing test. - vLLM version: v0.9.2 - vLLM main: vllm-project/vllm@fe8a2c5 Signed-off-by: MengqingCao <cmq0113@163.com>
### What this PR does / why we need it? Remove ETP/EP maintained in branch main. We drop this as there is no relevant scenarios to use ETP now, and we may subsequently advocate implementing expert tensor parallelism in vLLM to support scenarios where the expert is needed to be sliced This is a part of vllm-project#1422 backport. Fixes vllm-project#1396 vllm-project#1154 ### Does this PR introduce _any_ user-facing change? We'll not maintain etp/ep in vllm-ascend anymore, and use the tp/ep in vllm instead. ### How was this patch tested? CI passed with new added and existing test. - vLLM version: v0.9.2 - vLLM main: vllm-project/vllm@fe8a2c5 Signed-off-by: MengqingCao <cmq0113@163.com>
### What this PR does / why we need it? Remove ETP/EP maintained in branch main. We drop this as there is no relevant scenarios to use ETP now, and we may subsequently advocate implementing expert tensor parallelism in vLLM to support scenarios where the expert is needed to be sliced This is a part of vllm-project#1422 backport. Fixes vllm-project#1396 vllm-project#1154 ### Does this PR introduce _any_ user-facing change? We'll not maintain etp/ep in vllm-ascend anymore, and use the tp/ep in vllm instead. ### How was this patch tested? CI passed with new added and existing test. - vLLM version: v0.9.2 - vLLM main: vllm-project/vllm@fe8a2c5 Signed-off-by: MengqingCao <cmq0113@163.com>
What this PR does / why we need it?
Remove ETP/EP maintained in branch main. We drop this as there is no relevant scenarios to use ETP now, and we may subsequently advocate implementing expert tensor parallelism in vLLM to support scenarios where the expert is needed to be sliced
This is a part of #1422 backport.
Fixes #1396 #1154
Does this PR introduce any user-facing change?
We'll not maintain etp/ep in vllm-ascend anymore, and use the tp/ep in vllm instead.
How was this patch tested?
CI passed with new added and existing test.