[refactor] Refactoring forward_context and model_runner_v1#1422
[refactor] Refactoring forward_context and model_runner_v1#1422ganyi1996ppo merged 2 commits intovllm-project:v0.9.1-devfrom
Conversation
Signed-off-by: zzzzwwjj <1183291235@qq.com>
cfd63c5 to
48fd2a1
Compare
…raph_mode Signed-off-by: zzzzwwjj <1183291235@qq.com>
There was a problem hiding this comment.
Do we need to add another version here corresponding to 310P?
There was a problem hiding this comment.
We can do it in the future.
There was a problem hiding this comment.
Don't we maintain etp any more?
There was a problem hiding this comment.
Given the absence of relevant scenarios, employing EP or full TP is sufficient, for now. We may subsequently advocate implementing expert tensor parallelism in vLLM to support scenarios where the number of nodes exceeds the number of experts.
There was a problem hiding this comment.
However, we do have customer scenarios that require such configurations. While DeepSeek models might not need this, there are use cases involving large-scale MoE (Mixture of Experts) models that require splitting across both Tensor Parallelism (TP) and Expert Parallelism (EP), or sometimes just TP alone. This is exactly the case with the current Jieyue Xingchen models
|
LGTM |
1 similar comment
|
LGTM |
|
This solution is not fully aligned with the current ETP solution. For example, EP and ETP cannot be supported at the same time. |
### What this PR does / why we need it? Remove ETP/EP maintained in branch main. We drop this as there is no relevant scenarios to use ETP now, and we may subsequently advocate implementing expert tensor parallelism in vLLM to support scenarios where the expert is needed to be sliced This is a part of #1422 backport. Fixes #1396 #1154 ### Does this PR introduce _any_ user-facing change? We'll not maintain etp/ep in vllm-ascend anymore, and use the tp/ep in vllm instead. ### How was this patch tested? CI passed with new added and existing test. - vLLM version: v0.9.2 - vLLM main: vllm-project/vllm@fe8a2c5 Signed-off-by: MengqingCao <cmq0113@163.com>
Remove ETP/EP maintained in branch main. We drop this as there is no relevant scenarios to use ETP now, and we may subsequently advocate implementing expert tensor parallelism in vLLM to support scenarios where the expert is needed to be sliced This is a part of vllm-project#1422 backport. Fixes vllm-project#1396 vllm-project#1154 We'll not maintain etp/ep in vllm-ascend anymore, and use the tp/ep in vllm instead. CI passed with new added and existing test. - vLLM version: v0.9.2 - vLLM main: vllm-project/vllm@fe8a2c5 Signed-off-by: MengqingCao <cmq0113@163.com>
### What this PR does / why we need it? Remove ETP/EP maintained in branch main. We drop this as there is no relevant scenarios to use ETP now, and we may subsequently advocate implementing expert tensor parallelism in vLLM to support scenarios where the expert is needed to be sliced This is a part of vllm-project#1422 backport. Fixes vllm-project#1396 vllm-project#1154 ### Does this PR introduce _any_ user-facing change? We'll not maintain etp/ep in vllm-ascend anymore, and use the tp/ep in vllm instead. ### How was this patch tested? CI passed with new added and existing test. - vLLM version: v0.9.2 - vLLM main: vllm-project/vllm@fe8a2c5 Signed-off-by: MengqingCao <cmq0113@163.com>
### What this PR does / why we need it? Remove ETP/EP maintained in branch main. We drop this as there is no relevant scenarios to use ETP now, and we may subsequently advocate implementing expert tensor parallelism in vLLM to support scenarios where the expert is needed to be sliced This is a part of vllm-project#1422 backport. Fixes vllm-project#1396 vllm-project#1154 ### Does this PR introduce _any_ user-facing change? We'll not maintain etp/ep in vllm-ascend anymore, and use the tp/ep in vllm instead. ### How was this patch tested? CI passed with new added and existing test. - vLLM version: v0.9.2 - vLLM main: vllm-project/vllm@fe8a2c5 Signed-off-by: MengqingCao <cmq0113@163.com>
### What this PR does / why we need it? Remove ETP/EP maintained in branch main. We drop this as there is no relevant scenarios to use ETP now, and we may subsequently advocate implementing expert tensor parallelism in vLLM to support scenarios where the expert is needed to be sliced This is a part of vllm-project#1422 backport. Fixes vllm-project#1396 vllm-project#1154 ### Does this PR introduce _any_ user-facing change? We'll not maintain etp/ep in vllm-ascend anymore, and use the tp/ep in vllm instead. ### How was this patch tested? CI passed with new added and existing test. - vLLM version: v0.9.2 - vLLM main: vllm-project/vllm@fe8a2c5 Signed-off-by: MengqingCao <cmq0113@163.com>
What this PR does / why we need it?
A refactoring of
forward_contextandmodel_runner_v1, add some context which is necessary in model inference intoforward_context, and refactordummy_runlogic, make it more reasonable.Some details for this PR:
examplesdir;Does this PR introduce any user-facing change?
This PR remove
expert_tensor_parallel_sizeinadditional_config, we will useenable_expert_parallelto control whether expert_parallel is enable, which is consistent with vLLM.How was this patch tested?