[perf]Support MOE Multi-stream in Deepseek#947
[perf]Support MOE Multi-stream in Deepseek#947wangxiyuan merged 10 commits intovllm-project:mainfrom
Conversation
| moe_expert_num = len(expert_map) | ||
| # hidden_states = hidden_states.bfloat16() | ||
| kwargs = { | ||
| kwargs1 = { |
There was a problem hiding this comment.
rename to a readable name
| lambda: bool(int(os.getenv("COMPILE_CUSTOM_KERNELS", "1"))), | ||
| "VLLM_ENABLE_MC2": | ||
| lambda: bool(int(os.getenv("VLLM_ENABLE_MC2", '0'))), | ||
| "VLLM_ENABLE_CV_PARALLEL": |
There was a problem hiding this comment.
use additional_config instead of env, since this change is only used for torchair GE mode. like #839 does, there are another 3 new config option coming.
how about
{
"additional_config": {
"torchair_graph_config": {
"enable": True,
"enable_cv_parallet": True,
"batch_sizes": "12345",
"batch_sizes_init": True
}
}
}
cc @zzzzwwjj
|
And don't forget add e2e test. The model weight is here: https://www.modelscope.cn/models/vllm-ascend/DeepSeek-V2-Lite-W8A8 take https://github.com/vllm-project/vllm-ascend/blob/main/tests/multicard/test_offline_inference_distributed.py#L49 as an example |
Signed-off-by: David9857 <985700846@qq.com> use additional_config to enable cv parallel Signed-off-by: David9857 <985700846@qq.com> rename kwargs1 in fused_experts_with_mc2 Signed-off-by: David9857 <985700846@qq.com>
Signed-off-by: David9857 <985700846@qq.com>
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
| self.gate.e_score_correction_bias = None | ||
|
|
||
| self.enable_cv_parallel = False | ||
| additional_config = get_current_vllm_config().additional_config |
There was a problem hiding this comment.
please use ascend_config instead now. Note that doc should be updated at the same time.
Signed-off-by: David9857 <985700846@qq.com>
Signed-off-by: David9857 <985700846@qq.com>
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: David9857 <985700846@qq.com> bugfix Signed-off-by: David9857 <985700846@qq.com>
Signed-off-by: David9857 <985700846@qq.com>
### What this PR does / why we need it? Support MOE inner Multi-stream for Deepseek. This feature requires graph mode with mc2 enabled. --------- Signed-off-by: David9857 <985700846@qq.com>
### What this PR does / why we need it? Support MOE inner Multi-stream for Deepseek. This feature requires graph mode with mc2 enabled. --------- Signed-off-by: David9857 <985700846@qq.com>
What this PR does / why we need it?
Support MOE inner Multi-stream for Deepseek.
This feature requires graph mode with mc2 enabled.
Does this PR introduce any user-facing change?
How was this patch tested?