[CI] Add multi-nodes longseq configs of DeepSeek-R1-W8A8 & Qwen3-235B-W8A8#5381
Conversation
Signed-off-by: daishixun <dsxsteven@sina.com>
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request introduces new multi-node long-sequence test configurations for DeepSeek-R1-W8A8 and Qwen3-235B-W8A8. The changes are straightforward additions of YAML configuration files. My review has identified two potential issues: one is a likely incorrect engine_id in the DeepSeek configuration, and the other is an incomplete benchmarks section in the Qwen3 configuration. Both issues could lead to incorrect or ineffective testing and should be addressed.
| '{"kv_connector": "MooncakeConnectorV1", | ||
| "kv_role": "kv_consumer", | ||
| "kv_port": "30200", | ||
| "engine_id": "2", |
There was a problem hiding this comment.
The engine_id for the consumer node is set to "2", which appears to be incorrect for a 2-node setup. In a producer-consumer configuration, engine IDs are typically sequential, starting from "0". The accompanying Qwen3-235B-W8A8-longseq.yaml config uses "0" and "1", which is the expected pattern. Please correct this to "1" to ensure proper node communication.
"engine_id": "1",| } | ||
| } | ||
| }' | ||
| benchmarks: |
There was a problem hiding this comment.
| --max-num-seqs 4 | ||
| --max-model-len 32768 | ||
| --max-num-batched-tokens 16384 | ||
| --trust-remote-code |
There was a problem hiding this comment.
Subsequent cases need to be supplemented for TP asymmetry.
Signed-off-by: daishixun <dsxsteven@sina.com>
Signed-off-by: daishixun <dsxsteven@sina.com>
Signed-off-by: daishixun <dsxsteven@sina.com>
| --trust-remote-code | ||
| --no-enable-prefix-caching | ||
| --gpu-memory-utilization 0.9 | ||
| --compilation_config '{"cudagraph_capture_sizes":[1,2,4,8,16,32], "cudagraph_mode": "FULL_DECODE_ONLY"}' |
There was a problem hiding this comment.
why do we need to specify "cudagraph_capture_sizes":[1,2,4,8,16,32] here?
Signed-off-by: daishixun <dsxsteven@sina.com>
…ven/vllm-ascend_dsx into 12_26_add_longseq_nightly
Signed-off-by: daishixun <dsxsteven@sina.com>
…-W8A8 (vllm-project#5381) ### What this PR does / why we need it? add DeepSeek-R1-W8A8 and Qwen3-235B-W8A8 configs in multi-nodes and longseq (PCP&DCP) scenario - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@bc0a5a0 --------- Signed-off-by: daishixun <dsxsteven@sina.com>
…-W8A8 (vllm-project#5381) ### What this PR does / why we need it? add DeepSeek-R1-W8A8 and Qwen3-235B-W8A8 configs in multi-nodes and longseq (PCP&DCP) scenario - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@bc0a5a0 --------- Signed-off-by: daishixun <dsxsteven@sina.com> Signed-off-by: wjunLu <wjunlu217@gmail.com>
…-W8A8 (vllm-project#5381) ### What this PR does / why we need it? add DeepSeek-R1-W8A8 and Qwen3-235B-W8A8 configs in multi-nodes and longseq (PCP&DCP) scenario - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@bc0a5a0 --------- Signed-off-by: daishixun <dsxsteven@sina.com>
…-W8A8 (vllm-project#5381) ### What this PR does / why we need it? add DeepSeek-R1-W8A8 and Qwen3-235B-W8A8 configs in multi-nodes and longseq (PCP&DCP) scenario - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@bc0a5a0 --------- Signed-off-by: daishixun <dsxsteven@sina.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
…-W8A8 (vllm-project#5381) ### What this PR does / why we need it? add DeepSeek-R1-W8A8 and Qwen3-235B-W8A8 configs in multi-nodes and longseq (PCP&DCP) scenario - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@bc0a5a0 --------- Signed-off-by: daishixun <dsxsteven@sina.com>
…-W8A8 (vllm-project#5381) ### What this PR does / why we need it? add DeepSeek-R1-W8A8 and Qwen3-235B-W8A8 configs in multi-nodes and longseq (PCP&DCP) scenario - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@bc0a5a0 --------- Signed-off-by: daishixun <dsxsteven@sina.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

What this PR does / why we need it?
add DeepSeek-R1-W8A8 and Qwen3-235B-W8A8 configs in multi-nodes and longseq (PCP&DCP) scenario
Does this PR introduce any user-facing change?
NO
How was this patch tested?