[Doc] Update DeepSeek V3.1/R1 2P1D doc#5387
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request updates the documentation for DeepSeek V3.1 and R1 models, primarily focusing on the 2P1D deployment configurations. The changes involve updating command-line parameters, replacing an embedded script with a link for better maintainability, and removing hardcoded version numbers. My review identified a critical issue in the data parallelism configuration for prefill nodes in DeepSeek-V3.1.md. The provided commands are inconsistent and would result in a non-functional deployment. I have included a detailed comment with a suggested fix for this issue. The other documentation changes appear correct and improve clarity.
| python launch_online_dp.py --dp-size 2 --tp-size 8 --dp-size-local 2 --dp-rank-start 0 --dp-address 141.xx.xx.1 --dp-rpc-port 12321 --vllm-start-port 7100 | ||
| # p1 | ||
| python launch_dp_program.py --dp-size 2 --tp-size 8 --dp-size-local 2 --dp-rank-start 0 --dp-address 141.xx.xx.2 --dp-rpc-port 12321 --vllm-start-port 7100 | ||
| python launch_online_dp.py --dp-size 2 --tp-size 8 --dp-size-local 2 --dp-rank-start 0 --dp-address 141.xx.xx.2 --dp-rpc-port 12321 --vllm-start-port 7100 |
There was a problem hiding this comment.
The data parallelism configuration for the prefill nodes (p0, p1) appears to be incorrect and will likely lead to errors.
Here are the inconsistencies:
dp-sizevs. number of workers:dp-sizeis set to 2, but withdp-size-local=2on two nodes (p0 and p1), you are attempting to launch 4 workers in total. The total number of workers should matchdp-size.- Rank Conflict:
dp-rank-startis 0 for both nodes. This will cause data parallel rank conflicts as both nodes will try to launch workers with ranks 0 and 1. - DP Master Address:
dp-addressis different for p0 and p1. For a single data parallel group, all workers should point to the same master address. - Connector Config: The
kv-transfer-configin therun_dp_template.shfor prefill nodes specifies"dp_size": 2, which is inconsistent with launching 4 workers.
To fix this for a 2-node, 4-worker prefill setup, you should:
- Use
dp-size=4. - Assign unique ranks (e.g.,
dp-rank-start=0for p0,dp-rank-start=2for p1). - Use a single master
dp-address. - Update the
dp_sizeto 4 inkv-transfer-configin therun_dp_template.shscripts.
I've provided a code suggestion to correct the launch commands, assuming 141.xx.xx.1 is the master. You will also need to update the dp_size to 4 in the prefill section of kv_connector_extra_config within the run_dp_template.sh scripts for both prefill nodes.
| python launch_online_dp.py --dp-size 2 --tp-size 8 --dp-size-local 2 --dp-rank-start 0 --dp-address 141.xx.xx.1 --dp-rpc-port 12321 --vllm-start-port 7100 | |
| # p1 | |
| python launch_dp_program.py --dp-size 2 --tp-size 8 --dp-size-local 2 --dp-rank-start 0 --dp-address 141.xx.xx.2 --dp-rpc-port 12321 --vllm-start-port 7100 | |
| python launch_online_dp.py --dp-size 2 --tp-size 8 --dp-size-local 2 --dp-rank-start 0 --dp-address 141.xx.xx.2 --dp-rpc-port 12321 --vllm-start-port 7100 | |
| python launch_online_dp.py --dp-size 4 --tp-size 8 --dp-size-local 2 --dp-rank-start 0 --dp-address 141.xx.xx.1 --dp-rpc-port 12321 --vllm-start-port 7100 | |
| # p1 | |
| python launch_online_dp.py --dp-size 4 --tp-size 8 --dp-size-local 2 --dp-rank-start 2 --dp-address 141.xx.xx.1 --dp-rpc-port 12321 --vllm-start-port 7100 |
575970f to
f299b34
Compare
748455f to
8e7dbe6
Compare
Signed-off-by: chenmenglong <chenmenglong1@huawei.com>
8e7dbe6 to
149d37e
Compare
…to eplb_refactor * 'main' of https://github.com/vllm-project/vllm-ascend: (46 commits) [Feature] Support to use fullgraph with eagle (vllm-project#5118) [EPLB][refactor] Modification of the initialization logic for expert_map and log2phy(depend on pr5285) (vllm-project#5311) [Refactor]6/N Extract common code of class AscendMLAImpl (vllm-project#5314) [Refactor] cache cos/sin in mla & remove parameter model in builder. (vllm-project#5277) update vllm pin to 12.27 (vllm-project#5412) [ReleaseNote] Add release note for v0.13.0rc1 (vllm-project#5334) [Bugfix] Correctly handle the output shape in multimodal attention (vllm-project#5443) Fix nightly (vllm-project#5413) [bugfix] fix typo of _skip_all_reduce_across_dp_group (vllm-project#5435) [Doc]modify pcp tutorial doc (vllm-project#5440) [Misc] fast fail for exiting if tools/install_flash_infer_attention_score_ops_a2.sh (vllm-project#5422) [Doc] Update DeepSeek V3.1/R1 2P1D doc (vllm-project#5387) [DOC]Fix model weight download links (vllm-project#5436) [Doc] Modify DeepSeek-R1/V3.1 documentation (vllm-project#5426) Revert "[feat] enable hierarchical mc2 ops on A2 by default (vllm-project#5300)" (vllm-project#5434) [Bugfix] fix greedy temperature detection (vllm-project#5417) [doc] Update Qwen3-235B doc for reproducing latest performance (vllm-project#5323) [feat] enable hierarchical mc2 ops on A2 by default (vllm-project#5300) [Doc] delete environment variable HCCL_OP_EXPANSION_MODE in DeepSeekV3.1/R1 (vllm-project#5419) [Doc] add long_sequence feature user guide (vllm-project#5343) ...
### What this PR does / why we need it? The PR updates the documentation for DeepSeek-V3.1 and DeepSeek-R1 in the scenario of prefill-decode disaggregation. Updated some PD separation-related setting parameters and optimal configurations. This script has been verified. - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@bc0a5a0 Signed-off-by: chenmenglong <chenmenglong1@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? The PR updates the documentation for DeepSeek-V3.1 and DeepSeek-R1 in the scenario of prefill-decode disaggregation. Updated some PD separation-related setting parameters and optimal configurations. This script has been verified. - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@bc0a5a0 Signed-off-by: chenmenglong <chenmenglong1@huawei.com>
### What this PR does / why we need it? The PR updates the documentation for DeepSeek-V3.1 and DeepSeek-R1 in the scenario of prefill-decode disaggregation. Updated some PD separation-related setting parameters and optimal configurations. This script has been verified. - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@bc0a5a0 Signed-off-by: chenmenglong <chenmenglong1@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
What this PR does / why we need it?
The PR updates the documentation for DeepSeek-V3.1 and DeepSeek-R1 in the scenario of prefill-decode disaggregation.
Updated some PD separation-related setting parameters and optimal configurations. This script has been verified.