[Doc] Refactor the DeepSeek-V3.1 tutorial.#4399
[Doc] Refactor the DeepSeek-V3.1 tutorial.#4399MengqingCao merged 1 commit intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request adds a comprehensive tutorial for deploying the DeepSeek-V3.1 model. While the document covers various deployment scenarios, I've found several critical errors in the provided code snippets and configurations, particularly for multi-node and prefill-decode disaggregation setups. These issues, including Python syntax errors, incorrect data parallel configurations, and inconsistent model naming, would likely prevent users from successfully following the instructions. My review provides specific corrections to address these critical problems and improve the tutorial's accuracy and usability.
74cdeb3 to
16d672f
Compare
| local_ip="xxxx" | ||
|
|
||
| # [Optional] jemalloc | ||
| # if `libjemalloc.so` is install on your machine, you can turn it on. |
There was a problem hiding this comment.
jemalloc is for better performance, please add some description, otherwise may be a little confused. Thanks.
There was a problem hiding this comment.
i have added some description: “jemalloc is for better performance, if libjemalloc.so is install on your machine, you can turn it on.”
| ### Model Weight | ||
| - `DeepSeek-V3.1`(BF16 version): [Download model weight](https://www.modelscope.cn/models/deepseek-ai/DeepSeek-V3.1) | ||
| - `DeepSeek-V3.1-w8a8`(Quantized version): [Download model weight](https://www.modelscope.cn/models/Eco-Tech/DeepSeek-V3.1-w8a8). Note: modify `torch_dtype` from `float16` to `bfloat16` in `config.json`. | ||
| - Method of Quantify: [DeepSeek-V3.1 W8A8+MTP](https://gitcode.com/Ascend/msit/blob/master/msmodelslim/example/DeepSeek/README.md#deepseek-v31-w8a8-%E6%B7%B7%E5%90%88%E9%87%8F%E5%8C%96-mtp-%E9%87%8F%E5%8C%96) |
There was a problem hiding this comment.
DeepSeek-V3.1 W8A8+MTP seems not having a available download url. It's better to upload to modelscope or other platform, since you mention DeepSeek-V3.1 W8A8+MTP as below.
There was a problem hiding this comment.
ok, we don't have mtp weights on modelscope, so i put a method of quantify here, maybe i should add more details.
| export VLLM_ASCEND_ENABLE_FLASHCOMM1=0 | ||
| export DISABLE_L2_CACHE=1 | ||
|
|
||
| vllm serve vllm-ascend/DeepSeek-V3.1_w8a8mix_mtp \ |
There was a problem hiding this comment.
In fact, if you use xxx/xxx as a model name, vllm will search it from the huggingface (or if you set VLLM_USE_MODELSCOPE, vllm will search from the modelscope), the vllm-ascend/xxx usually indicate it is from our modelscope vllm-ascend published models, so better change it to a local path.
| export VLLM_USE_V1=1 | ||
| export HCCL_BUFFSIZE=200 | ||
| export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True | ||
| export VLLM_ASCEND_ENABLE_MLAPO=1 |
There was a problem hiding this comment.
@wangxiyuan The VLLM_ASCEND_ENABLE_MLAPO=1 is also needed in DeepSeek-V3.1? And i am not sure it is ok for this, since i remember it caused some issue in 0.11.0rc1 DeepSeek-V3.2-Exp.
| --gpu-memory-utilization 0.92 \ | ||
| --speculative-config '{"num_speculative_tokens": 1, "method": "deepseek_mtp"}' \ | ||
| --compilation-config '{"cudagraph_mode": "FULL_DECODE_ONLY"}' \ | ||
| --additional-config '{"ascend_scheduler_config":{"enabled":false},"torchair_graph_config":{"enabled":false}}' |
There was a problem hiding this comment.
ascend schedular is ready to be dropped in main. Refer to this #4498
9ce8f7b to
208dcae
Compare
712ae28 to
8bb9393
Compare
8bb9393 to
5b7511a
Compare
Signed-off-by: 1092626063 <1092626063@qq.com>
5b7511a to
fed65de
Compare
|
LGTM, thanks for your contribution! |
### What this PR does / why we need it? Refactor the DeepSeek-V3.1 tutorial. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: 1092626063 <1092626063@qq.com>
### What this PR does / why we need it? Refactor the DeepSeek-V3.1 tutorial. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: 1092626063 <1092626063@qq.com> Signed-off-by: Che Ruan <cr623@ic.ac.uk>
### What this PR does / why we need it? Refactor the DeepSeek-V3.1 tutorial. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: 1092626063 <1092626063@qq.com> Signed-off-by: Che Ruan <cr623@ic.ac.uk>
### What this PR does / why we need it? Refactor the DeepSeek-V3.1 tutorial. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: 1092626063 <1092626063@qq.com>
### What this PR does / why we need it? Refactor the DeepSeek-V3.1 tutorial. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: 1092626063 <1092626063@qq.com> Signed-off-by: tanqingshan (A) <50050625@china.huawei.com>
### What this PR does / why we need it? Refactor the DeepSeek-V3.1 tutorial. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: 1092626063 <1092626063@qq.com>
### What this PR does / why we need it? Refactor the DeepSeek-V3.1 tutorial. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: 1092626063 <1092626063@qq.com>
What this PR does / why we need it?
Refactor the DeepSeek-V3.1 tutorial.
Does this PR introduce any user-facing change?
How was this patch tested?