Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder and add doc for DeepSeekOCR2.md#7380
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces comprehensive documentation for the DeepSeek-OCR-2 model, aiming to streamline its deployment and evaluation for users. The new guide covers everything from environment setup and model weight acquisition to single-node deployment, functional verification, and performance benchmarking, ensuring users have all the necessary information to effectively utilize the model. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This PR adds documentation for the DeepSeek-OCR-2 model. I've found several critical issues: the example commands use incorrect model paths and will fail, the accuracy section contains misleading results from a different model, and the model is missing from the supported models list linked in the document. I've provided specific comments with fixes. Also, please update the PR title to follow the repository's style guide, for example: [Doc][Feature] Add documentation for DeepSeekOCR2.
| export VLLM_USE_V1=1 | ||
| export TASK_QUEUE_ENABLE=1 | ||
|
|
||
| vllm serve /weights/DeepSeek-OCR-2 \ |
There was a problem hiding this comment.
The model path /weights/DeepSeek-OCR-2 seems incorrect. The documentation instructs users to download the model to /root/.cache and the provided docker run command mounts the host's /root/.cache directory to /root/.cache inside the container. However, no volume is mounted to /weights. This command will fail. Please update the path to /root/.cache/DeepSeek-OCR-2 to match the download and volume mount instructions.
| vllm serve /weights/DeepSeek-OCR-2 \ | |
| vllm serve /root/.cache/DeepSeek-OCR-2 \ |
| Take the `serve` as an example. Run the code as follows. | ||
|
|
||
| ```shell | ||
| vllm bench serve --model /weights/DeepSeek-OCR-2 --dataset-name random --random-input 1024 --num-prompts 200 --request-rate 1 --save-result --result-dir ./ |
There was a problem hiding this comment.
The model path /weights/DeepSeek-OCR-2 is incorrect, similar to the vllm serve command earlier. The path should point to where the model is located inside the container, which is /root/.cache/DeepSeek-OCR-2 based on the instructions.
| vllm bench serve --model /weights/DeepSeek-OCR-2 --dataset-name random --random-input 1024 --num-prompts 200 --request-rate 1 --save-result --result-dir ./ | |
| vllm bench serve --model /root/.cache/DeepSeek-OCR-2 --dataset-name random --random-input 1024 --num-prompts 200 --request-rate 1 --save-result --result-dir ./ |
|
|
||
| ## Supported Features | ||
|
|
||
| Refer to [supported features](../../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix. |
There was a problem hiding this comment.
The documentation for DeepSeek-OCR-2 is being added, but the model is not listed in the supported_models.md file this line links to. To avoid confusion and keep the documentation consistent, please add DeepSeek-OCR-2 to the supported models table in docs/source/user_guide/support_matrix/supported_models.md.
|
|
||
| 1. Refer to [Using AISBench](../../developer_guide/evaluation/using_ais_bench.md) for details. | ||
|
|
||
| 2. After execution, you can get the result, here is the result of `Kimi-K2.5-w8a8-mtp-QuaRot` in `vllm-ascend:0.11.0rc1` for reference only. |
There was a problem hiding this comment.
This section provides accuracy results for Kimi-K2.5-w8a8-mtp-QuaRot, but this document is for DeepSeek-OCR-2. This is misleading and likely a copy-paste error. Please provide the accuracy results for DeepSeek-OCR-2 or clarify that these are just example results if the actual ones are not available.
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
| """ | ||
| NPU优化的4D Mask生成 - 向量化并行实现,替代原始循环实现 | ||
| """ |
There was a problem hiding this comment.
plz use english comment
| pre_tokens=2147483647, | ||
| next_tokens=2147483647, |
There was a problem hiding this comment.
plz use a constant variable instead, let's avoid the magic number
Signed-off-by: Wangbei25 <wangbei41@huawie.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawie.com>
Signed-off-by: Wangbei25 <wangbei41@huawie.com>
Signed-off-by: Wangbei25 <wangbei41@huawie.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
…r and add doc for DeepSeekOCR2.md (vllm-project#7380) ### What this PR does / why we need it? Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder and add doc for DeepSeekOCR2.md ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vllm 0.19.0 - vllm-ascend main 1. _create_custom_4d_mask during 141ms49us620ns --> _create_npu_optimized_mask during 1ms227us780ns 2. convd2d : 27ms --> matmul <1ms 3. relposattention:sdpa->prompt_flash_attention --------- Signed-off-by: Wangbei25 <wangbei41@huawie.com> Signed-off-by: Wangbei25 <wangbei41@huawei.com> Co-authored-by: Wangbei25 <wangbei41@huawie.com>
…r and add doc for DeepSeekOCR2.md (vllm-project#7380) ### What this PR does / why we need it? Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder and add doc for DeepSeekOCR2.md ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vllm 0.19.0 - vllm-ascend main 1. _create_custom_4d_mask during 141ms49us620ns --> _create_npu_optimized_mask during 1ms227us780ns 2. convd2d : 27ms --> matmul <1ms 3. relposattention:sdpa->prompt_flash_attention --------- Signed-off-by: Wangbei25 <wangbei41@huawie.com> Signed-off-by: Wangbei25 <wangbei41@huawei.com> Co-authored-by: Wangbei25 <wangbei41@huawie.com> Signed-off-by: 01267596 <xiongkai123@cmbchina.com>
…r and add doc for DeepSeekOCR2.md (vllm-project#7380) ### What this PR does / why we need it? Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder and add doc for DeepSeekOCR2.md ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vllm 0.19.0 - vllm-ascend main 1. _create_custom_4d_mask during 141ms49us620ns --> _create_npu_optimized_mask during 1ms227us780ns 2. convd2d : 27ms --> matmul <1ms 3. relposattention:sdpa->prompt_flash_attention --------- Signed-off-by: Wangbei25 <wangbei41@huawie.com> Signed-off-by: Wangbei25 <wangbei41@huawei.com> Co-authored-by: Wangbei25 <wangbei41@huawie.com>
…r and add doc for DeepSeekOCR2.md (vllm-project#7380) ### What this PR does / why we need it? Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder and add doc for DeepSeekOCR2.md ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vllm 0.19.0 - vllm-ascend main 1. _create_custom_4d_mask during 141ms49us620ns --> _create_npu_optimized_mask during 1ms227us780ns 2. convd2d : 27ms --> matmul <1ms 3. relposattention:sdpa->prompt_flash_attention --------- Signed-off-by: Wangbei25 <wangbei41@huawie.com> Signed-off-by: Wangbei25 <wangbei41@huawei.com> Co-authored-by: Wangbei25 <wangbei41@huawie.com> Signed-off-by: cvSoldier <610496306@qq.com>
What this PR does / why we need it?
Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder and add doc for DeepSeekOCR2.md
Does this PR introduce any user-facing change?
How was this patch tested?