Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder and add doc for DeepSeekOCR2.md by Wangbei25 · Pull Request #7380 · vllm-project/vllm-ascend

Wangbei25 · 2026-03-17T09:59:40Z

What this PR does / why we need it?

Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder and add doc for DeepSeekOCR2.md

Does this PR introduce any user-facing change?

How was this patch tested?

vllm 0.19.0
vllm-ascend main

_create_custom_4d_mask during 141ms49us620ns --> _create_npu_optimized_mask during 1ms227us780ns
convd2d : 27ms --> matmul <1ms
relposattention：sdpa->prompt_flash_attention

gemini-code-assist · 2026-03-17T09:59:51Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces comprehensive documentation for the DeepSeek-OCR-2 model, aiming to streamline its deployment and evaluation for users. The new guide covers everything from environment setup and model weight acquisition to single-node deployment, functional verification, and performance benchmarking, ensuring users have all the necessary information to effectively utilize the model.

Highlights

New Model Documentation: Added comprehensive documentation for the DeepSeek-OCR-2 model, detailing its introduction, supported features, and usage within the vllm-ascend framework.
Environment Setup: Provided detailed instructions for environment preparation, including model weight download from Hugging Face and setting up the vllm-ascend Docker image.
Deployment and Verification: Outlined single-node deployment steps with a sample inference script, along with methods for functional verification, accuracy evaluation using AISBench, and performance benchmarking using both AISBench and vLLM Benchmark.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

docs/source/tutorials/models/DeepSeekOCR2.md
- Added new documentation for the DeepSeek-OCR-2 model.

Activity

No human activity (comments, reviews) has been recorded yet for this pull request.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This PR adds documentation for the DeepSeek-OCR-2 model. I've found several critical issues: the example commands use incorrect model paths and will fail, the accuracy section contains misleading results from a different model, and the model is missing from the supported models list linked in the document. I've provided specific comments with fixes. Also, please update the PR title to follow the repository's style guide, for example: [Doc][Feature] Add documentation for DeepSeekOCR2.

gemini-code-assist · 2026-03-17T10:01:14Z

+export VLLM_USE_V1=1
+export TASK_QUEUE_ENABLE=1
+
+vllm serve /weights/DeepSeek-OCR-2 \


The model path /weights/DeepSeek-OCR-2 seems incorrect. The documentation instructs users to download the model to /root/.cache and the provided docker run command mounts the host's /root/.cache directory to /root/.cache inside the container. However, no volume is mounted to /weights. This command will fail. Please update the path to /root/.cache/DeepSeek-OCR-2 to match the download and volume mount instructions.

Suggested change

vllm serve /weights/DeepSeek-OCR-2 \

vllm serve /root/.cache/DeepSeek-OCR-2 \

gemini-code-assist · 2026-03-17T10:01:14Z

+Take the `serve` as an example. Run the code as follows.
+
+```shell
+vllm bench serve --model /weights/DeepSeek-OCR-2  --dataset-name random --random-input 1024 --num-prompts 200 --request-rate 1 --save-result --result-dir ./


The model path /weights/DeepSeek-OCR-2 is incorrect, similar to the vllm serve command earlier. The path should point to where the model is located inside the container, which is /root/.cache/DeepSeek-OCR-2 based on the instructions.

Suggested change

vllm bench serve --model /weights/DeepSeek-OCR-2 --dataset-name random --random-input 1024 --num-prompts 200 --request-rate 1 --save-result --result-dir ./

vllm bench serve --model /root/.cache/DeepSeek-OCR-2 --dataset-name random --random-input 1024 --num-prompts 200 --request-rate 1 --save-result --result-dir ./

gemini-code-assist · 2026-03-17T10:01:14Z

+
+## Supported Features
+
+Refer to [supported features](../../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.


The documentation for DeepSeek-OCR-2 is being added, but the model is not listed in the supported_models.md file this line links to. To avoid confusion and keep the documentation consistent, please add DeepSeek-OCR-2 to the supported models table in docs/source/user_guide/support_matrix/supported_models.md.

gemini-code-assist · 2026-03-17T10:01:14Z

+
+1. Refer to [Using AISBench](../../developer_guide/evaluation/using_ais_bench.md) for details.
+
+2. After execution, you can get the result, here is the result of `Kimi-K2.5-w8a8-mtp-QuaRot` in `vllm-ascend:0.11.0rc1` for reference only.


This section provides accuracy results for Kimi-K2.5-w8a8-mtp-QuaRot, but this document is for DeepSeek-OCR-2. This is misleading and likely a copy-paste error. Please provide the accuracy results for DeepSeek-OCR-2 or clarify that these are just example results if the actual ones are not available.

github-actions · 2026-03-17T11:41:36Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

MengqingCao · 2026-03-25T08:35:52Z

+                """
+                NPU优化的4D Mask生成 - 向量化并行实现,替代原始循环实现
+                """


plz use english comment

MengqingCao · 2026-03-25T08:37:27Z

+                pre_tokens=2147483647,
+                next_tokens=2147483647,


plz use a constant variable instead, let's avoid the magic number

Signed-off-by: Wangbei25 <wangbei41@huawie.com>

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Signed-off-by: Wangbei25 <wangbei41@huawie.com>

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

…r and add doc for DeepSeekOCR2.md (vllm-project#7380) ### What this PR does / why we need it? Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder and add doc for DeepSeekOCR2.md ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vllm 0.19.0 - vllm-ascend main 1. _create_custom_4d_mask during 141ms49us620ns --> _create_npu_optimized_mask during 1ms227us780ns 2. convd2d : 27ms --> matmul <1ms 3. relposattention：sdpa->prompt_flash_attention --------- Signed-off-by: Wangbei25 <wangbei41@huawie.com> Signed-off-by: Wangbei25 <wangbei41@huawei.com> Co-authored-by: Wangbei25 <wangbei41@huawie.com>

…r and add doc for DeepSeekOCR2.md (vllm-project#7380) ### What this PR does / why we need it? Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder and add doc for DeepSeekOCR2.md ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vllm 0.19.0 - vllm-ascend main 1. _create_custom_4d_mask during 141ms49us620ns --> _create_npu_optimized_mask during 1ms227us780ns 2. convd2d : 27ms --> matmul <1ms 3. relposattention：sdpa->prompt_flash_attention --------- Signed-off-by: Wangbei25 <wangbei41@huawie.com> Signed-off-by: Wangbei25 <wangbei41@huawei.com> Co-authored-by: Wangbei25 <wangbei41@huawie.com> Signed-off-by: 01267596 <xiongkai123@cmbchina.com>

…r and add doc for DeepSeekOCR2.md (vllm-project#7380) ### What this PR does / why we need it? Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder and add doc for DeepSeekOCR2.md ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vllm 0.19.0 - vllm-ascend main 1. _create_custom_4d_mask during 141ms49us620ns --> _create_npu_optimized_mask during 1ms227us780ns 2. convd2d : 27ms --> matmul <1ms 3. relposattention：sdpa->prompt_flash_attention --------- Signed-off-by: Wangbei25 <wangbei41@huawie.com> Signed-off-by: Wangbei25 <wangbei41@huawei.com> Co-authored-by: Wangbei25 <wangbei41@huawie.com>

…r and add doc for DeepSeekOCR2.md (vllm-project#7380) ### What this PR does / why we need it? Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder and add doc for DeepSeekOCR2.md ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vllm 0.19.0 - vllm-ascend main 1. _create_custom_4d_mask during 141ms49us620ns --> _create_npu_optimized_mask during 1ms227us780ns 2. convd2d : 27ms --> matmul <1ms 3. relposattention：sdpa->prompt_flash_attention --------- Signed-off-by: Wangbei25 <wangbei41@huawie.com> Signed-off-by: Wangbei25 <wangbei41@huawei.com> Co-authored-by: Wangbei25 <wangbei41@huawie.com> Signed-off-by: cvSoldier <610496306@qq.com>

Wangbei25 requested review from LCAIZJ, Yikun and wangxiyuan as code owners March 17, 2026 09:59

gemini-code-assist bot reviewed Mar 17, 2026

View reviewed changes

github-actions bot added the documentation Improvements or additions to documentation label Mar 17, 2026

Wangbei25 requested review from MengqingCao, realliujiaxu, whx-sjtu and zzzzwwjj as code owners March 23, 2026 08:57

Wangbei25 changed the title ~~add doc for DeepSeekOCR2.md~~ Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder and add doc for DeepSeekOCR2.md Mar 23, 2026

Wangbei25 force-pushed the main branch from eb02d7e to d837560 Compare March 25, 2026 07:09

MengqingCao reviewed Mar 25, 2026

View reviewed changes

realliujiaxu added ready read for review ready-for-test start test by label for PR labels Mar 25, 2026

Wangbei25 and others added 13 commits March 28, 2026 11:05

add doc for DeepSeekOCR2.md

a5c019c

Signed-off-by: Wangbei25 <wangbei41@huawie.com>

Update DeepSeekOCR2.md

dafa2e8

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Update DeepSeekOCR2.md

5a4ca6b

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Update DeepSeekOCR2.md

5f80ab2

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

add model info to index.md and supported_models.md

22b1502

Signed-off-by: Wangbei25 <wangbei41@huawie.com>

add deepseekocr2 to index.md

7d7bf7f

Signed-off-by: Wangbei25 <wangbei41@huawie.com>

Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder

a44fd2d

Signed-off-by: Wangbei25 <wangbei41@huawie.com>

Update DeepSeekOCR2.md

a2d4bd6

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Update DeepSeekOCR2.md

7068ab7

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Update DeepSeekOCR2.md

736ae60

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Update qwen2_decoder.py

b49a52d

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Update rel_pos_attention.py

2b92b3a

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Update utils.py

72ec6bc

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Wangbei25 added 23 commits March 28, 2026 11:05

Update qwen2_decoder.py

708d783

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Update utils.py

a31dab4

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Update rel_pos_attention.py

2f795a1

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Update qwen2_decoder.py

76d9b36

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Update rel_pos_attention.py

e728091

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Update utils.py

e84849c

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Update rel_pos_attention.py

12fec1d

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Update utils.py

d5bb154

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Update rel_pos_attention.py

822eb13

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Update qwen2_decoder.py

73bcc9b

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Update qwen2_decoder.py

ef991d6

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Update qwen2_decoder.py

65a3be1

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Update rel_pos_attention.py

cefcda2

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Update qwen2_decoder.py

229e01e

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Update rel_pos_attention.py

cc2bc25

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Update qwen2_decoder.py

978b30b

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Update qwen2_decoder.py

c03405c

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Update qwen2_decoder.py

a77c977

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Update qwen2_decoder.py

1d19e04

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Update qwen2_decoder.py

455668c

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Update utils.py

4a19ced

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Update worker.py

d090e17

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Update utils.py

ce58702

Signed-off-by: Wangbei25 <wangbei41@huawei.com>

Wangbei25 force-pushed the main branch from 530115c to ce58702 Compare March 28, 2026 03:05

wangxiyuan approved these changes Mar 28, 2026

View reviewed changes

wangxiyuan merged commit 2c4eff5 into vllm-project:main Mar 28, 2026
39 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder and add doc for DeepSeekOCR2.md#7380

Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder and add doc for DeepSeekOCR2.md#7380
wangxiyuan merged 43 commits intovllm-project:mainfrom
Wangbei25:main

Wangbei25 commented Mar 17, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 17, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 17, 2026

Uh oh!

gemini-code-assist bot Mar 17, 2026

Uh oh!

gemini-code-assist bot Mar 17, 2026

Uh oh!

gemini-code-assist bot Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

MengqingCao Mar 25, 2026

Uh oh!

Wangbei25 Mar 26, 2026

Uh oh!

MengqingCao Mar 25, 2026

Uh oh!

Wangbei25 Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	vllm serve /weights/DeepSeek-OCR-2 \
	vllm serve /root/.cache/DeepSeek-OCR-2 \

	vllm bench serve --model /weights/DeepSeek-OCR-2 --dataset-name random --random-input 1024 --num-prompts 200 --request-rate 1 --save-result --result-dir ./
	vllm bench serve --model /root/.cache/DeepSeek-OCR-2 --dataset-name random --random-input 1024 --num-prompts 200 --request-rate 1 --save-result --result-dir ./


		## Supported Features

		Refer to [supported features](../../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.


		1. Refer to [Using AISBench](../../developer_guide/evaluation/using_ais_bench.md) for details.

		2. After execution, you can get the result, here is the result of `Kimi-K2.5-w8a8-mtp-QuaRot` in `vllm-ascend:0.11.0rc1` for reference only.

Conversation

Wangbei25 commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot commented Mar 17, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

MengqingCao Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Wangbei25 Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

MengqingCao Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Wangbei25 Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Wangbei25 commented Mar 17, 2026 •

edited

Loading