Skip to content

Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder and add doc for DeepSeekOCR2.md#7380

Merged
wangxiyuan merged 43 commits intovllm-project:mainfrom
Wangbei25:main
Mar 28, 2026
Merged

Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder and add doc for DeepSeekOCR2.md#7380
wangxiyuan merged 43 commits intovllm-project:mainfrom
Wangbei25:main

Conversation

@Wangbei25
Copy link
Copy Markdown
Collaborator

@Wangbei25 Wangbei25 commented Mar 17, 2026

What this PR does / why we need it?

Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder and add doc for DeepSeekOCR2.md

Does this PR introduce any user-facing change?

How was this patch tested?

  • vllm 0.19.0
  • vllm-ascend main
  1. _create_custom_4d_mask during 141ms49us620ns --> _create_npu_optimized_mask during 1ms227us780ns
  2. convd2d : 27ms --> matmul <1ms
  3. relposattention:sdpa->prompt_flash_attention

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces comprehensive documentation for the DeepSeek-OCR-2 model, aiming to streamline its deployment and evaluation for users. The new guide covers everything from environment setup and model weight acquisition to single-node deployment, functional verification, and performance benchmarking, ensuring users have all the necessary information to effectively utilize the model.

Highlights

  • New Model Documentation: Added comprehensive documentation for the DeepSeek-OCR-2 model, detailing its introduction, supported features, and usage within the vllm-ascend framework.
  • Environment Setup: Provided detailed instructions for environment preparation, including model weight download from Hugging Face and setting up the vllm-ascend Docker image.
  • Deployment and Verification: Outlined single-node deployment steps with a sample inference script, along with methods for functional verification, accuracy evaluation using AISBench, and performance benchmarking using both AISBench and vLLM Benchmark.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • docs/source/tutorials/models/DeepSeekOCR2.md
    • Added new documentation for the DeepSeek-OCR-2 model.
Activity
  • No human activity (comments, reviews) has been recorded yet for this pull request.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR adds documentation for the DeepSeek-OCR-2 model. I've found several critical issues: the example commands use incorrect model paths and will fail, the accuracy section contains misleading results from a different model, and the model is missing from the supported models list linked in the document. I've provided specific comments with fixes. Also, please update the PR title to follow the repository's style guide, for example: [Doc][Feature] Add documentation for DeepSeekOCR2.

export VLLM_USE_V1=1
export TASK_QUEUE_ENABLE=1

vllm serve /weights/DeepSeek-OCR-2 \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The model path /weights/DeepSeek-OCR-2 seems incorrect. The documentation instructs users to download the model to /root/.cache and the provided docker run command mounts the host's /root/.cache directory to /root/.cache inside the container. However, no volume is mounted to /weights. This command will fail. Please update the path to /root/.cache/DeepSeek-OCR-2 to match the download and volume mount instructions.

Suggested change
vllm serve /weights/DeepSeek-OCR-2 \
vllm serve /root/.cache/DeepSeek-OCR-2 \

Take the `serve` as an example. Run the code as follows.

```shell
vllm bench serve --model /weights/DeepSeek-OCR-2 --dataset-name random --random-input 1024 --num-prompts 200 --request-rate 1 --save-result --result-dir ./
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The model path /weights/DeepSeek-OCR-2 is incorrect, similar to the vllm serve command earlier. The path should point to where the model is located inside the container, which is /root/.cache/DeepSeek-OCR-2 based on the instructions.

Suggested change
vllm bench serve --model /weights/DeepSeek-OCR-2 --dataset-name random --random-input 1024 --num-prompts 200 --request-rate 1 --save-result --result-dir ./
vllm bench serve --model /root/.cache/DeepSeek-OCR-2 --dataset-name random --random-input 1024 --num-prompts 200 --request-rate 1 --save-result --result-dir ./


## Supported Features

Refer to [supported features](../../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The documentation for DeepSeek-OCR-2 is being added, but the model is not listed in the supported_models.md file this line links to. To avoid confusion and keep the documentation consistent, please add DeepSeek-OCR-2 to the supported models table in docs/source/user_guide/support_matrix/supported_models.md.


1. Refer to [Using AISBench](../../developer_guide/evaluation/using_ais_bench.md) for details.

2. After execution, you can get the result, here is the result of `Kimi-K2.5-w8a8-mtp-QuaRot` in `vllm-ascend:0.11.0rc1` for reference only.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This section provides accuracy results for Kimi-K2.5-w8a8-mtp-QuaRot, but this document is for DeepSeek-OCR-2. This is misleading and likely a copy-paste error. Please provide the accuracy results for DeepSeek-OCR-2 or clarify that these are just example results if the actual ones are not available.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Mar 17, 2026
@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@Wangbei25 Wangbei25 changed the title add doc for DeepSeekOCR2.md Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder and add doc for DeepSeekOCR2.md Mar 23, 2026
Comment on lines +120 to +122
"""
NPU优化的4D Mask生成 - 向量化并行实现,替代原始循环实现
"""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz use english comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread vllm_ascend/ops/rel_pos_attention.py Outdated
Comment on lines +53 to +54
pre_tokens=2147483647,
next_tokens=2147483647,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz use a constant variable instead, let's avoid the magic number

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@realliujiaxu realliujiaxu added ready read for review ready-for-test start test by label for PR labels Mar 25, 2026
Wangbei25 and others added 13 commits March 28, 2026 11:05
Signed-off-by: Wangbei25 <wangbei41@huawie.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawie.com>
Signed-off-by: Wangbei25 <wangbei41@huawie.com>
Signed-off-by: Wangbei25 <wangbei41@huawie.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
@wangxiyuan wangxiyuan merged commit 2c4eff5 into vllm-project:main Mar 28, 2026
39 checks passed
lihaokun-2026 pushed a commit to lihaokun-2026/vllm-ascend that referenced this pull request Mar 29, 2026
…r and add doc for DeepSeekOCR2.md (vllm-project#7380)

### What this PR does / why we need it?
Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder and add doc
for DeepSeekOCR2.md

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
- vllm 0.19.0
- vllm-ascend main

1. _create_custom_4d_mask during 141ms49us620ns -->
_create_npu_optimized_mask during 1ms227us780ns
2. convd2d : 27ms --> matmul <1ms
3. relposattention:sdpa->prompt_flash_attention

---------

Signed-off-by: Wangbei25 <wangbei41@huawie.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Co-authored-by: Wangbei25 <wangbei41@huawie.com>
HF-001 pushed a commit to HF-001/vllm-ascend that referenced this pull request Mar 31, 2026
…r and add doc for DeepSeekOCR2.md (vllm-project#7380)

### What this PR does / why we need it?
Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder and add doc
for DeepSeekOCR2.md

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
- vllm 0.19.0
- vllm-ascend main

1. _create_custom_4d_mask during 141ms49us620ns -->
_create_npu_optimized_mask during 1ms227us780ns
2. convd2d : 27ms --> matmul <1ms
3. relposattention:sdpa->prompt_flash_attention

---------

Signed-off-by: Wangbei25 <wangbei41@huawie.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Co-authored-by: Wangbei25 <wangbei41@huawie.com>
Signed-off-by: 01267596 <xiongkai123@cmbchina.com>
chenchuw886 pushed a commit to chenchuw886/vllm-ascend that referenced this pull request Apr 1, 2026
…r and add doc for DeepSeekOCR2.md (vllm-project#7380)

### What this PR does / why we need it?
Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder and add doc
for DeepSeekOCR2.md

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
- vllm 0.19.0
- vllm-ascend main

1. _create_custom_4d_mask during 141ms49us620ns -->
_create_npu_optimized_mask during 1ms227us780ns
2. convd2d : 27ms --> matmul <1ms
3. relposattention:sdpa->prompt_flash_attention

---------

Signed-off-by: Wangbei25 <wangbei41@huawie.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Co-authored-by: Wangbei25 <wangbei41@huawie.com>
cvSoldier pushed a commit to cvSoldier/vllm-ascend that referenced this pull request Apr 3, 2026
…r and add doc for DeepSeekOCR2.md (vllm-project#7380)

### What this PR does / why we need it?
Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder and add doc
for DeepSeekOCR2.md

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
- vllm 0.19.0
- vllm-ascend main

1. _create_custom_4d_mask during 141ms49us620ns -->
_create_npu_optimized_mask during 1ms227us780ns
2. convd2d : 27ms --> matmul <1ms
3. relposattention:sdpa->prompt_flash_attention

---------

Signed-off-by: Wangbei25 <wangbei41@huawie.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Co-authored-by: Wangbei25 <wangbei41@huawie.com>
Signed-off-by: cvSoldier <610496306@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants