Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
a5c019c
add doc for DeepSeekOCR2.md
Mar 17, 2026
dafa2e8
Update DeepSeekOCR2.md
Wangbei25 Mar 17, 2026
5a4ca6b
Update DeepSeekOCR2.md
Wangbei25 Mar 17, 2026
5f80ab2
Update DeepSeekOCR2.md
Wangbei25 Mar 17, 2026
22b1502
add model info to index.md and supported_models.md
Mar 17, 2026
7d7bf7f
add deepseekocr2 to index.md
Mar 17, 2026
a44fd2d
Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder
Mar 23, 2026
a2d4bd6
Update DeepSeekOCR2.md
Wangbei25 Mar 17, 2026
7068ab7
Update DeepSeekOCR2.md
Wangbei25 Mar 18, 2026
736ae60
Update DeepSeekOCR2.md
Wangbei25 Mar 19, 2026
b49a52d
Update qwen2_decoder.py
Wangbei25 Mar 23, 2026
2b92b3a
Update rel_pos_attention.py
Wangbei25 Mar 23, 2026
72ec6bc
Update utils.py
Wangbei25 Mar 24, 2026
0baafe8
Update utils.py
Wangbei25 Mar 24, 2026
6ffb942
Update qwen2_decoder.py
Wangbei25 Mar 24, 2026
841b976
Update rel_pos_attention.py
Wangbei25 Mar 24, 2026
ceaa477
Update qwen2_decoder.py
Wangbei25 Mar 24, 2026
a77eeae
Update utils.py
Wangbei25 Mar 24, 2026
d918da5
Update qwen2_decoder.py
Wangbei25 Mar 24, 2026
c4c906c
Update qwen2_decoder.py
Wangbei25 Mar 24, 2026
708d783
Update qwen2_decoder.py
Wangbei25 Mar 24, 2026
a31dab4
Update utils.py
Wangbei25 Mar 24, 2026
2f795a1
Update rel_pos_attention.py
Wangbei25 Mar 24, 2026
76d9b36
Update qwen2_decoder.py
Wangbei25 Mar 24, 2026
e728091
Update rel_pos_attention.py
Wangbei25 Mar 24, 2026
e84849c
Update utils.py
Wangbei25 Mar 24, 2026
12fec1d
Update rel_pos_attention.py
Wangbei25 Mar 24, 2026
d5bb154
Update utils.py
Wangbei25 Mar 24, 2026
822eb13
Update rel_pos_attention.py
Wangbei25 Mar 24, 2026
73bcc9b
Update qwen2_decoder.py
Wangbei25 Mar 24, 2026
ef991d6
Update qwen2_decoder.py
Wangbei25 Mar 25, 2026
65a3be1
Update qwen2_decoder.py
Wangbei25 Mar 25, 2026
cefcda2
Update rel_pos_attention.py
Wangbei25 Mar 25, 2026
229e01e
Update qwen2_decoder.py
Wangbei25 Mar 25, 2026
cc2bc25
Update rel_pos_attention.py
Wangbei25 Mar 25, 2026
978b30b
Update qwen2_decoder.py
Wangbei25 Mar 25, 2026
c03405c
Update qwen2_decoder.py
Wangbei25 Mar 25, 2026
a77c977
Update qwen2_decoder.py
Wangbei25 Mar 25, 2026
1d19e04
Update qwen2_decoder.py
Wangbei25 Mar 25, 2026
455668c
Update qwen2_decoder.py
Wangbei25 Mar 25, 2026
4a19ced
Update utils.py
Wangbei25 Mar 27, 2026
d090e17
Update worker.py
Wangbei25 Mar 27, 2026
ce58702
Update utils.py
Wangbei25 Mar 27, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
171 changes: 171 additions & 0 deletions docs/source/tutorials/models/DeepSeekOCR2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
# DeepSeek-OCR-2

## Introduction

DeepSeekOCR2 is a model to investigate the role of vision encoders from an LLM-centric viewpoint.

The `DeepSeek-OCR-2` model is first supported in `vllm-ascend:v0.16.0`.

This document will show the main verification steps of the model, including supported features, feature configuration, environment preparation, single-node deployment, accuracy and performance evaluation.

## Supported Features

Refer to [supported features](../../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The documentation for DeepSeek-OCR-2 is being added, but the model is not listed in the supported_models.md file this line links to. To avoid confusion and keep the documentation consistent, please add DeepSeek-OCR-2 to the supported models table in docs/source/user_guide/support_matrix/supported_models.md.


Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the feature's configuration.

## Environment Preparation

### Model Weight

- `DeepSeek-OCR-2`: [Download model weight](https://huggingface.co/deepseek-ai/DeepSeek-OCR-2).

It is recommended to download the model weight to the shared directory of multiple nodes, such as `/root/.cache/`.

### Verify Multi-node Communication(Optional)

If you want to deploy multi-node environment, you need to verify multi-node communication according to [verify multi-node communication environment](../../installation.md#verify-multi-node-communication).

### Installation

You can use our official docker image to run `DeepSeek-OCR-2` directly.

Select an image based on your machine type and start the docker image on your node, refer to [using docker](../../installation.md#set-up-using-docker).

```{code-block} bash
:substitutions:
# Update --device according to your device (Atlas A2: /dev/davinci[0-7] Atlas A3:/dev/davinci[0-15]).
# Update the vllm-ascend image according to your environment.
# Note you should download the weight to /root/.cache in advance.
# Update the vllm-ascend image
export IMAGE=m.daocloud.io/quay.io/ascend/vllm-ascend:|vllm_ascend_version|
export NAME=vllm-ascend

# Run the container using the defined variables
# Note: If you are running bridge network with docker, please expose available ports for multiple nodes communication in advance.
docker run --rm \
--name $NAME \
--net=host \
--shm-size=1g \
--device /dev/davinci0 \
--device /dev/davinci1 \
--device /dev/davinci2 \
--device /dev/davinci3 \
--device /dev/davinci4 \
--device /dev/davinci5 \
--device /dev/davinci6 \
--device /dev/davinci7 \
--device /dev/davinci_manager \
--device /dev/devmm_svm \
--device /dev/hisi_hdc \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/Ascend/driver/tools/hccn_tool:/usr/local/Ascend/driver/tools/hccn_tool \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /root/.cache:/root/.cache \
-it $IMAGE bash
```

If you want to deploy multi-node environment, you need to set up environment on each node.

## Deployment

### Single-node Deployment

- `DeepSeek-OCR-2` can be deployed on 1 Atlas 800 A2.

Run the following script to execute online inference.

```shell
#!/bin/sh

export VLLM_USE_V1=1
export VLLM_ASCEND_ENABLE_NZ=0
export TOKENIZERS_PARALLELISM=false
export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"
export TASK_QUEUE_ENABLE=1
export TOKENIZERS_PARALLELISM=false

vllm serve /weights/DeepSeek-OCR-2 \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The model path /weights/DeepSeek-OCR-2 seems incorrect. The documentation instructs users to download the model to /root/.cache and the provided docker run command mounts the host's /root/.cache directory to /root/.cache inside the container. However, no volume is mounted to /weights. This command will fail. Please update the path to /root/.cache/DeepSeek-OCR-2 to match the download and volume mount instructions.

Suggested change
vllm serve /weights/DeepSeek-OCR-2 \
vllm serve /root/.cache/DeepSeek-OCR-2 \

--served-model-name deepseekocr2 \
--trust-remote-code \
-tp 1 \
--port 1055 \
--max_model_len 8192 \
--no-enable-prefix-caching \
--gpu-memory-utilization 0.8 \
--allowed-local-media-path / \
--async-scheduling \
--additional-config '{
"enable_cpu_binding": true,
"multistream_overlap_shared_expert": true,
"ascend_compilation_config": {"fuse_qknorm_rope": false}
}' \
--mm-processor-cache-gb 0
```

### Multi-node Deployment

Single-node deployment is recommended.

### Prefill-Decode Disaggregation

We don't neel to Prefill-Decode disaggregation

## Functional Verification

If your service start successfully, you can see the info shown below:

```bash
INFO: Started server process [87471]
INFO: Waiting for application startup.
INFO: Application startup complete.
```

Once your server is started, you can query the model with input prompts:

```shell
curl http://<node0_ip>:<port>/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseekocr2",
"prompt": "The future of AI is",
"max_completion_tokens": 50,
"temperature": 0
}'
```

## Accuracy Evaluation

Here are two accuracy evaluation methods.

### Using AISBench

1. Refer to [Using AISBench](../../developer_guide/evaluation/using_ais_bench.md) for details.

2. After execution, you can get the result, here is the result of `DeepSeek-OCR-2` for reference only.

| dataset | version | metric | mode | vllm-api-general-chat | note |
|----- | ----- | ----- | ----- | -----| ----- |
| textvqa | - | accuracy | gen | 50.28 | 1 Atlas 800 A2 |
| ominidocbench | - | accuracy | gen | 66.86 | 1 Atlas 800 A2 |

### Using Language Model Evaluation Harness

Not test yet.

## Performance

### Using AISBench

Refer to [Using AISBench for performance evaluation](../../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details.

The performance result is:

**Hardware**: A2-313T, 1 node

**Input/Output**: 1080P/256

**Performance**: TTFT = 2s, TPOT = 200ms, Average performance of each card is 864 TPS (Token Per Second).
1 change: 1 addition & 0 deletions docs/source/tutorials/models/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ Qwen3.5-397B-A17B.md
DeepSeek-V3.1.md
DeepSeek-V3.2.md
DeepSeek-R1.md
DeepSeekOCR2.md
GLM4.x.md
GLM5.md
Kimi-K2-Thinking.md
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ Get the latest info here: <https://github.com/vllm-project/vllm-ascend/issues/16
| Qwen2.5 | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ |||| ✅ ||| ✅ |||||| [Qwen2.5-7B](../../tutorials/models/Qwen2.5-7B.md) |
| GLM-4.x | 🔵 | || A2/A3 |✅|✅|✅||✅|✅|✅||✅|✅|✅|✅|✅|198k||[GLM-4.x](../../tutorials/models/GLM4.x.md)|
| Kimi-K2-Thinking | 🔵 | || A2/A3 |||||||||||||||| [Kimi-K2-Thinking](../../tutorials/models/Kimi-K2-Thinking.md) |
| DeepseekOCR2 | ✅ | | ✅ | A2/A3 ||✅||||✅|||||||||| [DeepSeekOCR2](../../tutorials/models/DeepSeekOCR2.md) |

#### Extended Compatible Models

Expand Down
Loading
Loading