Name	Name	Last commit message	Last commit date
parent directory ..
scripts	scripts
README.md	README.md
qwen2_5_vl_infer.py	qwen2_5_vl_infer.py

Qwen2.5-VL

1. 模型介绍

Qwen2.5-VL 是 Qwen 团队推出的一个专注于视觉与语言（Vision-Language, VL）任务的多模态大模型。它旨在通过结合图像和文本信息，提供强大的跨模态理解能力，可以处理涉及图像描述、视觉问答（VQA）、图文检索等多种任务。

Model
Qwen/Qwen2.5-VL-3B-Instruct
Qwen/Qwen2.5-VL-7B-Instruct
Qwen/Qwen2.5-VL-72B-Instruct

注意：与huggingface权重同名，但权重为paddle框架的Tensor，使用xxx.from_pretrained("Qwen/Qwen2.5-VL-3B-Instruct")即可自动下载该权重文件夹到缓存目录。

2 环境准备

1）安装PaddlePaddle

python >= 3.10
paddlepaddle-gpu 要求develop版本

# Develop 版本安装示例
python -m pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu123/

2）安装PaddleMIX环境依赖包

# pip 安装示例，安装paddlemix、ppdiffusers、项目依赖
python -m pip install -e . --user
python -m pip install -e ppdiffusers --user
python -m pip install -r requirements.txt --user

# 安装PaddleNLP
git clone https://github.com/PaddlePaddle/PaddleNLP.git
cd PaddleNLP
python setup.py install
cd csrc
python setup_cuda.py install

3 高性能推理

a. fp16 高性能推理

cd PaddleMIX
export CUDA_VISIBLE_DEVICES=0
python deploy/qwen2_5_vl/qwen2_5_vl_infer.py \
    --model_name_or_path Qwen/Qwen2.5-VL-7B-Instruct \
    --question "Describe this image." \
    --image_file paddlemix/demo_images/examples_image1.jpg \
    --min_length 128 \
    --max_length 128 \
    --top_k 1 \
    --top_p 0.001 \
    --temperature 0.1 \
    --repetition_penalty 1.05 \
    --block_attn True \
    --inference_model True \
    --mode dynamic \
    --dtype bfloat16 \
    --benchmark True

b. wint8 高性能推理

export CUDA_VISIBLE_DEVICES=0
python deploy/qwen2_5_vl/qwen2_5_vl_infer.py \
    --model_name_or_path Qwen/Qwen2.5-VL-3B-Instruct \
    --question "Describe this image." \
    --image_file paddlemix/demo_images/examples_image1.jpg \
    --min_length 128 \
    --max_length 128 \
    --top_k 1 \
    --top_p 0.001 \
    --temperature 0.1 \
    --repetition_penalty 1.05 \
    --block_attn True \
    --inference_model True \
    --mode dynamic \
    --dtype bfloat16 \
    --quant_type "weight_only_int8" \
    --benchmark True

c. TP并行，多卡高性能推理

export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch --gpus "0,1,2,3" deploy/qwen2_5_vl/qwen2_5_vl_infer.py \
    --model_name_or_path Qwen/Qwen2.5-VL-72B-Instruct \
    --question "Describe this image." \
    --image_file paddlemix/demo_images/examples_image1.jpg \
    --min_length 128 \
    --max_length 128 \
    --top_k 1 \
    --top_p 0.001 \
    --temperature 0.1 \
    --repetition_penalty 1.05 \
    --block_attn True \
    --inference_model True \
    --mode dynamic \
    --append_attn 1 \
    --dtype bfloat16 \
    --benchmark True

4 一键推理 & 推理说明

cd PaddleMIX
sh deploy/qwen2_5_vl/scripts/qwen2_5_vl.sh

参数设定：默认情况下，使用model自带的generation_config.json中的参数。

parameter	Value
Top-K	1
Top-P	0.001
temperature	0.1
repetition_penalty	1.05

单一测试demo执行时，指定max_length=min_length=128，固定输出长度。

parameter	Value
min_length	128
min_length	128

在 NVIDIA A800-SXM4-80GB 上测试的性能如下：

下方表格中所示性能对应的输入输出大小。

parameter	Value
input_tokens_len	997 tokens
output_tokens_len	128 tokens

model	Paddle Inference wint8	Paddle Inference	PyTorch	VLLM
Qwen/Qwen2.5-VL-3B-Instruct	0.994 s	1.247 s	4.92 s	1.39s
Qwen/Qwen2.5-VL-7B-Instruct	1.244 s	1.768 s	3.89 s	1.92s
Qwen/Qwen2.5-VL-72B-Instruct	-	4.806 s	-	-

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qwen2_5_vl

qwen2_5_vl

README.md

Qwen2.5-VL

1. 模型介绍

2 环境准备

3 高性能推理

a. fp16 高性能推理

b. wint8 高性能推理

c. TP并行，多卡高性能推理

4 一键推理 & 推理说明

参数设定：默认情况下，使用model自带的generation_config.json中的参数。

单一测试demo执行时，指定max_length=min_length=128，固定输出长度。

在 NVIDIA A800-SXM4-80GB 上测试的性能如下：

下方表格中所示性能对应的输入输出大小。

Files

qwen2_5_vl

Directory actions

More options

Directory actions

More options

Latest commit

History

qwen2_5_vl

Folders and files

parent directory

README.md

Qwen2.5-VL

1. 模型介绍

2 环境准备

3 高性能推理

a. fp16 高性能推理

b. wint8 高性能推理

c. TP并行，多卡高性能推理

4 一键推理 & 推理说明

参数设定：默认情况下，使用model自带的generation_config.json中的参数。

单一测试demo执行时，指定max_length=min_length=128，固定输出长度。

在 NVIDIA A800-SXM4-80GB 上测试的性能如下：

下方表格中所示性能对应的输入输出大小。