[TEST]Add initial multi modal cases of Qwen2.5-VL-32B-Instruct for nightly test#3707
[TEST]Add initial multi modal cases of Qwen2.5-VL-32B-Instruct for nightly test#3707wangxiyuan merged 5 commits intovllm-project:mainfrom
Conversation
Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
There was a problem hiding this comment.
Code Review
This pull request adds a new nightly end-to-end test for the Qwen/Qwen2.5-VL-32B-Instruct multi-modal model and includes a fix in aisbench.py for configuration generation. The new test is comprehensive, covering an API check, a multi-modal request, and benchmarks. My review focuses on improving the correctness of the new test. I've suggested using the chat completions API, which is more appropriate for the model under test, to make the test more robust.
| batch = await client.completions.create( | ||
| model=model, | ||
| prompt=prompts, | ||
| **request_keyword_args, | ||
| ) | ||
| choices: list[openai.types.CompletionChoice] = batch.choices | ||
| assert choices[0].text, "empty response" |
There was a problem hiding this comment.
The model Qwen/Qwen2.5-VL-32B-Instruct is an instruction-tuned chat model. For consistency with other parts of the test (like send_image_request and the aisbench cases which use chat endpoints) and to follow best practices, the initial smoke test should use the chat completions API (client.chat.completions.create) instead of the legacy completions API (client.completions.create). This ensures the correct API is used for the model type and makes the test more robust and maintainable.
| batch = await client.completions.create( | |
| model=model, | |
| prompt=prompts, | |
| **request_keyword_args, | |
| ) | |
| choices: list[openai.types.CompletionChoice] = batch.choices | |
| assert choices[0].text, "empty response" | |
| chat_response = await client.chat.completions.create( | |
| model=model, | |
| messages=[{ | |
| "role": "user", | |
| "content": prompts[0] | |
| }], | |
| **request_keyword_args, | |
| ) | |
| choices = chat_response.choices | |
| assert choices[0].message.content, "empty response" |
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
…ghtly test (vllm-project#3707) ### What this PR does / why we need it? This PR adds the initial multi modal model for nightly test, including 2 cases for Qwen2.5-vl-32b acc/perf test on A3, we need test them daily. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test vLLM version: v0.11.0rc3 vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com> Signed-off-by: luolun <luolun1995@cmbchina.com>
…ghtly test (vllm-project#3707) ### What this PR does / why we need it? This PR adds the initial multi modal model for nightly test, including 2 cases for Qwen2.5-vl-32b acc/perf test on A3, we need test them daily. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test vLLM version: v0.11.0rc3 vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com> Signed-off-by: hwhaokun <haokun0405@163.com>
…ghtly test (vllm-project#3707) ### What this PR does / why we need it? This PR adds the initial multi modal model for nightly test, including 2 cases for Qwen2.5-vl-32b acc/perf test on A3, we need test them daily. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test vLLM version: v0.11.0rc3 vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com> Signed-off-by: nsdie <yeyifan@huawei.com>
…ghtly test (vllm-project#3707) ### What this PR does / why we need it? This PR adds the initial multi modal model for nightly test, including 2 cases for Qwen2.5-vl-32b acc/perf test on A3, we need test them daily. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test vLLM version: v0.11.0rc3 vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com>
… to `.yaml` (#6503) ### What this PR does / why we need it? This PR refactors the nightly single-node model test by migrating test configurations from Python scripts to a more maintainable `YAML-based` format. | Original PR | Python (`.py`) | YAML (`.yaml`) | | :--- | :--- | :--- | | [#3568](#3568) | `test_deepseek_r1_0528_w8a8_eplb.py` | `DeepSeek-R1-0528-W8A8.yaml` | | [#3631](#3631) | `test_deepseek_r1_0528_w8a8.py` | `DeepSeek-R1-0528-W8A8.yaml` | | [#5874](#5874) | `test_deepseek_r1_w8a8_hbm.py` | `DeepSeek-R1-W8A8-HBM.yaml` | | [#3908](#3908) | `test_deepseek_v3_2_w8a8.py` | `DeepSeek-V3.2-W8A8.yaml` | | [#5682](#5682) | `test_kimi_k2_thinking.py` | `Kimi-K2-Thinking.yaml` | | [#4111](#4111) | `test_mtpx_deepseek_r1_0528_w8a8.py` | `MTPX-DeepSeek-R1-0528-W8A8.yaml` | | [#3733](#3733) | `test_prefix_cache_deepseek_r1_0528_w8a8.py` | `Prefix-Cache-DeepSeek-R1-0528-W8A8.yaml` | | [#6543](#6543) | `test_qwen3_235b_w8a8.py` | `Qwen3-235B-A22B-W8A8.yaml` | | [#6543](#6543) | `test_qwen3_235b_a22b_w8a8_eplb.py` | `Qwen3-235B-A22B-W8A8.yaml` | | [#3973](#3973) | `test_qwen3_30b_w8a8.py` | `Qwen3-30B-A3B-W8A8.yaml` | | [#3541](#3541) | `test_qwen3_32b_int8.py` | `Qwen3-32B-Int8.yaml` | | [#3757](#3757) | `test_qwq_32b.py` | `QwQ-32B.yaml` | | [#5616](#5616) | `test_qwen3_next_w8a8.py` | `Qwen3-Next-80B-A3B-Instruct-W8A8.yaml` | | [#3541](#3541) | `test_qwen2_5_vl_7b.py` | `Qwen2.5-VL-7B-Instruct.yaml` | | [#5301](#5301) | `test_qwen2_5_vl_7b_epd.py` | `Qwen2.5-VL-7B-Instruct-EPD.yaml` | | [#3707](#3707) | `test_qwen2_5_vl_32b.py` | `Qwen2.5-VL-32B-Instruct.yaml` | | [#3676](#3676) | `test_qwen3_32b_int8_a3_feature_stack3.py` | `Qwen3-32B-Int8-A3-Feature-Stack3.yaml` | | [#3709](#3709) | `test_prefix_cache_qwen3_32b_int8.py` | `Prefix-Cache-Qwen3-32B-Int8.yaml` | | [#5395](#5395) | `test_qwen3_next.py` | `Qwen3-Next-80B-A3B-Instruct-A2.yaml` | | [#3474](#3474) | `test_qwen3_32b.py` | `Qwen3-32B.yaml` | | [#3541](#3541) | `test_qwen3_32b_int8.py` | `Qwen3-32B-Int8-A2.yaml` | ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0 --------- Signed-off-by: MrZ20 <2609716663@qq.com>
… to `.yaml` (vllm-project#6503) ### What this PR does / why we need it? This PR refactors the nightly single-node model test by migrating test configurations from Python scripts to a more maintainable `YAML-based` format. | Original PR | Python (`.py`) | YAML (`.yaml`) | | :--- | :--- | :--- | | [vllm-project#3568](vllm-project#3568) | `test_deepseek_r1_0528_w8a8_eplb.py` | `DeepSeek-R1-0528-W8A8.yaml` | | [vllm-project#3631](vllm-project#3631) | `test_deepseek_r1_0528_w8a8.py` | `DeepSeek-R1-0528-W8A8.yaml` | | [vllm-project#5874](vllm-project#5874) | `test_deepseek_r1_w8a8_hbm.py` | `DeepSeek-R1-W8A8-HBM.yaml` | | [vllm-project#3908](vllm-project#3908) | `test_deepseek_v3_2_w8a8.py` | `DeepSeek-V3.2-W8A8.yaml` | | [vllm-project#5682](vllm-project#5682) | `test_kimi_k2_thinking.py` | `Kimi-K2-Thinking.yaml` | | [vllm-project#4111](vllm-project#4111) | `test_mtpx_deepseek_r1_0528_w8a8.py` | `MTPX-DeepSeek-R1-0528-W8A8.yaml` | | [vllm-project#3733](vllm-project#3733) | `test_prefix_cache_deepseek_r1_0528_w8a8.py` | `Prefix-Cache-DeepSeek-R1-0528-W8A8.yaml` | | [vllm-project#6543](vllm-project#6543) | `test_qwen3_235b_w8a8.py` | `Qwen3-235B-A22B-W8A8.yaml` | | [vllm-project#6543](vllm-project#6543) | `test_qwen3_235b_a22b_w8a8_eplb.py` | `Qwen3-235B-A22B-W8A8.yaml` | | [vllm-project#3973](vllm-project#3973) | `test_qwen3_30b_w8a8.py` | `Qwen3-30B-A3B-W8A8.yaml` | | [vllm-project#3541](vllm-project#3541) | `test_qwen3_32b_int8.py` | `Qwen3-32B-Int8.yaml` | | [vllm-project#3757](vllm-project#3757) | `test_qwq_32b.py` | `QwQ-32B.yaml` | | [vllm-project#5616](vllm-project#5616) | `test_qwen3_next_w8a8.py` | `Qwen3-Next-80B-A3B-Instruct-W8A8.yaml` | | [vllm-project#3541](vllm-project#3541) | `test_qwen2_5_vl_7b.py` | `Qwen2.5-VL-7B-Instruct.yaml` | | [vllm-project#5301](vllm-project#5301) | `test_qwen2_5_vl_7b_epd.py` | `Qwen2.5-VL-7B-Instruct-EPD.yaml` | | [vllm-project#3707](vllm-project#3707) | `test_qwen2_5_vl_32b.py` | `Qwen2.5-VL-32B-Instruct.yaml` | | [vllm-project#3676](vllm-project#3676) | `test_qwen3_32b_int8_a3_feature_stack3.py` | `Qwen3-32B-Int8-A3-Feature-Stack3.yaml` | | [vllm-project#3709](vllm-project#3709) | `test_prefix_cache_qwen3_32b_int8.py` | `Prefix-Cache-Qwen3-32B-Int8.yaml` | | [vllm-project#5395](vllm-project#5395) | `test_qwen3_next.py` | `Qwen3-Next-80B-A3B-Instruct-A2.yaml` | | [vllm-project#3474](vllm-project#3474) | `test_qwen3_32b.py` | `Qwen3-32B.yaml` | | [vllm-project#3541](vllm-project#3541) | `test_qwen3_32b_int8.py` | `Qwen3-32B-Int8-A2.yaml` | ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0 --------- Signed-off-by: MrZ20 <2609716663@qq.com>
What this PR does / why we need it?
This PR adds the initial multi modal model for nightly test, including 2 cases for Qwen2.5-vl-32b acc/perf test on A3, we need test them daily.
Does this PR introduce any user-facing change?
No
How was this patch tested?
by running the test
vLLM version: v0.11.0rc3
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0