[Nightly][Test] Add Qwen3-Next-80B-A3B-Instruct-W8A8 nightly test#5616
Conversation
Signed-off-by: IncSec <1790766300@qq.com>
There was a problem hiding this comment.
Code Review
This pull request adds a new nightly test for the Qwen3-Next-80B-A3B-Instruct-W8A8 model to monitor its accuracy. The new test file is well-structured and follows existing patterns. However, I've identified a high-severity issue where the smoke test uses the legacy completions API for a chat model, which is incorrect. I've provided suggestions to update the test to use the chat.completions API for correctness. This involves changing both the prompt format and the API call itself.
| prompts = [ | ||
| "San Francisco is a", | ||
| ] |
There was a problem hiding this comment.
The model under test, Qwen3-Next-80B-A3B-Instruct-W8A8, is an instruction-tuned chat model. For correctness, it's better to use the chat completions API. This requires formatting the prompt as a list of messages with roles. This change should be made in conjunction with updating the API call to use client.chat.completions.create.
| prompts = [ | |
| "San Francisco is a", | |
| ] | |
| prompts = [ | |
| {"role": "user", "content": "San Francisco is a"}, | |
| ] |
| batch = await client.completions.create( | ||
| model=model, | ||
| prompt=prompts, | ||
| **request_keyword_args, | ||
| ) | ||
| choices: list[openai.types.CompletionChoice] = batch.choices | ||
| assert choices[0].text, "empty response" |
There was a problem hiding this comment.
To correctly test a chat model, you should use the client.chat.completions.create method instead of the legacy completions.create. This also requires updating how the response is accessed, from choices[0].text to choices[0].message.content. This change assumes the prompts variable has been updated to the chat message format as suggested in the other comment.
| batch = await client.completions.create( | |
| model=model, | |
| prompt=prompts, | |
| **request_keyword_args, | |
| ) | |
| choices: list[openai.types.CompletionChoice] = batch.choices | |
| assert choices[0].text, "empty response" | |
| batch = await client.chat.completions.create( | |
| model=model, | |
| messages=prompts, | |
| **request_keyword_args, | |
| ) | |
| choices: list[openai.types.chat.ChatCompletion.Choice] = batch.choices | |
| assert choices[0].message.content, "empty response" |
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
| async def test_models(model: str) -> None: | ||
| port = get_open_port() | ||
| env_dict = { | ||
| "OMP_NUM_THREADS": "10", |
There was a problem hiding this comment.
| "OMP_NUM_THREADS": "10", | |
| "OMP_NUM_THREADS": "1", |
…lm-project#5616) ### What this PR does / why we need it? There was an accuracy issue with **Qwen3-Next-80B-A3B-Instruct-W8A8** model in the old version of **Triton-Ascend**, so, we are now adding one nightly test to maintain it. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@7157596 Signed-off-by: IncSec <1790766300@qq.com>
…lm-project#5616) ### What this PR does / why we need it? There was an accuracy issue with **Qwen3-Next-80B-A3B-Instruct-W8A8** model in the old version of **Triton-Ascend**, so, we are now adding one nightly test to maintain it. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@7157596 Signed-off-by: IncSec <1790766300@qq.com>
…lm-project#5616) ### What this PR does / why we need it? There was an accuracy issue with **Qwen3-Next-80B-A3B-Instruct-W8A8** model in the old version of **Triton-Ascend**, so, we are now adding one nightly test to maintain it. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@7157596 Signed-off-by: IncSec <1790766300@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
…lm-project#5616) ### What this PR does / why we need it? There was an accuracy issue with **Qwen3-Next-80B-A3B-Instruct-W8A8** model in the old version of **Triton-Ascend**, so, we are now adding one nightly test to maintain it. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@7157596 Signed-off-by: IncSec <1790766300@qq.com>
… to `.yaml` (#6503) ### What this PR does / why we need it? This PR refactors the nightly single-node model test by migrating test configurations from Python scripts to a more maintainable `YAML-based` format. | Original PR | Python (`.py`) | YAML (`.yaml`) | | :--- | :--- | :--- | | [#3568](#3568) | `test_deepseek_r1_0528_w8a8_eplb.py` | `DeepSeek-R1-0528-W8A8.yaml` | | [#3631](#3631) | `test_deepseek_r1_0528_w8a8.py` | `DeepSeek-R1-0528-W8A8.yaml` | | [#5874](#5874) | `test_deepseek_r1_w8a8_hbm.py` | `DeepSeek-R1-W8A8-HBM.yaml` | | [#3908](#3908) | `test_deepseek_v3_2_w8a8.py` | `DeepSeek-V3.2-W8A8.yaml` | | [#5682](#5682) | `test_kimi_k2_thinking.py` | `Kimi-K2-Thinking.yaml` | | [#4111](#4111) | `test_mtpx_deepseek_r1_0528_w8a8.py` | `MTPX-DeepSeek-R1-0528-W8A8.yaml` | | [#3733](#3733) | `test_prefix_cache_deepseek_r1_0528_w8a8.py` | `Prefix-Cache-DeepSeek-R1-0528-W8A8.yaml` | | [#6543](#6543) | `test_qwen3_235b_w8a8.py` | `Qwen3-235B-A22B-W8A8.yaml` | | [#6543](#6543) | `test_qwen3_235b_a22b_w8a8_eplb.py` | `Qwen3-235B-A22B-W8A8.yaml` | | [#3973](#3973) | `test_qwen3_30b_w8a8.py` | `Qwen3-30B-A3B-W8A8.yaml` | | [#3541](#3541) | `test_qwen3_32b_int8.py` | `Qwen3-32B-Int8.yaml` | | [#3757](#3757) | `test_qwq_32b.py` | `QwQ-32B.yaml` | | [#5616](#5616) | `test_qwen3_next_w8a8.py` | `Qwen3-Next-80B-A3B-Instruct-W8A8.yaml` | | [#3541](#3541) | `test_qwen2_5_vl_7b.py` | `Qwen2.5-VL-7B-Instruct.yaml` | | [#5301](#5301) | `test_qwen2_5_vl_7b_epd.py` | `Qwen2.5-VL-7B-Instruct-EPD.yaml` | | [#3707](#3707) | `test_qwen2_5_vl_32b.py` | `Qwen2.5-VL-32B-Instruct.yaml` | | [#3676](#3676) | `test_qwen3_32b_int8_a3_feature_stack3.py` | `Qwen3-32B-Int8-A3-Feature-Stack3.yaml` | | [#3709](#3709) | `test_prefix_cache_qwen3_32b_int8.py` | `Prefix-Cache-Qwen3-32B-Int8.yaml` | | [#5395](#5395) | `test_qwen3_next.py` | `Qwen3-Next-80B-A3B-Instruct-A2.yaml` | | [#3474](#3474) | `test_qwen3_32b.py` | `Qwen3-32B.yaml` | | [#3541](#3541) | `test_qwen3_32b_int8.py` | `Qwen3-32B-Int8-A2.yaml` | ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0 --------- Signed-off-by: MrZ20 <2609716663@qq.com>
…lm-project#5616) ### What this PR does / why we need it? There was an accuracy issue with **Qwen3-Next-80B-A3B-Instruct-W8A8** model in the old version of **Triton-Ascend**, so, we are now adding one nightly test to maintain it. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@7157596 Signed-off-by: IncSec <1790766300@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
…lm-project#5616) ### What this PR does / why we need it? There was an accuracy issue with **Qwen3-Next-80B-A3B-Instruct-W8A8** model in the old version of **Triton-Ascend**, so, we are now adding one nightly test to maintain it. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@7157596 Signed-off-by: IncSec <1790766300@qq.com>
… to `.yaml` (vllm-project#6503) ### What this PR does / why we need it? This PR refactors the nightly single-node model test by migrating test configurations from Python scripts to a more maintainable `YAML-based` format. | Original PR | Python (`.py`) | YAML (`.yaml`) | | :--- | :--- | :--- | | [vllm-project#3568](vllm-project#3568) | `test_deepseek_r1_0528_w8a8_eplb.py` | `DeepSeek-R1-0528-W8A8.yaml` | | [vllm-project#3631](vllm-project#3631) | `test_deepseek_r1_0528_w8a8.py` | `DeepSeek-R1-0528-W8A8.yaml` | | [vllm-project#5874](vllm-project#5874) | `test_deepseek_r1_w8a8_hbm.py` | `DeepSeek-R1-W8A8-HBM.yaml` | | [vllm-project#3908](vllm-project#3908) | `test_deepseek_v3_2_w8a8.py` | `DeepSeek-V3.2-W8A8.yaml` | | [vllm-project#5682](vllm-project#5682) | `test_kimi_k2_thinking.py` | `Kimi-K2-Thinking.yaml` | | [vllm-project#4111](vllm-project#4111) | `test_mtpx_deepseek_r1_0528_w8a8.py` | `MTPX-DeepSeek-R1-0528-W8A8.yaml` | | [vllm-project#3733](vllm-project#3733) | `test_prefix_cache_deepseek_r1_0528_w8a8.py` | `Prefix-Cache-DeepSeek-R1-0528-W8A8.yaml` | | [vllm-project#6543](vllm-project#6543) | `test_qwen3_235b_w8a8.py` | `Qwen3-235B-A22B-W8A8.yaml` | | [vllm-project#6543](vllm-project#6543) | `test_qwen3_235b_a22b_w8a8_eplb.py` | `Qwen3-235B-A22B-W8A8.yaml` | | [vllm-project#3973](vllm-project#3973) | `test_qwen3_30b_w8a8.py` | `Qwen3-30B-A3B-W8A8.yaml` | | [vllm-project#3541](vllm-project#3541) | `test_qwen3_32b_int8.py` | `Qwen3-32B-Int8.yaml` | | [vllm-project#3757](vllm-project#3757) | `test_qwq_32b.py` | `QwQ-32B.yaml` | | [vllm-project#5616](vllm-project#5616) | `test_qwen3_next_w8a8.py` | `Qwen3-Next-80B-A3B-Instruct-W8A8.yaml` | | [vllm-project#3541](vllm-project#3541) | `test_qwen2_5_vl_7b.py` | `Qwen2.5-VL-7B-Instruct.yaml` | | [vllm-project#5301](vllm-project#5301) | `test_qwen2_5_vl_7b_epd.py` | `Qwen2.5-VL-7B-Instruct-EPD.yaml` | | [vllm-project#3707](vllm-project#3707) | `test_qwen2_5_vl_32b.py` | `Qwen2.5-VL-32B-Instruct.yaml` | | [vllm-project#3676](vllm-project#3676) | `test_qwen3_32b_int8_a3_feature_stack3.py` | `Qwen3-32B-Int8-A3-Feature-Stack3.yaml` | | [vllm-project#3709](vllm-project#3709) | `test_prefix_cache_qwen3_32b_int8.py` | `Prefix-Cache-Qwen3-32B-Int8.yaml` | | [vllm-project#5395](vllm-project#5395) | `test_qwen3_next.py` | `Qwen3-Next-80B-A3B-Instruct-A2.yaml` | | [vllm-project#3474](vllm-project#3474) | `test_qwen3_32b.py` | `Qwen3-32B.yaml` | | [vllm-project#3541](vllm-project#3541) | `test_qwen3_32b_int8.py` | `Qwen3-32B-Int8-A2.yaml` | ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0 --------- Signed-off-by: MrZ20 <2609716663@qq.com>
What this PR does / why we need it?
There was an accuracy issue with Qwen3-Next-80B-A3B-Instruct-W8A8 model in the old version of Triton-Ascend, so, we are now adding one nightly test to maintain it.
Does this PR introduce any user-facing change?
N/A
How was this patch tested?