[Nightly][Test] Add Qwen3-Next-80B-A3B-Instruct-W8A8 nightly test by InSec · Pull Request #5616 · vllm-project/vllm-ascend

InSec · 2026-01-05T14:53:41Z

What this PR does / why we need it?

There was an accuracy issue with Qwen3-Next-80B-A3B-Instruct-W8A8 model in the old version of Triton-Ascend, so, we are now adding one nightly test to maintain it.

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

vLLM version: v0.13.0
vLLM main: vllm-project/vllm@7157596

Signed-off-by: IncSec <1790766300@qq.com>

gemini-code-assist

Code Review

This pull request adds a new nightly test for the Qwen3-Next-80B-A3B-Instruct-W8A8 model to monitor its accuracy. The new test file is well-structured and follows existing patterns. However, I've identified a high-severity issue where the smoke test uses the legacy completions API for a chat model, which is incorrect. I've provided suggestions to update the test to use the chat.completions API for correctness. This involves changing both the prompt format and the API call itself.

gemini-code-assist · 2026-01-05T14:55:58Z

tests/e2e/nightly/single_node/models/test_qwen3_next_w8a8.py

+prompts = [
+    "San Francisco is a",
+]


The model under test, Qwen3-Next-80B-A3B-Instruct-W8A8, is an instruction-tuned chat model. For correctness, it's better to use the chat completions API. This requires formatting the prompt as a list of messages with roles. This change should be made in conjunction with updating the API call to use client.chat.completions.create.

Suggested change

prompts = [

"San Francisco is a",

]

prompts = [

{"role": "user", "content": "San Francisco is a"},

]

gemini-code-assist · 2026-01-05T14:55:58Z

tests/e2e/nightly/single_node/models/test_qwen3_next_w8a8.py

+        batch = await client.completions.create(
+            model=model,
+            prompt=prompts,
+            **request_keyword_args,
+        )
+        choices: list[openai.types.CompletionChoice] = batch.choices
+        assert choices[0].text, "empty response"


To correctly test a chat model, you should use the client.chat.completions.create method instead of the legacy completions.create. This also requires updating how the response is accessed, from choices[0].text to choices[0].message.content. This change assumes the prompts variable has been updated to the chat message format as suggested in the other comment.

Suggested change

batch = await client.completions.create(

model=model,

prompt=prompts,

**request_keyword_args,

)

choices: list[openai.types.CompletionChoice] = batch.choices

assert choices[0].text, "empty response"

batch = await client.chat.completions.create(

model=model,

messages=prompts,

**request_keyword_args,

)

choices: list[openai.types.chat.ChatCompletion.Choice] = batch.choices

assert choices[0].message.content, "empty response"

github-actions · 2026-01-05T16:22:08Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

MengqingCao · 2026-01-06T09:36:18Z

tests/e2e/nightly/single_node/models/test_qwen3_next_w8a8.py

+async def test_models(model: str) -> None:
+    port = get_open_port()
+    env_dict = {
+        "OMP_NUM_THREADS": "10",


Suggested change

"OMP_NUM_THREADS": "10",

"OMP_NUM_THREADS": "1",

…lm-project#5616) ### What this PR does / why we need it? There was an accuracy issue with **Qwen3-Next-80B-A3B-Instruct-W8A8** model in the old version of **Triton-Ascend**, so, we are now adding one nightly test to maintain it. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@7157596 Signed-off-by: IncSec <1790766300@qq.com>

…lm-project#5616) ### What this PR does / why we need it? There was an accuracy issue with **Qwen3-Next-80B-A3B-Instruct-W8A8** model in the old version of **Triton-Ascend**, so, we are now adding one nightly test to maintain it. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@7157596 Signed-off-by: IncSec <1790766300@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…lm-project#5616) ### What this PR does / why we need it? There was an accuracy issue with **Qwen3-Next-80B-A3B-Instruct-W8A8** model in the old version of **Triton-Ascend**, so, we are now adding one nightly test to maintain it. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@7157596 Signed-off-by: IncSec <1790766300@qq.com>

…lm-project#5616) ### What this PR does / why we need it? There was an accuracy issue with **Qwen3-Next-80B-A3B-Instruct-W8A8** model in the old version of **Triton-Ascend**, so, we are now adding one nightly test to maintain it. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@7157596 Signed-off-by: IncSec <1790766300@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…lm-project#5616) ### What this PR does / why we need it? There was an accuracy issue with **Qwen3-Next-80B-A3B-Instruct-W8A8** model in the old version of **Triton-Ascend**, so, we are now adding one nightly test to maintain it. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@7157596 Signed-off-by: IncSec <1790766300@qq.com>

[Nightly][Test] Add Qwen3-Next-80B-A3B-Instruct-W8A8 nightly test

5fcba34

Signed-off-by: IncSec <1790766300@qq.com>

gemini-code-assist bot reviewed Jan 5, 2026

View reviewed changes

github-actions bot added ci/build module:tests labels Jan 5, 2026

wangxiyuan merged commit 089ca2d into vllm-project:main Jan 6, 2026
16 checks passed

MengqingCao reviewed Jan 6, 2026

View reviewed changes

MrZ20 mentioned this pull request Mar 2, 2026

[Nightly][Refactor]Migrate nightly single-node model tests from .py to .yaml #6503

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Nightly][Test] Add Qwen3-Next-80B-A3B-Instruct-W8A8 nightly test#5616

[Nightly][Test] Add Qwen3-Next-80B-A3B-Instruct-W8A8 nightly test#5616
wangxiyuan merged 1 commit intovllm-project:mainfrom
InSec:add_qwen3_next_w8a8_nightly_test

InSec commented Jan 5, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 5, 2026

Uh oh!

gemini-code-assist bot Jan 5, 2026

Uh oh!

github-actions bot commented Jan 5, 2026

Uh oh!

Uh oh!

MengqingCao Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

InSec commented Jan 5, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 5, 2026

Uh oh!

Uh oh!

MengqingCao Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

InSec commented Jan 5, 2026 •

edited by github-actions bot

Loading