Skip to content

add qwq testcase#3757

Merged
wangxiyuan merged 7 commits intovllm-project:mainfrom
ck-hw-1018:main
Oct 25, 2025
Merged

add qwq testcase#3757
wangxiyuan merged 7 commits intovllm-project:mainfrom
ck-hw-1018:main

Conversation

@ck-hw-1018
Copy link
Copy Markdown
Contributor

@ck-hw-1018 ck-hw-1018 commented Oct 25, 2025

What this PR does / why we need it?

This PR adds a qwq case for nightly test for qwen-qwq on A3 ,we need test them daily

Does this PR introduce any user-facing change?

no

How was this patch tested?

by running the test

Signed-off-by: ckhw <cuikai1@huawei.com>
Signed-off-by: ckhw <cuikai1@huawei.com>
@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR adds a new end-to-end test for the Qwen/QwQ-32B model. The overall structure is good, but I've found an area for improvement in how the server arguments are constructed. The current implementation is brittle and hard to maintain. I've provided a suggestion to make it more robust and readable.

Comment on lines +84 to +101
server_args = [
"--tensor-parallel-size",
str(tp_size), "--port",
str(port), "--max-model-len", "36864", "--max-num-batched-tokens",
"36864", "--block-size", "128", "--trust-remote-code",
"--gpu-memory-utilization", "0.9", "--compilation_config",
'{"cudagraph_mode":"FULL_DECODE_ONLY", "cudagraph_capture_sizes": [1, 8, 24, 48, 60]}',
"--reasoning-parser", "deepseek_r1", "--distributed_executor_backend",
"mp"
]
if mode == "single":
server_args.remove("--compilation_config")
server_args.remove(
'{"cudagraph_mode":"FULL_DECODE_ONLY", "cudagraph_capture_sizes": [1, 8, 24, 48, 60]}'
)
server_args.append("--additional-config")
server_args.append('{"ascend_scheduler_config":{"enabled":true}}')
server_args.append("--enforce-eager")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current method of constructing server_args by defining a default list and then modifying it with list.remove() is brittle and can lead to runtime errors. If the initial list is changed, the remove() calls might fail with a ValueError. Additionally, the long JSON string for compilation_config is duplicated, making the code harder to maintain.

It's better to build the argument list conditionally from common and mode-specific parts. This approach is more robust, readable, and avoids duplicating configuration strings.

    server_args = [
        "--tensor-parallel-size",
        str(tp_size), "--port",
        str(port), "--max-model-len", "36864", "--max-num-batched-tokens",
        "36864", "--block-size", "128", "--trust-remote-code",
        "--gpu-memory-utilization", "0.9",
    ]
    if mode == "single":
        server_args.extend([
            "--additional-config",
            '{"ascend_scheduler_config":{"enabled":true}}',
            "--enforce-eager",
        ])
    else:  # aclgraph
        server_args.extend([
            "--compilation_config",
            '{"cudagraph_mode":"FULL_DECODE_ONLY", "cudagraph_capture_sizes": [1, 8, 24, 48, 60]}',
        ])
    server_args.extend([
        "--reasoning-parser", "deepseek_r1", "--distributed_executor_backend",
        "mp"
    ])

Signed-off-by: ckhw <cuikai1@huawei.com>
Signed-off-by: ckhw <cuikai1@huawei.com>
@ck-hw-1018
Copy link
Copy Markdown
Contributor Author

Signed-off-by: ckhw <cuikai1@huawei.com>
Signed-off-by: ckhw <cuikai1@huawei.com>
Signed-off-by: ckhw <cuikai1@huawei.com>
@jiangyunfan1
Copy link
Copy Markdown
Contributor

LGTM

@wangxiyuan wangxiyuan merged commit 7572939 into vllm-project:main Oct 25, 2025
6 checks passed
luolun pushed a commit to luolun/vllm-ascend that referenced this pull request Nov 19, 2025
### What this PR does / why we need it?
This PR adds a qwq case for nightly test for qwen-qwq on A3 ,we need
test them daily

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
by running the test


- vLLM version: v0.11.0rc3
- vLLM main:
vllm-project/vllm@c9461e0

---------

Signed-off-by: ckhw <cuikai1@huawei.com>
Signed-off-by: luolun <luolun1995@cmbchina.com>
hwhaokun pushed a commit to hwhaokun/vllm-ascend that referenced this pull request Nov 19, 2025
### What this PR does / why we need it?
This PR adds a qwq case for nightly test for qwen-qwq on A3 ,we need
test them daily

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
by running the test

- vLLM version: v0.11.0rc3
- vLLM main:
vllm-project/vllm@c9461e0

---------

Signed-off-by: ckhw <cuikai1@huawei.com>
Signed-off-by: hwhaokun <haokun0405@163.com>
NSDie pushed a commit to NSDie/vllm-ascend that referenced this pull request Nov 24, 2025
### What this PR does / why we need it?
This PR adds a qwq case for nightly test for qwen-qwq on A3 ,we need
test them daily

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
by running the test

- vLLM version: v0.11.0rc3
- vLLM main:
vllm-project/vllm@c9461e0

---------

Signed-off-by: ckhw <cuikai1@huawei.com>
Signed-off-by: nsdie <yeyifan@huawei.com>
Clorist33 pushed a commit to Clorist33/vllm-ascend that referenced this pull request Dec 10, 2025
### What this PR does / why we need it?
This PR adds a qwq case for nightly test for qwen-qwq on A3 ,we need
test them daily

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
by running the test


- vLLM version: v0.11.0rc3
- vLLM main:
vllm-project/vllm@c9461e0

---------

Signed-off-by: ckhw <cuikai1@huawei.com>
wangxiyuan pushed a commit that referenced this pull request Mar 3, 2026
… to `.yaml` (#6503)

### What this PR does / why we need it?
This PR refactors the nightly single-node model test by migrating test
configurations from Python scripts to a more maintainable `YAML-based`
format.

| Original PR | Python (`.py`) | YAML (`.yaml`) |
| :--- | :--- | :--- |
| [#3568](#3568) |
`test_deepseek_r1_0528_w8a8_eplb.py` | `DeepSeek-R1-0528-W8A8.yaml` |
| [#3631](#3631) |
`test_deepseek_r1_0528_w8a8.py` | `DeepSeek-R1-0528-W8A8.yaml` |
| [#5874](#5874) |
`test_deepseek_r1_w8a8_hbm.py` | `DeepSeek-R1-W8A8-HBM.yaml` |
| [#3908](#3908) |
`test_deepseek_v3_2_w8a8.py` | `DeepSeek-V3.2-W8A8.yaml` |
| [#5682](#5682) |
`test_kimi_k2_thinking.py` | `Kimi-K2-Thinking.yaml` |
| [#4111](#4111) |
`test_mtpx_deepseek_r1_0528_w8a8.py` | `MTPX-DeepSeek-R1-0528-W8A8.yaml`
|
| [#3733](#3733) |
`test_prefix_cache_deepseek_r1_0528_w8a8.py` |
`Prefix-Cache-DeepSeek-R1-0528-W8A8.yaml` |
| [#6543](#6543) |
`test_qwen3_235b_w8a8.py` | `Qwen3-235B-A22B-W8A8.yaml` |
| [#6543](#6543) |
`test_qwen3_235b_a22b_w8a8_eplb.py` | `Qwen3-235B-A22B-W8A8.yaml` |
| [#3973](#3973) |
`test_qwen3_30b_w8a8.py` | `Qwen3-30B-A3B-W8A8.yaml` |
| [#3541](#3541) |
`test_qwen3_32b_int8.py` | `Qwen3-32B-Int8.yaml` |
| [#3757](#3757) |
`test_qwq_32b.py` | `QwQ-32B.yaml` |
| [#5616](#5616) |
`test_qwen3_next_w8a8.py` | `Qwen3-Next-80B-A3B-Instruct-W8A8.yaml` |
| [#3541](#3541) |
`test_qwen2_5_vl_7b.py` | `Qwen2.5-VL-7B-Instruct.yaml` |
| [#5301](#5301) |
`test_qwen2_5_vl_7b_epd.py` | `Qwen2.5-VL-7B-Instruct-EPD.yaml` |
| [#3707](#3707) |
`test_qwen2_5_vl_32b.py` | `Qwen2.5-VL-32B-Instruct.yaml` |
| [#3676](#3676) |
`test_qwen3_32b_int8_a3_feature_stack3.py` |
`Qwen3-32B-Int8-A3-Feature-Stack3.yaml` |
| [#3709](#3709) |
`test_prefix_cache_qwen3_32b_int8.py` |
`Prefix-Cache-Qwen3-32B-Int8.yaml` |
| [#5395](#5395) |
`test_qwen3_next.py` | `Qwen3-Next-80B-A3B-Instruct-A2.yaml` |
| [#3474](#3474) |
`test_qwen3_32b.py` | `Qwen3-32B.yaml` |
| [#3541](#3541) |
`test_qwen3_32b_int8.py` | `Qwen3-32B-Int8-A2.yaml` |
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

---------

Signed-off-by: MrZ20 <2609716663@qq.com>
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
… to `.yaml` (vllm-project#6503)

### What this PR does / why we need it?
This PR refactors the nightly single-node model test by migrating test
configurations from Python scripts to a more maintainable `YAML-based`
format.

| Original PR | Python (`.py`) | YAML (`.yaml`) |
| :--- | :--- | :--- |
| [vllm-project#3568](vllm-project#3568) |
`test_deepseek_r1_0528_w8a8_eplb.py` | `DeepSeek-R1-0528-W8A8.yaml` |
| [vllm-project#3631](vllm-project#3631) |
`test_deepseek_r1_0528_w8a8.py` | `DeepSeek-R1-0528-W8A8.yaml` |
| [vllm-project#5874](vllm-project#5874) |
`test_deepseek_r1_w8a8_hbm.py` | `DeepSeek-R1-W8A8-HBM.yaml` |
| [vllm-project#3908](vllm-project#3908) |
`test_deepseek_v3_2_w8a8.py` | `DeepSeek-V3.2-W8A8.yaml` |
| [vllm-project#5682](vllm-project#5682) |
`test_kimi_k2_thinking.py` | `Kimi-K2-Thinking.yaml` |
| [vllm-project#4111](vllm-project#4111) |
`test_mtpx_deepseek_r1_0528_w8a8.py` | `MTPX-DeepSeek-R1-0528-W8A8.yaml`
|
| [vllm-project#3733](vllm-project#3733) |
`test_prefix_cache_deepseek_r1_0528_w8a8.py` |
`Prefix-Cache-DeepSeek-R1-0528-W8A8.yaml` |
| [vllm-project#6543](vllm-project#6543) |
`test_qwen3_235b_w8a8.py` | `Qwen3-235B-A22B-W8A8.yaml` |
| [vllm-project#6543](vllm-project#6543) |
`test_qwen3_235b_a22b_w8a8_eplb.py` | `Qwen3-235B-A22B-W8A8.yaml` |
| [vllm-project#3973](vllm-project#3973) |
`test_qwen3_30b_w8a8.py` | `Qwen3-30B-A3B-W8A8.yaml` |
| [vllm-project#3541](vllm-project#3541) |
`test_qwen3_32b_int8.py` | `Qwen3-32B-Int8.yaml` |
| [vllm-project#3757](vllm-project#3757) |
`test_qwq_32b.py` | `QwQ-32B.yaml` |
| [vllm-project#5616](vllm-project#5616) |
`test_qwen3_next_w8a8.py` | `Qwen3-Next-80B-A3B-Instruct-W8A8.yaml` |
| [vllm-project#3541](vllm-project#3541) |
`test_qwen2_5_vl_7b.py` | `Qwen2.5-VL-7B-Instruct.yaml` |
| [vllm-project#5301](vllm-project#5301) |
`test_qwen2_5_vl_7b_epd.py` | `Qwen2.5-VL-7B-Instruct-EPD.yaml` |
| [vllm-project#3707](vllm-project#3707) |
`test_qwen2_5_vl_32b.py` | `Qwen2.5-VL-32B-Instruct.yaml` |
| [vllm-project#3676](vllm-project#3676) |
`test_qwen3_32b_int8_a3_feature_stack3.py` |
`Qwen3-32B-Int8-A3-Feature-Stack3.yaml` |
| [vllm-project#3709](vllm-project#3709) |
`test_prefix_cache_qwen3_32b_int8.py` |
`Prefix-Cache-Qwen3-32B-Int8.yaml` |
| [vllm-project#5395](vllm-project#5395) |
`test_qwen3_next.py` | `Qwen3-Next-80B-A3B-Instruct-A2.yaml` |
| [vllm-project#3474](vllm-project#3474) |
`test_qwen3_32b.py` | `Qwen3-32B.yaml` |
| [vllm-project#3541](vllm-project#3541) |
`test_qwen3_32b_int8.py` | `Qwen3-32B-Int8-A2.yaml` |
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

---------

Signed-off-by: MrZ20 <2609716663@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants