Skip to content

[TEST] Add initial aisbench support and Qwen3 32B acc/perf test#3474

Merged
Yikun merged 1 commit intovllm-project:mainfrom
jiangyunfan1:main
Oct 20, 2025
Merged

[TEST] Add initial aisbench support and Qwen3 32B acc/perf test#3474
Yikun merged 1 commit intovllm-project:mainfrom
jiangyunfan1:main

Conversation

@jiangyunfan1
Copy link
Contributor

@jiangyunfan1 jiangyunfan1 commented Oct 15, 2025

What this PR does / why we need it?

This PR adds the first aisbench case for nightly test, it lays a foundation for following performance and accuracy tests in nightly test.

Does this PR introduce any user-facing change?

No

How was this patch tested?

By running the test

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds new nightly tests using aisbench and a corresponding helper script. The overall approach is sound, but the new tools/aisbench.py script contains several critical and high-severity issues. These include potential UnboundLocalError exceptions due to incorrect variable scoping, a resource leak from an unmanaged subprocess, a risk of an infinite loop when monitoring the subprocess, and fragile logic that relies on hardcoded values or specific string formats. Addressing these issues is crucial for making the test runner robust and reliable.

Comment on lines +43 to +55
if self.task_type == "accuracy":
aisbench_cmd = [
'ais_bench', '--models', f'{self.request_conf}_custom',
'--datasets', f'{dataset_conf}', '--debug'
]
if self.task_type == "performance":
aisbench_cmd = [
'ais_bench', '--models', f'{self.request_conf}_custom',
'--datasets', f'{dataset_conf}_custom', '--debug', '--mode',
'perf'
]
if self.num_prompts:
aisbench_cmd.extend(['--num-prompts', str(self.num_prompts)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The aisbench_cmd variable is only defined within the if self.task_type == "accuracy": and if self.task_type == "performance": blocks. If self.task_type has any other value, aisbench_cmd will be unbound when used on line 56, causing an UnboundLocalError. It's better to use an if/elif/else structure to handle all cases, raising an error for unsupported task types.

        if self.task_type == "accuracy":
            aisbench_cmd = [
                'ais_bench', '--models', f'{self.request_conf}_custom',
                '--datasets', f'{dataset_conf}', '--debug'
            ]
        elif self.task_type == "performance":
            aisbench_cmd = [
                'ais_bench', '--models', f'{self.request_conf}_custom',
                '--datasets', f'{dataset_conf}_custom', '--debug', '--mode',
                'perf'
            ]
            if self.num_prompts:
                aisbench_cmd.extend(['--num-prompts', str(self.num_prompts)])
        else:
            raise ValueError(f"Unsupported task_type: {self.task_type}")

Comment on lines +150 to +158
while True:
line = self.proc.stdout.readline().strip()
print(line)
if result_msg in line:
self.result_line = line
return
if "ERROR" in line:
raise RuntimeError(
"Some errors happen to Aisbench task.") from None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The while True loop reading from self.proc.stdout can hang indefinitely if the subprocess closes its standard output without printing either the expected result_msg or an "ERROR" line. readline() will block, waiting for data that will never arrive. The loop should also check if the process has terminated. When an error does occur, including stderr in the exception message would be very helpful for debugging.

        while True:
            line = self.proc.stdout.readline().strip()
            if not line and self.proc.poll() is not None:
                # Process ended without finding the result message
                stderr = self.proc.stderr.read()
                raise RuntimeError(f"Aisbench task finished unexpectedly. Stderr: {stderr}")

            print(line)
            if result_msg in line:
                self.result_line = line
                return
            if "ERROR" in line:
                stderr = self.proc.stderr.read()
                raise RuntimeError(f"Some errors happen to Aisbench task. Stderr: {stderr}")

Comment on lines +163 to +164
result_csv_file = os.path.join(result_dir, "gsm8kdataset.csv")
result_json_file = os.path.join(result_dir, "gsm8kdataset.json")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The result filenames gsm8kdataset.csv and gsm8kdataset.json are hardcoded. This restricts this method to only work with the gsm8k dataset. To make this runner more flexible and reusable for other datasets, the dataset name should be derived dynamically, for instance from self.dataset_conf.

        dataset_name = self.dataset_conf.split('/')[0]
        result_csv_file = os.path.join(result_dir, f"{dataset_name}dataset.csv")
        result_json_file = os.path.join(result_dir, f"{dataset_name}dataset.json")

Copy link
Member

@Yikun Yikun Oct 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is validate review, pls resolve this. @jiangyunfan1 You should address this later

@github-actions
Copy link
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@wangxiyuan wangxiyuan added the ready read for review label Oct 15, 2025
Copy link
Member

@Yikun Yikun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see comments inline, remove unrelated unready file.

Comment on lines +163 to +164
result_csv_file = os.path.join(result_dir, "gsm8kdataset.csv")
result_json_file = os.path.join(result_dir, "gsm8kdataset.json")
Copy link
Member

@Yikun Yikun Oct 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is validate review, pls resolve this. @jiangyunfan1 You should address this later

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
@Yikun Yikun changed the title Add aisbench nightly test cases [TEST] Add initial aisbench support and Qwen3 32B acc test Oct 19, 2025
@Yikun Yikun changed the title [TEST] Add initial aisbench support and Qwen3 32B acc test [TEST] Add initial aisbench support and Qwen3 32B acc perf test Oct 19, 2025
@Yikun Yikun changed the title [TEST] Add initial aisbench support and Qwen3 32B acc perf test [TEST] Add initial aisbench support and Qwen3 32B acc/perf test Oct 19, 2025
@Yikun
Copy link
Member

Yikun commented Oct 19, 2025

@Yikun Yikun merged commit 9e59fc1 into vllm-project:main Oct 20, 2025
14 checks passed
wxsIcey pushed a commit to wxsIcey/vllm-ascend that referenced this pull request Oct 20, 2025
…-project#3474)

### What this PR does / why we need it?
This PR adds the first aisbench case for nightly test, it lays a
foundation for following performance and accuracy tests in nightly test.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running the test

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
ZYang6263 pushed a commit to rjg-lyh/vllm-ascend that referenced this pull request Oct 23, 2025
…-project#3474)

### What this PR does / why we need it?
This PR adds the first aisbench case for nightly test, it lays a
foundation for following performance and accuracy tests in nightly test.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running the test

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
luolun pushed a commit to luolun/vllm-ascend that referenced this pull request Nov 19, 2025
…-project#3474)

### What this PR does / why we need it?
This PR adds the first aisbench case for nightly test, it lays a
foundation for following performance and accuracy tests in nightly test.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running the test

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
Signed-off-by: luolun <luolun1995@cmbchina.com>
hwhaokun pushed a commit to hwhaokun/vllm-ascend that referenced this pull request Nov 19, 2025
…-project#3474)

### What this PR does / why we need it?
This PR adds the first aisbench case for nightly test, it lays a
foundation for following performance and accuracy tests in nightly test.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running the test

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
Signed-off-by: hwhaokun <haokun0405@163.com>
NSDie pushed a commit to NSDie/vllm-ascend that referenced this pull request Nov 24, 2025
…-project#3474)

### What this PR does / why we need it?
This PR adds the first aisbench case for nightly test, it lays a
foundation for following performance and accuracy tests in nightly test.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running the test

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
Signed-off-by: nsdie <yeyifan@huawei.com>
Clorist33 pushed a commit to Clorist33/vllm-ascend that referenced this pull request Dec 9, 2025
…-project#3474)

### What this PR does / why we need it?
This PR adds the first aisbench case for nightly test, it lays a
foundation for following performance and accuracy tests in nightly test.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running the test

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
wangxiyuan pushed a commit that referenced this pull request Mar 3, 2026
… to `.yaml` (#6503)

### What this PR does / why we need it?
This PR refactors the nightly single-node model test by migrating test
configurations from Python scripts to a more maintainable `YAML-based`
format.

| Original PR | Python (`.py`) | YAML (`.yaml`) |
| :--- | :--- | :--- |
| [#3568](#3568) |
`test_deepseek_r1_0528_w8a8_eplb.py` | `DeepSeek-R1-0528-W8A8.yaml` |
| [#3631](#3631) |
`test_deepseek_r1_0528_w8a8.py` | `DeepSeek-R1-0528-W8A8.yaml` |
| [#5874](#5874) |
`test_deepseek_r1_w8a8_hbm.py` | `DeepSeek-R1-W8A8-HBM.yaml` |
| [#3908](#3908) |
`test_deepseek_v3_2_w8a8.py` | `DeepSeek-V3.2-W8A8.yaml` |
| [#5682](#5682) |
`test_kimi_k2_thinking.py` | `Kimi-K2-Thinking.yaml` |
| [#4111](#4111) |
`test_mtpx_deepseek_r1_0528_w8a8.py` | `MTPX-DeepSeek-R1-0528-W8A8.yaml`
|
| [#3733](#3733) |
`test_prefix_cache_deepseek_r1_0528_w8a8.py` |
`Prefix-Cache-DeepSeek-R1-0528-W8A8.yaml` |
| [#6543](#6543) |
`test_qwen3_235b_w8a8.py` | `Qwen3-235B-A22B-W8A8.yaml` |
| [#6543](#6543) |
`test_qwen3_235b_a22b_w8a8_eplb.py` | `Qwen3-235B-A22B-W8A8.yaml` |
| [#3973](#3973) |
`test_qwen3_30b_w8a8.py` | `Qwen3-30B-A3B-W8A8.yaml` |
| [#3541](#3541) |
`test_qwen3_32b_int8.py` | `Qwen3-32B-Int8.yaml` |
| [#3757](#3757) |
`test_qwq_32b.py` | `QwQ-32B.yaml` |
| [#5616](#5616) |
`test_qwen3_next_w8a8.py` | `Qwen3-Next-80B-A3B-Instruct-W8A8.yaml` |
| [#3541](#3541) |
`test_qwen2_5_vl_7b.py` | `Qwen2.5-VL-7B-Instruct.yaml` |
| [#5301](#5301) |
`test_qwen2_5_vl_7b_epd.py` | `Qwen2.5-VL-7B-Instruct-EPD.yaml` |
| [#3707](#3707) |
`test_qwen2_5_vl_32b.py` | `Qwen2.5-VL-32B-Instruct.yaml` |
| [#3676](#3676) |
`test_qwen3_32b_int8_a3_feature_stack3.py` |
`Qwen3-32B-Int8-A3-Feature-Stack3.yaml` |
| [#3709](#3709) |
`test_prefix_cache_qwen3_32b_int8.py` |
`Prefix-Cache-Qwen3-32B-Int8.yaml` |
| [#5395](#5395) |
`test_qwen3_next.py` | `Qwen3-Next-80B-A3B-Instruct-A2.yaml` |
| [#3474](#3474) |
`test_qwen3_32b.py` | `Qwen3-32B.yaml` |
| [#3541](#3541) |
`test_qwen3_32b_int8.py` | `Qwen3-32B-Int8-A2.yaml` |
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

---------

Signed-off-by: MrZ20 <2609716663@qq.com>
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
… to `.yaml` (vllm-project#6503)

### What this PR does / why we need it?
This PR refactors the nightly single-node model test by migrating test
configurations from Python scripts to a more maintainable `YAML-based`
format.

| Original PR | Python (`.py`) | YAML (`.yaml`) |
| :--- | :--- | :--- |
| [vllm-project#3568](vllm-project#3568) |
`test_deepseek_r1_0528_w8a8_eplb.py` | `DeepSeek-R1-0528-W8A8.yaml` |
| [vllm-project#3631](vllm-project#3631) |
`test_deepseek_r1_0528_w8a8.py` | `DeepSeek-R1-0528-W8A8.yaml` |
| [vllm-project#5874](vllm-project#5874) |
`test_deepseek_r1_w8a8_hbm.py` | `DeepSeek-R1-W8A8-HBM.yaml` |
| [vllm-project#3908](vllm-project#3908) |
`test_deepseek_v3_2_w8a8.py` | `DeepSeek-V3.2-W8A8.yaml` |
| [vllm-project#5682](vllm-project#5682) |
`test_kimi_k2_thinking.py` | `Kimi-K2-Thinking.yaml` |
| [vllm-project#4111](vllm-project#4111) |
`test_mtpx_deepseek_r1_0528_w8a8.py` | `MTPX-DeepSeek-R1-0528-W8A8.yaml`
|
| [vllm-project#3733](vllm-project#3733) |
`test_prefix_cache_deepseek_r1_0528_w8a8.py` |
`Prefix-Cache-DeepSeek-R1-0528-W8A8.yaml` |
| [vllm-project#6543](vllm-project#6543) |
`test_qwen3_235b_w8a8.py` | `Qwen3-235B-A22B-W8A8.yaml` |
| [vllm-project#6543](vllm-project#6543) |
`test_qwen3_235b_a22b_w8a8_eplb.py` | `Qwen3-235B-A22B-W8A8.yaml` |
| [vllm-project#3973](vllm-project#3973) |
`test_qwen3_30b_w8a8.py` | `Qwen3-30B-A3B-W8A8.yaml` |
| [vllm-project#3541](vllm-project#3541) |
`test_qwen3_32b_int8.py` | `Qwen3-32B-Int8.yaml` |
| [vllm-project#3757](vllm-project#3757) |
`test_qwq_32b.py` | `QwQ-32B.yaml` |
| [vllm-project#5616](vllm-project#5616) |
`test_qwen3_next_w8a8.py` | `Qwen3-Next-80B-A3B-Instruct-W8A8.yaml` |
| [vllm-project#3541](vllm-project#3541) |
`test_qwen2_5_vl_7b.py` | `Qwen2.5-VL-7B-Instruct.yaml` |
| [vllm-project#5301](vllm-project#5301) |
`test_qwen2_5_vl_7b_epd.py` | `Qwen2.5-VL-7B-Instruct-EPD.yaml` |
| [vllm-project#3707](vllm-project#3707) |
`test_qwen2_5_vl_32b.py` | `Qwen2.5-VL-32B-Instruct.yaml` |
| [vllm-project#3676](vllm-project#3676) |
`test_qwen3_32b_int8_a3_feature_stack3.py` |
`Qwen3-32B-Int8-A3-Feature-Stack3.yaml` |
| [vllm-project#3709](vllm-project#3709) |
`test_prefix_cache_qwen3_32b_int8.py` |
`Prefix-Cache-Qwen3-32B-Int8.yaml` |
| [vllm-project#5395](vllm-project#5395) |
`test_qwen3_next.py` | `Qwen3-Next-80B-A3B-Instruct-A2.yaml` |
| [vllm-project#3474](vllm-project#3474) |
`test_qwen3_32b.py` | `Qwen3-32B.yaml` |
| [vllm-project#3541](vllm-project#3541) |
`test_qwen3_32b_int8.py` | `Qwen3-32B-Int8-A2.yaml` |
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

---------

Signed-off-by: MrZ20 <2609716663@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants