[TEST]Add initial prefix cache case for nightly test by jiangyunfan1 · Pull Request #3709 · vllm-project/vllm-ascend

jiangyunfan1 · 2025-10-24T03:52:20Z

What this PR does / why we need it?

This PR adds the initial prefix cache case for nightly test for Qwen3-32b-int8 on A3, we need test them daily.

Does this PR introduce any user-facing change?

No

How was this patch tested?

By running the test

vLLM version: v0.11.0rc3
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

gemini-code-assist

Code Review

This pull request introduces a new end-to-end test for prefix caching and modifies the aisbench tool to support it. The changes are generally good, but I have a couple of suggestions to improve the test's efficiency and the robustness of the result parsing logic.

In the new test file, the vLLM server is started twice to run two separate test cases. This is inefficient and can be done within a single server instance, which will significantly speed up the test.

In tools/aisbench.py, the new get_TTFT function uses string slicing to extract a numeric value, which is brittle. I've suggested a more robust implementation using regular expressions.

gemini-code-assist · 2025-10-24T03:54:21Z

+    with RemoteOpenAIServer(model,
+                            server_args,
+                            server_port=port,
+                            env_dict=env_dict,
+                            auto_port=False):
+        run_aisbench_cases(model, port, aisbench_warm_up)
+        result = run_aisbench_cases(model, port, aisbench_cases0)
+        TTFT0 = get_TTFT(result)
+    with RemoteOpenAIServer(model,
+                            server_args,
+                            server_port=port,
+                            env_dict=env_dict,
+                            auto_port=False):
+        run_aisbench_cases(model, port, aisbench_warm_up)
+        result = run_aisbench_cases(model, port, aisbench_cases75)
+        TTFT75 = get_TTFT(result)


Starting the RemoteOpenAIServer twice is inefficient and significantly slows down the test. The server can be started once to run both test cases, and the warm-up also only needs to be run once. This will make the test execute much faster.

with RemoteOpenAIServer(model, server_args, server_port=port, env_dict=env_dict, auto_port=False): run_aisbench_cases(model, port, aisbench_warm_up) result = run_aisbench_cases(model, port, aisbench_cases0) TTFT0 = get_TTFT(result) result = run_aisbench_cases(model, port, aisbench_cases75) TTFT75 = get_TTFT(result)

gemini-code-assist · 2025-10-24T03:54:21Z

+def get_TTFT(result):
+    TTFT = result[0][0].loc["TTFT", "Average"][:-3]
+    return float(TTFT)


Slicing with [:-3] is brittle because it assumes the units will always be 3 characters long (e.g., ' ms'). If the units change (e.g., to 's') or are formatted differently, this will break or produce incorrect results. It's more robust to parse the numeric value from the string using a regular expression.

Suggested change

def get_TTFT(result):

TTFT = result[0][0].loc["TTFT", "Average"][:-3]

return float(TTFT)

def get_TTFT(result):

ttft_str = result[0][0].loc["TTFT", "Average"]

match = re.match(r"^\d+(\.\d*)?", ttft_str)

if not match:

raise ValueError(f"Could not parse TTFT value from '{ttft_str}'")

return float(match.group(0))

github-actions · 2025-10-24T04:03:00Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

jiangyunfan1 · 2025-10-24T06:51:59Z

CI passed: https://github.com/vllm-project/vllm-ascend/actions/runs/18769047634/job/53550002427

Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>

### What this PR does / why we need it? This PR adds the initial prefix cache case for nightly test for Qwen3-32b-int8 on A3, we need test them daily. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the test - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com> Signed-off-by: luolun <luolun1995@cmbchina.com>

### What this PR does / why we need it? This PR adds the initial prefix cache case for nightly test for Qwen3-32b-int8 on A3, we need test them daily. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the test - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com> Signed-off-by: hwhaokun <haokun0405@163.com>

### What this PR does / why we need it? This PR adds the initial prefix cache case for nightly test for Qwen3-32b-int8 on A3, we need test them daily. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the test - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com> Signed-off-by: nsdie <yeyifan@huawei.com>

### What this PR does / why we need it? This PR adds the initial prefix cache case for nightly test for Qwen3-32b-int8 on A3, we need test them daily. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the test - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>

gemini-code-assist bot reviewed Oct 24, 2025

View reviewed changes

github-actions bot added module:tests module:tools labels Oct 24, 2025

jiangyunfan1 force-pushed the new_branch4 branch from dc128ee to 12613d7 Compare October 24, 2025 04:37

jiangyunfan1 changed the title ~~add prefixcache~~ [TEST]Add initial prefix cache case for nightly test Oct 24, 2025

add prefix cache qwen3-32b-int8

28a9e32

Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>

jiangyunfan1 force-pushed the new_branch4 branch from 12613d7 to 28a9e32 Compare October 24, 2025 07:41

wangxiyuan approved these changes Oct 24, 2025

View reviewed changes

wangxiyuan merged commit ec9ec78 into vllm-project:main Oct 24, 2025
22 checks passed

MrZ20 mentioned this pull request Mar 2, 2026

[Nightly][Refactor]Migrate nightly single-node model tests from .py to .yaml #6503

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TEST]Add initial prefix cache case for nightly test#3709

[TEST]Add initial prefix cache case for nightly test#3709
wangxiyuan merged 1 commit intovllm-project:mainfrom
jiangyunfan1:new_branch4

jiangyunfan1 commented Oct 24, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 24, 2025

Uh oh!

gemini-code-assist bot Oct 24, 2025

Uh oh!

github-actions bot commented Oct 24, 2025

Uh oh!

jiangyunfan1 commented Oct 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-def get_TTFT(result):
-    TTFT = result[0][0].loc["TTFT", "Average"][:-3]
-    return float(TTFT)
+def get_TTFT(result):
+    ttft_str = result[0][0].loc["TTFT", "Average"]
+    match = re.match(r"^\d+(\.\d*)?", ttft_str)
+    if not match:
+        raise ValueError(f"Could not parse TTFT value from '{ttft_str}'")
+    return float(match.group(0))

Conversation

jiangyunfan1 commented Oct 24, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 24, 2025

Uh oh!

jiangyunfan1 commented Oct 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jiangyunfan1 commented Oct 24, 2025 •

edited by github-actions bot

Loading