[TEST]add a qwen3-30b acc case with mooncake mempool by jiangyunfan1 · Pull Request #6244 · vllm-project/vllm-ascend

jiangyunfan1 · 2026-01-26T02:59:31Z

What this PR does / why we need it?

This PR adds a case of qwen3-30b w8a8 with mooncake mempool, we need to test it regual

Does this PR introduce any user-facing change?

No

How was this patch tested?

by running the test

vLLM version: v0.14.1
vLLM main: vllm-project/vllm@d682094

github-actions · 2026-01-26T02:59:44Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request adds a new end-to-end test for the Qwen3-30B-A3B-W8A8 model. My review identifies a potential issue with test robustness due to the use of a hardcoded file name for configuration. I've provided a suggestion to use pytest's tmp_path fixture to ensure tests are isolated and do not leave artifacts, which is a standard practice for writing robust tests.

gemini-code-assist · 2026-01-26T03:01:00Z

+async def test_models(model: str, tp_size: int) -> None:
+    port = get_open_port()
+    mooncake_port = get_open_port()
+    mooncake_metrics_port = get_open_port()
+    mooncake_json["master_server_address"] = f"127.0.0.1:{mooncake_port}"
+    with open("mooncake.json", "w") as f:
+        json.dump(mooncake_json, f)
+    env_dict = {
+        "PYTHONHASHSEED": "0",
+        "ASCEND_CONNECT_TIMEOUT": "10000",
+        "ASCEND_TRANSFER_TIMEOUT": "10000",
+        "ASCEND_BUFFER_POOL": "4:8",
+        "VLLM_USE_V1": "1",
+        "OMP_PROC_BIND": "false",
+        "HCCL_OP_EXPANSION_MODE": "AIV",
+        "HCCL_BUFFSIZE": "1024",
+        "OMP_NUM_THREADS": "1",
+        "PYTORCH_NPU_ALLOC_CONF": "expandable_segments:True",
+        "VLLM_ASCEND_ENABLE_NZ": "2",
+        "MOONCAKE_CONFIG_PATH": "mooncake.json"


Creating a file with a hardcoded name (mooncake.json) in the current working directory is not a good practice for tests. It can lead to race conditions if tests are run in parallel and can leave artifacts if a test fails. It's better to use the tmp_path fixture provided by pytest to create temporary files in a managed way. This suggestion refactors the code to use tmp_path for creating the mooncake.json configuration file, which will make the test more robust and isolated.

Suggested change

async def test_models(model: str, tp_size: int) -> None:

port = get_open_port()

mooncake_port = get_open_port()

mooncake_metrics_port = get_open_port()

mooncake_json["master_server_address"] = f"127.0.0.1:{mooncake_port}"

with open("mooncake.json", "w") as f:

json.dump(mooncake_json, f)

env_dict = {

"PYTHONHASHSEED": "0",

"ASCEND_CONNECT_TIMEOUT": "10000",

"ASCEND_TRANSFER_TIMEOUT": "10000",

"ASCEND_BUFFER_POOL": "4:8",

"VLLM_USE_V1": "1",

"OMP_PROC_BIND": "false",

"HCCL_OP_EXPANSION_MODE": "AIV",

"HCCL_BUFFSIZE": "1024",

"OMP_NUM_THREADS": "1",

"PYTORCH_NPU_ALLOC_CONF": "expandable_segments:True",

"VLLM_ASCEND_ENABLE_NZ": "2",

"MOONCAKE_CONFIG_PATH": "mooncake.json"

async def test_models(model: str, tp_size: int, tmp_path) -> None:

port = get_open_port()

mooncake_port = get_open_port()

mooncake_metrics_port = get_open_port()

mooncake_json["master_server_address"] = f"127.0.0.1:{mooncake_port}"

mooncake_config_path = tmp_path / "mooncake.json"

mooncake_config_path.write_text(json.dumps(mooncake_json))

env_dict = {

"PYTHONHASHSEED": "0",

"ASCEND_CONNECT_TIMEOUT": "10000",

"ASCEND_TRANSFER_TIMEOUT": "10000",

"ASCEND_BUFFER_POOL": "4:8",

"VLLM_USE_V1": "1",

"OMP_PROC_BIND": "false",

"HCCL_OP_EXPANSION_MODE": "AIV",

"HCCL_BUFFSIZE": "1024",

"OMP_NUM_THREADS": "1",

"PYTORCH_NPU_ALLOC_CONF": "expandable_segments:True",

"VLLM_ASCEND_ENABLE_NZ": "2",

"MOONCAKE_CONFIG_PATH": str(mooncake_config_path)

github-actions · 2026-01-30T07:57:19Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>

…to qwen3next_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: [Feat] 310p support MoE W8A8 quantizaition (vllm-project#6641) [TEST]add a qwen3-30b acc case with mooncake mempool (vllm-project#6244) [MOE Refactor] Remove QuantType in prepare_finalize.py (vllm-project#6534) [EPLB] Avoiding eplb's dependency on a specified model (vllm-project#6528) [Doc][Misc] Restructure tutorial documentation (vllm-project#6501) implement batch invariant with ascendc (vllm-project#6590) [Refact]Refact MLA/SFA weight prefetch to consist with moe weight prefetch (vllm-project#6629) [Misc] upgrade to vllm main (vllm-project#6646) [main][Docs] Fix spelling errors across documentation (vllm-project#6649) [bugfix]Fix no attribute 'data' when MLAPO is enable (vllm-project#6601) [DOC]Add Memcache Usage Guide (vllm-project#6476) [main][bugfix] Fix spec acceptance rate problem in vllm_0.15.0 (vllm-project#6606) [Test][LoRA] Add e2e test for base model inference (vllm-project#6624) [refactor]Optimized the kvcache usage of Deepseek v3.2 (vllm-project#6610) [Feat](sfa,dcp) support dcp for sfa (vllm-project#6563) [BugFix] Add support for rotary_dim parameter when using partial rope in rotary_embedding (vllm-project#6581) [fix bug] fix tensor mismatch bug in sigmoid operate test case (vllm-project#6619) [Kernel]: Optimize DispatchFFNCombine performance (vllm-project#6468) [MISC] Clean up useless env USE_OPTIMIZED_MODEL (vllm-project#6618)

### What this PR does / why we need it? This PR adds a case of qwen3-30b w8a8 with mooncake mempool, we need to test it regual ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com> Signed-off-by: mikequan0425 <mikequan0425@foxmail.com>

### What this PR does / why we need it? This PR adds a case of qwen3-30b w8a8 with mooncake mempool, we need to test it regual ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com> Signed-off-by: momochenchuw <chenchuw@huawei.com>

### What this PR does / why we need it? This PR adds a case of qwen3-30b w8a8 with mooncake mempool, we need to test it regual ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>

### What this PR does / why we need it? This PR adds a case of qwen3-30b w8a8 with mooncake mempool, we need to test it regual ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? This PR adds a case of qwen3-30b w8a8 with mooncake mempool, we need to test it regual ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>

### What this PR does / why we need it? This PR adds a case of qwen3-30b w8a8 with mooncake mempool, we need to test it regual ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? This PR adds a case of qwen3-30b w8a8 with mooncake mempool, we need to test it regual ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>

jiangyunfan1 requested review from Yikun and wangxiyuan as code owners January 26, 2026 02:59

github-actions bot added ci/build module:tests labels Jan 26, 2026

gemini-code-assist bot reviewed Jan 26, 2026

View reviewed changes

jiangyunfan1 changed the title ~~[TEST]add a mooncake case~~ [TEST]add a qwen3-30b acc case with mooncake mempool Jan 26, 2026

wangxiyuan added the model-download label Jan 26, 2026

jiangyunfan1 closed this Jan 26, 2026

jiangyunfan1 reopened this Jan 26, 2026

jiangyunfan1 force-pushed the mooncake branch from 6e67094 to c264059 Compare January 26, 2026 10:33

jiangyunfan1 closed this Jan 27, 2026

jiangyunfan1 reopened this Jan 27, 2026

jiangyunfan1 force-pushed the mooncake branch from c264059 to f12503f Compare January 27, 2026 01:59

jiangyunfan1 closed this Jan 27, 2026

jiangyunfan1 reopened this Jan 27, 2026

jiangyunfan1 force-pushed the mooncake branch 2 times, most recently from 00f2542 to dab246e Compare January 29, 2026 12:40

github-actions bot added the merge-conflicts label Jan 30, 2026

jiangyunfan1 force-pushed the mooncake branch from dab246e to 685082f Compare January 30, 2026 09:52

github-actions bot removed the merge-conflicts label Jan 30, 2026

jiangyunfan1 force-pushed the mooncake branch 2 times, most recently from d99e073 to 6a44a21 Compare February 4, 2026 06:03

jiangyunfan1 force-pushed the mooncake branch from 6a44a21 to 25dc0af Compare February 9, 2026 02:37

add qwen3-30b mooncake

874face

Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>

jiangyunfan1 force-pushed the mooncake branch from 8d0c1f0 to 874face Compare February 9, 2026 08:54

wangxiyuan approved these changes Feb 10, 2026

View reviewed changes

wangxiyuan merged commit 1eb0798 into vllm-project:main Feb 10, 2026
26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TEST]add a qwen3-30b acc case with mooncake mempool#6244

[TEST]add a qwen3-30b acc case with mooncake mempool#6244
wangxiyuan merged 1 commit intovllm-project:mainfrom
jiangyunfan1:mooncake

jiangyunfan1 commented Jan 26, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 26, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 26, 2026

Uh oh!

github-actions bot commented Jan 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jiangyunfan1 commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Jan 26, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jiangyunfan1 commented Jan 26, 2026 •

edited

Loading