Skip to content

[TEST]add a qwen3-30b acc case with mooncake mempool#6244

Merged
wangxiyuan merged 1 commit intovllm-project:mainfrom
jiangyunfan1:mooncake
Feb 10, 2026
Merged

[TEST]add a qwen3-30b acc case with mooncake mempool#6244
wangxiyuan merged 1 commit intovllm-project:mainfrom
jiangyunfan1:mooncake

Conversation

@jiangyunfan1
Copy link
Copy Markdown
Contributor

@jiangyunfan1 jiangyunfan1 commented Jan 26, 2026

What this PR does / why we need it?

This PR adds a case of qwen3-30b w8a8 with mooncake mempool, we need to test it regual

Does this PR introduce any user-facing change?

No

How was this patch tested?

by running the test

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new end-to-end test for the Qwen3-30B-A3B-W8A8 model. My review identifies a potential issue with test robustness due to the use of a hardcoded file name for configuration. I've provided a suggestion to use pytest's tmp_path fixture to ensure tests are isolated and do not leave artifacts, which is a standard practice for writing robust tests.

Comment on lines +58 to +77
async def test_models(model: str, tp_size: int) -> None:
port = get_open_port()
mooncake_port = get_open_port()
mooncake_metrics_port = get_open_port()
mooncake_json["master_server_address"] = f"127.0.0.1:{mooncake_port}"
with open("mooncake.json", "w") as f:
json.dump(mooncake_json, f)
env_dict = {
"PYTHONHASHSEED": "0",
"ASCEND_CONNECT_TIMEOUT": "10000",
"ASCEND_TRANSFER_TIMEOUT": "10000",
"ASCEND_BUFFER_POOL": "4:8",
"VLLM_USE_V1": "1",
"OMP_PROC_BIND": "false",
"HCCL_OP_EXPANSION_MODE": "AIV",
"HCCL_BUFFSIZE": "1024",
"OMP_NUM_THREADS": "1",
"PYTORCH_NPU_ALLOC_CONF": "expandable_segments:True",
"VLLM_ASCEND_ENABLE_NZ": "2",
"MOONCAKE_CONFIG_PATH": "mooncake.json"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Creating a file with a hardcoded name (mooncake.json) in the current working directory is not a good practice for tests. It can lead to race conditions if tests are run in parallel and can leave artifacts if a test fails. It's better to use the tmp_path fixture provided by pytest to create temporary files in a managed way. This suggestion refactors the code to use tmp_path for creating the mooncake.json configuration file, which will make the test more robust and isolated.

Suggested change
async def test_models(model: str, tp_size: int) -> None:
port = get_open_port()
mooncake_port = get_open_port()
mooncake_metrics_port = get_open_port()
mooncake_json["master_server_address"] = f"127.0.0.1:{mooncake_port}"
with open("mooncake.json", "w") as f:
json.dump(mooncake_json, f)
env_dict = {
"PYTHONHASHSEED": "0",
"ASCEND_CONNECT_TIMEOUT": "10000",
"ASCEND_TRANSFER_TIMEOUT": "10000",
"ASCEND_BUFFER_POOL": "4:8",
"VLLM_USE_V1": "1",
"OMP_PROC_BIND": "false",
"HCCL_OP_EXPANSION_MODE": "AIV",
"HCCL_BUFFSIZE": "1024",
"OMP_NUM_THREADS": "1",
"PYTORCH_NPU_ALLOC_CONF": "expandable_segments:True",
"VLLM_ASCEND_ENABLE_NZ": "2",
"MOONCAKE_CONFIG_PATH": "mooncake.json"
async def test_models(model: str, tp_size: int, tmp_path) -> None:
port = get_open_port()
mooncake_port = get_open_port()
mooncake_metrics_port = get_open_port()
mooncake_json["master_server_address"] = f"127.0.0.1:{mooncake_port}"
mooncake_config_path = tmp_path / "mooncake.json"
mooncake_config_path.write_text(json.dumps(mooncake_json))
env_dict = {
"PYTHONHASHSEED": "0",
"ASCEND_CONNECT_TIMEOUT": "10000",
"ASCEND_TRANSFER_TIMEOUT": "10000",
"ASCEND_BUFFER_POOL": "4:8",
"VLLM_USE_V1": "1",
"OMP_PROC_BIND": "false",
"HCCL_OP_EXPANSION_MODE": "AIV",
"HCCL_BUFFSIZE": "1024",
"OMP_NUM_THREADS": "1",
"PYTORCH_NPU_ALLOC_CONF": "expandable_segments:True",
"VLLM_ASCEND_ENABLE_NZ": "2",
"MOONCAKE_CONFIG_PATH": str(mooncake_config_path)

@jiangyunfan1 jiangyunfan1 changed the title [TEST]add a mooncake case [TEST]add a qwen3-30b acc case with mooncake mempool Jan 26, 2026
@jiangyunfan1 jiangyunfan1 reopened this Jan 26, 2026
@jiangyunfan1 jiangyunfan1 reopened this Jan 27, 2026
@jiangyunfan1 jiangyunfan1 reopened this Jan 27, 2026
@jiangyunfan1 jiangyunfan1 force-pushed the mooncake branch 2 times, most recently from 00f2542 to dab246e Compare January 29, 2026 12:40
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
@wangxiyuan wangxiyuan merged commit 1eb0798 into vllm-project:main Feb 10, 2026
26 checks passed
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Feb 11, 2026
…to qwen3next_rebase

* 'main' of https://github.com/vllm-project/vllm-ascend:
  [Feat] 310p support MoE W8A8 quantizaition (vllm-project#6641)
  [TEST]add a qwen3-30b acc case with mooncake mempool (vllm-project#6244)
  [MOE Refactor] Remove QuantType in prepare_finalize.py (vllm-project#6534)
  [EPLB] Avoiding eplb's dependency on a specified model (vllm-project#6528)
  [Doc][Misc] Restructure tutorial documentation (vllm-project#6501)
  implement batch invariant with ascendc (vllm-project#6590)
  [Refact]Refact MLA/SFA weight prefetch to consist with moe weight prefetch (vllm-project#6629)
  [Misc] upgrade to vllm main (vllm-project#6646)
  [main][Docs] Fix spelling errors across documentation (vllm-project#6649)
  [bugfix]Fix no attribute 'data' when MLAPO is enable  (vllm-project#6601)
  [DOC]Add Memcache Usage Guide (vllm-project#6476)
  [main][bugfix] Fix spec acceptance rate problem in vllm_0.15.0 (vllm-project#6606)
  [Test][LoRA] Add e2e test for base model inference (vllm-project#6624)
  [refactor]Optimized the kvcache usage of Deepseek v3.2 (vllm-project#6610)
  [Feat](sfa,dcp) support dcp for sfa (vllm-project#6563)
  [BugFix] Add support for rotary_dim parameter when using partial rope in rotary_embedding (vllm-project#6581)
  [fix bug] fix tensor mismatch bug in sigmoid operate test case (vllm-project#6619)
  [Kernel]: Optimize DispatchFFNCombine performance (vllm-project#6468)
  [MISC] Clean up useless env USE_OPTIMIZED_MODEL (vllm-project#6618)
mikequan0425 pushed a commit to taoyao1221/vllm-ascend that referenced this pull request Feb 11, 2026
### What this PR does / why we need it?
This PR adds a case of qwen3-30b w8a8 with mooncake mempool, we need to
test it regual
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by running the test
- vLLM version: v0.14.1
- vLLM main:
vllm-project/vllm@d682094

Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
Signed-off-by: mikequan0425 <mikequan0425@foxmail.com>
chenchuw886 pushed a commit to chenchuw886/vllm-ascend that referenced this pull request Feb 12, 2026
### What this PR does / why we need it?
This PR adds a case of qwen3-30b w8a8 with mooncake mempool, we need to
test it regual
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by running the test
- vLLM version: v0.14.1
- vLLM main:
vllm-project/vllm@d682094

Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
Signed-off-by: momochenchuw <chenchuw@huawei.com>
banxiaduhuo pushed a commit to banxiaduhuo/vllm-ascend that referenced this pull request Feb 26, 2026
### What this PR does / why we need it?
This PR adds a case of qwen3-30b w8a8 with mooncake mempool, we need to
test it regual
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by running the test
- vLLM version: v0.14.1
- vLLM main:
vllm-project/vllm@d682094

Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
### What this PR does / why we need it?
This PR adds a case of qwen3-30b w8a8 with mooncake mempool, we need to
test it regual
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by running the test
- vLLM version: v0.14.1
- vLLM main:
vllm-project/vllm@d682094

Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
### What this PR does / why we need it?
This PR adds a case of qwen3-30b w8a8 with mooncake mempool, we need to
test it regual
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by running the test
- vLLM version: v0.14.1
- vLLM main:
vllm-project/vllm@d682094

Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
### What this PR does / why we need it?
This PR adds a case of qwen3-30b w8a8 with mooncake mempool, we need to
test it regual
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by running the test
- vLLM version: v0.14.1
- vLLM main:
vllm-project/vllm@d682094

Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
### What this PR does / why we need it?
This PR adds a case of qwen3-30b w8a8 with mooncake mempool, we need to
test it regual
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by running the test
- vLLM version: v0.14.1
- vLLM main:
vllm-project/vllm@d682094

Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants