[TEST]add a qwen3-30b acc case with mooncake mempool#6244
[TEST]add a qwen3-30b acc case with mooncake mempool#6244wangxiyuan merged 1 commit intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request adds a new end-to-end test for the Qwen3-30B-A3B-W8A8 model. My review identifies a potential issue with test robustness due to the use of a hardcoded file name for configuration. I've provided a suggestion to use pytest's tmp_path fixture to ensure tests are isolated and do not leave artifacts, which is a standard practice for writing robust tests.
| async def test_models(model: str, tp_size: int) -> None: | ||
| port = get_open_port() | ||
| mooncake_port = get_open_port() | ||
| mooncake_metrics_port = get_open_port() | ||
| mooncake_json["master_server_address"] = f"127.0.0.1:{mooncake_port}" | ||
| with open("mooncake.json", "w") as f: | ||
| json.dump(mooncake_json, f) | ||
| env_dict = { | ||
| "PYTHONHASHSEED": "0", | ||
| "ASCEND_CONNECT_TIMEOUT": "10000", | ||
| "ASCEND_TRANSFER_TIMEOUT": "10000", | ||
| "ASCEND_BUFFER_POOL": "4:8", | ||
| "VLLM_USE_V1": "1", | ||
| "OMP_PROC_BIND": "false", | ||
| "HCCL_OP_EXPANSION_MODE": "AIV", | ||
| "HCCL_BUFFSIZE": "1024", | ||
| "OMP_NUM_THREADS": "1", | ||
| "PYTORCH_NPU_ALLOC_CONF": "expandable_segments:True", | ||
| "VLLM_ASCEND_ENABLE_NZ": "2", | ||
| "MOONCAKE_CONFIG_PATH": "mooncake.json" |
There was a problem hiding this comment.
Creating a file with a hardcoded name (mooncake.json) in the current working directory is not a good practice for tests. It can lead to race conditions if tests are run in parallel and can leave artifacts if a test fails. It's better to use the tmp_path fixture provided by pytest to create temporary files in a managed way. This suggestion refactors the code to use tmp_path for creating the mooncake.json configuration file, which will make the test more robust and isolated.
| async def test_models(model: str, tp_size: int) -> None: | |
| port = get_open_port() | |
| mooncake_port = get_open_port() | |
| mooncake_metrics_port = get_open_port() | |
| mooncake_json["master_server_address"] = f"127.0.0.1:{mooncake_port}" | |
| with open("mooncake.json", "w") as f: | |
| json.dump(mooncake_json, f) | |
| env_dict = { | |
| "PYTHONHASHSEED": "0", | |
| "ASCEND_CONNECT_TIMEOUT": "10000", | |
| "ASCEND_TRANSFER_TIMEOUT": "10000", | |
| "ASCEND_BUFFER_POOL": "4:8", | |
| "VLLM_USE_V1": "1", | |
| "OMP_PROC_BIND": "false", | |
| "HCCL_OP_EXPANSION_MODE": "AIV", | |
| "HCCL_BUFFSIZE": "1024", | |
| "OMP_NUM_THREADS": "1", | |
| "PYTORCH_NPU_ALLOC_CONF": "expandable_segments:True", | |
| "VLLM_ASCEND_ENABLE_NZ": "2", | |
| "MOONCAKE_CONFIG_PATH": "mooncake.json" | |
| async def test_models(model: str, tp_size: int, tmp_path) -> None: | |
| port = get_open_port() | |
| mooncake_port = get_open_port() | |
| mooncake_metrics_port = get_open_port() | |
| mooncake_json["master_server_address"] = f"127.0.0.1:{mooncake_port}" | |
| mooncake_config_path = tmp_path / "mooncake.json" | |
| mooncake_config_path.write_text(json.dumps(mooncake_json)) | |
| env_dict = { | |
| "PYTHONHASHSEED": "0", | |
| "ASCEND_CONNECT_TIMEOUT": "10000", | |
| "ASCEND_TRANSFER_TIMEOUT": "10000", | |
| "ASCEND_BUFFER_POOL": "4:8", | |
| "VLLM_USE_V1": "1", | |
| "OMP_PROC_BIND": "false", | |
| "HCCL_OP_EXPANSION_MODE": "AIV", | |
| "HCCL_BUFFSIZE": "1024", | |
| "OMP_NUM_THREADS": "1", | |
| "PYTORCH_NPU_ALLOC_CONF": "expandable_segments:True", | |
| "VLLM_ASCEND_ENABLE_NZ": "2", | |
| "MOONCAKE_CONFIG_PATH": str(mooncake_config_path) |
6e67094 to
c264059
Compare
c264059 to
f12503f
Compare
00f2542 to
dab246e
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
dab246e to
685082f
Compare
d99e073 to
6a44a21
Compare
6a44a21 to
25dc0af
Compare
Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
8d0c1f0 to
874face
Compare
…to qwen3next_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: [Feat] 310p support MoE W8A8 quantizaition (vllm-project#6641) [TEST]add a qwen3-30b acc case with mooncake mempool (vllm-project#6244) [MOE Refactor] Remove QuantType in prepare_finalize.py (vllm-project#6534) [EPLB] Avoiding eplb's dependency on a specified model (vllm-project#6528) [Doc][Misc] Restructure tutorial documentation (vllm-project#6501) implement batch invariant with ascendc (vllm-project#6590) [Refact]Refact MLA/SFA weight prefetch to consist with moe weight prefetch (vllm-project#6629) [Misc] upgrade to vllm main (vllm-project#6646) [main][Docs] Fix spelling errors across documentation (vllm-project#6649) [bugfix]Fix no attribute 'data' when MLAPO is enable (vllm-project#6601) [DOC]Add Memcache Usage Guide (vllm-project#6476) [main][bugfix] Fix spec acceptance rate problem in vllm_0.15.0 (vllm-project#6606) [Test][LoRA] Add e2e test for base model inference (vllm-project#6624) [refactor]Optimized the kvcache usage of Deepseek v3.2 (vllm-project#6610) [Feat](sfa,dcp) support dcp for sfa (vllm-project#6563) [BugFix] Add support for rotary_dim parameter when using partial rope in rotary_embedding (vllm-project#6581) [fix bug] fix tensor mismatch bug in sigmoid operate test case (vllm-project#6619) [Kernel]: Optimize DispatchFFNCombine performance (vllm-project#6468) [MISC] Clean up useless env USE_OPTIMIZED_MODEL (vllm-project#6618)
### What this PR does / why we need it? This PR adds a case of qwen3-30b w8a8 with mooncake mempool, we need to test it regual ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com> Signed-off-by: mikequan0425 <mikequan0425@foxmail.com>
### What this PR does / why we need it? This PR adds a case of qwen3-30b w8a8 with mooncake mempool, we need to test it regual ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com> Signed-off-by: momochenchuw <chenchuw@huawei.com>
### What this PR does / why we need it? This PR adds a case of qwen3-30b w8a8 with mooncake mempool, we need to test it regual ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
### What this PR does / why we need it? This PR adds a case of qwen3-30b w8a8 with mooncake mempool, we need to test it regual ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? This PR adds a case of qwen3-30b w8a8 with mooncake mempool, we need to test it regual ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
### What this PR does / why we need it? This PR adds a case of qwen3-30b w8a8 with mooncake mempool, we need to test it regual ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? This PR adds a case of qwen3-30b w8a8 with mooncake mempool, we need to test it regual ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
What this PR does / why we need it?
This PR adds a case of qwen3-30b w8a8 with mooncake mempool, we need to test it regual
Does this PR introduce any user-facing change?
No
How was this patch tested?
by running the test