-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[TEST] Add Qwen3-32b-w8a8 acc/perf A2/A3 test #3541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
81 commits
Select commit
Hold shift + click to select a range
8edcacf
add a case
jiangyunfan1 9b55223
add a job
jiangyunfan1 798ca2e
fix some issues
jiangyunfan1 1543c62
fix some issues
jiangyunfan1 b4e6a7a
fix some issues
jiangyunfan1 3eb5cfe
fix some issues
jiangyunfan1 065ddf5
fix some issues
jiangyunfan1 ac0c471
fix some issues
jiangyunfan1 ce98fe8
fix some issues
jiangyunfan1 23f767a
fix some issues
jiangyunfan1 e87e849
fix some issues
jiangyunfan1 a3b1970
fix some issues
jiangyunfan1 6dca606
add changes
jiangyunfan1 4d10b7e
delete old files
jiangyunfan1 cc3c4a1
add a case of qwen3-32b-int8
jiangyunfan1 9ef536b
rebase
Potabk 22fe4d7
fix
Potabk 93a7749
add workflow
Potabk 3eb536a
fix path
Potabk cc25561
fix
Potabk 6a4003b
just for test
Potabk ba518fb
fix
Potabk 799ebb3
add workflow
Potabk f852e30
fix
Potabk 26762c8
rm vllm_use_v1 env
Potabk 8dbb2ed
add port
Potabk 24320bb
add trigger
Potabk 93d8dfc
test
Potabk 45bfb9c
test
Potabk fae574f
revert
Potabk 5261d96
revert
Potabk 4b6c11b
fix
Potabk ce6354f
fix
Potabk bbab015
fix
Potabk 3bc091c
fix
Potabk 346118b
add test
Potabk 670e4a9
fix lint
Yikun c9b5986
Merge branch 'vllm-project:main' into main
jiangyunfan1 5058c08
Merge branch 'vllm-project:main' into main
jiangyunfan1 4a23a11
add nightly test aisbench cases
jiangyunfan1 e467d24
fix issues
jiangyunfan1 4a60fb7
fix import
jiangyunfan1 aa474b1
fix import
jiangyunfan1 cfbf3c9
ignore modelscope check
jiangyunfan1 3a76502
add aisbench workflow
jiangyunfan1 ceeb404
fix model param
jiangyunfan1 54618a0
fix model param
jiangyunfan1 dae2574
fix input param
jiangyunfan1 3e9cdf0
relax acc test threshold
jiangyunfan1 b373f85
acc to gsm8k-lite
jiangyunfan1 98285c6
stream_chat, not ignore_eos
jiangyunfan1 d30dd3e
remove acc case
jiangyunfan1 95b14ff
acc debug test
jiangyunfan1 a90f960
suqash
Yikun 78e91c0
Merge remote-tracking branch 'origin'
jiangyunfan1 e3a994d
Merge branch 'vllm-project:main' into main
jiangyunfan1 ab95206
Merge remote-tracking branch 'origin/main'
jiangyunfan1 bd6051e
add qwen25vl
jiangyunfan1 6bc3cb5
add qwen25vl-7b workflow
jiangyunfan1 6e54d0e
Merge branch 'vllm-project:main' into main
jiangyunfan1 12be156
qwen25vl-7b a2
jiangyunfan1 91f25f2
bf16 acc and perf
jiangyunfan1 54d2b09
Add initial aisbench and Qwen3 32B
Yikun 25c2541
Merge branch 'vllm-project:main' into main
jiangyunfan1 dd6dcd3
merge conflict
jiangyunfan1 389629d
test qwen25vl
jiangyunfan1 cad97d5
qwen25vl-7b a3
jiangyunfan1 5002483
add qwen25vl-7b perf
jiangyunfan1 c0ae66c
add a3 qwen3-32b
jiangyunfan1 adf1555
merge a2a3 qwen3-32b-int8
jiangyunfan1 05d9a81
remove a2 qwen3-32b-int8
jiangyunfan1 7b00198
fix index
jiangyunfan1 e04eb92
fix index
jiangyunfan1 58cafeb
mod a3 image
jiangyunfan1 d653d71
rm qwen3-32b a3
jiangyunfan1 06114b9
Merge branch 'vllm-project:main' into main
jiangyunfan1 3b9e94a
fix qwen3-32b-int8
jiangyunfan1 2368840
Merge remote-tracking branch 'origin/main'
jiangyunfan1 09845bb
add qwen3-32b-int8 8 cases
jiangyunfan1 a1eed85
fix yaml
jiangyunfan1 a4e3948
mod workflow
jiangyunfan1 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,110 @@ | ||
| # Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved. | ||
| # Copyright 2023 The vLLM team. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| # This file is a part of the vllm-ascend project. | ||
| # | ||
| from typing import Any | ||
|
|
||
| import openai | ||
| import pytest | ||
| from vllm.utils import get_open_port | ||
|
|
||
| from tests.e2e.conftest import RemoteOpenAIServer | ||
| from tools.aisbench import run_aisbench_cases | ||
| from tools.send_mm_request import send_image_request | ||
|
|
||
| MODELS = [ | ||
| "Qwen/Qwen2.5-VL-7B-Instruct", | ||
| ] | ||
|
|
||
| TENSOR_PARALLELS = [4] | ||
|
|
||
| prompts = [ | ||
| "San Francisco is a", | ||
| ] | ||
|
|
||
| api_keyword_args = { | ||
| "max_tokens": 10, | ||
| } | ||
|
|
||
| aisbench_cases = [{ | ||
| "case_type": "accuracy", | ||
| "dataset_path": "vllm-ascend/textvqa-lite", | ||
| "request_conf": "vllm_api_stream_chat", | ||
| "dataset_conf": "textvqa/textvqa_gen_base64", | ||
| "max_out_len": 2048, | ||
| "batch_size": 128, | ||
| "baseline": 81, | ||
| "threshold": 5 | ||
| }, { | ||
| "case_type": "performance", | ||
| "dataset_path": "vllm-ascend/textvqa-perf-1080p", | ||
| "request_conf": "vllm_api_stream_chat", | ||
| "dataset_conf": "textvqa/textvqa_gen_base64", | ||
| "num_prompts": 512, | ||
| "max_out_len": 256, | ||
| "batch_size": 128, | ||
| "request_rate": 0, | ||
| "baseline": 1, | ||
| "threshold": 0.97 | ||
| }] | ||
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| @pytest.mark.parametrize("model", MODELS) | ||
| @pytest.mark.parametrize("tp_size", TENSOR_PARALLELS) | ||
| async def test_models(model: str, tp_size: int) -> None: | ||
| port = get_open_port() | ||
| env_dict = { | ||
| "TASK_QUEUE_ENABLE": "1", | ||
| "VLLM_ASCEND_ENABLE_NZ": "0", | ||
| "HCCL_OP_EXPANSION_MODE": "AIV" | ||
| } | ||
| server_args = [ | ||
| "--no-enable-prefix-caching", | ||
| "--disable-mm-preprocessor-cache", | ||
| "--tensor-parallel-size", | ||
| str(tp_size), | ||
| "--port", | ||
| str(port), | ||
| "--max-model-len", | ||
| "30000", | ||
| "--max-num-batched-tokens", | ||
| "40000", | ||
| "--max-num-seqs", | ||
| "400", | ||
| "--trust-remote-code", | ||
| "--gpu-memory-utilization", | ||
| "0.8", | ||
| ] | ||
| request_keyword_args: dict[str, Any] = { | ||
| **api_keyword_args, | ||
| } | ||
| with RemoteOpenAIServer(model, | ||
| server_args, | ||
| server_port=port, | ||
| env_dict=env_dict, | ||
| auto_port=False) as server: | ||
| client = server.get_async_client() | ||
| batch = await client.completions.create( | ||
| model=model, | ||
| prompt=prompts, | ||
| **request_keyword_args, | ||
| ) | ||
| choices: list[openai.types.CompletionChoice] = batch.choices | ||
| assert choices[0].text, "empty response" | ||
| print(choices) | ||
| send_image_request(model, server) | ||
| # aisbench test | ||
| run_aisbench_cases(model, port, aisbench_cases) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,118 @@ | ||
| # Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved. | ||
| # Copyright 2023 The vLLM team. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| # This file is a part of the vllm-ascend project. | ||
| # | ||
| import os | ||
| from typing import Any | ||
|
|
||
| import openai | ||
| import pytest | ||
| from vllm.utils import get_open_port | ||
|
|
||
| from tests.e2e.conftest import RemoteOpenAIServer | ||
| from tools.aisbench import run_aisbench_cases | ||
|
|
||
| MODELS = [ | ||
| "vllm-ascend/Qwen3-32B-W8A8", | ||
| ] | ||
|
|
||
| MODES = [ | ||
| "aclgraph", | ||
| "single", | ||
| ] | ||
|
|
||
| TENSOR_PARALLELS = [4] | ||
|
|
||
| prompts = [ | ||
| "San Francisco is a", | ||
| ] | ||
|
|
||
| api_keyword_args = { | ||
| "max_tokens": 10, | ||
| } | ||
|
|
||
| batch_size_dict = { | ||
| "linux-aarch64-a2-4": 44, | ||
| "linux-aarch64-a3-4": 46, | ||
| } | ||
| VLLM_CI_RUNNER = os.getenv("VLLM_CI_RUNNER", "linux-aarch64-a2-4") | ||
| performance_batch_size = batch_size_dict.get(VLLM_CI_RUNNER, 1) | ||
|
|
||
| aisbench_cases = [{ | ||
| "case_type": "performance", | ||
| "dataset_path": "vllm-ascend/GSM8K-in3500-bs400", | ||
| "request_conf": "vllm_api_stream_chat", | ||
| "dataset_conf": "gsm8k/gsm8k_gen_0_shot_cot_str_perf", | ||
| "num_prompts": 4 * performance_batch_size, | ||
| "max_out_len": 1500, | ||
| "batch_size": performance_batch_size, | ||
| "baseline": 1, | ||
| "threshold": 0.97 | ||
| }, { | ||
| "case_type": "accuracy", | ||
| "dataset_path": "vllm-ascend/aime2024", | ||
| "request_conf": "vllm_api_general_chat", | ||
| "dataset_conf": "aime2024/aime2024_gen_0_shot_chat_prompt", | ||
| "max_out_len": 32768, | ||
| "batch_size": 32, | ||
| "baseline": 83.33, | ||
| "threshold": 17 | ||
| }] | ||
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| @pytest.mark.parametrize("model", MODELS) | ||
| @pytest.mark.parametrize("mode", MODES) | ||
| @pytest.mark.parametrize("tp_size", TENSOR_PARALLELS) | ||
| async def test_models(model: str, mode: str, tp_size: int) -> None: | ||
| port = get_open_port() | ||
| env_dict = { | ||
| "TASK_QUEUE_ENABLE": "1", | ||
| "OMP_PROC_BIND": "false", | ||
| "HCCL_OP_EXPANSION_MODE": "AIV", | ||
| "PAGED_ATTENTION_MASK_LEN": "5500" | ||
| } | ||
| server_args = [ | ||
| "--quantization", "ascend", "--no-enable-prefix-caching", | ||
| "--tensor-parallel-size", | ||
| str(tp_size), "--port", | ||
| str(port), "--max-model-len", "36864", "--max-num-batched-tokens", | ||
| "36864", "--block-size", "128", "--trust-remote-code", | ||
| "--gpu-memory-utilization", "0.9", "--additional-config", | ||
| '{"enable_weight_nz_layout":true}' | ||
| ] | ||
| if mode == "single": | ||
| server_args.append("--enforce-eager") | ||
| request_keyword_args: dict[str, Any] = { | ||
| **api_keyword_args, | ||
| } | ||
| with RemoteOpenAIServer(model, | ||
| server_args, | ||
| server_port=port, | ||
| env_dict=env_dict, | ||
| auto_port=False) as server: | ||
| client = server.get_async_client() | ||
| batch = await client.completions.create( | ||
| model=model, | ||
| prompt=prompts, | ||
| **request_keyword_args, | ||
| ) | ||
| choices: list[openai.types.CompletionChoice] = batch.choices | ||
| assert choices[0].text, "empty response" | ||
| print(choices) | ||
| if mode == "single": | ||
| return | ||
| # aisbench test | ||
| run_aisbench_cases(model, port, aisbench_cases) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| import base64 | ||
| import os | ||
|
|
||
| import requests | ||
| from modelscope import snapshot_download # type: ignore | ||
|
|
||
| mm_dir = snapshot_download("vllm-ascend/mm_request", repo_type='dataset') | ||
| image_path = os.path.join(mm_dir, "test_mm2.jpg") | ||
| with open(image_path, 'rb') as image_file: | ||
| image_data = base64.b64encode(image_file.read()).decode('utf-8') | ||
|
|
||
| data = { | ||
| "messages": [{ | ||
| "role": | ||
| "user", | ||
| "content": [{ | ||
| "type": "text", | ||
| "text": "What is the content of this image?" | ||
| }, { | ||
| "type": "image_url", | ||
| "image_url": { | ||
| "url": f"data:image/jpeg;base64,{image_data}" | ||
| } | ||
| }] | ||
| }], | ||
| "eos_token_id": [1, 106], | ||
| "pad_token_id": | ||
| 0, | ||
| "top_k": | ||
| 64, | ||
| "top_p": | ||
| 0.95, | ||
| "max_tokens": | ||
| 8192, | ||
| "stream": | ||
| False | ||
| } | ||
|
|
||
| headers = {'Accept': 'application/json', 'Content-Type': 'application/json'} | ||
|
|
||
|
|
||
| def send_image_request(model, server): | ||
| data["model"] = model | ||
| url = server.url_for("v1", "chat", "completions") | ||
| response = requests.post(url, headers=headers, json=data) | ||
| print("Status Code:", response.status_code) | ||
| response_json = response.json() | ||
| print("Response:", response_json) | ||
| assert response_json["choices"][0]["message"]["content"], "empty response" | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This module performs I/O operations (downloading and reading a file) at the module level, which causes side effects upon import. Additionally, the
send_image_requestfunction modifies a globaldatadictionary. This is not a safe practice, especially in a testing environment where concurrency can lead to race conditions.To improve this, all I/O operations and data preparations should be encapsulated within the
send_image_requestfunction. This makes the function self-contained, eliminates import-time side effects, and avoids mutating global state. The suggested change refactors the code to follow these best practices.