Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
8edcacf
add a case
jiangyunfan1 Sep 17, 2025
9b55223
add a job
jiangyunfan1 Sep 17, 2025
798ca2e
fix some issues
jiangyunfan1 Sep 17, 2025
1543c62
fix some issues
jiangyunfan1 Sep 17, 2025
b4e6a7a
fix some issues
jiangyunfan1 Sep 17, 2025
3eb5cfe
fix some issues
jiangyunfan1 Sep 17, 2025
065ddf5
fix some issues
jiangyunfan1 Sep 17, 2025
ac0c471
fix some issues
jiangyunfan1 Sep 17, 2025
ce98fe8
fix some issues
jiangyunfan1 Sep 17, 2025
23f767a
fix some issues
jiangyunfan1 Sep 17, 2025
e87e849
fix some issues
jiangyunfan1 Sep 17, 2025
a3b1970
fix some issues
jiangyunfan1 Sep 17, 2025
6dca606
add changes
jiangyunfan1 Sep 18, 2025
4d10b7e
delete old files
jiangyunfan1 Oct 10, 2025
cc3c4a1
add a case of qwen3-32b-int8
jiangyunfan1 Oct 10, 2025
9ef536b
rebase
Potabk Oct 11, 2025
22fe4d7
fix
Potabk Oct 11, 2025
93a7749
add workflow
Potabk Oct 11, 2025
3eb536a
fix path
Potabk Oct 11, 2025
cc25561
fix
Potabk Oct 11, 2025
6a4003b
just for test
Potabk Oct 11, 2025
ba518fb
fix
Potabk Oct 11, 2025
799ebb3
add workflow
Potabk Oct 11, 2025
f852e30
fix
Potabk Oct 11, 2025
26762c8
rm vllm_use_v1 env
Potabk Oct 11, 2025
8dbb2ed
add port
Potabk Oct 11, 2025
24320bb
add trigger
Potabk Oct 11, 2025
93d8dfc
test
Potabk Oct 11, 2025
45bfb9c
test
Potabk Oct 11, 2025
fae574f
revert
Potabk Oct 11, 2025
5261d96
revert
Potabk Oct 11, 2025
4b6c11b
fix
Potabk Oct 11, 2025
ce6354f
fix
Potabk Oct 11, 2025
bbab015
fix
Potabk Oct 11, 2025
3bc091c
fix
Potabk Oct 11, 2025
346118b
add test
Potabk Oct 11, 2025
670e4a9
fix lint
Yikun Oct 12, 2025
c9b5986
Merge branch 'vllm-project:main' into main
jiangyunfan1 Oct 15, 2025
5058c08
Merge branch 'vllm-project:main' into main
jiangyunfan1 Oct 15, 2025
4a23a11
add nightly test aisbench cases
jiangyunfan1 Oct 15, 2025
e467d24
fix issues
jiangyunfan1 Oct 15, 2025
4a60fb7
fix import
jiangyunfan1 Oct 15, 2025
aa474b1
fix import
jiangyunfan1 Oct 15, 2025
cfbf3c9
ignore modelscope check
jiangyunfan1 Oct 15, 2025
3a76502
add aisbench workflow
jiangyunfan1 Oct 16, 2025
ceeb404
fix model param
jiangyunfan1 Oct 16, 2025
54618a0
fix model param
jiangyunfan1 Oct 16, 2025
dae2574
fix input param
jiangyunfan1 Oct 16, 2025
3e9cdf0
relax acc test threshold
jiangyunfan1 Oct 16, 2025
b373f85
acc to gsm8k-lite
jiangyunfan1 Oct 16, 2025
98285c6
stream_chat, not ignore_eos
jiangyunfan1 Oct 16, 2025
d30dd3e
remove acc case
jiangyunfan1 Oct 17, 2025
95b14ff
acc debug test
jiangyunfan1 Oct 17, 2025
a90f960
suqash
Yikun Oct 17, 2025
78e91c0
Merge remote-tracking branch 'origin'
jiangyunfan1 Oct 17, 2025
e3a994d
Merge branch 'vllm-project:main' into main
jiangyunfan1 Oct 18, 2025
ab95206
Merge remote-tracking branch 'origin/main'
jiangyunfan1 Oct 18, 2025
bd6051e
add qwen25vl
jiangyunfan1 Oct 18, 2025
6bc3cb5
add qwen25vl-7b workflow
jiangyunfan1 Oct 18, 2025
6e54d0e
Merge branch 'vllm-project:main' into main
jiangyunfan1 Oct 18, 2025
12be156
qwen25vl-7b a2
jiangyunfan1 Oct 18, 2025
91f25f2
bf16 acc and perf
jiangyunfan1 Oct 18, 2025
54d2b09
Add initial aisbench and Qwen3 32B
Yikun Oct 19, 2025
25c2541
Merge branch 'vllm-project:main' into main
jiangyunfan1 Oct 20, 2025
dd6dcd3
merge conflict
jiangyunfan1 Oct 20, 2025
389629d
test qwen25vl
jiangyunfan1 Oct 20, 2025
cad97d5
qwen25vl-7b a3
jiangyunfan1 Oct 20, 2025
5002483
add qwen25vl-7b perf
jiangyunfan1 Oct 20, 2025
c0ae66c
add a3 qwen3-32b
jiangyunfan1 Oct 20, 2025
adf1555
merge a2a3 qwen3-32b-int8
jiangyunfan1 Oct 20, 2025
05d9a81
remove a2 qwen3-32b-int8
jiangyunfan1 Oct 20, 2025
7b00198
fix index
jiangyunfan1 Oct 20, 2025
e04eb92
fix index
jiangyunfan1 Oct 20, 2025
58cafeb
mod a3 image
jiangyunfan1 Oct 20, 2025
d653d71
rm qwen3-32b a3
jiangyunfan1 Oct 21, 2025
06114b9
Merge branch 'vllm-project:main' into main
jiangyunfan1 Oct 21, 2025
3b9e94a
fix qwen3-32b-int8
jiangyunfan1 Oct 21, 2025
2368840
Merge remote-tracking branch 'origin/main'
jiangyunfan1 Oct 21, 2025
09845bb
add qwen3-32b-int8 8 cases
jiangyunfan1 Oct 21, 2025
a1eed85
fix yaml
jiangyunfan1 Oct 21, 2025
a4e3948
mod workflow
jiangyunfan1 Oct 21, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/_e2e_nightly.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,7 @@ jobs:
env:
VLLM_WORKER_MULTIPROC_METHOD: spawn
VLLM_USE_MODELSCOPE: True
VLLM_CI_RUNNER: ${{ inputs.runner }}
run: |
# TODO: enable more tests
pytest -sv ${{ inputs.tests }}
21 changes: 20 additions & 1 deletion .github/workflows/vllm_ascend_test_nightly.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ defaults:
# and ignore the lint / 1 card / 4 cards test type
concurrency:
group: ascend-nightly-${{ github.ref }}
cancel-in-progress: true
#cancel-in-progress: true

jobs:
qwen3-32b:
Expand All @@ -56,3 +56,22 @@ jobs:
vllm: v0.11.0
runner: ${{ matrix.os }}
tests: tests/e2e/nightly/models/test_qwen3_32b.py
qwen3-32b-in8-a3:
strategy:
matrix:
os: [linux-aarch64-a3-4]
uses: ./.github/workflows/_e2e_nightly.yaml
with:
vllm: v0.11.0
runner: ${{ matrix.os }}
image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.2.rc1-a3-ubuntu22.04-py3.11
tests: tests/e2e/nightly/models/test_qwen3_32b_int8.py
qwen3-32b-in8-a2:
strategy:
matrix:
os: [linux-aarch64-a2-4]
uses: ./.github/workflows/_e2e_nightly.yaml
with:
vllm: v0.11.0
runner: ${{ matrix.os }}
tests: tests/e2e/nightly/models/test_qwen3_32b_int8.py
110 changes: 110 additions & 0 deletions tests/e2e/nightly/models/test_qwen2_5_vl_7b.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
# Copyright 2023 The vLLM team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is a part of the vllm-ascend project.
#
from typing import Any

import openai
import pytest
from vllm.utils import get_open_port

from tests.e2e.conftest import RemoteOpenAIServer
from tools.aisbench import run_aisbench_cases
from tools.send_mm_request import send_image_request

MODELS = [
"Qwen/Qwen2.5-VL-7B-Instruct",
]

TENSOR_PARALLELS = [4]

prompts = [
"San Francisco is a",
]

api_keyword_args = {
"max_tokens": 10,
}

aisbench_cases = [{
"case_type": "accuracy",
"dataset_path": "vllm-ascend/textvqa-lite",
"request_conf": "vllm_api_stream_chat",
"dataset_conf": "textvqa/textvqa_gen_base64",
"max_out_len": 2048,
"batch_size": 128,
"baseline": 81,
"threshold": 5
}, {
"case_type": "performance",
"dataset_path": "vllm-ascend/textvqa-perf-1080p",
"request_conf": "vllm_api_stream_chat",
"dataset_conf": "textvqa/textvqa_gen_base64",
"num_prompts": 512,
"max_out_len": 256,
"batch_size": 128,
"request_rate": 0,
"baseline": 1,
"threshold": 0.97
}]


@pytest.mark.asyncio
@pytest.mark.parametrize("model", MODELS)
@pytest.mark.parametrize("tp_size", TENSOR_PARALLELS)
async def test_models(model: str, tp_size: int) -> None:
port = get_open_port()
env_dict = {
"TASK_QUEUE_ENABLE": "1",
"VLLM_ASCEND_ENABLE_NZ": "0",
"HCCL_OP_EXPANSION_MODE": "AIV"
}
server_args = [
"--no-enable-prefix-caching",
"--disable-mm-preprocessor-cache",
"--tensor-parallel-size",
str(tp_size),
"--port",
str(port),
"--max-model-len",
"30000",
"--max-num-batched-tokens",
"40000",
"--max-num-seqs",
"400",
"--trust-remote-code",
"--gpu-memory-utilization",
"0.8",
]
request_keyword_args: dict[str, Any] = {
**api_keyword_args,
}
with RemoteOpenAIServer(model,
server_args,
server_port=port,
env_dict=env_dict,
auto_port=False) as server:
client = server.get_async_client()
batch = await client.completions.create(
model=model,
prompt=prompts,
**request_keyword_args,
)
choices: list[openai.types.CompletionChoice] = batch.choices
assert choices[0].text, "empty response"
print(choices)
send_image_request(model, server)
# aisbench test
run_aisbench_cases(model, port, aisbench_cases)
118 changes: 118 additions & 0 deletions tests/e2e/nightly/models/test_qwen3_32b_int8.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
# Copyright 2023 The vLLM team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is a part of the vllm-ascend project.
#
import os
from typing import Any

import openai
import pytest
from vllm.utils import get_open_port

from tests.e2e.conftest import RemoteOpenAIServer
from tools.aisbench import run_aisbench_cases

MODELS = [
"vllm-ascend/Qwen3-32B-W8A8",
]

MODES = [
"aclgraph",
"single",
]

TENSOR_PARALLELS = [4]

prompts = [
"San Francisco is a",
]

api_keyword_args = {
"max_tokens": 10,
}

batch_size_dict = {
"linux-aarch64-a2-4": 44,
"linux-aarch64-a3-4": 46,
}
VLLM_CI_RUNNER = os.getenv("VLLM_CI_RUNNER", "linux-aarch64-a2-4")
performance_batch_size = batch_size_dict.get(VLLM_CI_RUNNER, 1)

aisbench_cases = [{
"case_type": "performance",
"dataset_path": "vllm-ascend/GSM8K-in3500-bs400",
"request_conf": "vllm_api_stream_chat",
"dataset_conf": "gsm8k/gsm8k_gen_0_shot_cot_str_perf",
"num_prompts": 4 * performance_batch_size,
"max_out_len": 1500,
"batch_size": performance_batch_size,
"baseline": 1,
"threshold": 0.97
}, {
"case_type": "accuracy",
"dataset_path": "vllm-ascend/aime2024",
"request_conf": "vllm_api_general_chat",
"dataset_conf": "aime2024/aime2024_gen_0_shot_chat_prompt",
"max_out_len": 32768,
"batch_size": 32,
"baseline": 83.33,
"threshold": 17
}]


@pytest.mark.asyncio
@pytest.mark.parametrize("model", MODELS)
@pytest.mark.parametrize("mode", MODES)
@pytest.mark.parametrize("tp_size", TENSOR_PARALLELS)
async def test_models(model: str, mode: str, tp_size: int) -> None:
port = get_open_port()
env_dict = {
"TASK_QUEUE_ENABLE": "1",
"OMP_PROC_BIND": "false",
"HCCL_OP_EXPANSION_MODE": "AIV",
"PAGED_ATTENTION_MASK_LEN": "5500"
}
server_args = [
"--quantization", "ascend", "--no-enable-prefix-caching",
"--tensor-parallel-size",
str(tp_size), "--port",
str(port), "--max-model-len", "36864", "--max-num-batched-tokens",
"36864", "--block-size", "128", "--trust-remote-code",
"--gpu-memory-utilization", "0.9", "--additional-config",
'{"enable_weight_nz_layout":true}'
]
if mode == "single":
server_args.append("--enforce-eager")
request_keyword_args: dict[str, Any] = {
**api_keyword_args,
}
with RemoteOpenAIServer(model,
server_args,
server_port=port,
env_dict=env_dict,
auto_port=False) as server:
client = server.get_async_client()
batch = await client.completions.create(
model=model,
prompt=prompts,
**request_keyword_args,
)
choices: list[openai.types.CompletionChoice] = batch.choices
assert choices[0].text, "empty response"
print(choices)
if mode == "single":
return
# aisbench test
run_aisbench_cases(model, port, aisbench_cases)
11 changes: 9 additions & 2 deletions tools/aisbench.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,9 @@ def _init_dataset_conf(self):
if self.task_type == "performance":
conf_path = os.path.join(DATASET_CONF_DIR,
f'{self.dataset_conf}.py')
if self.dataset_conf.startswith("textvqa"):
self.dataset_path = os.path.join(self.dataset_path,
"textvqa_val.jsonl")
with open(conf_path, 'r', encoding='utf-8') as f:
content = f.read()
content = re.sub(r'path=.*', f'path="{self.dataset_path}",',
Expand Down Expand Up @@ -180,9 +183,13 @@ def _wait_for_task(self):
def _get_result_performance(self):
result_dir = re.search(r'Performance Result files locate in (.*)',
self.result_line).group(1)[:-1]
result_csv_file = os.path.join(result_dir, "gsm8kdataset.csv")
result_json_file = os.path.join(result_dir, "gsm8kdataset.json")
dataset_type = self.dataset_conf.split('/')[0]
result_csv_file = os.path.join(result_dir,
f"{dataset_type}dataset.csv")
result_json_file = os.path.join(result_dir,
f"{dataset_type}dataset.json")
self.result_csv = pd.read_csv(result_csv_file)
print("Getting performance results from file: ", result_json_file)
with open(result_json_file, 'r', encoding='utf-8') as f:
self.result_json = json.load(f)

Expand Down
49 changes: 49 additions & 0 deletions tools/send_mm_request.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
import base64
import os

import requests
from modelscope import snapshot_download # type: ignore

mm_dir = snapshot_download("vllm-ascend/mm_request", repo_type='dataset')
image_path = os.path.join(mm_dir, "test_mm2.jpg")
with open(image_path, 'rb') as image_file:
image_data = base64.b64encode(image_file.read()).decode('utf-8')

data = {
"messages": [{
"role":
"user",
"content": [{
"type": "text",
"text": "What is the content of this image?"
}, {
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{image_data}"
}
}]
}],
"eos_token_id": [1, 106],
"pad_token_id":
0,
"top_k":
64,
"top_p":
0.95,
"max_tokens":
8192,
"stream":
False
}

headers = {'Accept': 'application/json', 'Content-Type': 'application/json'}


def send_image_request(model, server):
data["model"] = model
url = server.url_for("v1", "chat", "completions")
response = requests.post(url, headers=headers, json=data)
print("Status Code:", response.status_code)
response_json = response.json()
print("Response:", response_json)
assert response_json["choices"][0]["message"]["content"], "empty response"
Comment on lines +7 to +49
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This module performs I/O operations (downloading and reading a file) at the module level, which causes side effects upon import. Additionally, the send_image_request function modifies a global data dictionary. This is not a safe practice, especially in a testing environment where concurrency can lead to race conditions.

To improve this, all I/O operations and data preparations should be encapsulated within the send_image_request function. This makes the function self-contained, eliminates import-time side effects, and avoids mutating global state. The suggested change refactors the code to follow these best practices.

headers = {'Accept': 'application/json', 'Content-Type': 'application/json'}


def def send_image_request(model, server):
    mm_dir = snapshot_download("vllm-ascend/mm_request", repo_type='dataset')
    image_path = os.path.join(mm_dir, "test_mm2.jpg")
    with open(image_path, 'rb') as image_file:
        image_data = base64.b64encode(image_file.read()).decode('utf-8')

    data = {
        "model": model,
        "messages": [{
            "role":
            "user",
            "content": [{
                "type": "text",
                "text": "What is the content of this image?"
            }, {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{image_data}"
                }
            }]
        }],
        "eos_token_id": [1, 106],
        "pad_token_id":
        0,
        "top_k":
        64,
        "top_p":
        0.95,
        "max_tokens":
        8192,
        "stream":
        False
    }

    url = server.url_for("v1", "chat", "completions")
    response = requests.post(url, headers=headers, json=data)
    print("Status Code:", response.status_code)
    response.raise_for_status()
    response_json = response.json()
    print("Response:", response_json)
    assert response_json["choices"][0]["message"]["content"], "empty response"

Loading