Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 102 additions & 0 deletions .github/workflows/vllm_ascend_doctest.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
#
# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is a part of the vllm-ascend project.
#

name: 'e2e test / doctest'

on:
workflow_dispatch:
pull_request:
branches:
- 'main'
- '*-dev'
paths:
# If we are changing the doctest we should do a PR test
- '.github/workflows/vllm_ascend_doctest.yaml'
- 'tests/e2e/doctests/**'
- 'tests/e2e/common.sh'
- 'tests/e2e/run_doctests.sh'
schedule:
# Runs every 4 hours
- cron: '0 */4 * * *'

# Bash shells do not use ~/.profile or ~/.bashrc so these shells need to be explicitly
# declared as "shell: bash -el {0}" on steps that need to be properly activated.
# It's used to activate ascend-toolkit environment variables.
defaults:
run:
shell: bash -el {0}

jobs:
test:
strategy:
# Each version should be tested
fail-fast: false
matrix:
vllm_verison: [main, v0.7.3-dev, main-openeuler, v0.7.3-dev-openeuler]
name: vLLM Ascend test
runs-on: linux-arm64-npu-1
container:
image: m.daocloud.io/quay.io/ascend/vllm-ascend:${{ matrix.vllm_verison }}
steps:
- name: Check NPU/CANN and git info
run: |
echo "====> Print NPU/CANN info"
npu-smi info
cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info

echo "====> Print vllm-ascend git info"
cd /vllm-workspace/vllm-ascend
git --no-pager log -1 || true
echo "====> Print vllm git info"
cd /vllm-workspace/vllm
git --no-pager log -1 || true

- name: Config OS mirrors - Ubuntu
if: ${{ !endsWith(matrix.vllm_verison, '-openeuler') }}
run: |
sed -i 's|ports.ubuntu.com|mirrors.tuna.tsinghua.edu.cn|g' /etc/apt/sources.list
apt-get update -y
apt install git curl -y

- name: Config OS mirrors - openEuler
if: ${{ endsWith(matrix.vllm_verison, '-openeuler') }}
run: |
yum update -y
yum install git curl -y

- name: Config pip mirrors
run: |
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

- name: Checkout vllm-project/vllm-ascend repo
uses: actions/checkout@v4

- name: Run vllm-ascend/tests/e2e/run_doctests.sh
run: |
# PWD: /__w/vllm-ascend/vllm-ascend
# Address old branch like v0.7.3:
if [ ! -d /vllm-workspace/vllm-ascend/tests/e2e ]; then
echo "Warning: the doctest path doesn't exists, copy now"
cp -r tests/e2e /vllm-workspace/vllm-ascend/tests/
fi

# Simulate container to enter directory
cd /workspace

# Run real test
echo "Test:"
/vllm-workspace/vllm-ascend/tests/e2e/run_doctests.sh
11 changes: 8 additions & 3 deletions docs/source/quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ The default workdir is `/workspace`, vLLM and vLLM Ascend code are placed in `/v

You can use Modelscope mirror to speed up download:

<!-- tests/e2e/doctest/001-quickstart-test.sh should be considered updating as well -->
```bash
export VLLM_USE_MODELSCOPE=true
```
Expand All @@ -81,6 +82,7 @@ With vLLM installed, you can start generating texts for list of input prompts (i

Try to run below Python script directly or use `python3` shell to generate texts:

<!-- tests/e2e/doctest/001-quickstart-test.sh should be considered updating as well -->
```python
from vllm import LLM, SamplingParams

Expand Down Expand Up @@ -108,6 +110,7 @@ vLLM can also be deployed as a server that implements the OpenAI API protocol. R
the following command to start the vLLM server with the
[Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) model:

<!-- tests/e2e/doctest/001-quickstart-test.sh should be considered updating as well -->
```bash
# Deploy vLLM server (The first run will take about 3-5 mins (10 MB/s) to download models)
vllm serve Qwen/Qwen2.5-0.5B-Instruct &
Expand All @@ -125,12 +128,14 @@ Congratulations, you have successfully started the vLLM server!

You can query the list the models:

<!-- tests/e2e/doctest/001-quickstart-test.sh should be considered updating as well -->
```bash
curl http://localhost:8000/v1/models | python3 -m json.tool
```

You can also query the model with input prompts:

<!-- tests/e2e/doctest/001-quickstart-test.sh should be considered updating as well -->
```bash
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
Expand All @@ -145,10 +150,10 @@ curl http://localhost:8000/v1/completions \
vLLM is serving as background process, you can use `kill -2 $VLLM_PID` to stop the background process gracefully,
it's equal to `Ctrl-C` to stop foreground vLLM process:

<!-- tests/e2e/doctest/001-quickstart-test.sh should be considered updating as well -->
```bash
ps -ef | grep "/.venv/bin/vllm serve" | grep -v grep
VLLM_PID=`ps -ef | grep "/.venv/bin/vllm serve" | grep -v grep | awk '{print $2}'`
kill -2 $VLLM_PID
VLLM_PID=$(pgrep -f "vllm serve")
kill -2 "$VLLM_PID"
```

You will see output as below:
Expand Down
51 changes: 51 additions & 0 deletions tests/e2e/common.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# bash fonts colors
cyan='\e[96m'
yellow='\e[33m'
red='\e[31m'
none='\e[0m'

_cyan() { echo -e "${cyan}$*${none}"; }
_yellow() { echo -e "${yellow}$*${none}"; }
_red() { echo -e "${red}$*${none}"; }

_info() { _cyan "Info: $*"; }
_warn() { _yellow "Warn: $*"; }
_err() { _red "Error: $*" && exit 1; }

CURL_TIMEOUT=1
CURL_COOLDOWN=5
CURL_MAX_TRIES=120

function wait_url_ready() {
local serve_name="$1"
local url="$2"
i=0
while true; do
_info "===> Waiting for ${serve_name} to be ready...${i}s"
i=$((i + CURL_COOLDOWN))
set +e
curl --silent --max-time "$CURL_TIMEOUT" "${url}" >/dev/null
result=$?
set -e
if [ "$result" -eq 0 ]; then
break
fi
if [ "$i" -gt "$CURL_MAX_TRIES" ]; then
_info "===> \$CURL_MAX_TRIES exceeded waiting for ${serve_name} to be ready"
return 1
fi
sleep "$CURL_COOLDOWN"
done
_info "===> ${serve_name} is ready."
}

function wait_for_exit() {
local VLLM_PID="$1"
while kill -0 "$VLLM_PID"; do
_info "===> Wait for ${VLLM_PID} to exit."
sleep 1
done
_info "===> Wait for ${VLLM_PID} to exit."
}

SCRIPT_DIR=$(cd "$(dirname "$0")" && pwd)
55 changes: 55 additions & 0 deletions tests/e2e/doctests/001-quickstart-test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
#!/bin/bash

#
# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is a part of the vllm-ascend project.
#

function simple_test() {
# Do real import test
python3 -c "import vllm; print(vllm.__version__)"
}

function quickstart_offline_test() {
export VLLM_USE_MODELSCOPE=true
# Do real script test
python3 "${SCRIPT_DIR}/../../examples/offline_inference_npu.py"
}

function quickstart_online_test() {
export VLLM_USE_MODELSCOPE=true
vllm serve Qwen/Qwen2.5-0.5B-Instruct &
wait_url_ready "vllm serve" "localhost:8000/v1/models"
# Do real curl test
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen2.5-0.5B-Instruct",
"prompt": "Beijing is a",
"max_tokens": 5,
"temperature": 0
}' | python3 -m json.tool
VLLM_PID=$(pgrep -f "vllm serve")
_info "===> Try kill -2 ${VLLM_PID} to exit."
kill -2 "$VLLM_PID"
wait_for_exit "$VLLM_PID"
}

_info "====> Start simple_test"
simple_test
_info "====> Start quickstart_offline_test"
quickstart_offline_test
_info "====> Start quickstart_online_test"
quickstart_online_test
27 changes: 27 additions & 0 deletions tests/e2e/run_doctests.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/bin/bash

#
# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is a part of the vllm-ascend project.
#

set -eo errexit

. $(dirname "$0")/common.sh

_info "====> Start Quickstart test"
. "${SCRIPT_DIR}/doctests/001-quickstart-test.sh"

_info "Doctest passed."