|
| 1 | +<p align="center"> |
| 2 | + <picture> |
| 3 | + <!-- TODO: Replace tmp link to logo url after vllm-projects/vllm-ascend ready --> |
| 4 | + <source media="(prefers-color-scheme: dark)" srcset="https://github.com/user-attachments/assets/4a958093-58b5-4772-a942-638b51ced646"> |
| 5 | + <img alt="vllm-ascend" src="https://github.com/user-attachments/assets/838afe2f-9a1d-42df-9758-d79b31556de0" width=55%> |
| 6 | + </picture> |
| 7 | +</p> |
| 8 | + |
| 9 | +<h3 align="center"> |
| 10 | +vLLM Ascend Plugin |
| 11 | +</h3> |
| 12 | + |
| 13 | +<p align="center"> |
| 14 | +| <a href="https://www.hiascend.com/en/"><b>关于昇腾</b></a> | <a href="https://slack.vllm.ai"><b>开发者 Slack (#sig-ascend)</b></a> | |
| 15 | +</p> |
| 16 | + |
| 17 | +<p align="center"> |
| 18 | +<a href="README.md"><b>English</b></a> | <a><b>中文</b></a> |
| 19 | +</p> |
| 20 | + |
| 21 | +--- |
| 22 | +*最新消息* 🔥 |
| 23 | + |
| 24 | +- [2024/12] 我们正在与 vLLM 社区合作,以支持 [[RFC]: Hardware pluggable](https://github.com/vllm-project/vllm/issues/11162). |
| 25 | +--- |
| 26 | +## 总览 |
| 27 | + |
| 28 | +vLLM 昇腾插件 (`vllm-ascend`) 是一个让vLLM在Ascend NPU无缝运行的后端插件。 |
| 29 | + |
| 30 | +此插件是 vLLM 社区中支持昇腾后端的推荐方式。它遵循[[RFC]: Hardware pluggable](https://github.com/vllm-project/vllm/issues/11162)所述原则:通过解耦的方式提供了vLLM对Ascend NPU的支持。 |
| 31 | + |
| 32 | +使用 vLLM 昇腾插件,可以让类Transformer、混合专家(MOE)、嵌入、多模态等流行的大语言模型在 Ascend NPU 上无缝运行。 |
| 33 | + |
| 34 | +## 前提 |
| 35 | +### 支持的设备 |
| 36 | +- Atlas A2 训练系列 (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2) |
| 37 | +- Atlas 800I A2 推理系列 (Atlas 800I A2) |
| 38 | + |
| 39 | +### 依赖 |
| 40 | +| 需求 | 支持的版本 | 推荐版本 | 注意 | |
| 41 | +|-------------|-------------------| ----------- |------------------------------------------| |
| 42 | +| vLLM | main | main | vllm-ascend 依赖 | |
| 43 | +| Python | >= 3.9 | [3.10](https://www.python.org/downloads/) | vllm 依赖 | |
| 44 | +| CANN | >= 8.0.RC2 | [8.0.RC3](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.0.beta1) | vllm-ascend and torch-npu 依赖 | |
| 45 | +| torch-npu | >= 2.4.0 | [2.5.1rc1](https://gitee.com/ascend/pytorch/releases/tag/v6.0.0.alpha001-pytorch2.5.1) | vllm-ascend 依赖 | |
| 46 | +| torch | >= 2.4.0 | [2.5.1](https://github.com/pytorch/pytorch/releases/tag/v2.5.1) | torch-npu and vllm 依赖 | |
| 47 | + |
| 48 | +在[此处](docs/environment.zh.md)了解更多如何配置您环境的信息。 |
| 49 | + |
| 50 | +## 开始使用 |
| 51 | + |
| 52 | +> [!NOTE] |
| 53 | +> 目前,我们正在积极与 vLLM 社区合作以支持 Ascend 后端插件,一旦支持,您可以使用一行命令: `pip install vllm vllm-ascend` 来完成安装。 |
| 54 | +
|
| 55 | +通过源码安装: |
| 56 | +```bash |
| 57 | +# 安装vllm main 分支参考文档: |
| 58 | +# https://docs.vllm.ai/en/latest/getting_started/installation/cpu/index.html#build-wheel-from-source |
| 59 | +git clone --depth 1 https://github.com/vllm-project/vllm.git |
| 60 | +cd vllm |
| 61 | +pip install -r requirements-build.txt |
| 62 | +VLLM_TARGET_DEVICE=empty pip install . |
| 63 | + |
| 64 | +# 安装vllm-ascend main 分支 |
| 65 | +git clone https://github.com/vllm-project/vllm-ascend.git |
| 66 | +cd vllm-ascend |
| 67 | +pip install -e . |
| 68 | +``` |
| 69 | + |
| 70 | +运行如下命令使用 [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) 模型启动服务: |
| 71 | + |
| 72 | +```bash |
| 73 | +# 设置环境变量 VLLM_USE_MODELSCOPE=true 加速下载 |
| 74 | +vllm serve Qwen/Qwen2.5-0.5B-Instruct |
| 75 | +curl http://localhost:8000/v1/models |
| 76 | +``` |
| 77 | + |
| 78 | +请参阅 [vLLM 快速入门](https://docs.vllm.ai/en/latest/getting_started/quickstart.html)以获取更多详细信息。 |
| 79 | + |
| 80 | +## 构建 |
| 81 | + |
| 82 | +#### 从源码构建Python包 |
| 83 | + |
| 84 | +```bash |
| 85 | +git clone https://github.com/vllm-project/vllm-ascend.git |
| 86 | +cd vllm-ascend |
| 87 | +pip install -e . |
| 88 | +``` |
| 89 | + |
| 90 | +#### 构建容器镜像 |
| 91 | +```bash |
| 92 | +git clone https://github.com/vllm-project/vllm-ascend.git |
| 93 | +cd vllm-ascend |
| 94 | +docker build -t vllm-ascend-dev-image -f ./Dockerfile . |
| 95 | +``` |
| 96 | + |
| 97 | +查看[构建和测试](./CONTRIBUTING.zh.md)以获取更多详细信息,其中包含逐步指南,帮助您设置开发环境、构建和测试。 |
| 98 | + |
| 99 | +## 特性支持矩阵 |
| 100 | +| Feature | Supported | Note | |
| 101 | +|---------|-----------|------| |
| 102 | +| Chunked Prefill | ✗ | Plan in 2025 Q1 | |
| 103 | +| Automatic Prefix Caching | ✅ | Imporve performance in 2025 Q1 | |
| 104 | +| LoRA | ✗ | Plan in 2025 Q1 | |
| 105 | +| Prompt adapter | ✅ || |
| 106 | +| Speculative decoding | ✅ | Impore accuracy in 2025 Q1| |
| 107 | +| Pooling | ✗ | Plan in 2025 Q1 | |
| 108 | +| Enc-dec | ✗ | Plan in 2025 Q1 | |
| 109 | +| Multi Modality | ✅ (LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Add more model support in 2025 Q1 | |
| 110 | +| LogProbs | ✅ || |
| 111 | +| Prompt logProbs | ✅ || |
| 112 | +| Async output | ✅ || |
| 113 | +| Multi step scheduler | ✅ || |
| 114 | +| Best of | ✅ || |
| 115 | +| Beam search | ✅ || |
| 116 | +| Guided Decoding | ✗ | Plan in 2025 Q1 | |
| 117 | + |
| 118 | +## 模型支持矩阵 |
| 119 | + |
| 120 | +此处展示了部分受支持的模型。有关更多详细信息,请参阅 [supported_models](docs/supported_models.md): |
| 121 | +| Model | Supported | Note | |
| 122 | +|---------|-----------|------| |
| 123 | +| Qwen 2.5 | ✅ || |
| 124 | +| Mistral | | Need test | |
| 125 | +| DeepSeek v2.5 | |Need test | |
| 126 | +| LLama3.1/3.2 | ✅ || |
| 127 | +| Gemma-2 | |Need test| |
| 128 | +| baichuan | |Need test| |
| 129 | +| minicpm | |Need test| |
| 130 | +| internlm | ✅ || |
| 131 | +| ChatGLM | ✅ || |
| 132 | +| InternVL 2.5 | ✅ || |
| 133 | +| Qwen2-VL | ✅ || |
| 134 | +| GLM-4v | |Need test| |
| 135 | +| Molomo | ✅ || |
| 136 | +| LLaVA 1.5 | ✅ || |
| 137 | +| Mllama | |Need test| |
| 138 | +| LLaVA-Next | |Need test| |
| 139 | +| LLaVA-Next-Video | |Need test| |
| 140 | +| Phi-3-Vison/Phi-3.5-Vison | |Need test| |
| 141 | +| Ultravox | |Need test| |
| 142 | +| Qwen2-Audio | ✅ || |
| 143 | + |
| 144 | + |
| 145 | +## 贡献 |
| 146 | +我们欢迎并重视任何形式的贡献与合作: |
| 147 | +- 请通过[提交问题](https://github.com/vllm-project/vllm-ascend/issues)来告知我们您遇到的任何错误。 |
| 148 | +- 请参阅 [CONTRIBUTING.zh.md](./CONTRIBUTING.zh.md) 中的贡献指南。 |
| 149 | +## 许可证 |
| 150 | + |
| 151 | +Apache 许可证 2.0,如 [LICENSE](./LICENSE) 文件中所示。 |
0 commit comments