|
| 1 | +--- |
| 2 | +title: 索引 |
| 3 | +--- |
| 4 | + |
| 5 | + |
| 6 | +# 欢迎来到 vLLM! |
| 7 | + |
| 8 | + |
| 9 | +vLLM 是一个快速且易于使用的库,专为大型语言模型 (LLM) 的推理和部署而设计。 |
| 10 | + |
| 11 | + |
| 12 | +vLLM 的核心特性包括: |
| 13 | + |
| 14 | +* 最先进的服务吞吐量 |
| 15 | + |
| 16 | +* 使用 **PagedAttention** 高效管理注意力键和值的内存 |
| 17 | + |
| 18 | +* 连续批处理传入请求 |
| 19 | + |
| 20 | +* 使用 CUDA/HIP 图实现快速执行模型 |
| 21 | + |
| 22 | +* 量化: [GPTQ](https://arxiv.org/abs/2210.17323), [AWQ](https://arxiv.org/abs/2306.00978), INT4, INT8, 和 FP8 |
| 23 | + |
| 24 | +* 优化的 CUDA 内核,包括与 FlashAttention 和 FlashInfer 的集成 |
| 25 | + |
| 26 | +* 推测性解码 |
| 27 | + |
| 28 | +* 分块预填充 |
| 29 | + |
| 30 | + |
| 31 | +vLLM 的灵活性和易用性体现在以下方面: |
| 32 | + |
| 33 | +* 无缝集成流行的 HuggingFace 模型 |
| 34 | + |
| 35 | +* 具有高吞吐量服务以及各种解码算法,包括*并行采样*、*束搜索*等 |
| 36 | + |
| 37 | +* 支持张量并行和流水线并行的分布式推理 |
| 38 | + |
| 39 | +* 流式输出 |
| 40 | + |
| 41 | +* 提供与 OpenAI 兼容的 API 服务器 |
| 42 | + |
| 43 | +* 支持 NVIDIA GPU、AMD CPU 和 GPU、Intel CPU 和 GPU、PowerPC CPU、TPU 以及 AWS Neuron |
| 44 | + |
| 45 | +* 前缀缓存支持 |
| 46 | + |
| 47 | +* 支持多 LoRA |
| 48 | + |
| 49 | + |
| 50 | +欲了解更多信息,请参阅以下内容: |
| 51 | + |
| 52 | +* [vLLM announcing blog post](https://vllm.ai) (PagedAttention 教程) |
| 53 | + |
| 54 | +* [vLLM paper](https://arxiv.org/abs/2309.06180) (SOSP 2023) |
| 55 | + |
| 56 | +* [How continuous batching enables 23x throughput in LLM inference |
| 57 | +](https://www.anyscale.com/blog/continuous-batching-llm-inference) [while reducing p50 |
| 58 | +](https://www.anyscale.com/blog/continuous-batching-llm-inference)[ ](https://www.anyscale.com/blog/continuous-batching-llm-inference)[latency](https://www.anyscale.com/blog/continuous-batching-llm-inference) |
| 59 | + by Cade Daniel et al. |
| 60 | + |
| 61 | +* [vLLM 聚会](https://vllm.hyper.ai/docs/community/vllm-meetups) |
| 62 | + |
| 63 | + |
| 64 | +## 文档 |
| 65 | + |
| 66 | +### 入门 |
| 67 | + |
| 68 | +[安装](https://vllm.hyper.ai/docs/getting-started/installation) |
| 69 | + |
| 70 | +[使用 ROCm 进行安装](https://vllm.hyper.ai/docs/getting-started/installation-with-rocm) |
| 71 | + |
| 72 | +[使用 OpenVINO 进行安装](https://vllm.hyper.ai/docs/getting-started/installation-with-openvino) |
| 73 | + |
| 74 | +[使用 CPU 进行安装](https://vllm.hyper.ai/docs/getting-started/installation-with-cpu) |
| 75 | + |
| 76 | +[使用 Neuron 进行安装](https://vllm.hyper.ai/docs/getting-started/installation-with-neuron) |
| 77 | + |
| 78 | +[使用 TPU 进行安装](https://vllm.hyper.ai/docs/getting-started/installation-with-tpu) |
| 79 | + |
| 80 | +[使用 XPU 进行安装](https://vllm.hyper.ai/docs/getting-started/installation-with-xpu) |
| 81 | + |
| 82 | +[快速入门](https://vllm.hyper.ai/docs/getting-started/quickstart) |
| 83 | + |
| 84 | +[调试提示](https://vllm.hyper.ai/docs/getting-started/debugging-tips) |
| 85 | + |
| 86 | +[示例](https://vllm.hyper.ai/docs/getting-started/examples/) |
| 87 | + |
| 88 | + |
| 89 | +### 部署 |
| 90 | + |
| 91 | +[OpenAI 兼容服务器](https://vllm.hyper.ai/docs/serving/openai-compatible-server) |
| 92 | + |
| 93 | +[使用 Docker 部署](https://vllm.hyper.ai/docs/serving/deploying-with-docker) |
| 94 | + |
| 95 | +[分布式推理和服务](https://vllm.hyper.ai/docs/serving/distributed-inference-and-serving) |
| 96 | + |
| 97 | +[生产指标](https://vllm.hyper.ai/docs/serving/production-metrics) |
| 98 | + |
| 99 | +[环境变量](https://vllm.hyper.ai/docs/serving/environment-variables) |
| 100 | + |
| 101 | +[使用统计数据收集](https://vllm.hyper.ai/docs/serving/usage-stats-collection) |
| 102 | + |
| 103 | +[整合](https://vllm.hyper.ai/docs/serving/integrations/) |
| 104 | + |
| 105 | +[使用 CoreWeave 的 Tensorizer 加载模型](https://vllm.hyper.ai/docs/serving/tensorizer) |
| 106 | + |
| 107 | +[兼容性矩阵](https://vllm.hyper.ai/docs/serving/compatibility%20matrix) |
| 108 | + |
| 109 | +[常见问题解答](https://vllm.hyper.ai/docs/serving/frequently-asked-questions) |
| 110 | + |
| 111 | + |
| 112 | +### 模型 |
| 113 | + |
| 114 | +[支持的模型](https://vllm.hyper.ai/docs/models/supported-models) |
| 115 | + |
| 116 | +[添加新模型](https://vllm.hyper.ai/docs/models/adding-a-new-model) |
| 117 | + |
| 118 | +[启用多模态输入](https://vllm.hyper.ai/docs/models/enabling-multimodal-inputs) |
| 119 | + |
| 120 | +[引擎参数](https://vllm.hyper.ai/docs/models/engine-arguments) |
| 121 | + |
| 122 | +[使用 LoRA 适配器](https://vllm.hyper.ai/docs/models/using-lora-adapters) |
| 123 | + |
| 124 | +[使用 VLMs](https://vllm.hyper.ai/docs/models/using-vlms) |
| 125 | + |
| 126 | +[在 vLLM 中使用推测性解码](https://vllm.hyper.ai/docs/models/speculative-decoding-in-vllm) |
| 127 | + |
| 128 | +[性能和调优](https://vllm.hyper.ai/docs/models/performance-and-tuning) |
| 129 | + |
| 130 | + |
| 131 | +### 量化 |
| 132 | + |
| 133 | +[量化内核支持的硬件](https://vllm.hyper.ai/docs/quantization/supported_hardware) |
| 134 | + |
| 135 | +[AutoAWQ](https://vllm.hyper.ai/docs/quantization/autoawq) |
| 136 | + |
| 137 | +[BitsAndBytes](https://vllm.hyper.ai/docs/quantization/bitsandbytes) |
| 138 | + |
| 139 | +[GGUF](https://vllm.hyper.ai/docs/quantization/gguf) |
| 140 | + |
| 141 | +[INT8 W8A8](https://vllm.hyper.ai/docs/quantization/int8-w8a8) |
| 142 | + |
| 143 | +[FP8 W8A8](https://vllm.hyper.ai/docs/quantization/fp8-w8a8) |
| 144 | + |
| 145 | +[FP8 E5M2 KV 缓存](https://vllm.hyper.ai/docs/quantization/fp8-e5m2-kv-cache) |
| 146 | + |
| 147 | +[FP8 E4M3 KV 缓存](https://vllm.hyper.ai/docs/quantization/fp8-e4m3-kv-cache) |
| 148 | + |
| 149 | + |
| 150 | +### 自动前缀缓存 |
| 151 | + |
| 152 | +[简介](https://vllm.hyper.ai/docs/automatic-prefix-caching/introduction-apc) |
| 153 | + |
| 154 | +[实现](https://vllm.hyper.ai/docs/automatic-prefix-caching/implementation) |
| 155 | + |
| 156 | +[广义缓存策略](https://vllm.hyper.ai/docs/automatic-prefix-caching/implementation) |
| 157 | + |
| 158 | +### 性能基准测试 |
| 159 | + |
| 160 | +[vLLM 的基准套件](https://vllm.hyper.ai/docs/performance-benchmarks/benchmark-suites-of-vllm) |
| 161 | + |
| 162 | + |
| 163 | +### 开发者文档 |
| 164 | + |
| 165 | +[采样参数](https://vllm.hyper.ai/docs/developer-documentation/sampling-parameters) |
| 166 | + |
| 167 | +[离线推理](https://vllm.hyper.ai/docs/developer-documentation/offline-inference/) |
| 168 | + |
| 169 | +- [LLM 类](https://vllm.hyper.ai/docs/developer-documentation/offline-inference/llm-class) |
| 170 | + |
| 171 | +- [LLM 输入](https://vllm.hyper.ai/docs/developer-documentation/offline-inference/llm-inputs) |
| 172 | + |
| 173 | +[vLLM 引擎](https://vllm.hyper.ai/docs/developer-documentation/vllm-engine/) |
| 174 | + |
| 175 | +[LLM 引擎](https://vllm.hyper.ai/docs/developer-documentation/vllm-engine/) |
| 176 | + |
| 177 | +- [LLMEngine](https://vllm.hyper.ai/docs/developer-documentation/vllm-engine/llmengine) |
| 178 | + |
| 179 | +- [AsyncLLMEngine](https://vllm.hyper.ai/docs/developer-documentation/vllm-engine/asyncllmengine) |
| 180 | + |
| 181 | +[vLLM 分页注意力](https://vllm.hyper.ai/docs/developer-documentation/vllm-paged-attention) |
| 182 | + |
| 183 | +- [输入处理](https://vllm.hyper.ai/docs/developer-documentation/vllm-paged-attention#%E8%BE%93%E5%85%A5) |
| 184 | + |
| 185 | +- [概念](https://vllm.hyper.ai/docs/developer-documentation/vllm-paged-attention#%E6%A6%82%E5%BF%B5) |
| 186 | + |
| 187 | +- [查询](https://vllm.hyper.ai/docs/developer-documentation/vllm-paged-attention#%E8%AF%A2%E9%97%AE-query) |
| 188 | + |
| 189 | +- [键](https://vllm.hyper.ai/docs/developer-documentation/vllm-paged-attention#%E9%94%AE-key) |
| 190 | + |
| 191 | +- [QK](https://vllm.hyper.ai/docs/developer-documentation/vllm-paged-attention#qk) |
| 192 | + |
| 193 | +- [Softmax](https://vllm.hyper.ai/docs/developer-documentation/vllm-paged-attention#softmax) |
| 194 | + |
| 195 | +- [值](https://vllm.hyper.ai/docs/developer-documentation/vllm-paged-attention#%E5%80%BC) |
| 196 | + |
| 197 | +- [LV](https://vllm.hyper.ai/docs/developer-documentation/vllm-paged-attention#lv) |
| 198 | + |
| 199 | +- [输出](https://vllm.hyper.ai/docs/developer-documentation/vllm-paged-attention#%E8%BE%93%E5%87%BA) |
| 200 | + |
| 201 | +[输入处理](https://vllm.hyper.ai/docs/developer-documentation/input-processing/model_inputs_index) |
| 202 | + |
| 203 | +- [指南](https://vllm.hyper.ai/docs/developer-documentation/input-processing/model_inputs_index#%E6%8C%87%E5%8D%97) |
| 204 | + |
| 205 | +- [模块内容](https://vllm.hyper.ai/docs/developer-documentation/input-processing/model_inputs_index#%E6%A8%A1%E5%9D%97%E5%86%85%E5%AE%B9) |
| 206 | + |
| 207 | +[多模态](https://vllm.hyper.ai/docs/developer-documentation/multi-modality/) |
| 208 | + |
| 209 | +- [指南](https://vllm.hyper.ai/docs/developer-documentation/multi-modality/#%E6%8C%87%E5%8D%97) |
| 210 | + |
| 211 | +- [模块内容](https://vllm.hyper.ai/docs/developer-documentation/multi-modality/#%E6%A8%A1%E5%9D%97%E5%86%85%E5%AE%B9) |
| 212 | + |
| 213 | +[Docker 文件](https://vllm.hyper.ai/docs/developer-documentation/dockerfile) |
| 214 | + |
| 215 | +[vLLM 性能分析](https://vllm.hyper.ai/docs/developer-documentation/profiling-vllm) |
| 216 | + |
| 217 | +- [示例命令和用法](https://vllm.hyper.ai/docs/developer-documentation/profiling-vllm#%E5%91%BD%E4%BB%A4%E5%92%8C%E4%BD%BF%E7%94%A8%E7%A4%BA%E4%BE%8B) |
| 218 | + |
| 219 | +- [离线推理](https://vllm.hyper.ai/docs/developer-documentation/profiling-vllm#%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86) |
| 220 | + |
| 221 | +- [OpenAI 服务器](https://vllm.hyper.ai/docs/developer-documentation/profiling-vllm#openai-%E6%9C%8D%E5%8A%A1%E5%99%A8) |
| 222 | + |
| 223 | + |
| 224 | +## 社区 |
| 225 | + |
| 226 | +[vLLM 聚会](https://vllm.hyper.ai/docs/community/vllm-meetups) |
| 227 | + |
| 228 | +[赞助商](https://vllm.hyper.ai/docs/community/sponsors) |
| 229 | + |
| 230 | + |
| 231 | +# [索引和表格](https://vllm.hyper.ai/docs/indices-and-tables/index) |
| 232 | + |
| 233 | +* [索引](https://vllm.hyper.ai/docs/indices-and-tables/index) |
| 234 | + |
| 235 | +* [模块索引](https://vllm.hyper.ai/docs/indices-and-tables/python-module-index) |
| 236 | + |
0 commit comments