Skip to content

Commit 49dc86d

Browse files
authored
Merge pull request #3 from yuudiiii/patch-4
2 parents c143ae7 + e8d4816 commit 49dc86d

File tree

1 file changed

+236
-0
lines changed

1 file changed

+236
-0
lines changed

docs/index.md

+236
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,236 @@
1+
---
2+
title: 索引
3+
---
4+
5+
6+
# 欢迎来到 vLLM!
7+
8+
9+
vLLM 是一个快速且易于使用的库,专为大型语言模型 (LLM) 的推理和部署而设计。
10+
11+
12+
vLLM 的核心特性包括:
13+
14+
* 最先进的服务吞吐量
15+
16+
* 使用 **PagedAttention** 高效管理注意力键和值的内存
17+
18+
* 连续批处理传入请求
19+
20+
* 使用 CUDA/HIP 图实现快速执行模型
21+
22+
* 量化: [GPTQ](https://arxiv.org/abs/2210.17323), [AWQ](https://arxiv.org/abs/2306.00978), INT4, INT8, 和 FP8
23+
24+
* 优化的 CUDA 内核,包括与 FlashAttention 和 FlashInfer 的集成
25+
26+
* 推测性解码
27+
28+
* 分块预填充
29+
30+
31+
vLLM 的灵活性和易用性体现在以下方面:
32+
33+
* 无缝集成流行的 HuggingFace 模型
34+
35+
* 具有高吞吐量服务以及各种解码算法,包括*并行采样**束搜索*
36+
37+
* 支持张量并行和流水线并行的分布式推理
38+
39+
* 流式输出
40+
41+
* 提供与 OpenAI 兼容的 API 服务器
42+
43+
* 支持 NVIDIA GPU、AMD CPU 和 GPU、Intel CPU 和 GPU、PowerPC CPU、TPU 以及 AWS Neuron
44+
45+
* 前缀缓存支持
46+
47+
* 支持多 LoRA
48+
49+
50+
欲了解更多信息,请参阅以下内容:
51+
52+
* [vLLM announcing blog post](https://vllm.ai) (PagedAttention 教程)
53+
54+
* [vLLM paper](https://arxiv.org/abs/2309.06180) (SOSP 2023)
55+
56+
* [How continuous batching enables 23x throughput in LLM inference
57+
](https://www.anyscale.com/blog/continuous-batching-llm-inference) [while reducing p50
58+
](https://www.anyscale.com/blog/continuous-batching-llm-inference)[ ](https://www.anyscale.com/blog/continuous-batching-llm-inference)[latency](https://www.anyscale.com/blog/continuous-batching-llm-inference)
59+
by Cade Daniel et al.
60+
61+
* [vLLM 聚会](https://vllm.hyper.ai/docs/community/vllm-meetups)
62+
63+
64+
## 文档
65+
66+
### 入门
67+
68+
[安装](https://vllm.hyper.ai/docs/getting-started/installation)
69+
70+
[使用 ROCm 进行安装](https://vllm.hyper.ai/docs/getting-started/installation-with-rocm)
71+
72+
[使用 OpenVINO 进行安装](https://vllm.hyper.ai/docs/getting-started/installation-with-openvino)
73+
74+
[使用 CPU 进行安装](https://vllm.hyper.ai/docs/getting-started/installation-with-cpu)
75+
76+
[使用 Neuron 进行安装](https://vllm.hyper.ai/docs/getting-started/installation-with-neuron)
77+
78+
[使用 TPU 进行安装](https://vllm.hyper.ai/docs/getting-started/installation-with-tpu)
79+
80+
[使用 XPU 进行安装](https://vllm.hyper.ai/docs/getting-started/installation-with-xpu)
81+
82+
[快速入门](https://vllm.hyper.ai/docs/getting-started/quickstart)
83+
84+
[调试提示](https://vllm.hyper.ai/docs/getting-started/debugging-tips)
85+
86+
[示例](https://vllm.hyper.ai/docs/getting-started/examples/)
87+
88+
89+
### 部署
90+
91+
[OpenAI 兼容服务器](https://vllm.hyper.ai/docs/serving/openai-compatible-server)
92+
93+
[使用 Docker 部署](https://vllm.hyper.ai/docs/serving/deploying-with-docker)
94+
95+
[分布式推理和服务](https://vllm.hyper.ai/docs/serving/distributed-inference-and-serving)
96+
97+
[生产指标](https://vllm.hyper.ai/docs/serving/production-metrics)
98+
99+
[环境变量](https://vllm.hyper.ai/docs/serving/environment-variables)
100+
101+
[使用统计数据收集](https://vllm.hyper.ai/docs/serving/usage-stats-collection)
102+
103+
[整合](https://vllm.hyper.ai/docs/serving/integrations/)
104+
105+
[使用 CoreWeave 的 Tensorizer 加载模型](https://vllm.hyper.ai/docs/serving/tensorizer)
106+
107+
[兼容性矩阵](https://vllm.hyper.ai/docs/serving/compatibility%20matrix)
108+
109+
[常见问题解答](https://vllm.hyper.ai/docs/serving/frequently-asked-questions)
110+
111+
112+
### 模型
113+
114+
[支持的模型](https://vllm.hyper.ai/docs/models/supported-models)
115+
116+
[添加新模型](https://vllm.hyper.ai/docs/models/adding-a-new-model)
117+
118+
[启用多模态输入](https://vllm.hyper.ai/docs/models/enabling-multimodal-inputs)
119+
120+
[引擎参数](https://vllm.hyper.ai/docs/models/engine-arguments)
121+
122+
[使用 LoRA 适配器](https://vllm.hyper.ai/docs/models/using-lora-adapters)
123+
124+
[使用 VLMs](https://vllm.hyper.ai/docs/models/using-vlms)
125+
126+
[在 vLLM 中使用推测性解码](https://vllm.hyper.ai/docs/models/speculative-decoding-in-vllm)
127+
128+
[性能和调优](https://vllm.hyper.ai/docs/models/performance-and-tuning)
129+
130+
131+
### 量化
132+
133+
[量化内核支持的硬件](https://vllm.hyper.ai/docs/quantization/supported_hardware)
134+
135+
[AutoAWQ](https://vllm.hyper.ai/docs/quantization/autoawq)
136+
137+
[BitsAndBytes](https://vllm.hyper.ai/docs/quantization/bitsandbytes)
138+
139+
[GGUF](https://vllm.hyper.ai/docs/quantization/gguf)
140+
141+
[INT8 W8A8](https://vllm.hyper.ai/docs/quantization/int8-w8a8)
142+
143+
[FP8 W8A8](https://vllm.hyper.ai/docs/quantization/fp8-w8a8)
144+
145+
[FP8 E5M2 KV 缓存](https://vllm.hyper.ai/docs/quantization/fp8-e5m2-kv-cache)
146+
147+
[FP8 E4M3 KV 缓存](https://vllm.hyper.ai/docs/quantization/fp8-e4m3-kv-cache)
148+
149+
150+
### 自动前缀缓存
151+
152+
[简介](https://vllm.hyper.ai/docs/automatic-prefix-caching/introduction-apc)
153+
154+
[实现](https://vllm.hyper.ai/docs/automatic-prefix-caching/implementation)
155+
156+
[广义缓存策略](https://vllm.hyper.ai/docs/automatic-prefix-caching/implementation)
157+
158+
### 性能基准测试
159+
160+
[vLLM 的基准套件](https://vllm.hyper.ai/docs/performance-benchmarks/benchmark-suites-of-vllm)
161+
162+
163+
### 开发者文档
164+
165+
[采样参数](https://vllm.hyper.ai/docs/developer-documentation/sampling-parameters)
166+
167+
[离线推理](https://vllm.hyper.ai/docs/developer-documentation/offline-inference/)
168+
169+
- [LLM 类](https://vllm.hyper.ai/docs/developer-documentation/offline-inference/llm-class)
170+
171+
- [LLM 输入](https://vllm.hyper.ai/docs/developer-documentation/offline-inference/llm-inputs)
172+
173+
[vLLM 引擎](https://vllm.hyper.ai/docs/developer-documentation/vllm-engine/)
174+
175+
[LLM 引擎](https://vllm.hyper.ai/docs/developer-documentation/vllm-engine/)
176+
177+
- [LLMEngine](https://vllm.hyper.ai/docs/developer-documentation/vllm-engine/llmengine)
178+
179+
- [AsyncLLMEngine](https://vllm.hyper.ai/docs/developer-documentation/vllm-engine/asyncllmengine)
180+
181+
[vLLM 分页注意力](https://vllm.hyper.ai/docs/developer-documentation/vllm-paged-attention)
182+
183+
- [输入处理](https://vllm.hyper.ai/docs/developer-documentation/vllm-paged-attention#%E8%BE%93%E5%85%A5)
184+
185+
- [概念](https://vllm.hyper.ai/docs/developer-documentation/vllm-paged-attention#%E6%A6%82%E5%BF%B5)
186+
187+
- [查询](https://vllm.hyper.ai/docs/developer-documentation/vllm-paged-attention#%E8%AF%A2%E9%97%AE-query)
188+
189+
- [](https://vllm.hyper.ai/docs/developer-documentation/vllm-paged-attention#%E9%94%AE-key)
190+
191+
- [QK](https://vllm.hyper.ai/docs/developer-documentation/vllm-paged-attention#qk)
192+
193+
- [Softmax](https://vllm.hyper.ai/docs/developer-documentation/vllm-paged-attention#softmax)
194+
195+
- [](https://vllm.hyper.ai/docs/developer-documentation/vllm-paged-attention#%E5%80%BC)
196+
197+
- [LV](https://vllm.hyper.ai/docs/developer-documentation/vllm-paged-attention#lv)
198+
199+
- [输出](https://vllm.hyper.ai/docs/developer-documentation/vllm-paged-attention#%E8%BE%93%E5%87%BA)
200+
201+
[输入处理](https://vllm.hyper.ai/docs/developer-documentation/input-processing/model_inputs_index)
202+
203+
- [指南](https://vllm.hyper.ai/docs/developer-documentation/input-processing/model_inputs_index#%E6%8C%87%E5%8D%97)
204+
205+
- [模块内容](https://vllm.hyper.ai/docs/developer-documentation/input-processing/model_inputs_index#%E6%A8%A1%E5%9D%97%E5%86%85%E5%AE%B9)
206+
207+
[多模态](https://vllm.hyper.ai/docs/developer-documentation/multi-modality/)
208+
209+
- [指南](https://vllm.hyper.ai/docs/developer-documentation/multi-modality/#%E6%8C%87%E5%8D%97)
210+
211+
- [模块内容](https://vllm.hyper.ai/docs/developer-documentation/multi-modality/#%E6%A8%A1%E5%9D%97%E5%86%85%E5%AE%B9)
212+
213+
[Docker 文件](https://vllm.hyper.ai/docs/developer-documentation/dockerfile)
214+
215+
[vLLM 性能分析](https://vllm.hyper.ai/docs/developer-documentation/profiling-vllm)
216+
217+
- [示例命令和用法](https://vllm.hyper.ai/docs/developer-documentation/profiling-vllm#%E5%91%BD%E4%BB%A4%E5%92%8C%E4%BD%BF%E7%94%A8%E7%A4%BA%E4%BE%8B)
218+
219+
- [离线推理](https://vllm.hyper.ai/docs/developer-documentation/profiling-vllm#%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86)
220+
221+
- [OpenAI 服务器](https://vllm.hyper.ai/docs/developer-documentation/profiling-vllm#openai-%E6%9C%8D%E5%8A%A1%E5%99%A8)
222+
223+
224+
## 社区
225+
226+
[vLLM 聚会](https://vllm.hyper.ai/docs/community/vllm-meetups)
227+
228+
[赞助商](https://vllm.hyper.ai/docs/community/sponsors)
229+
230+
231+
# [索引和表格](https://vllm.hyper.ai/docs/indices-and-tables/index)
232+
233+
* [索引](https://vllm.hyper.ai/docs/indices-and-tables/index)
234+
235+
* [模块索引](https://vllm.hyper.ai/docs/indices-and-tables/python-module-index)
236+

0 commit comments

Comments
 (0)