Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 14 additions & 4 deletions docs/examples/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,17 @@
# Examples

vLLM's examples are split into three categories:
vLLM's examples are organized into the following categories:

- If you are using vLLM from within Python code, see the [Offline Inference](.) section.
- If you are using vLLM from an HTTP application or client, see the [Online Serving](.) section.
- For examples of using some of vLLM's advanced features (e.g. LMCache or Tensorizer) which are not specific to either of the above use cases, see the [Others](.) section.
- **[`basic/`](../../examples/basic)** – Minimal examples for offline inference and online serving.
- **[`generate/`](../../examples/generate)** – Text generation examples, including multimodal models.
- **[`pooling/`](../../examples/pooling)** – Examples for embedding, classification, scoring, reward, etc.
- **[`speech_to_text/`](../../examples/speech_to_text)** – Speech transcription, translation and real-time audio examples.
- **[`features/`](../../examples/features)** – Demonstrations of individual vLLM features: automatic prefix caching, speculative decoding, LoRA, structured outputs, prompt embedding, pause/resume, batch invariance, KV events, data parallelism, and more.
- **[`reasoning/`](../../examples/reasoning)** – Examples for reasoning with vLLM.
- **[`tool_calling/`](../../examples/tool_calling)** – Examples for function/tool calling with vLLM.
- **[`applications/`](../../examples/applications)** – Application examples such as chatbots and RAG (Retrieval-Augmented Generation).
- **[`rl/`](../../examples/rl)** – Reinforcement learning examples.
- **[`deployment/`](../../examples/deployment)** – Examples for deploying vLLM in production.
- **[`ray_serving/`](../../examples/ray_serving)** – Scalable serving using Ray.
- **[`disaggregated/`](../../examples/disaggregated)** – Examples for disaggregated serving (separate prefill and decode), including various kv cache connectors (LMCache, Mooncake, FlexKV, P2P NCCL) and failure recovery.
- **[`observability/`](../../examples/observability)** – Metrics, logging, tracing (OpenTelemetry), and dashboards (Grafana, Perses).
Loading