-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Migrate docs from Sphinx to MkDocs #18145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 100 commits
da2d299
6c861ca
af3b60d
12ec865
58ee036
3f3572d
c9ee202
4de42f1
40fdcec
8047a0a
4b4e177
f85f172
19047ba
2509639
0a7c43d
53f055f
812bd8c
32fab12
1df088a
2188496
42af380
5fd0c33
a4a125a
65a3cc9
1ded939
dd16e08
9e84fe2
f46e2d5
7a460ce
50e4896
d6b9635
215366c
77c2554
ec284e7
f28c9f6
0961b55
09819d3
5385eb6
1c3c4ad
ceb799f
cb03d40
bc70c9d
11cf31a
1a7a691
7c5539c
7a2ead2
d388a24
a47f141
230d0cd
192a551
f351fe2
7a274d9
85d45a4
02f3a65
a44232a
974c4ab
0f92d8f
302556d
77e6d8e
4c16f60
a5a45cc
23f6e46
d643c4a
d8a4e90
dfa3c30
68209e0
f184c0c
3643ce0
7740e35
0406086
6387b4c
6429083
916e087
a85e0d0
cb9fcaa
49b2069
ec40428
2b736ab
dc13d59
b80c71c
300cb81
5ce330a
99cba62
f6853e0
bb469d2
30eafd5
b35bdcd
9e4196b
ab3abed
6ff269f
22ee168
680562f
7c2ce70
4502dfb
331664c
8a44a66
8d8cf0f
596e07e
dd66626
b4c2e75
0543aaa
c33a510
07351e2
94e88af
1f81b6e
a67d1cc
105370c
56923ad
982184b
29f7267
08fa15c
06d9b72
c57a89d
fe24554
7e8c725
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,51 @@ | ||
| nav: | ||
| - Home: | ||
| - vLLM: README.md | ||
| - Getting Started: | ||
| - getting_started/quickstart.md | ||
| - getting_started/installation | ||
| - Examples: | ||
| - LMCache: getting_started/examples/lmcache | ||
| - getting_started/examples/offline_inference | ||
| - getting_started/examples/online_serving | ||
| - getting_started/examples/other | ||
| - Roadmap: https://roadmap.vllm.ai | ||
| - Releases: https://github.com/vllm-project/vllm/releases | ||
| - User Guide: | ||
| - Inference and Serving: | ||
| - serving/offline_inference.md | ||
| - serving/openai_compatible_server.md | ||
| - serving/* | ||
| - serving/integrations | ||
| - Training: training | ||
| - Deployment: | ||
| - deployment/* | ||
| - deployment/frameworks | ||
| - deployment/integrations | ||
| - Performance: performance | ||
| - Models: | ||
| - models/supported_models.md | ||
| - models/generative_models.md | ||
| - models/pooling_models.md | ||
| - models/extensions | ||
| - Features: | ||
| - features/compatibility_matrix.md | ||
| - features/* | ||
| - features/quantization | ||
| - Other: | ||
| - getting_started/* | ||
| - Developer Guide: | ||
| - contributing/overview.md | ||
| - glob: contributing/* | ||
| flatten_single_child_sections: true | ||
| - contributing/model | ||
| - Design Documents: | ||
| - V0: design | ||
| - V1: design/v1 | ||
| - API Reference: | ||
| - api/README.md | ||
| - glob: api/vllm/* | ||
| preserve_directory_names: true | ||
| - Community: | ||
| - community/* | ||
| - vLLM Blog: https://blog.vllm.ai |
This file was deleted.
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we preserve some instructions on how to contribute and test documentation changes after this migration to MkDocs?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, information about building the docs has moved to https://docs.vllm.ai/en/latest/contributing/overview.html#building-the-docs |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,43 +1,50 @@ | ||
| # vLLM documents | ||
|
|
||
| ## Build the docs | ||
|
|
||
| - Make sure in `docs` directory | ||
|
|
||
| ```bash | ||
| cd docs | ||
| ``` | ||
|
|
||
| - Install the dependencies: | ||
|
|
||
| ```bash | ||
| pip install -r ../requirements/docs.txt | ||
| ``` | ||
|
|
||
| - Clean the previous build (optional but recommended): | ||
|
|
||
| ```bash | ||
| make clean | ||
| ``` | ||
|
|
||
| - Generate the HTML documentation: | ||
|
|
||
| ```bash | ||
| make html | ||
| ``` | ||
|
|
||
| ## Open the docs with your browser | ||
|
|
||
| - Serve the documentation locally: | ||
|
|
||
| ```bash | ||
| python -m http.server -d build/html/ | ||
| ``` | ||
|
|
||
| This will start a local server at http://localhost:8000. You can now open your browser and view the documentation. | ||
|
|
||
| If port 8000 is already in use, you can specify a different port, for example: | ||
|
|
||
| ```bash | ||
| python -m http.server 3000 -d build/html/ | ||
| ``` | ||
| # Welcome to vLLM | ||
|
|
||
| <figure markdown="span"> | ||
| { align="center" alt="vLLM" class="no-scaled-link" width="60%" } | ||
| </figure> | ||
|
|
||
| <p style="text-align:center"> | ||
| <strong>Easy, fast, and cheap LLM serving for everyone | ||
| </strong> | ||
| </p> | ||
|
|
||
| <p style="text-align:center"> | ||
| <script async defer src="https://buttons.github.io/buttons.js"></script> | ||
| <a class="github-button" href="https://github.com/vllm-project/vllm" data-show-count="true" data-size="large" aria-label="Star">Star</a> | ||
| <a class="github-button" href="https://github.com/vllm-project/vllm/subscription" data-icon="octicon-eye" data-size="large" aria-label="Watch">Watch</a> | ||
| <a class="github-button" href="https://github.com/vllm-project/vllm/fork" data-icon="octicon-repo-forked" data-size="large" aria-label="Fork">Fork</a> | ||
| </p> | ||
|
|
||
| vLLM is a fast and easy-to-use library for LLM inference and serving. | ||
|
|
||
| Originally developed in the [Sky Computing Lab](https://sky.cs.berkeley.edu) at UC Berkeley, vLLM has evolved into a community-driven project with contributions from both academia and industry. | ||
|
|
||
| vLLM is fast with: | ||
|
|
||
| - State-of-the-art serving throughput | ||
| - Efficient management of attention key and value memory with [**PagedAttention**](https://blog.vllm.ai/2023/06/20/vllm.html) | ||
| - Continuous batching of incoming requests | ||
| - Fast model execution with CUDA/HIP graph | ||
| - Quantization: [GPTQ](https://arxiv.org/abs/2210.17323), [AWQ](https://arxiv.org/abs/2306.00978), INT4, INT8, and FP8 | ||
| - Optimized CUDA kernels, including integration with FlashAttention and FlashInfer. | ||
| - Speculative decoding | ||
| - Chunked prefill | ||
|
|
||
| vLLM is flexible and easy to use with: | ||
|
|
||
| - Seamless integration with popular HuggingFace models | ||
| - High-throughput serving with various decoding algorithms, including *parallel sampling*, *beam search*, and more | ||
| - Tensor parallelism and pipeline parallelism support for distributed inference | ||
| - Streaming outputs | ||
| - OpenAI-compatible API server | ||
| - Support NVIDIA GPUs, AMD CPUs and GPUs, Intel CPUs, Gaudi® accelerators and GPUs, IBM Power CPUs, TPU, and AWS Trainium and Inferentia Accelerators. | ||
| - Prefix caching support | ||
| - Multi-lora support | ||
|
|
||
| For more information, check out the following: | ||
|
|
||
| - [vLLM announcing blog post](https://vllm.ai) (intro to PagedAttention) | ||
| - [vLLM paper](https://arxiv.org/abs/2309.06180) (SOSP 2023) | ||
| - [How continuous batching enables 23x throughput in LLM inference while reducing p50 latency](https://www.anyscale.com/blog/continuous-batching-llm-inference) by Cade Daniel et al. | ||
| - [vLLM Meetups][meetups] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,107 @@ | ||
| # Summary | ||
|
|
||
| [](){ #configuration } | ||
|
|
||
| ## Configuration | ||
|
|
||
| API documentation for vLLM's configuration classes. | ||
|
|
||
| - [vllm.config.ModelConfig][] | ||
| - [vllm.config.CacheConfig][] | ||
| - [vllm.config.TokenizerPoolConfig][] | ||
| - [vllm.config.LoadConfig][] | ||
| - [vllm.config.ParallelConfig][] | ||
| - [vllm.config.SchedulerConfig][] | ||
| - [vllm.config.DeviceConfig][] | ||
| - [vllm.config.SpeculativeConfig][] | ||
| - [vllm.config.LoRAConfig][] | ||
| - [vllm.config.PromptAdapterConfig][] | ||
| - [vllm.config.MultiModalConfig][] | ||
| - [vllm.config.PoolerConfig][] | ||
| - [vllm.config.DecodingConfig][] | ||
| - [vllm.config.ObservabilityConfig][] | ||
| - [vllm.config.KVTransferConfig][] | ||
| - [vllm.config.CompilationConfig][] | ||
| - [vllm.config.VllmConfig][] | ||
|
|
||
| [](){ #offline-inference-api } | ||
|
|
||
| ## Offline Inference | ||
|
|
||
| LLM Class. | ||
|
|
||
| - [vllm.LLM][] | ||
|
|
||
| LLM Inputs. | ||
|
|
||
| - [vllm.inputs.PromptType][] | ||
| - [vllm.inputs.TextPrompt][] | ||
| - [vllm.inputs.TokensPrompt][] | ||
|
|
||
| ## vLLM Engines | ||
|
|
||
| Engine classes for offline and online inference. | ||
|
|
||
| - [vllm.LLMEngine][] | ||
| - [vllm.AsyncLLMEngine][] | ||
|
|
||
| ## Inference Parameters | ||
|
|
||
| Inference parameters for vLLM APIs. | ||
|
|
||
| [](){ #sampling-params } | ||
| [](){ #pooling-params } | ||
|
|
||
| - [vllm.SamplingParams][] | ||
| - [vllm.PoolingParams][] | ||
|
|
||
| [](){ #multi-modality } | ||
|
|
||
| ## Multi-Modality | ||
|
|
||
| vLLM provides experimental support for multi-modal models through the [vllm.multimodal][] package. | ||
|
|
||
| Multi-modal inputs can be passed alongside text and token prompts to [supported models][supported-mm-models] | ||
| via the `multi_modal_data` field in [vllm.inputs.PromptType][]. | ||
|
|
||
| Looking to add your own multi-modal model? Please follow the instructions listed [here][supports-multimodal]. | ||
|
|
||
| - [vllm.multimodal.MULTIMODAL_REGISTRY][] | ||
|
|
||
| ### Inputs | ||
|
|
||
| User-facing inputs. | ||
|
|
||
| - [vllm.multimodal.inputs.MultiModalDataDict][] | ||
|
|
||
| Internal data structures. | ||
|
|
||
| - [vllm.multimodal.inputs.PlaceholderRange][] | ||
| - [vllm.multimodal.inputs.NestedTensors][] | ||
| - [vllm.multimodal.inputs.MultiModalFieldElem][] | ||
| - [vllm.multimodal.inputs.MultiModalFieldConfig][] | ||
| - [vllm.multimodal.inputs.MultiModalKwargsItem][] | ||
| - [vllm.multimodal.inputs.MultiModalKwargs][] | ||
| - [vllm.multimodal.inputs.MultiModalInputs][] | ||
|
|
||
| ### Data Parsing | ||
|
|
||
| - [vllm.multimodal.parse][] | ||
|
|
||
| ### Data Processing | ||
|
|
||
| - [vllm.multimodal.processing][] | ||
|
|
||
| ### Memory Profiling | ||
|
|
||
| - [vllm.multimodal.profiling][] | ||
|
|
||
| ### Registry | ||
|
|
||
| - [vllm.multimodal.registry][] | ||
|
|
||
| ## Model Development | ||
|
|
||
| - [vllm.model_executor.models.interfaces_base][] | ||
| - [vllm.model_executor.models.interfaces][] | ||
| - [vllm.model_executor.models.adapters][] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| search: | ||
| boost: 0.5 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,23 @@ | ||
| --- | ||
| title: Adding a New Model | ||
| --- | ||
| [](){ #new-model } | ||
|
|
||
| This section provides more information on how to integrate a [PyTorch](https://pytorch.org/) model into vLLM. | ||
|
|
||
| Contents: | ||
|
|
||
| - [Basic](basic.md) | ||
| - [Registration](registration.md) | ||
| - [Tests](tests.md) | ||
| - [Multimodal](multimodal.md) | ||
|
|
||
| !!! note | ||
| The complexity of adding a new model depends heavily on the model's architecture. | ||
| The process is considerably straightforward if the model shares a similar architecture with an existing model in vLLM. | ||
| However, for models that include new operators (e.g., a new attention mechanism), the process can be a bit more complex. | ||
|
|
||
| !!! tip | ||
| If you are encountering issues while integrating your model into vLLM, feel free to open a [GitHub issue](https://github.com/vllm-project/vllm/issues) | ||
| or ask on our [developer slack](https://slack.vllm.ai). | ||
| We will be happy to help you out! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this change replace how to pull and build the docs to what vLLM is? The previous file has instructions on how to build docs locally, the now new file looks like the homepage of vLLM. Is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the docs have been moved 1 level up from
docs/source/*todocs/*so this README has become the home page for the docs. So yes, it was intentional.I have added a section in the contributing docs on how to build the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason to do this (as well as changing any
index.mdfiles toREADME.mdfiles) is so that the docs are nicer to browse: