docs: fix doc-code mismatches from audit#40062
Conversation
- CLI: remove phantom --help=listgroup and --help=page modes, add launch subcommand - Plugins: clarify per-group loading scope and VLLM_PLUGINS cross-group filtering - Features: fix beam-search x prompt-logprobs and prompt-embeds compatibility - Env vars: correct VLLM_ prefix claim, document non-VLLM_ vars and port scanning - Pooling: document scoring endpoint conditions and default task selection - Auth: clarify /v1-only scope and Bearer header requirement Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Yiyang Liu <37043548+ianliuy@users.noreply.github.com>
|
Documentation preview: https://vllm--40062.org.readthedocs.build/en/40062/ |
There was a problem hiding this comment.
Code Review
This pull request updates the documentation to include the new launch CLI command, clarify environment variable usage (including non-VLLM_ prefixed ones), and specify plugin loading scopes. It also adds details on API key authentication, feature compatibility for beam search, and endpoint availability for pooling models. Review feedback recommends adding missing S3 environment variables for completeness and ensuring consistency in the feature compatibility matrix regarding beam search and prompt embeddings.
| Please note that `VLLM_PORT` and `VLLM_HOST_IP` set the port and ip for vLLM's **internal usage**. It is not the port and ip for the API server. If you use `--host $VLLM_HOST_IP` and `--port $VLLM_PORT` to start the API server, it will not work. When `VLLM_PORT` is set, it is used as a base port; if multiple internal ports are needed, vLLM scans upward from that value to allocate them. | ||
|
|
||
| All environment variables used by vLLM are prefixed with `VLLM_`. **Special care should be taken for Kubernetes users**: please do not name the service as `vllm`, otherwise environment variables set by Kubernetes might conflict with vLLM's environment variables, because [Kubernetes sets environment variables for each service with the capitalized service name as the prefix](https://kubernetes.io/docs/concepts/services-networking/service/#environment-variables). | ||
| Most vLLM-specific environment variables are prefixed with `VLLM_`. However, vLLM also reads several non-`VLLM_`-prefixed environment variables for build configuration (e.g., `MAX_JOBS`, `NVCC_THREADS`, `CMAKE_BUILD_TYPE`, `VERBOSE`), system integration (e.g., `CUDA_HOME`, `CUDA_VISIBLE_DEVICES`, `LD_LIBRARY_PATH`, `LOCAL_RANK`), path defaults (e.g., `XDG_CACHE_HOME`, `XDG_CONFIG_HOME`), S3 credentials (e.g., `S3_ACCESS_KEY_ID`), and standard conventions (e.g., `NO_COLOR`, `DO_NOT_TRACK` as a fallback for `VLLM_DO_NOT_TRACK`). **Special care should be taken for Kubernetes users**: please do not name the service as `vllm`, otherwise environment variables set by Kubernetes might conflict with vLLM's environment variables, because [Kubernetes sets environment variables for each service with the capitalized service name as the prefix](https://kubernetes.io/docs/concepts/services-networking/service/#environment-variables). |
There was a problem hiding this comment.
The list of non-VLLM_ prefixed environment variables for S3 credentials is incomplete. While S3_ACCESS_KEY_ID is mentioned, vLLM also relies on S3_SECRET_ACCESS_KEY and S3_ENDPOINT_URL for S3 integration (as seen in vllm/envs.py). Including these provides a more complete and helpful guide for users configuring S3 storage.
| Most vLLM-specific environment variables are prefixed with `VLLM_`. However, vLLM also reads several non-`VLLM_`-prefixed environment variables for build configuration (e.g., `MAX_JOBS`, `NVCC_THREADS`, `CMAKE_BUILD_TYPE`, `VERBOSE`), system integration (e.g., `CUDA_HOME`, `CUDA_VISIBLE_DEVICES`, `LD_LIBRARY_PATH`, `LOCAL_RANK`), path defaults (e.g., `XDG_CACHE_HOME`, `XDG_CONFIG_HOME`), S3 credentials (e.g., `S3_ACCESS_KEY_ID`), and standard conventions (e.g., `NO_COLOR`, `DO_NOT_TRACK` as a fallback for `VLLM_DO_NOT_TRACK`). **Special care should be taken for Kubernetes users**: please do not name the service as `vllm`, otherwise environment variables set by Kubernetes might conflict with vLLM's environment variables, because [Kubernetes sets environment variables for each service with the capitalized service name as the prefix](https://kubernetes.io/docs/concepts/services-networking/service/#environment-variables). | |
| Most vLLM-specific environment variables are prefixed with `VLLM_`. However, vLLM also reads several non-`VLLM_`-prefixed environment variables for build configuration (e.g., `MAX_JOBS`, `NVCC_THREADS`, `CMAKE_BUILD_TYPE`, `VERBOSE`), system integration (e.g., `CUDA_HOME`, `CUDA_VISIBLE_DEVICES`, `LD_LIBRARY_PATH`, `LOCAL_RANK`), path defaults (e.g., `XDG_CACHE_HOME`, `XDG_CONFIG_HOME`), S3 credentials (e.g., `S3_ACCESS_KEY_ID`, `S3_SECRET_ACCESS_KEY`, `S3_ENDPOINT_URL`), and standard conventions (e.g., `NO_COLOR`, `DO_NOT_TRACK` as a fallback for `VLLM_DO_NOT_TRACK`). **Special care should be taken for Kubernetes users**: please do not name the service as `vllm`, otherwise environment variables set by Kubernetes might conflict with vLLM's environment variables, because [Kubernetes sets environment variables for each service with the capitalized service name as the prefix](https://kubernetes.io/docs/concepts/services-networking/service/#environment-variables). |
| | best-of | ✅ | ✅ | ✅ | [❌](https://github.com/vllm-project/vllm/issues/6137) | ✅ | ❌ | ✅ | ✅ | ✅ | ❔ | [❌](https://github.com/vllm-project/vllm/issues/7968) | ✅ | ✅ | | | | ||
| | beam-search | ✅ | ✅ | ✅ | [❌](https://github.com/vllm-project/vllm/issues/6137) | ✅ | ❌ | ✅ | ✅ | ✅ | ❔ | [❌](https://github.com/vllm-project/vllm/issues/7968) | ❔ | ✅ | ✅ | | | ||
| | [prompt-embeds](prompt_embeds.md) | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ | ✅ | ❌ | ❔ | ❔ | ❌ | ❔ | ❔ | ✅ | | ||
| | beam-search | ✅ | ✅ | ✅ | [❌](https://github.com/vllm-project/vllm/issues/6137) | ✅ | ❌ | ✅ | ✅ | ❌<sup>†</sup> | ❔ | [❌](https://github.com/vllm-project/vllm/issues/7968) | ❔ | ✅ | ✅ | | |
There was a problem hiding this comment.
The compatibility matrix is inconsistent for the beam-search and prompt-embeds intersection. In line 55 (the prompt-embeds row), the beam-search column is correctly marked as ❌<sup>†</sup>. However, in line 54 (the beam-search row), the prompt-embeds column (the last cell) is left empty. To maintain consistency and accuracy in the matrix, this cell should also be marked with ❌<sup>†</sup>.
| | beam-search | ✅ | ✅ | ✅ | [❌](https://github.com/vllm-project/vllm/issues/6137) | ✅ | ❌ | ✅ | ✅ | ❌<sup>†</sup> | ❔ | [❌](https://github.com/vllm-project/vllm/issues/7968) | ❔ | ✅ | ✅ | | | |
| | beam-search | ✅ | ✅ | ✅ | [❌](https://github.com/vllm-project/vllm/issues/6137) | ✅ | ❌ | ✅ | ✅ | ❌<sup>†</sup> | ❔ | [❌](https://github.com/vllm-project/vllm/issues/7968) | ❔ | ✅ | ✅ | ❌<sup>†</sup> | |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
This pull request has merge conflicts that must be resolved before it can be |
What's broken?
An automated documentation audit found 14+ places where docs diverge from current code behavior. Users following the docs may encounter phantom CLI modes, incorrect compatibility claims, and misleading env-var guidance.
What changed?
Docs-only fixes across 6 files — no code changes.
CLI (
docs/cli/README.md)--help=listgroup: The parser has no dedicatedlistgrouphandler; the default--helpalready lists groups.--help=page: No pager integration exists;pageis treated as a search keyword.launchsubcommand: The code registerslaunchbut it was missing from the CLI guide.Plugin system (
docs/design/plugin_system.md)VLLM_PLUGINScross-group filtering: The env var filters all plugin groups, not just general plugins.Feature matrix (
docs/features/README.md)prompt_logprobs=None.NotImplementedErrorfor embedding prompts.Environment variables (
docs/configuration/env_vars.md)MAX_JOBS,NVCC_THREADS,CMAKE_BUILD_TYPE,CUDA_HOME,NO_COLOR,DO_NOT_TRACK,XDG_CACHE_HOME, etc.VLLM_PORTport-scanning behavior: When set, it serves as a base port and scans upward for additional internal ports.Pooling models (
docs/models/pooling_models/README.md)/scoreand rerank are only enabled forembed/token_embedtasks, orclassifywithnum_labels == 1.Auth (
docs/getting_started/quickstart.md)/v1routes, requiresAuthorization: Bearerheader, and skipsOPTIONSrequests.How do we know it works?
Each doc change was verified by reading the corresponding source code to confirm the documented behavior matches:
vllm/utils/argparse_utils.pyfor CLI help modesvllm/entrypoints/cli/main.pyfor thelaunchsubcommandvllm/plugins/__init__.pyfor plugin loading and filteringvllm/entrypoints/openai/engine/serving.pyfor beam search behaviorvllm/envs.pyfor environment variable definitionsvllm/entrypoints/pooling/utils.pyandvllm/config/model.pyfor pooling/scoringvllm/entrypoints/openai/server_utils.pyfor auth middlewareFixes #39613