docs: fix doc-code mismatches from audit by ianliuy · Pull Request #40062 · vllm-project/vllm

ianliuy · 2026-04-16T21:11:39Z

What's broken?

An automated documentation audit found 14+ places where docs diverge from current code behavior. Users following the docs may encounter phantom CLI modes, incorrect compatibility claims, and misleading env-var guidance.

What changed?

Docs-only fixes across 6 files — no code changes.

CLI (`docs/cli/README.md`)

Removed phantom --help=listgroup: The parser has no dedicated listgroup handler; the default --help already lists groups.
Removed phantom --help=page: No pager integration exists; page is treated as a search keyword.
Added launch subcommand: The code registers launch but it was missing from the CLI guide.

Plugin system (`docs/design/plugin_system.md`)

Clarified per-group loading scope: General and platform plugins load in all processes; IO processor and stat logger plugins load in process 0 only.
Documented VLLM_PLUGINS cross-group filtering: The env var filters all plugin groups, not just general plugins.

Feature matrix (`docs/features/README.md`)

Fixed beam-search × prompt logprobs: Changed ✅ → ❌ — the serving path explicitly returns prompt_logprobs=None.
Fixed prompt-embeds × beam-search: Changed ❔ → ❌ — beam search raises NotImplementedError for embedding prompts.
Added footnote explaining beam search serving-path restrictions.

Environment variables (`docs/configuration/env_vars.md`)

Corrected "all VLLM_ prefixed" claim: vLLM also reads MAX_JOBS, NVCC_THREADS, CMAKE_BUILD_TYPE, CUDA_HOME, NO_COLOR, DO_NOT_TRACK, XDG_CACHE_HOME, etc.
Documented VLLM_PORT port-scanning behavior: When set, it serves as a base port and scans upward for additional internal ports.

Pooling models (`docs/models/pooling_models/README.md`)

Documented scoring endpoint conditions: /score and rerank are only enabled for embed/token_embed tasks, or classify with num_labels == 1.
Documented default task selection: When no task is specified, a priority order is used to select the default pooling task.

Auth (`docs/getting_started/quickstart.md`)

Clarified auth scope: API key auth applies only to /v1 routes, requires Authorization: Bearer header, and skips OPTIONS requests.

How do we know it works?

Each doc change was verified by reading the corresponding source code to confirm the documented behavior matches:

vllm/utils/argparse_utils.py for CLI help modes
vllm/entrypoints/cli/main.py for the launch subcommand
vllm/plugins/__init__.py for plugin loading and filtering
vllm/entrypoints/openai/engine/serving.py for beam search behavior
vllm/envs.py for environment variable definitions
vllm/entrypoints/pooling/utils.py and vllm/config/model.py for pooling/scoring
vllm/entrypoints/openai/server_utils.py for auth middleware

Fixes #39613

- CLI: remove phantom --help=listgroup and --help=page modes, add launch subcommand - Plugins: clarify per-group loading scope and VLLM_PLUGINS cross-group filtering - Features: fix beam-search x prompt-logprobs and prompt-embeds compatibility - Env vars: correct VLLM_ prefix claim, document non-VLLM_ vars and port scanning - Pooling: document scoring endpoint conditions and default task selection - Auth: clarify /v1-only scope and Bearer header requirement Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Yiyang Liu <37043548+ianliuy@users.noreply.github.com>

mergify · 2026-04-16T21:12:20Z

Documentation preview: https://vllm--40062.org.readthedocs.build/en/40062/

gemini-code-assist

Code Review

This pull request updates the documentation to include the new launch CLI command, clarify environment variable usage (including non-VLLM_ prefixed ones), and specify plugin loading scopes. It also adds details on API key authentication, feature compatibility for beam search, and endpoint availability for pooling models. Review feedback recommends adding missing S3 environment variables for completeness and ensuring consistency in the feature compatibility matrix regarding beam search and prompt embeddings.

gemini-code-assist · 2026-04-16T21:14:30Z

+    Please note that `VLLM_PORT` and `VLLM_HOST_IP` set the port and ip for vLLM's **internal usage**. It is not the port and ip for the API server. If you use `--host $VLLM_HOST_IP` and `--port $VLLM_PORT` to start the API server, it will not work. When `VLLM_PORT` is set, it is used as a base port; if multiple internal ports are needed, vLLM scans upward from that value to allocate them.

-    All environment variables used by vLLM are prefixed with `VLLM_`. **Special care should be taken for Kubernetes users**: please do not name the service as `vllm`, otherwise environment variables set by Kubernetes might conflict with vLLM's environment variables, because [Kubernetes sets environment variables for each service with the capitalized service name as the prefix](https://kubernetes.io/docs/concepts/services-networking/service/#environment-variables).
+    Most vLLM-specific environment variables are prefixed with `VLLM_`. However, vLLM also reads several non-`VLLM_`-prefixed environment variables for build configuration (e.g., `MAX_JOBS`, `NVCC_THREADS`, `CMAKE_BUILD_TYPE`, `VERBOSE`), system integration (e.g., `CUDA_HOME`, `CUDA_VISIBLE_DEVICES`, `LD_LIBRARY_PATH`, `LOCAL_RANK`), path defaults (e.g., `XDG_CACHE_HOME`, `XDG_CONFIG_HOME`), S3 credentials (e.g., `S3_ACCESS_KEY_ID`), and standard conventions (e.g., `NO_COLOR`, `DO_NOT_TRACK` as a fallback for `VLLM_DO_NOT_TRACK`). **Special care should be taken for Kubernetes users**: please do not name the service as `vllm`, otherwise environment variables set by Kubernetes might conflict with vLLM's environment variables, because [Kubernetes sets environment variables for each service with the capitalized service name as the prefix](https://kubernetes.io/docs/concepts/services-networking/service/#environment-variables).


The list of non-VLLM_ prefixed environment variables for S3 credentials is incomplete. While S3_ACCESS_KEY_ID is mentioned, vLLM also relies on S3_SECRET_ACCESS_KEY and S3_ENDPOINT_URL for S3 integration (as seen in vllm/envs.py). Including these provides a more complete and helpful guide for users configuring S3 storage.

Suggested change

Most vLLM-specific environment variables are prefixed with `VLLM_`. However, vLLM also reads several non-`VLLM_`-prefixed environment variables for build configuration (e.g., `MAX_JOBS`, `NVCC_THREADS`, `CMAKE_BUILD_TYPE`, `VERBOSE`), system integration (e.g., `CUDA_HOME`, `CUDA_VISIBLE_DEVICES`, `LD_LIBRARY_PATH`, `LOCAL_RANK`), path defaults (e.g., `XDG_CACHE_HOME`, `XDG_CONFIG_HOME`), S3 credentials (e.g., `S3_ACCESS_KEY_ID`), and standard conventions (e.g., `NO_COLOR`, `DO_NOT_TRACK` as a fallback for `VLLM_DO_NOT_TRACK`). **Special care should be taken for Kubernetes users**: please do not name the service as `vllm`, otherwise environment variables set by Kubernetes might conflict with vLLM's environment variables, because [Kubernetes sets environment variables for each service with the capitalized service name as the prefix](https://kubernetes.io/docs/concepts/services-networking/service/#environment-variables).

Most vLLM-specific environment variables are prefixed with `VLLM_`. However, vLLM also reads several non-`VLLM_`-prefixed environment variables for build configuration (e.g., `MAX_JOBS`, `NVCC_THREADS`, `CMAKE_BUILD_TYPE`, `VERBOSE`), system integration (e.g., `CUDA_HOME`, `CUDA_VISIBLE_DEVICES`, `LD_LIBRARY_PATH`, `LOCAL_RANK`), path defaults (e.g., `XDG_CACHE_HOME`, `XDG_CONFIG_HOME`), S3 credentials (e.g., `S3_ACCESS_KEY_ID`, `S3_SECRET_ACCESS_KEY`, `S3_ENDPOINT_URL`), and standard conventions (e.g., `NO_COLOR`, `DO_NOT_TRACK` as a fallback for `VLLM_DO_NOT_TRACK`). **Special care should be taken for Kubernetes users**: please do not name the service as `vllm`, otherwise environment variables set by Kubernetes might conflict with vLLM's environment variables, because [Kubernetes sets environment variables for each service with the capitalized service name as the prefix](https://kubernetes.io/docs/concepts/services-networking/service/#environment-variables).

gemini-code-assist · 2026-04-16T21:14:30Z

 | best-of | ✅ | ✅ | ✅ | [❌](https://github.com/vllm-project/vllm/issues/6137) | ✅ | ❌ | ✅ | ✅ | ✅ | ❔ | [❌](https://github.com/vllm-project/vllm/issues/7968) | ✅ | ✅ | | |
-| beam-search | ✅ | ✅ | ✅ | [❌](https://github.com/vllm-project/vllm/issues/6137) | ✅ | ❌ | ✅ | ✅ | ✅ | ❔ | [❌](https://github.com/vllm-project/vllm/issues/7968) | ❔ | ✅ | ✅ | |
-| [prompt-embeds](prompt_embeds.md) | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ | ✅ | ❌ | ❔ | ❔ | ❌ | ❔ | ❔ | ✅ |
+| beam-search | ✅ | ✅ | ✅ | [❌](https://github.com/vllm-project/vllm/issues/6137) | ✅ | ❌ | ✅ | ✅ | ❌<sup>†</sup> | ❔ | [❌](https://github.com/vllm-project/vllm/issues/7968) | ❔ | ✅ | ✅ | |


The compatibility matrix is inconsistent for the beam-search and prompt-embeds intersection. In line 55 (the prompt-embeds row), the beam-search column is correctly marked as ❌†. However, in line 54 (the beam-search row), the prompt-embeds column (the last cell) is left empty. To maintain consistency and accuracy in the matrix, this cell should also be marked with ❌†.

Suggested change

| beam-search | ✅ | ✅ | ✅ | [❌](https://github.com/vllm-project/vllm/issues/6137) | ✅ | ❌ | ✅ | ✅ | ❌† | ❔ | [❌](https://github.com/vllm-project/vllm/issues/7968) | ❔ | ✅ | ✅ | |

| beam-search | ✅ | ✅ | ✅ | [❌](https://github.com/vllm-project/vllm/issues/6137) | ✅ | ❌ | ✅ | ✅ | ❌† | ❔ | [❌](https://github.com/vllm-project/vllm/issues/7968) | ❔ | ✅ | ✅ | ❌† |

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

mergify · 2026-05-23T08:48:00Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ianliuy.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

ianliuy requested a review from noooop as a code owner April 16, 2026 21:11

mergify Bot added the documentation Improvements or additions to documentation label Apr 16, 2026

gemini-code-assist Bot reviewed Apr 16, 2026

View reviewed changes

docs: address PR review nits

704f289

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

kevglynn mentioned this pull request May 4, 2026

[Doc]: Docs audit: CLI, plugins, features, env vars, and auth mismatches #39613

Open

1 task

mergify Bot added the needs-rebase label May 23, 2026

This was referenced May 23, 2026

[Docs] Update deprecated VLLM_FLASHINFER_MOE_BACKEND reference #43487

Open

[Docs] Fix stale version number in token_embed.md #43488

Merged

[Docs] Fix stale version number in token_classify.md #43489

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: fix doc-code mismatches from audit#40062

docs: fix doc-code mismatches from audit#40062
ianliuy wants to merge 2 commits into
vllm-project:mainfrom
ianliuy:fix/issue-39613

ianliuy commented Apr 16, 2026

Uh oh!

mergify Bot commented Apr 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 16, 2026

Uh oh!

gemini-code-assist Bot Apr 16, 2026

Uh oh!

mergify Bot commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	\| beam-search \| ✅ \| ✅ \| ✅ \| [❌](https://github.com/vllm-project/vllm/issues/6137) \| ✅ \| ❌ \| ✅ \| ✅ \| ❌<sup>†</sup> \| ❔ \| [❌](https://github.com/vllm-project/vllm/issues/7968) \| ❔ \| ✅ \| ✅ \| \|
	\| beam-search \| ✅ \| ✅ \| ✅ \| [❌](https://github.com/vllm-project/vllm/issues/6137) \| ✅ \| ❌ \| ✅ \| ✅ \| ❌<sup>†</sup> \| ❔ \| [❌](https://github.com/vllm-project/vllm/issues/7968) \| ❔ \| ✅ \| ✅ \| ❌<sup>†</sup> \|

Uh oh!

Conversation

ianliuy commented Apr 16, 2026

What's broken?

What changed?

CLI (docs/cli/README.md)

Plugin system (docs/design/plugin_system.md)

Feature matrix (docs/features/README.md)

Environment variables (docs/configuration/env_vars.md)

Pooling models (docs/models/pooling_models/README.md)

Auth (docs/getting_started/quickstart.md)

How do we know it works?

Uh oh!

mergify Bot commented Apr 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CLI (`docs/cli/README.md`)

Plugin system (`docs/design/plugin_system.md`)

Feature matrix (`docs/features/README.md`)

Environment variables (`docs/configuration/env_vars.md`)

Pooling models (`docs/models/pooling_models/README.md`)

Auth (`docs/getting_started/quickstart.md`)