Skip to content

docs: fix doc-code mismatches from audit#40062

Open
ianliuy wants to merge 2 commits into
vllm-project:mainfrom
ianliuy:fix/issue-39613
Open

docs: fix doc-code mismatches from audit#40062
ianliuy wants to merge 2 commits into
vllm-project:mainfrom
ianliuy:fix/issue-39613

Conversation

@ianliuy
Copy link
Copy Markdown
Contributor

@ianliuy ianliuy commented Apr 16, 2026

What's broken?

An automated documentation audit found 14+ places where docs diverge from current code behavior. Users following the docs may encounter phantom CLI modes, incorrect compatibility claims, and misleading env-var guidance.

What changed?

Docs-only fixes across 6 files — no code changes.

CLI (docs/cli/README.md)

  • Removed phantom --help=listgroup: The parser has no dedicated listgroup handler; the default --help already lists groups.
  • Removed phantom --help=page: No pager integration exists; page is treated as a search keyword.
  • Added launch subcommand: The code registers launch but it was missing from the CLI guide.

Plugin system (docs/design/plugin_system.md)

  • Clarified per-group loading scope: General and platform plugins load in all processes; IO processor and stat logger plugins load in process 0 only.
  • Documented VLLM_PLUGINS cross-group filtering: The env var filters all plugin groups, not just general plugins.

Feature matrix (docs/features/README.md)

  • Fixed beam-search × prompt logprobs: Changed ✅ → ❌ — the serving path explicitly returns prompt_logprobs=None.
  • Fixed prompt-embeds × beam-search: Changed ❔ → ❌ — beam search raises NotImplementedError for embedding prompts.
  • Added footnote explaining beam search serving-path restrictions.

Environment variables (docs/configuration/env_vars.md)

  • Corrected "all VLLM_ prefixed" claim: vLLM also reads MAX_JOBS, NVCC_THREADS, CMAKE_BUILD_TYPE, CUDA_HOME, NO_COLOR, DO_NOT_TRACK, XDG_CACHE_HOME, etc.
  • Documented VLLM_PORT port-scanning behavior: When set, it serves as a base port and scans upward for additional internal ports.

Pooling models (docs/models/pooling_models/README.md)

  • Documented scoring endpoint conditions: /score and rerank are only enabled for embed/token_embed tasks, or classify with num_labels == 1.
  • Documented default task selection: When no task is specified, a priority order is used to select the default pooling task.

Auth (docs/getting_started/quickstart.md)

  • Clarified auth scope: API key auth applies only to /v1 routes, requires Authorization: Bearer header, and skips OPTIONS requests.

How do we know it works?

Each doc change was verified by reading the corresponding source code to confirm the documented behavior matches:

  • vllm/utils/argparse_utils.py for CLI help modes
  • vllm/entrypoints/cli/main.py for the launch subcommand
  • vllm/plugins/__init__.py for plugin loading and filtering
  • vllm/entrypoints/openai/engine/serving.py for beam search behavior
  • vllm/envs.py for environment variable definitions
  • vllm/entrypoints/pooling/utils.py and vllm/config/model.py for pooling/scoring
  • vllm/entrypoints/openai/server_utils.py for auth middleware

Fixes #39613

- CLI: remove phantom --help=listgroup and --help=page modes, add launch subcommand
- Plugins: clarify per-group loading scope and VLLM_PLUGINS cross-group filtering
- Features: fix beam-search x prompt-logprobs and prompt-embeds compatibility
- Env vars: correct VLLM_ prefix claim, document non-VLLM_ vars and port scanning
- Pooling: document scoring endpoint conditions and default task selection
- Auth: clarify /v1-only scope and Bearer header requirement

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Yiyang Liu <37043548+ianliuy@users.noreply.github.com>
@ianliuy ianliuy requested a review from noooop as a code owner April 16, 2026 21:11
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 16, 2026

Documentation preview: https://vllm--40062.org.readthedocs.build/en/40062/

@mergify mergify Bot added the documentation Improvements or additions to documentation label Apr 16, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the documentation to include the new launch CLI command, clarify environment variable usage (including non-VLLM_ prefixed ones), and specify plugin loading scopes. It also adds details on API key authentication, feature compatibility for beam search, and endpoint availability for pooling models. Review feedback recommends adding missing S3 environment variables for completeness and ensuring consistency in the feature compatibility matrix regarding beam search and prompt embeddings.

Comment thread docs/configuration/env_vars.md Outdated
Please note that `VLLM_PORT` and `VLLM_HOST_IP` set the port and ip for vLLM's **internal usage**. It is not the port and ip for the API server. If you use `--host $VLLM_HOST_IP` and `--port $VLLM_PORT` to start the API server, it will not work. When `VLLM_PORT` is set, it is used as a base port; if multiple internal ports are needed, vLLM scans upward from that value to allocate them.

All environment variables used by vLLM are prefixed with `VLLM_`. **Special care should be taken for Kubernetes users**: please do not name the service as `vllm`, otherwise environment variables set by Kubernetes might conflict with vLLM's environment variables, because [Kubernetes sets environment variables for each service with the capitalized service name as the prefix](https://kubernetes.io/docs/concepts/services-networking/service/#environment-variables).
Most vLLM-specific environment variables are prefixed with `VLLM_`. However, vLLM also reads several non-`VLLM_`-prefixed environment variables for build configuration (e.g., `MAX_JOBS`, `NVCC_THREADS`, `CMAKE_BUILD_TYPE`, `VERBOSE`), system integration (e.g., `CUDA_HOME`, `CUDA_VISIBLE_DEVICES`, `LD_LIBRARY_PATH`, `LOCAL_RANK`), path defaults (e.g., `XDG_CACHE_HOME`, `XDG_CONFIG_HOME`), S3 credentials (e.g., `S3_ACCESS_KEY_ID`), and standard conventions (e.g., `NO_COLOR`, `DO_NOT_TRACK` as a fallback for `VLLM_DO_NOT_TRACK`). **Special care should be taken for Kubernetes users**: please do not name the service as `vllm`, otherwise environment variables set by Kubernetes might conflict with vLLM's environment variables, because [Kubernetes sets environment variables for each service with the capitalized service name as the prefix](https://kubernetes.io/docs/concepts/services-networking/service/#environment-variables).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The list of non-VLLM_ prefixed environment variables for S3 credentials is incomplete. While S3_ACCESS_KEY_ID is mentioned, vLLM also relies on S3_SECRET_ACCESS_KEY and S3_ENDPOINT_URL for S3 integration (as seen in vllm/envs.py). Including these provides a more complete and helpful guide for users configuring S3 storage.

Suggested change
Most vLLM-specific environment variables are prefixed with `VLLM_`. However, vLLM also reads several non-`VLLM_`-prefixed environment variables for build configuration (e.g., `MAX_JOBS`, `NVCC_THREADS`, `CMAKE_BUILD_TYPE`, `VERBOSE`), system integration (e.g., `CUDA_HOME`, `CUDA_VISIBLE_DEVICES`, `LD_LIBRARY_PATH`, `LOCAL_RANK`), path defaults (e.g., `XDG_CACHE_HOME`, `XDG_CONFIG_HOME`), S3 credentials (e.g., `S3_ACCESS_KEY_ID`), and standard conventions (e.g., `NO_COLOR`, `DO_NOT_TRACK` as a fallback for `VLLM_DO_NOT_TRACK`). **Special care should be taken for Kubernetes users**: please do not name the service as `vllm`, otherwise environment variables set by Kubernetes might conflict with vLLM's environment variables, because [Kubernetes sets environment variables for each service with the capitalized service name as the prefix](https://kubernetes.io/docs/concepts/services-networking/service/#environment-variables).
Most vLLM-specific environment variables are prefixed with `VLLM_`. However, vLLM also reads several non-`VLLM_`-prefixed environment variables for build configuration (e.g., `MAX_JOBS`, `NVCC_THREADS`, `CMAKE_BUILD_TYPE`, `VERBOSE`), system integration (e.g., `CUDA_HOME`, `CUDA_VISIBLE_DEVICES`, `LD_LIBRARY_PATH`, `LOCAL_RANK`), path defaults (e.g., `XDG_CACHE_HOME`, `XDG_CONFIG_HOME`), S3 credentials (e.g., `S3_ACCESS_KEY_ID`, `S3_SECRET_ACCESS_KEY`, `S3_ENDPOINT_URL`), and standard conventions (e.g., `NO_COLOR`, `DO_NOT_TRACK` as a fallback for `VLLM_DO_NOT_TRACK`). **Special care should be taken for Kubernetes users**: please do not name the service as `vllm`, otherwise environment variables set by Kubernetes might conflict with vLLM's environment variables, because [Kubernetes sets environment variables for each service with the capitalized service name as the prefix](https://kubernetes.io/docs/concepts/services-networking/service/#environment-variables).

Comment thread docs/features/README.md Outdated
| best-of | ✅ | ✅ | ✅ | [❌](https://github.com/vllm-project/vllm/issues/6137) | ✅ | ❌ | ✅ | ✅ | ✅ | ❔ | [❌](https://github.com/vllm-project/vllm/issues/7968) | ✅ | ✅ | | |
| beam-search | ✅ | ✅ | ✅ | [❌](https://github.com/vllm-project/vllm/issues/6137) | ✅ | ❌ | ✅ | ✅ | | ❔ | [❌](https://github.com/vllm-project/vllm/issues/7968) | ❔ | ✅ | ✅ | |
| [prompt-embeds](prompt_embeds.md) | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ | ✅ | ❌ | ❔ | ❔ | ❌ | ❔ | | ✅ |
| beam-search | ✅ | ✅ | ✅ | [❌](https://github.com/vllm-project/vllm/issues/6137) | ✅ | ❌ | ✅ | ✅ | ❌<sup>†</sup> | ❔ | [❌](https://github.com/vllm-project/vllm/issues/7968) | ❔ | ✅ | ✅ | |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The compatibility matrix is inconsistent for the beam-search and prompt-embeds intersection. In line 55 (the prompt-embeds row), the beam-search column is correctly marked as ❌<sup>†</sup>. However, in line 54 (the beam-search row), the prompt-embeds column (the last cell) is left empty. To maintain consistency and accuracy in the matrix, this cell should also be marked with ❌<sup>†</sup>.

Suggested change
| beam-search |||| [](https://github.com/vllm-project/vllm/issues/6137) ||||| ❌<sup>†</sup> || [](https://github.com/vllm-project/vllm/issues/7968) |||| |
| beam-search |||| [](https://github.com/vllm-project/vllm/issues/6137) ||||| ❌<sup>†</sup> || [](https://github.com/vllm-project/vllm/issues/7968) |||| ❌<sup>†</sup> |

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 23, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ianliuy.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation needs-rebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Doc]: Docs audit: CLI, plugins, features, env vars, and auth mismatches

1 participant