vllm-project · will-deines · Feb 26, 2026 · Feb 26, 2026 · Mar 2, 2026
diff --git a/docs/contributing/README.md b/docs/contributing/README.md
@@ -29,10 +29,59 @@ See [LICENSE](../../LICENSE).
 The first step of contributing to vLLM is to clone the GitHub repository:
 
 ```bash
-git clone https://github.com/vllm-project/vllm.git
+git clone <your-fork-url>
 cd vllm
 ```
 
+If your fork was created from `github.com/vllm-project/vllm`, add the project remote and name it `upstream`:
+
+```bash
+git remote add upstream https://github.com/vllm-project/vllm.git
+git remote -v
+```
+
+If your remotes are reversed, use `git remote rename` to make your fork `origin` and project repo `upstream`.
+
+If this checkout still has only `origin` pointing at the main vLLM repo, run:
+
+```bash
+git remote rename origin upstream
+git remote add origin <your-fork-url>
+```
+
+### Contributor Workspace Setup
+
+Before you begin, check your branch health:
+
+```bash
+scripts/contributor-workspace.sh status
+```
+
+If your branch is behind or missing `upstream/main`, sync your local `main` first:
+
+```bash
+scripts/contributor-workspace.sh sync-main
+```
+
+### Baseline Version Guidance
+
+For day-to-day contributions to vLLM, work from `main` unless a maintainer explicitly asks for a stable release branch.
+
+Find the latest stable tagged release in this repo:
+
+```bash
+git tag -l 'v[0-9]*.[0-9]*.[0-9]*' \
+  | grep -Ev 'rc|post|dev' \
+  | sort -V \
+  | tail -n 1
+```
+
+Then create a short-lived branch when you need a stable snapshot:
+
+```bash
+git switch -c contrib/<area>-<date> vX.Y.Z
+```
+
 Then, configure your Python virtual environment.
 
 --8<-- "docs/getting_started/installation/python_env_setup.inc.md"
@@ -98,6 +147,14 @@ vLLM's `pre-commit` hooks will now run automatically every time you commit.
     pre-commit run --hook-stage manual mypy-3.10
     ```
 
+If your environment has restricted access to `~/.cache`, use the repo-local helper:
+
+```bash
+scripts/run-pre-commit.sh run -a  # runs on all files
+```
+
+Use this helper whenever you run pre-commit manually.
+
 ### Documentation
 
 MkDocs is a fast, simple and downright gorgeous static site generator that's geared towards building project documentation. Documentation source files are written in Markdown, and configured with a single YAML configuration file, [mkdocs.yaml](../../mkdocs.yaml).
@@ -159,6 +216,16 @@ pytest -s -v tests/test_logger.py
     platform to run unit tests locally, rely on the continuous integration system to run the tests for
     now.
 
+### Choosing PR Candidates
+
+Use this order when selecting work:
+
+- Start with `good first issue` labels, since they are curated for new contributors.
+- Follow issues tagged by a maintainer for component-specific work you can reproduce.
+- If there is no existing issue, open one first and wait for maintainer alignment before writing significant code.
+- For larger refactors, ask for an RFC before implementation; `rfc-required` labels usually indicate this.
+- Keep first PRs narrow: one behavior, one test area, one subsystem.
+
 ## Issues
 
 If you encounter a bug or have a feature request, please [search existing issues](https://github.com/vllm-project/vllm/issues?q=is%3Aissue) first to see if it has already been reported. If not, please [file a new issue](https://github.com/vllm-project/vllm/issues/new/choose), providing as much relevant information as possible.
@@ -224,6 +291,20 @@ The PR needs to meet the following code quality standards:
 - Please add documentation to `docs/` if the PR modifies the user-facing behaviors of vLLM.
   It helps vLLM users understand and utilize the new features or changes.
 
+## First PR Checklist
+
+Before you open your first PR, use this checklist as a concrete execution path:
+
+- Open or find a relevant issue and link it in your PR description.
+- Keep your branch focused on one change only.
+- Choose a PR title with the required type prefix (for example, `[Bugfix]`, `[Doc]`, or `[Model]`).
+- Add the smallest sufficient test coverage for code changes; for docs-only changes, validate docs build.
+- Run `pre-commit` locally before pushing.
+- Include a short test plan and test results section in the PR description.
+- Ensure your commit has DCO sign-off (`git commit -s`).
+
+A small PR with complete checkboxes is easier to review and more likely to move quickly through review.
+
 ### Adding or Changing Kernels
 
 When actively developing or modifying kernels, using the [Incremental Compilation Workflow](./incremental_build.md) is highly recommended for faster build times.

diff --git a/docs/contributing/responses-harmony-state-plan.md b/docs/contributing/responses-harmony-state-plan.md
@@ -0,0 +1,120 @@
+# Responses + Harmony Contribution Plan (Core GPT-OSS Focus)
+
+## Purpose
+This repo is intended as a focused working area for contributions in core vLLM, GPT-OSS models, and Harmony/Responses API features. The list below captures ten concrete tasks grouped by impact.
+
+## 1) Concrete contribution list (priority order)
+
+### 1. Complete refusal and non-text handling in Harmony input conversion
+**Why**: Open-ended chat/tool payloads are currently flattened to text-only assumptions in some branches, which drops refusal and non-text content.
+**Scope**:
+- [vllm/entrypoints/openai/parser/harmony_utils.py](../vllm/entrypoints/openai/parser/harmony_utils.py)
+- [vllm/entrypoints/openai/responses/harmony.py](../vllm/entrypoints/openai/responses/harmony.py)
+**Tasks**:
+- Add support for `refusal` fields and non-text content blocks in `_parse_chat_format_message`/`_parse_harmony_format_message` and equivalent response parsing paths.
+- Ensure round-trips preserve typed content in `Message.from_*` and serializer paths.
+
+### 2. Wire tool-call detection through chat-output parsing
+**Why**: `parse_chat_output()` currently always reports no tool call in the tuple return value, even when tool calls are present.
+**Scope**: [vllm/entrypoints/openai/parser/harmony_utils.py](../vllm/entrypoints/openai/parser/harmony_utils.py)
+**Tasks**:
+- Detect commentary/recipient-marked tool call messages while parsing and set `is_tool_call` correctly.
+- Add unit coverage for models that emit tool calls and partial tool calls.
+
+### 3. Preserve MCP/tool-call context instead of forcing `mcp_call` fallback shape
+**Why**: When converting parser state to response output, MCP metadata is currently overwritten and error handling is missing.
+**Scope**: [vllm/entrypoints/openai/parser/responses_parser.py](../vllm/entrypoints/openai/parser/responses_parser.py)
+**Tasks**:
+- Store and emit tool-server label data when converting `ResponseFunctionToolCallOutputItem` to `McpCall`.
+- Add support for error outputs in function-call result conversion rather than silently dropping metadata.
+
+### 4. Add output annotations/logprob propagation to fallback output builders
+**Why**: Parser fallback paths currently emit placeholder values for `annotations`/`logprobs`, which loses observability and parity with non-harmony parsing.
+**Scope**:
+- [vllm/entrypoints/openai/parser/responses_parser.py](../vllm/entrypoints/openai/parser/responses_parser.py)
+- [vllm/entrypoints/openai/responses/harmony.py](../vllm/entrypoints/openai/responses/harmony.py)
+**Tasks**:
+- Thread through available token-level logprob structures where requested.
+- Keep annotation structure stable and non-null in final output items.
+
+### 5. Include tool output messages in streaming harmony message history
+**Why**: `StreamingHarmonyContext.append_tool_output()` still has a TODO to add tool output messages; without it, state reconstruction can omit tool-result content in streamed runs.
+**Scope**: [vllm/entrypoints/openai/responses/context.py](../vllm/entrypoints/openai/responses/context.py)
+**Tasks**:
+- Append parsed tool-result `Message` objects into `_messages`.
+- Validate that returned stream event order matches non-streaming output item conversion.
+
+### 6. Clarify and fix previous-turn reconstruction in harmony continuation
+**Why**: The slice-delete/reappend block for previous-response continuation appears intentionally redundant and can mis-handle turn boundaries.
+**Scope**: [vllm/entrypoints/openai/responses/serving.py](../vllm/entrypoints/openai/responses/serving.py)
+**Tasks**:
+- Remove no-op behavior and implement explicit final-channel turn trimming policy.
+- Add regression tests for multi-turn conversations where last message is `analysis`/`final`.
+
+### 7. Add robust stateful response persistence and cleanup
+**Why**: response/message stores are explicit in-memory hacks with known leak risks.
+**Scope**:
+- [vllm/entrypoints/openai/responses/serving.py](../vllm/entrypoints/openai/responses/serving.py)
+- [vllm/entrypoints/openai/responses/context.py](../vllm/entrypoints/openai/responses/context.py)
+**Tasks**:
+- Track TTL/size limits or explicit pruning for `response_store`, `msg_store`, and `event_store`.
+- Ensure state used for `previous_response_id` survives normal use while preventing unbounded growth.
+
+### 8. Harden stateful tool execution contract for streaming
+**Why**: Tool execution + streaming paths still have known quirks (disconnect handling and per-request session behavior) not fully codified.
+**Scope**: [vllm/entrypoints/openai/responses/serving.py](../vllm/entrypoints/openai/responses/serving.py)
+**Tasks**:
+- Address `TODO` around disconnect handling in stream generator.
+- Add/extend tests around `previous_response_id` when `background=True`, including stream replay (`starting_after`).
+
+### 9. Improve parser compatibility for nested JSON tool arguments
+**Why**: Nested JSON tool arguments are known to fail in one parser path and are xfailed in streaming mode.
+**Scope**: [tests/entrypoints/openai/tool_parsers/test_hunyuan_a13b_tool_parser.py](../tests/entrypoints/openai/tool_parsers/test_hunyuan_a13b_tool_parser.py), [vllm/entrypoints/openai/tool_parsers/hunyuan_a13b_tool_parser.py](../vllm/entrypoints/openai/tool_parsers/hunyuan_a13b_tool_parser.py)
+**Tasks**:
+- Implement nested-object parsing in tool-parser extraction.
+- Remove remaining skip/xfail behavior and add focused regression tests.
+
+### 10. Fix GPT-OSS MoE Triton routing-weight path
+**Why**: `apply_router_weight_on_input` is currently ignored in the custom MOE kernel path, which can alter behavior versus reference implementation.
+**Scope**: [vllm/model_executor/layers/fused_moe/gpt_oss_triton_kernels_moe.py](../vllm/model_executor/layers/fused_moe/gpt_oss_triton_kernels_moe.py)
+**Tasks**:
+- Thread/consume `apply_router_weight_on_input` in `OAITritonExperts.apply`.
+- Add kernel parity tests in [tests/kernels/moe/test_modular_oai_triton_moe.py](../tests/kernels/moe/test_modular_oai_triton_moe.py).
+
+## 2) How Responses API statefulness currently works
+
+### Core behavior
+1. On request, `create_responses()` optionally loads prior response with `previous_response_id`.
+2. `previous_response_id` is loaded from `self.response_store`; if missing, the request returns `invalid_request_error`.
+3. For non-Harmony models, `construct_input_messages()` prepends previous chat messages (`msg_store`) and previous assistant outputs (`prev_response.output`) before appending current input.
+4. For Harmony/GPT-OSS, `_construct_input_messages_with_harmony()` loads previous harmony messages from `msg_store[prev_response.id]` and appends parsed new input.
+5. `msg_store`/`response_store` are only reliably populated when `store=True`.
+
+### Practical caveats
+- `msg_store` is in-memory only and has no eviction policy (`FIXME` comments mark this as a memory leak risk).
+- `response_store` and `event_store` are also in-memory hacks with no retention policy.
+- Request-level state is per `response_id`; streaming replay (`starting_after`) reads from `event_store` rather than recomputing.
+- In Harmony, `context.messages` includes init state + generated messages; stateful reconstruction depends on correct `_construct_input_messages_with_harmony()` behavior.
+
+## 3) TODOs affecting stateful correctness
+
+### Direct TODOs in responses/harmony flow
+- `vllm/entrypoints/openai/responses/harmony.py` and `.../parser/harmony_utils.py`: refusal/non-text support gaps.
+- `.../parser/harmony_utils.py`: `parse_chat_output()` does not report `is_tool_call` yet.
+- `.../responses/context.py`: add tool output messages in streaming harmony history.
+- `.../responses/serving.py`: known streaming bug around tool session initialization/streaming path, plus disconnect TODO.
+- `.../responses/serving.py`: previous-response continuation block is currently redundant and needs explicit final-turn handling.
+- `.../responses/serving.py`: store/event maps include explicit FIXME about unbounded memory use.
+- `.../protocol.py`: incomplete-details only covers max tokens; content_filter reason is still TODO.
+- `.../protocol.py`: non-harmony previous_input message support is marked TODO.
+
+### Parser/Output conversion TODOs that impact state replay
+- `.../parser/responses_parser.py`: MCP server label and error-output conversion are incomplete.
+- `.../parser/responses_parser.py` and `.../responses/harmony.py`: annotations/logprobs are placeholders in several conversion paths.
+- `.../streaming_events.py`: TODOs around logprobs and web-search URL/ids in emitted events (impacting debugability and stream consumers).
+
+## 4) Suggested near-term execution order
+1. Address parser/harmony correctness items (1, 2, 4) together to stabilize message/output semantics.
+2. Fix streaming tool output and state reconstruction (5, 6) to avoid multi-turn drift.
+3. Add robust state persistence policy + regression tests (7, 8).
+4. Improve tool-call robustness and kernel parity work (9, 10).
diff --git a/scripts/contributor-workspace.sh b/scripts/contributor-workspace.sh
@@ -0,0 +1,145 @@
+#!/usr/bin/env bash
+
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
+UPSTREAM_REMOTE="${UPSTREAM_REMOTE:-upstream}"
+BASE_BRANCH="${BASE_BRANCH:-main}"
+
+cd "$REPO_ROOT"
+
+resolve_upstream_remote() {
+  if git remote get-url "$UPSTREAM_REMOTE" >/dev/null 2>&1; then
+    printf '%s\n' "$UPSTREAM_REMOTE"
+    return 0
+  fi
+
+  if git remote get-url origin >/dev/null 2>&1; then
+    local origin_url
+    origin_url="$(git remote get-url origin)"
+    if printf '%s' "$origin_url" | grep -Eq 'vllm-project/vllm(\.git)?$'; then
+      printf 'origin\n'
+      return 0
+    fi
+  fi
+
+  return 1
+}
+
+print_status() {
+  local upstream_remote
+  upstream_remote="$(resolve_upstream_remote || true)"
+
+  local latest_stable
+  latest_stable="$(git tag -l 'v[0-9]*.[0-9]*.[0-9]*' | grep -Ev 'rc|post|dev' | sort -V | tail -n 1)"
+  echo
+  echo "Contributor workspace status"
+  echo "----------------------------"
+  echo "Repo:            ${REPO_ROOT}"
+  echo "Branch:          $(git branch --show-current)"
+  echo
+  echo "Remotes:"
+  git remote -v
+  echo
+
+  if [ -n "$upstream_remote" ]; then
+    if git rev-parse --verify --quiet "refs/remotes/$upstream_remote/$BASE_BRANCH" >/dev/null; then
+      echo "Tracking base:"
+      local local_ahead upstream_ahead
+      read -r local_ahead upstream_ahead < <(git rev-list --left-right --count "${BASE_BRANCH}...$upstream_remote/$BASE_BRANCH")
+      printf "  %s: upstream=%s\n" "$BASE_BRANCH" "$upstream_remote/$BASE_BRANCH"
+      printf "  local commits not in upstream / upstream commits not in local: %s\n" "$local_ahead/$upstream_ahead"
+    else
+      echo "Remote branch not cached locally yet: $upstream_remote/$BASE_BRANCH (run: scripts/contributor-workspace.sh sync-main)"
+    fi
+  else
+    echo "No upstream project remote found (expected upstream project remote)."
+    echo "Run: git remote add upstream https://github.com/vllm-project/vllm.git"
+    echo "If your fork is in origin and upstream points to your fork, you can set UPSTREAM_REMOTE=origin."
+  fi
+
+  echo
+  if [ -n "$latest_stable" ]; then
+    echo "Latest stable tag (non-rc/post): $latest_stable"
+    echo "  To create a fresh baseline branch:"
+    echo "  git switch -c contrib/stable-base \"$latest_stable\""
+  else
+    echo "No stable version tag found by pattern: vN.N.N"
+  fi
+  echo
+}
+
+sync_main() {
+  local current_branch
+  current_branch="$(git symbolic-ref --short -q HEAD || true)"
+  local upstream_remote
+  upstream_remote="$(resolve_upstream_remote || true)"
+
+  if [ -z "$upstream_remote" ]; then
+    echo "Missing upstream remote. Add it first:"
+    echo "git remote add $UPSTREAM_REMOTE https://github.com/vllm-project/vllm.git"
+    exit 1
+  fi
+
+  if [ -n "$(git status --porcelain)" ]; then
+    echo "Workspace has uncommitted changes. Commit/stash or reset before sync."
+    git status --short
+    exit 1
+  fi
+
+  if [ "$upstream_remote" != "$UPSTREAM_REMOTE" ]; then
+    echo "Using upstream remote '$upstream_remote' (set UPSTREAM_REMOTE explicitly to override)."
+    UPSTREAM_REMOTE="$upstream_remote"
+  fi
+
+  git fetch "$UPSTREAM_REMOTE" --prune --tags
+
+  if git rev-parse --verify --quiet "refs/remotes/$UPSTREAM_REMOTE/$BASE_BRANCH" >/dev/null; then
+    git switch "$BASE_BRANCH"
+    git pull --ff-only "$UPSTREAM_REMOTE" "$BASE_BRANCH"
+  else
+    echo "Creating local ${BASE_BRANCH} from $UPSTREAM_REMOTE/${BASE_BRANCH}"
+    git switch --track -c "$BASE_BRANCH" "$UPSTREAM_REMOTE/$BASE_BRANCH"
+  fi
+
+  if [ -n "$current_branch" ] && [ "$current_branch" != "$BASE_BRANCH" ]; then
+    git switch "$current_branch"
+    echo
+    echo "Updated $BASE_BRANCH from $UPSTREAM_REMOTE/$BASE_BRANCH"
+    echo "Returned to your working branch: $current_branch"
+  else
+    echo
+    echo "Updated $BASE_BRANCH from $UPSTREAM_REMOTE/$BASE_BRANCH"
+    echo "Working branch remains checked out: $BASE_BRANCH"
+  fi
+}
+
+show_help() {
+  cat <<'EOF'
+Usage:
+  scripts/contributor-workspace.sh status
+  scripts/contributor-workspace.sh sync-main
+
+Environment:
+  UPSTREAM_REMOTE  Remote that points to https://github.com/vllm-project/vllm (default: upstream)
+  BASE_BRANCH     Base branch to track and sync (default: main)
+EOF
+}
+
+case "${1:-status}" in
+  status)
+    print_status
+    ;;
+  sync-main)
+    sync_main
+    ;;
+  -h|--help|help)
+    show_help
+    ;;
+  *)
+    echo "Unknown mode: ${1:-}"
+    show_help
+    exit 1
+    ;;
+esac