Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 82 additions & 1 deletion docs/contributing/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,59 @@ See [LICENSE](../../LICENSE).
The first step of contributing to vLLM is to clone the GitHub repository:

```bash
git clone https://github.com/vllm-project/vllm.git
git clone <your-fork-url>
cd vllm
```

If your fork was created from `github.com/vllm-project/vllm`, add the project remote and name it `upstream`:

```bash
git remote add upstream https://github.com/vllm-project/vllm.git
git remote -v
```

If your remotes are reversed, use `git remote rename` to make your fork `origin` and project repo `upstream`.

If this checkout still has only `origin` pointing at the main vLLM repo, run:

```bash
git remote rename origin upstream
git remote add origin <your-fork-url>
```

### Contributor Workspace Setup

Before you begin, check your branch health:

```bash
scripts/contributor-workspace.sh status
```

If your branch is behind or missing `upstream/main`, sync your local `main` first:

```bash
scripts/contributor-workspace.sh sync-main
```

### Baseline Version Guidance

For day-to-day contributions to vLLM, work from `main` unless a maintainer explicitly asks for a stable release branch.

Find the latest stable tagged release in this repo:

```bash
git tag -l 'v[0-9]*.[0-9]*.[0-9]*' \
| grep -Ev 'rc|post|dev' \
| sort -V \
| tail -n 1
```

Then create a short-lived branch when you need a stable snapshot:

```bash
git switch -c contrib/<area>-<date> vX.Y.Z
```

Then, configure your Python virtual environment.

--8<-- "docs/getting_started/installation/python_env_setup.inc.md"
Expand Down Expand Up @@ -98,6 +147,14 @@ vLLM's `pre-commit` hooks will now run automatically every time you commit.
pre-commit run --hook-stage manual mypy-3.10
```

If your environment has restricted access to `~/.cache`, use the repo-local helper:

```bash
scripts/run-pre-commit.sh run -a # runs on all files
```

Use this helper whenever you run pre-commit manually.

### Documentation

MkDocs is a fast, simple and downright gorgeous static site generator that's geared towards building project documentation. Documentation source files are written in Markdown, and configured with a single YAML configuration file, [mkdocs.yaml](../../mkdocs.yaml).
Expand Down Expand Up @@ -159,6 +216,16 @@ pytest -s -v tests/test_logger.py
platform to run unit tests locally, rely on the continuous integration system to run the tests for
now.

### Choosing PR Candidates

Use this order when selecting work:

- Start with `good first issue` labels, since they are curated for new contributors.
- Follow issues tagged by a maintainer for component-specific work you can reproduce.
- If there is no existing issue, open one first and wait for maintainer alignment before writing significant code.
- For larger refactors, ask for an RFC before implementation; `rfc-required` labels usually indicate this.
- Keep first PRs narrow: one behavior, one test area, one subsystem.

## Issues

If you encounter a bug or have a feature request, please [search existing issues](https://github.com/vllm-project/vllm/issues?q=is%3Aissue) first to see if it has already been reported. If not, please [file a new issue](https://github.com/vllm-project/vllm/issues/new/choose), providing as much relevant information as possible.
Expand Down Expand Up @@ -224,6 +291,20 @@ The PR needs to meet the following code quality standards:
- Please add documentation to `docs/` if the PR modifies the user-facing behaviors of vLLM.
It helps vLLM users understand and utilize the new features or changes.

## First PR Checklist

Before you open your first PR, use this checklist as a concrete execution path:

- Open or find a relevant issue and link it in your PR description.
- Keep your branch focused on one change only.
- Choose a PR title with the required type prefix (for example, `[Bugfix]`, `[Doc]`, or `[Model]`).
- Add the smallest sufficient test coverage for code changes; for docs-only changes, validate docs build.
- Run `pre-commit` locally before pushing.
- Include a short test plan and test results section in the PR description.
- Ensure your commit has DCO sign-off (`git commit -s`).

A small PR with complete checkboxes is easier to review and more likely to move quickly through review.

### Adding or Changing Kernels

When actively developing or modifying kernels, using the [Incremental Compilation Workflow](./incremental_build.md) is highly recommended for faster build times.
Expand Down
120 changes: 120 additions & 0 deletions docs/contributing/responses-harmony-state-plan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# Responses + Harmony Contribution Plan (Core GPT-OSS Focus)

## Purpose

Check failure on line 3 in docs/contributing/responses-harmony-state-plan.md

View workflow job for this annotation

GitHub Actions / pre-commit

Headings should be surrounded by blank lines [Expected: 1; Actual: 0; Below] [Context: "## Purpose"]
This repo is intended as a focused working area for contributions in core vLLM, GPT-OSS models, and Harmony/Responses API features. The list below captures ten concrete tasks grouped by impact.

## 1) Concrete contribution list (priority order)

### 1. Complete refusal and non-text handling in Harmony input conversion

Check failure on line 8 in docs/contributing/responses-harmony-state-plan.md

View workflow job for this annotation

GitHub Actions / pre-commit

Headings should be surrounded by blank lines [Expected: 1; Actual: 0; Below] [Context: "### 1. Complete refusal and non-text handling in Harmony input conversion"]
**Why**: Open-ended chat/tool payloads are currently flattened to text-only assumptions in some branches, which drops refusal and non-text content.
**Scope**:
- [vllm/entrypoints/openai/parser/harmony_utils.py](../vllm/entrypoints/openai/parser/harmony_utils.py)

Check failure on line 11 in docs/contributing/responses-harmony-state-plan.md

View workflow job for this annotation

GitHub Actions / pre-commit

Lists should be surrounded by blank lines [Context: "- [vllm/entrypoints/openai/par..."]
- [vllm/entrypoints/openai/responses/harmony.py](../vllm/entrypoints/openai/responses/harmony.py)
**Tasks**:
- Add support for `refusal` fields and non-text content blocks in `_parse_chat_format_message`/`_parse_harmony_format_message` and equivalent response parsing paths.
- Ensure round-trips preserve typed content in `Message.from_*` and serializer paths.

### 2. Wire tool-call detection through chat-output parsing

Check failure on line 17 in docs/contributing/responses-harmony-state-plan.md

View workflow job for this annotation

GitHub Actions / pre-commit

Headings should be surrounded by blank lines [Expected: 1; Actual: 0; Below] [Context: "### 2. Wire tool-call detection through chat-output parsing"]
**Why**: `parse_chat_output()` currently always reports no tool call in the tuple return value, even when tool calls are present.
**Scope**: [vllm/entrypoints/openai/parser/harmony_utils.py](../vllm/entrypoints/openai/parser/harmony_utils.py)
**Tasks**:
- Detect commentary/recipient-marked tool call messages while parsing and set `is_tool_call` correctly.
- Add unit coverage for models that emit tool calls and partial tool calls.

### 3. Preserve MCP/tool-call context instead of forcing `mcp_call` fallback shape
**Why**: When converting parser state to response output, MCP metadata is currently overwritten and error handling is missing.
**Scope**: [vllm/entrypoints/openai/parser/responses_parser.py](../vllm/entrypoints/openai/parser/responses_parser.py)
**Tasks**:
- Store and emit tool-server label data when converting `ResponseFunctionToolCallOutputItem` to `McpCall`.
- Add support for error outputs in function-call result conversion rather than silently dropping metadata.

### 4. Add output annotations/logprob propagation to fallback output builders
**Why**: Parser fallback paths currently emit placeholder values for `annotations`/`logprobs`, which loses observability and parity with non-harmony parsing.
**Scope**:
- [vllm/entrypoints/openai/parser/responses_parser.py](../vllm/entrypoints/openai/parser/responses_parser.py)
- [vllm/entrypoints/openai/responses/harmony.py](../vllm/entrypoints/openai/responses/harmony.py)
**Tasks**:
- Thread through available token-level logprob structures where requested.
- Keep annotation structure stable and non-null in final output items.

### 5. Include tool output messages in streaming harmony message history
**Why**: `StreamingHarmonyContext.append_tool_output()` still has a TODO to add tool output messages; without it, state reconstruction can omit tool-result content in streamed runs.
**Scope**: [vllm/entrypoints/openai/responses/context.py](../vllm/entrypoints/openai/responses/context.py)
**Tasks**:
- Append parsed tool-result `Message` objects into `_messages`.
- Validate that returned stream event order matches non-streaming output item conversion.

### 6. Clarify and fix previous-turn reconstruction in harmony continuation
**Why**: The slice-delete/reappend block for previous-response continuation appears intentionally redundant and can mis-handle turn boundaries.
**Scope**: [vllm/entrypoints/openai/responses/serving.py](../vllm/entrypoints/openai/responses/serving.py)
**Tasks**:
- Remove no-op behavior and implement explicit final-channel turn trimming policy.
- Add regression tests for multi-turn conversations where last message is `analysis`/`final`.

### 7. Add robust stateful response persistence and cleanup
**Why**: response/message stores are explicit in-memory hacks with known leak risks.
**Scope**:
- [vllm/entrypoints/openai/responses/serving.py](../vllm/entrypoints/openai/responses/serving.py)
- [vllm/entrypoints/openai/responses/context.py](../vllm/entrypoints/openai/responses/context.py)
**Tasks**:
- Track TTL/size limits or explicit pruning for `response_store`, `msg_store`, and `event_store`.
- Ensure state used for `previous_response_id` survives normal use while preventing unbounded growth.

### 8. Harden stateful tool execution contract for streaming
**Why**: Tool execution + streaming paths still have known quirks (disconnect handling and per-request session behavior) not fully codified.
**Scope**: [vllm/entrypoints/openai/responses/serving.py](../vllm/entrypoints/openai/responses/serving.py)
**Tasks**:
- Address `TODO` around disconnect handling in stream generator.
- Add/extend tests around `previous_response_id` when `background=True`, including stream replay (`starting_after`).

### 9. Improve parser compatibility for nested JSON tool arguments
**Why**: Nested JSON tool arguments are known to fail in one parser path and are xfailed in streaming mode.
**Scope**: [tests/entrypoints/openai/tool_parsers/test_hunyuan_a13b_tool_parser.py](../tests/entrypoints/openai/tool_parsers/test_hunyuan_a13b_tool_parser.py), [vllm/entrypoints/openai/tool_parsers/hunyuan_a13b_tool_parser.py](../vllm/entrypoints/openai/tool_parsers/hunyuan_a13b_tool_parser.py)
**Tasks**:
- Implement nested-object parsing in tool-parser extraction.
- Remove remaining skip/xfail behavior and add focused regression tests.

### 10. Fix GPT-OSS MoE Triton routing-weight path
**Why**: `apply_router_weight_on_input` is currently ignored in the custom MOE kernel path, which can alter behavior versus reference implementation.
**Scope**: [vllm/model_executor/layers/fused_moe/gpt_oss_triton_kernels_moe.py](../vllm/model_executor/layers/fused_moe/gpt_oss_triton_kernels_moe.py)
**Tasks**:
- Thread/consume `apply_router_weight_on_input` in `OAITritonExperts.apply`.
- Add kernel parity tests in [tests/kernels/moe/test_modular_oai_triton_moe.py](../tests/kernels/moe/test_modular_oai_triton_moe.py).

## 2) How Responses API statefulness currently works

### Core behavior
1. On request, `create_responses()` optionally loads prior response with `previous_response_id`.
2. `previous_response_id` is loaded from `self.response_store`; if missing, the request returns `invalid_request_error`.
3. For non-Harmony models, `construct_input_messages()` prepends previous chat messages (`msg_store`) and previous assistant outputs (`prev_response.output`) before appending current input.
4. For Harmony/GPT-OSS, `_construct_input_messages_with_harmony()` loads previous harmony messages from `msg_store[prev_response.id]` and appends parsed new input.
5. `msg_store`/`response_store` are only reliably populated when `store=True`.

### Practical caveats
- `msg_store` is in-memory only and has no eviction policy (`FIXME` comments mark this as a memory leak risk).
- `response_store` and `event_store` are also in-memory hacks with no retention policy.
- Request-level state is per `response_id`; streaming replay (`starting_after`) reads from `event_store` rather than recomputing.
- In Harmony, `context.messages` includes init state + generated messages; stateful reconstruction depends on correct `_construct_input_messages_with_harmony()` behavior.

## 3) TODOs affecting stateful correctness

### Direct TODOs in responses/harmony flow
- `vllm/entrypoints/openai/responses/harmony.py` and `.../parser/harmony_utils.py`: refusal/non-text support gaps.
- `.../parser/harmony_utils.py`: `parse_chat_output()` does not report `is_tool_call` yet.
- `.../responses/context.py`: add tool output messages in streaming harmony history.
- `.../responses/serving.py`: known streaming bug around tool session initialization/streaming path, plus disconnect TODO.
- `.../responses/serving.py`: previous-response continuation block is currently redundant and needs explicit final-turn handling.
- `.../responses/serving.py`: store/event maps include explicit FIXME about unbounded memory use.
- `.../protocol.py`: incomplete-details only covers max tokens; content_filter reason is still TODO.
- `.../protocol.py`: non-harmony previous_input message support is marked TODO.

### Parser/Output conversion TODOs that impact state replay
- `.../parser/responses_parser.py`: MCP server label and error-output conversion are incomplete.
- `.../parser/responses_parser.py` and `.../responses/harmony.py`: annotations/logprobs are placeholders in several conversion paths.
- `.../streaming_events.py`: TODOs around logprobs and web-search URL/ids in emitted events (impacting debugability and stream consumers).

## 4) Suggested near-term execution order
1. Address parser/harmony correctness items (1, 2, 4) together to stabilize message/output semantics.
2. Fix streaming tool output and state reconstruction (5, 6) to avoid multi-turn drift.
3. Add robust state persistence policy + regression tests (7, 8).
4. Improve tool-call robustness and kernel parity work (9, 10).
145 changes: 145 additions & 0 deletions scripts/contributor-workspace.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
#!/usr/bin/env bash

set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
UPSTREAM_REMOTE="${UPSTREAM_REMOTE:-upstream}"
BASE_BRANCH="${BASE_BRANCH:-main}"

cd "$REPO_ROOT"

resolve_upstream_remote() {
if git remote get-url "$UPSTREAM_REMOTE" >/dev/null 2>&1; then
printf '%s\n' "$UPSTREAM_REMOTE"
return 0
fi

if git remote get-url origin >/dev/null 2>&1; then
local origin_url
origin_url="$(git remote get-url origin)"
if printf '%s' "$origin_url" | grep -Eq 'vllm-project/vllm(\.git)?$'; then
printf 'origin\n'
return 0
fi
fi

return 1
}

print_status() {
local upstream_remote
upstream_remote="$(resolve_upstream_remote || true)"

local latest_stable
latest_stable="$(git tag -l 'v[0-9]*.[0-9]*.[0-9]*' | grep -Ev 'rc|post|dev' | sort -V | tail -n 1)"
echo
echo "Contributor workspace status"
echo "----------------------------"
echo "Repo: ${REPO_ROOT}"
echo "Branch: $(git branch --show-current)"
echo
echo "Remotes:"
git remote -v
echo

if [ -n "$upstream_remote" ]; then
if git rev-parse --verify --quiet "refs/remotes/$upstream_remote/$BASE_BRANCH" >/dev/null; then
echo "Tracking base:"
local local_ahead upstream_ahead
read -r local_ahead upstream_ahead < <(git rev-list --left-right --count "${BASE_BRANCH}...$upstream_remote/$BASE_BRANCH")
printf " %s: upstream=%s\n" "$BASE_BRANCH" "$upstream_remote/$BASE_BRANCH"
printf " local commits not in upstream / upstream commits not in local: %s\n" "$local_ahead/$upstream_ahead"
else
echo "Remote branch not cached locally yet: $upstream_remote/$BASE_BRANCH (run: scripts/contributor-workspace.sh sync-main)"
fi
else
echo "No upstream project remote found (expected upstream project remote)."
echo "Run: git remote add upstream https://github.com/vllm-project/vllm.git"
echo "If your fork is in origin and upstream points to your fork, you can set UPSTREAM_REMOTE=origin."
fi

echo
if [ -n "$latest_stable" ]; then
echo "Latest stable tag (non-rc/post): $latest_stable"
echo " To create a fresh baseline branch:"
echo " git switch -c contrib/stable-base \"$latest_stable\""
else
echo "No stable version tag found by pattern: vN.N.N"
fi
echo
}

sync_main() {
local current_branch
current_branch="$(git symbolic-ref --short -q HEAD || true)"
local upstream_remote
upstream_remote="$(resolve_upstream_remote || true)"

if [ -z "$upstream_remote" ]; then
echo "Missing upstream remote. Add it first:"
echo "git remote add $UPSTREAM_REMOTE https://github.com/vllm-project/vllm.git"
exit 1
fi

if [ -n "$(git status --porcelain)" ]; then
echo "Workspace has uncommitted changes. Commit/stash or reset before sync."
git status --short
exit 1
fi

if [ "$upstream_remote" != "$UPSTREAM_REMOTE" ]; then
echo "Using upstream remote '$upstream_remote' (set UPSTREAM_REMOTE explicitly to override)."
UPSTREAM_REMOTE="$upstream_remote"
fi

git fetch "$UPSTREAM_REMOTE" --prune --tags

if git rev-parse --verify --quiet "refs/remotes/$UPSTREAM_REMOTE/$BASE_BRANCH" >/dev/null; then
git switch "$BASE_BRANCH"
git pull --ff-only "$UPSTREAM_REMOTE" "$BASE_BRANCH"
else
echo "Creating local ${BASE_BRANCH} from $UPSTREAM_REMOTE/${BASE_BRANCH}"
git switch --track -c "$BASE_BRANCH" "$UPSTREAM_REMOTE/$BASE_BRANCH"
fi

if [ -n "$current_branch" ] && [ "$current_branch" != "$BASE_BRANCH" ]; then
git switch "$current_branch"
echo
echo "Updated $BASE_BRANCH from $UPSTREAM_REMOTE/$BASE_BRANCH"
echo "Returned to your working branch: $current_branch"
else
echo
echo "Updated $BASE_BRANCH from $UPSTREAM_REMOTE/$BASE_BRANCH"
echo "Working branch remains checked out: $BASE_BRANCH"
fi
}

show_help() {
cat <<'EOF'
Usage:
scripts/contributor-workspace.sh status
scripts/contributor-workspace.sh sync-main

Environment:
UPSTREAM_REMOTE Remote that points to https://github.com/vllm-project/vllm (default: upstream)
BASE_BRANCH Base branch to track and sync (default: main)
EOF
}

case "${1:-status}" in
status)
print_status
;;
sync-main)
sync_main
;;
-h|--help|help)
show_help
;;
*)
echo "Unknown mode: ${1:-}"
show_help
exit 1
;;
esac
Loading