feat(responses): stateless multi-turn via encrypted_content state carrier (RFC #26934) by will-deines · Pull Request #35740 · vllm-project/vllm

will-deines · 2026-03-02T12:07:28Z

Related Issues, PRs, and RFCs

Directly Addressed by This PR

#	Title	Status	How This PR Relates
#26934	[RFC] Separating State & Providing Flexibility for serving ResponsesAPI	Open	Primary motivation — implements the stateless inference path requested by @qandrew (Meta) and proposed by @grs
#33089	[Feature] Support multi-turn conversation for OpenAI Response API	Open	Direct fix — users with OpenCode/Codex CLI agents fail on turn 2 because `previous_response_id` requires `VLLM_ENABLE_RESPONSES_API_STORE=1`; this PR provides a store-free alternative
#34738	Fix memory leak in Responses API store (LRU eviction)	Open	Superseded — stateless via `encrypted_content` avoids server-side storage entirely, eliminating the memory leak root cause rather than bounding it

Related RFCs

#	Title	Status	Design Decision
#32850	[RFC] Clarify policy for Open Responses API extensions in vLLM	Open	We reuse the existing `encrypted_content` field from the OpenAI spec (no new protocol extensions), aligning with the conservative extension policy
#33381	[RFC] Align with the openresponses.org spec	Open	This approach adds zero new fields to the wire protocol — `encrypted_content` and `previous_response` are both existing OpenAI spec fields

Companion PR

#	Title	Status	Relevance
#35874	feat(responses): pluggable ResponseStore abstraction	Open	Depends on this PR — extracts the in-memory store into a pluggable ABC; together these serve RFC #26934's two personas (stateless for production, pluggable store for researchers/small-scale)

Decisions We Made That Can Be Debated

1. HMAC signing vs. actual encryption of conversation history

What we chose: The state carrier uses HMAC-SHA256 for tamper detection. The conversation history is base64-encoded but not encrypted — anyone who intercepts the encrypted_content field can decode and read it.

Alternative: Use authenticated encryption (AES-GCM or similar) so the history is both tamper-proof and confidential. The field is called encrypted_content after all.

Why we chose this: The field name comes from the OpenAI spec, not from us — we use it as an opaque signed blob consistent with the spec's intent. Real encryption adds key management complexity (key rotation, IV generation, padding) that isn't justified for the initial implementation. The content travels over TLS between client and server, so in-transit confidentiality is already handled. The RFC (#26934) scoped encryption as out-of-scope for the initial implementation.

What reviewers might disagree with: Users may assume encrypted_content means encrypted. If the response is logged, cached, or stored client-side, the conversation history is readable. A follow-up could add optional encryption behind a flag.

2. Single carrier (full Harmony history) vs. per-item state

What we chose: One synthetic ReasoningItem carries the full Harmony message history as a single signed blob. The carrier is appended to the response output and filtered out in utils.py before messages reach the LLM.

Alternative: Attach state to each reasoning item individually — each ReasoningItem carries its own context, and the history is reconstructed by collecting all items.

Why we chose this: Tool-call metadata (raised by @alecsolder in #26934) — tool call results and assistant metadata aren't expressible in per-item form but are captured in the Harmony message list. A single carrier is also simpler to implement, verify, and debug. The full message list is what the model actually needs on the next turn.

What reviewers might disagree with: The carrier grows linearly with conversation length. For very long conversations, this could become large. A per-item approach or delta-based encoding could be more efficient, but adds significant complexity.

3. Reuse `encrypted_content` field vs. a new vLLM-specific field

What we chose: We reuse the existing encrypted_content field on ResponseReasoningItem from the OpenAI spec, and the existing previous_response field on the request. Zero new wire protocol fields.

Alternative: Add a vLLM-specific field (e.g., vllm_state_carrier) or use the OpenResponses extension mechanism proposed in #33381.

Why we chose this: RFCs #32850 and #33381 both emphasize conservative extensions and alignment with existing specs. encrypted_content is already defined as an opaque blob for platform use — our usage is consistent with that intent. Clients using the standard OpenAI SDK can use this feature without any SDK modifications. @DanielMe's proposal in #32850 supports allowing extensions "when there is a documented need" following existing patterns.

4. Per-process random key (default) vs. requiring explicit key configuration

What we chose: When VLLM_RESPONSES_STATE_SIGNING_KEY is not set, a per-process random key is generated with a warning. This means stateless multi-turn works out of the box for single-node dev, but carriers are invalidated on restart and incompatible across nodes.

Alternative: Require the env var and fail hard if it's not set — forcing users to think about key management upfront.

Why we chose this: The zero-config default matches vLLM's general philosophy — things should work out of the box for the common single-node case. The warning makes the limitation visible. Production deployments with multi-node or restart requirements will naturally need to set the key, and the error messages guide them there.

What reviewers might disagree with: Silent key generation could lead to hard-to-debug issues in production (e.g., a rolling restart invalidates all in-flight carriers). A louder failure mode might be safer.

5. New `previous_response` field (full object) vs. extending `previous_response_id` for stateless

What we chose: A new vLLM-specific previous_response field that accepts the full ResponsesResponse object, with a model_validator enforcing mutual exclusion with previous_response_id. This is a protocol extension — the OpenAI spec only defines previous_response_id (a string for server-side lookup).

Alternative: Overload previous_response_id with a special sentinel or encoded value, or embed the carrier in a separate field alongside previous_response_id.

Why we chose this: previous_response_id implies a server-side lookup by design — overloading it for stateless use would be confusing. A separate field makes the two paths self-documenting: previous_response_id means "look it up in the store," previous_response means "here's the full response, no store needed." The mutual exclusion validator makes it impossible to mix the two.

What reviewers might disagree with: This is a vLLM-specific extension to the wire protocol, which #32850 and #33381 argue should be conservative. However, the field is purely additive (existing clients are unaffected) and the alternative — overloading previous_response_id — would be more confusing. If OpenResponses or the OpenAI spec adds a similar field in the future, we can align with it.

6. Stateless path as additive (keep the store) vs. replacing it

What we chose: The stateless path is purely additive. The existing previous_response_id + VLLM_ENABLE_RESPONSES_API_STORE=1 path is completely unchanged. Both paths coexist.

Alternative: Remove the in-memory store entirely and force all multi-turn through the stateless path.

Why we chose this: RFC #26934 identifies two user personas — production deployments that want stateless operation, and researchers/small-scale users who want an all-in-one server with state. Removing the store would break the second persona. The companion PR (#35874) makes the store pluggable for production use cases that need server-side state (e.g., background mode, retrieve_responses).

What reviewers might disagree with: Keeping both paths means more code to maintain and test. If the stateless path proves sufficient for most use cases, the store path could be deprecated in a future PR.

Summary

Implements stateless multi-turn Responses API conversations without server-side storage, using the existing encrypted_content field on ResponseReasoningItem as the state carrier. Proposed by @grs in #26934.

The three in-process dicts (response_store, msg_store, event_store) marked # HACK / # FIXME in serving.py are disabled by default because they leak memory, lose state on restart, and are incompatible with multi-node deployments. This PR provides a production-ready alternative for the multi-turn use case.

Design

Turn 1:  client → vLLM  { store: false, include: ["reasoning.encrypted_content"], input: "..." }
         vLLM  → client  { output: [...real items..., ReasoningItem(encrypted_content="vllm:1:<b64>:<hmac>")] }

Turn 2:  client → vLLM  { store: false, previous_response: <full Turn 1 response>, input: "..." }
         vLLM  extracts encrypted_content → verifies HMAC → deserialises Harmony history → no store touched

Wire format: vllm:1:<base64(json(messages))>:<hmac-sha256-hex>

Content is signed, not encrypted (HMAC-SHA256). The field name comes from the OpenAI spec; vLLM uses it as an opaque signed blob for tamper detection, consistent with the spec's intent.
VLLM_RESPONSES_STATE_SIGNING_KEY (64-char hex) enables multi-node / restart-safe operation. Without it, a per-process random key is generated with a warning.
The state carrier ReasoningItem is filtered out in utils.py before messages reach the LLM — invisible to the model.

Files Changed

File	Change
`vllm/entrypoints/openai/responses/state.py`	NEW — `serialize_state` / `deserialize_state` / `is_state_carrier` / HMAC helpers
`vllm/entrypoints/openai/responses/protocol.py`	Add `previous_response: ResponsesResponse \| None` to `ResponsesRequest`; mutual-exclusion `model_validator` with `previous_response_id`; reject `background=True` with `previous_response`; `model_rebuild()` for forward ref
`vllm/entrypoints/openai/responses/serving.py`	Stateless prev-response resolution in `create_responses`; thread `prev_messages` through `_make_request` / `_make_request_with_harmony` / `_construct_input_messages_with_harmony`; inject state carrier in `responses_full_generator`; 400 guard when carrier missing from `previous_response`; 501 on `retrieve_responses` when store disabled; 404/400 on `cancel_responses` when store disabled
`vllm/entrypoints/openai/responses/utils.py`	`_construct_single_message_from_response_item` returns `None` for state-carrier items; filter `None` in `construct_chat_messages_with_tool_call`
`vllm/envs.py`	Register `VLLM_RESPONSES_STATE_SIGNING_KEY`
`tests/entrypoints/openai/responses/test_state.py`	NEW — 16 unit tests (round-trip, tamper detection, cross-key incompatibility, invalid hex, random key caching)
`tests/entrypoints/openai/responses/test_serving_stateless.py`	NEW — 15 unit tests (protocol validation, state carrier helpers, all error paths, utils skipping, cancel success path)

Test Results

tests/entrypoints/openai/responses/test_state.py             16/16 passed
tests/entrypoints/openai/responses/test_serving_stateless.py 15/15 passed

All pure-Python unit tests — no GPU required.

Usage

Turn 1:

resp1 = client.responses.create(
    model="...",
    input="My name is Alice",
    store=False,
    include=["reasoning.encrypted_content"],
)
# resp1.output[-1] == ReasoningItem(encrypted_content="vllm:1:...", id="rs_state_...")

Turn 2:

resp2 = client.responses.create(
    model="...",
    input="What is my name?",
    store=False,
    previous_response=resp1,   # full response object, not previous_response_id
    include=["reasoning.encrypted_content"],
)
# → "Your name is Alice."

Multi-node / restart-safe:

export VLLM_RESPONSES_STATE_SIGNING_KEY="$(openssl rand -hex 32)"
# same value across all vLLM nodes

Backward Compatibility

No breaking changes. The existing previous_response_id + VLLM_ENABLE_RESPONSES_API_STORE=1 path is completely unchanged. The new path requires explicit opt-in (store=false + include=["reasoning.encrypted_content"] + previous_response on turn 2).

previous_response_id with store disabled now returns a helpful 400 pointing users to the stateless path.

Test plan

Unit tests pass: pytest tests/entrypoints/openai/responses/test_state.py tests/entrypoints/openai/responses/test_serving_stateless.py -v
Pre-commit passes on all changed files
E2e with GPT-OSS model: verify stateless multi-turn produces correct conversation continuity
E2e with multi-node: verify VLLM_RESPONSES_STATE_SIGNING_KEY enables cross-node carrier compatibility

cc @qandrew @WoosukKwon @njhill @DanielMe @chaunceyjiang

@grs

Implements the @grs proposal for stateless multi-turn Responses API conversations without server-side storage, using the standard OpenAI `encrypted_content` field on a synthetic `ResponseReasoningItem` as the state carrier. **How it works:** 1. Client sets `store=false` + `include=["reasoning.encrypted_content"]` 2. vLLM serialises the Harmony message history into a signed blob (`vllm:1:<base64(json)>:<hmac-sha256>`) and appends it as a synthetic `ReasoningItem` to the response output 3. On the next turn the client passes `previous_response` (full response object) instead of `previous_response_id` 4. vLLM extracts, verifies, and deserialises the history from the carrier item — no in-memory store touched **No breaking changes.** Existing `previous_response_id` + store-enabled path is unchanged. New path requires explicit opt-in. **Multi-node safe:** set `VLLM_RESPONSES_STATE_SIGNING_KEY` to the same 64-char hex value on all nodes so tokens validate across replicas. Files changed: - `vllm/entrypoints/openai/responses/state.py` (new) — serialise / deserialise / HMAC-verify state carriers - `vllm/entrypoints/openai/responses/protocol.py` — add `previous_response` field + mutual-exclusion validator on `ResponsesRequest`; `model_rebuild()` for forward ref - `vllm/entrypoints/openai/responses/serving.py` — stateless prev-response resolution; thread `prev_messages` through `_make_request*`; inject state carrier in `responses_full_generator`; 501 guards on `retrieve_responses` / `cancel_responses` when store disabled - `vllm/entrypoints/openai/responses/utils.py` — skip state-carrier `ReasoningItem`s when reconstructing chat messages - `vllm/envs.py` — register `VLLM_RESPONSES_STATE_SIGNING_KEY` - `tests/entrypoints/openai/responses/test_state.py` (new) — 16 unit tests - `tests/entrypoints/openai/responses/test_serving_stateless.py` (new) — 14 unit tests Closes #26934 (partial — non-streaming only; streaming carrier TBD) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…400 on success, info log Per code review feedback: - Return 404 (not 501) when response_id has no matching background task, consistent with the stateful path's _make_not_found_error behavior - Return 400 BAD_REQUEST (not 501 NOT_IMPLEMENTED) when a task is found and cancelled — cancellation succeeded, but no stored response object can be returned; 501 was misleading - Use logger.info instead of logger.exception for asyncio.CancelledError, since cancellation is the expected outcome of this call path Update test to assert 404 for the unknown-id case. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…arrier guard, background invariant) Fixes three issues found in review: 1. cancel_responses stateless mode (gemini-code-assist P1): - Return 404 (not 501) for unknown response_id — consistent with stateful path - Return 400 BAD_REQUEST (not 501) on successful cancellation — task was cancelled but no stored response is available; 501 was misleading - Use logger.info (not logger.exception) for expected CancelledError 2. Missing carrier guard in create_responses (codex P1): - When previous_response has no state carrier and store is disabled, return 400 with a clear message instead of falling through to msg_store[id] KeyError → 500 3. background/store invariant in protocol validator (codex P2): - Reject background=True + previous_response at validation time rather than silently producing an unretrievable background response Tests: - Add test_cancel_without_store_active_task_returns_400: covers the success branch of the cancel fix; uses await asyncio.sleep(0) to start the task before cancelling (Python 3.12: unstarted tasks cancelled before first await never run their body) - Add test_previous_response_without_carrier_returns_400: regression for the KeyError → 500 bug - Add test_background_with_previous_response_raises: regression for the background/store invariant - Remove test_no_previous_response_preserves_store_true: passed regardless of our code (no new path exercised) - Remove test_full_stateless_roundtrip: duplicate of test_build_and_extract_roundtrip - Rename test_prev_messages_used_over_empty_msg_store → test_construct_input_messages_prepends_prev_msg (accurate name) All 31 tests pass.

github-actions · 2026-03-02T12:07:37Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request introduces a significant and well-designed feature for stateless multi-turn conversations in the Responses API, using a signed state carrier in the encrypted_content field. The implementation is thorough, with comprehensive new tests and careful consideration for security aspects like using hmac.compare_digest to prevent timing attacks. The changes are well-structured across new and existing modules.

I found one critical issue in the implementation for the non-Harmony code path, where the state carrier was being generated without including the assistant's response from the current turn. This would cause the conversation history to be incomplete for subsequent turns. I've provided a detailed comment and a suggested fix for this.

vllm/entrypoints/openai/responses/serving.py

mergify · 2026-03-02T12:12:02Z

Hi @will-deines, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1d1be94bcb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

vllm/entrypoints/openai/responses/serving.py

vllm/entrypoints/openai/responses/state.py

…ey validation Gemini critical — non-Harmony state carrier missing the assistant turn: carrier_messages was only the input messages, omitting the assistant response just generated. The next turn would see history without the last assistant message. Fix: append construct_chat_messages_with_tool_call(response.output) so the carrier contains the full turn (input + response). Codex P1 — carrierless previous_response check gated on enable_store: The guard 'if prev_messages_from_state is None and not self.enable_store' was too narrow. previous_response always means stateless path; a server restart with enable_store=True and empty msg_store would still KeyError. Fix: drop the 'not self.enable_store' condition. Codex P2 — any-length hex key accepted as signing key: bytes.fromhex('aa') produces 1 byte — a weak HMAC key. Fix: enforce len(key_bytes) >= 32 (64 hex chars) and raise ValueError if too short. Tests: - test_previous_response_without_carrier_store_enabled_returns_400: regression for P1 (store=True path also returns 400, not KeyError) - test_short_key_raises: regression for P2 (4-byte key raises) Run pre-commit --all-files; apply linter reformatting. Signed-off-by: Will Deines <will@garr.io>

mergify · 2026-03-02T13:27:40Z

Hi @will-deines, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

chaunceyjiang · 2026-03-03T02:42:15Z

vllm/entrypoints/openai/responses/protocol.py

+    # history from the encrypted_content state carrier embedded in the response
+    # output, so no server-side store is required.
+    # Cannot be set together with previous_response_id.
+    previous_response: "ResponsesResponse | None" = None


Is this field one of the standard fields in the Open Response API?

No, previous_response is a vLLM extension — the standard OpenAI API only has previous_response_id (a string ID that requires server-side storage).

This extension is needed because previous_response_id depends on vLLM's in-process store (the response_store / msg_store / event_store dicts marked # HACK / # FIXME), which leaks memory (#34738), loses state on restart, and is incompatible with production multi-node deployments (RFC #26934, raised by @qandrew at Meta).

The approach follows @grs's proposal in #26934: the client sends back the full response object; vLLM extracts the signed state from the existing encrypted_content field on a ReasoningItem — so the wire-format response itself uses zero new fields.

This is consistent with vLLM's existing extension policy (RFC #32850) — the Responses API already ships non-standard request fields (top_k, request_id, priority, etc.), and the Open Responses spec explicitly allows implementation extensions.

Mutual exclusion with previous_response_id is enforced via a Pydantic model_validator.

Extract the three in-memory dicts (response_store, msg_store, response_store_lock) from OpenAIServingResponses into a pluggable ResponseStore ABC with an InMemoryResponseStore default. Users can point VLLM_RESPONSES_STORE_BACKEND to a fully-qualified class name to swap in their own backend (Redis, Postgres, etc.) without patching vLLM. - Add ResponseStore ABC with 5 abstract methods + close() hook - Add InMemoryResponseStore wrapping current dict behavior with internal asyncio.Lock (removes external response_store_lock) - Add create_response_store() factory reading env var - Refactor ~15 call sites in serving.py to use self.store.* - Add VLLM_RESPONSES_STORE_BACKEND env var to envs.py - Update test helper to use InMemoryResponseStore - Add unit + integration tests for store and serving interactions Follows up on vllm-project#35740 (stateless multi-turn). Addresses RFC vllm-project#26934 (pluggable state backends) and supersedes vllm-project#34738 (LRU eviction).

Extract the three in-memory dicts (response_store, msg_store, response_store_lock) from OpenAIServingResponses into a pluggable ResponseStore ABC with an InMemoryResponseStore default. Users can point VLLM_RESPONSES_STORE_BACKEND to a fully-qualified class name to swap in their own backend (Redis, Postgres, etc.) without patching vLLM. - Add ResponseStore ABC with 5 abstract methods + close() hook - Add InMemoryResponseStore wrapping current dict behavior with internal asyncio.Lock (removes external response_store_lock) - Add create_response_store() factory reading env var - Refactor ~15 call sites in serving.py to use self.store.* - Add VLLM_RESPONSES_STORE_BACKEND env var to envs.py - Update test helper to use InMemoryResponseStore - Add unit + integration tests for store and serving interactions Follows up on vllm-project#35740 (stateless multi-turn). Addresses RFC vllm-project#26934 (pluggable state backends) and supersedes vllm-project#34738 (LRU eviction). Signed-off-by: Will Deines <will@garr.io>

Extract the three in-memory dicts (response_store, msg_store, response_store_lock) from OpenAIServingResponses into a pluggable ResponseStore ABC with an InMemoryResponseStore default. Users can point VLLM_RESPONSES_STORE_BACKEND to a fully-qualified class name to swap in their own backend (Redis, Postgres, etc.) without patching vLLM. - Add ResponseStore ABC with 5 abstract methods + close() hook - Add InMemoryResponseStore wrapping current dict behavior with internal asyncio.Lock (removes external response_store_lock) - Add create_response_store() factory reading env var - Refactor ~15 call sites in serving.py to use self.store.* - Add VLLM_RESPONSES_STORE_BACKEND env var to envs.py - Update test helper to use InMemoryResponseStore - Add unit + integration tests for store and serving interactions Follows up on vllm-project#35740 (stateless multi-turn). Addresses RFC vllm-project#26934 (pluggable state backends) and supersedes vllm-project#34738 (LRU eviction).

Extract the three in-memory dicts (response_store, msg_store, response_store_lock) from OpenAIServingResponses into a pluggable ResponseStore ABC with an InMemoryResponseStore default. Users can point VLLM_RESPONSES_STORE_BACKEND to a fully-qualified class name to swap in their own backend (Redis, Postgres, etc.) without patching vLLM. - Add ResponseStore ABC with 5 abstract methods + close() hook - Add InMemoryResponseStore wrapping current dict behavior with internal asyncio.Lock (removes external response_store_lock) - Add create_response_store() factory reading env var - Refactor ~15 call sites in serving.py to use self.store.* - Add VLLM_RESPONSES_STORE_BACKEND env var to envs.py - Update test helper to use InMemoryResponseStore - Add unit + integration tests for store and serving interactions Follows up on vllm-project#35740 (stateless multi-turn). Addresses RFC vllm-project#26934 (pluggable state backends) and supersedes vllm-project#34738 (LRU eviction). Signed-off-by: Will Deines <will@garr.io>

garrio-1 and others added 3 commits March 2, 2026 07:00

will-deines requested review from DarkLight1337, NickLucche, aarnphm, chaunceyjiang, robertgshaw2-redhat and russellb as code owners March 2, 2026 12:07

mergify bot added the frontend label Mar 2, 2026

gemini-code-assist bot reviewed Mar 2, 2026

View reviewed changes

vllm/entrypoints/openai/responses/serving.py Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Mar 2, 2026

View reviewed changes

vllm/entrypoints/openai/responses/serving.py Outdated Show resolved Hide resolved

vllm/entrypoints/openai/responses/state.py Show resolved Hide resolved

chaunceyjiang reviewed Mar 3, 2026

View reviewed changes

fix: remove unused TYPE_CHECKING import to pass pre-commit ruff check

bdc8b36

will-deines mentioned this pull request Mar 3, 2026

feat(responses): pluggable ResponseStore abstraction #35874

Closed

4 tasks

Merge branch 'main' into feat/stateless-responses-encrypted-content

da69322

will-deines closed this Mar 3, 2026

This was referenced Mar 3, 2026

feat(responses): pluggable ResponseStore abstraction #35900

Closed

feat(responses): stateless multi-turn via encrypted_content state carrier (RFC #26934) #35903

Open

feat(responses): pluggable ResponseStore abstraction #35905

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(responses): stateless multi-turn via encrypted_content state carrier (RFC #26934)#35740

feat(responses): stateless multi-turn via encrypted_content state carrier (RFC #26934)#35740
will-deines wants to merge 6 commits intovllm-project:mainfrom
will-deines:feat/stateless-responses-encrypted-content

will-deines commented Mar 2, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

mergify bot commented Mar 2, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Mar 2, 2026

Uh oh!

chaunceyjiang Mar 3, 2026

Uh oh!

will-deines Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

will-deines commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related Issues, PRs, and RFCs

Directly Addressed by This PR

Related RFCs

Companion PR

Decisions We Made That Can Be Debated

1. HMAC signing vs. actual encryption of conversation history

2. Single carrier (full Harmony history) vs. per-item state

3. Reuse encrypted_content field vs. a new vLLM-specific field

4. Per-process random key (default) vs. requiring explicit key configuration

5. New previous_response field (full object) vs. extending previous_response_id for stateless

6. Stateless path as additive (keep the store) vs. replacing it

Summary

Design

Files Changed

Test Results

Usage

Backward Compatibility

Test plan

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

mergify bot commented Mar 2, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Mar 2, 2026

Uh oh!

chaunceyjiang Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

will-deines Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

will-deines commented Mar 2, 2026 •

edited

Loading

3. Reuse `encrypted_content` field vs. a new vLLM-specific field

5. New `previous_response` field (full object) vs. extending `previous_response_id` for stateless