feat(responses): stateless multi-turn via encrypted_content state carrier (RFC #26934) by will-deines · Pull Request #35903 · vllm-project/vllm

will-deines · 2026-03-03T20:16:59Z

Recreated from #35740, which was closed when the fork was temporarily made private.

Related Issues, PRs, and RFCs

Directly Addressed by This PR

#	Title	Status	How This PR Relates
#26934	[RFC] Separating State & Providing Flexibility for serving ResponsesAPI	Open	Primary motivation — implements the stateless inference path requested by @qandrew (Meta) and proposed by @grs
#33089	[Feature] Support multi-turn conversation for OpenAI Response API	Open	Direct fix — users with OpenCode/Codex CLI agents fail on turn 2 because `previous_response_id` requires `VLLM_ENABLE_RESPONSES_API_STORE=1`; this PR provides a store-free alternative
#34738	Fix memory leak in Responses API store (LRU eviction)	Open	Superseded — stateless via `encrypted_content` avoids server-side storage entirely, eliminating the memory leak root cause rather than bounding it

Related RFCs

#	Title	Status	Design Decision
#32850	[RFC] Clarify policy for Open Responses API extensions in vLLM	Open	We reuse the existing `encrypted_content` field from the OpenAI spec (no new protocol extensions), aligning with the conservative extension policy
#33381	[RFC] Align with the openresponses.org spec	Open	This approach adds zero new fields to the wire protocol — `encrypted_content` and `previous_response` are both existing OpenAI spec fields

Companion PR

#	Title	Status	Relevance
#35874	feat(responses): pluggable ResponseStore abstraction	Open	Depends on this PR — extracts the in-memory store into a pluggable ABC; together these serve RFC #26934's two personas (stateless for production, pluggable store for researchers/small-scale)

Decisions We Made That Can Be Debated

1. HMAC signing vs. actual encryption of conversation history

What we chose: The state carrier uses HMAC-SHA256 for tamper detection. The conversation history is base64-encoded but not encrypted — anyone who intercepts the encrypted_content field can decode and read it.

Alternative: Use authenticated encryption (AES-GCM or similar) so the history is both tamper-proof and confidential. The field is called encrypted_content after all.

Why we chose this: The field name comes from the OpenAI spec, not from us — we use it as an opaque signed blob consistent with the spec's intent. Real encryption adds key management complexity (key rotation, IV generation, padding) that isn't justified for the initial implementation. The content travels over TLS between client and server, so in-transit confidentiality is already handled. The RFC (#26934) scoped encryption as out-of-scope for the initial implementation.

What reviewers might disagree with: Users may assume encrypted_content means encrypted. If the response is logged, cached, or stored client-side, the conversation history is readable. A follow-up could add optional encryption behind a flag.

2. Single carrier (full Harmony history) vs. per-item state

What we chose: One synthetic ReasoningItem carries the full Harmony message history as a single signed blob. The carrier is appended to the response output and filtered out in utils.py before messages reach the LLM.

Alternative: Attach state to each reasoning item individually — each ReasoningItem carries its own context, and the history is reconstructed by collecting all items.

Why we chose this: Tool-call metadata (raised by @alecsolder in #26934) — tool call results and assistant metadata aren't expressible in per-item form but are captured in the Harmony message list. A single carrier is also simpler to implement, verify, and debug. The full message list is what the model actually needs on the next turn.

What reviewers might disagree with: The carrier grows linearly with conversation length. For very long conversations, this could become large. A per-item approach or delta-based encoding could be more efficient, but adds significant complexity.

3. Reuse `encrypted_content` field vs. a new vLLM-specific field

What we chose: We reuse the existing encrypted_content field on ResponseReasoningItem from the OpenAI spec, and the existing previous_response field on the request. Zero new wire protocol fields.

Alternative: Add a vLLM-specific field (e.g., vllm_state_carrier) or use the OpenResponses extension mechanism proposed in #33381.

Why we chose this: RFCs #32850 and #33381 both emphasize conservative extensions and alignment with existing specs. encrypted_content is already defined as an opaque blob for platform use — our usage is consistent with that intent. Clients using the standard OpenAI SDK can use this feature without any SDK modifications. @DanielMe's proposal in #32850 supports allowing extensions "when there is a documented need" following existing patterns.

4. Per-process random key (default) vs. requiring explicit key configuration

What we chose: When VLLM_RESPONSES_STATE_SIGNING_KEY is not set, a per-process random key is generated with a warning. This means stateless multi-turn works out of the box for single-node dev, but carriers are invalidated on restart and incompatible across nodes.

Alternative: Require the env var and fail hard if it's not set — forcing users to think about key management upfront.

Why we chose this: The zero-config default matches vLLM's general philosophy — things should work out of the box for the common single-node case. The warning makes the limitation visible. Production deployments with multi-node or restart requirements will naturally need to set the key, and the error messages guide them there.

What reviewers might disagree with: Silent key generation could lead to hard-to-debug issues in production (e.g., a rolling restart invalidates all in-flight carriers). A louder failure mode might be safer.

5. New `previous_response` field (full object) vs. extending `previous_response_id` for stateless

What we chose: A new vLLM-specific previous_response field that accepts the full ResponsesResponse object, with a model_validator enforcing mutual exclusion with previous_response_id. This is a protocol extension — the OpenAI spec only defines previous_response_id (a string for server-side lookup).

Alternative: Overload previous_response_id with a special sentinel or encoded value, or embed the carrier in a separate field alongside previous_response_id.

Why we chose this: previous_response_id implies a server-side lookup by design — overloading it for stateless use would be confusing. A separate field makes the two paths self-documenting: previous_response_id means "look it up in the store," previous_response means "here's the full response, no store needed." The mutual exclusion validator makes it impossible to mix the two.

What reviewers might disagree with: This is a vLLM-specific extension to the wire protocol, which #32850 and #33381 argue should be conservative. However, the field is purely additive (existing clients are unaffected) and the alternative — overloading previous_response_id — would be more confusing. If OpenResponses or the OpenAI spec adds a similar field in the future, we can align with it.

6. Stateless path as additive (keep the store) vs. replacing it

What we chose: The stateless path is purely additive. The existing previous_response_id + VLLM_ENABLE_RESPONSES_API_STORE=1 path is completely unchanged. Both paths coexist.

Alternative: Remove the in-memory store entirely and force all multi-turn through the stateless path.

Why we chose this: RFC #26934 identifies two user personas — production deployments that want stateless operation, and researchers/small-scale users who want an all-in-one server with state. Removing the store would break the second persona. The companion PR (#35874) makes the store pluggable for production use cases that need server-side state (e.g., background mode, retrieve_responses).

What reviewers might disagree with: Keeping both paths means more code to maintain and test. If the stateless path proves sufficient for most use cases, the store path could be deprecated in a future PR.

Summary

Implements stateless multi-turn Responses API conversations without server-side storage, using the existing encrypted_content field on ResponseReasoningItem as the state carrier. Proposed by @grs in #26934.

The three in-process dicts (response_store, msg_store, event_store) marked # HACK / # FIXME in serving.py are disabled by default because they leak memory, lose state on restart, and are incompatible with multi-node deployments. This PR provides a production-ready alternative for the multi-turn use case.

Design

Turn 1:  client → vLLM  { store: false, include: ["reasoning.encrypted_content"], input: "..." }
         vLLM  → client  { output: [...real items..., ReasoningItem(encrypted_content="vllm:1:<b64>:<hmac>")] }

Turn 2:  client → vLLM  { store: false, previous_response: <full Turn 1 response>, input: "..." }
         vLLM  extracts encrypted_content → verifies HMAC → deserialises Harmony history → no store touched

Wire format: vllm:1:<base64(json(messages))>:<hmac-sha256-hex>

Content is signed, not encrypted (HMAC-SHA256). The field name comes from the OpenAI spec; vLLM uses it as an opaque signed blob for tamper detection, consistent with the spec's intent.
VLLM_RESPONSES_STATE_SIGNING_KEY (64-char hex) enables multi-node / restart-safe operation. Without it, a per-process random key is generated with a warning.
The state carrier ReasoningItem is filtered out in utils.py before messages reach the LLM — invisible to the model.

Files Changed

File	Change
`vllm/entrypoints/openai/responses/state.py`	NEW — `serialize_state` / `deserialize_state` / `is_state_carrier` / HMAC helpers
`vllm/entrypoints/openai/responses/protocol.py`	Add `previous_response: ResponsesResponse \| None` to `ResponsesRequest`; mutual-exclusion `model_validator` with `previous_response_id`; reject `background=True` with `previous_response`; `model_rebuild()` for forward ref
`vllm/entrypoints/openai/responses/serving.py`	Stateless prev-response resolution in `create_responses`; thread `prev_messages` through `_make_request` / `_make_request_with_harmony` / `_construct_input_messages_with_harmony`; inject state carrier in `responses_full_generator`; try/except `ValueError` around state extraction returning 400 on HMAC mismatch; 400 guard when carrier missing from `previous_response`; 501 on `retrieve_responses` when store disabled; 404/400 on `cancel_responses` when store disabled; avoid duplicate assistant turn on stateless path by passing `prev_response_output=None` when `prev_messages` is set
`vllm/entrypoints/openai/responses/utils.py`	`_construct_single_message_from_response_item` returns `None` for state-carrier items; filter `None` in `construct_chat_messages_with_tool_call`; fix content/summary precedence regression (content-first, summary as fallback with warning)
`vllm/envs.py`	Register `VLLM_RESPONSES_STATE_SIGNING_KEY`
`tests/entrypoints/openai/responses/test_state.py`	NEW — 17 unit tests (round-trip, tamper detection, cross-key incompatibility, invalid hex, random key caching)
`tests/entrypoints/openai/responses/test_serving_stateless.py`	NEW — 19 unit tests (protocol validation, state carrier helpers, all error paths, utils skipping, cancel success path, duplicate assistant turn regression, tampered carrier → 400 regression)

Test Results

tests/entrypoints/openai/responses/test_state.py             17/17 passed
tests/entrypoints/openai/responses/test_serving_stateless.py  19/19 passed

All pure-Python unit tests — no GPU required.

Usage

Turn 1:

resp1 = client.responses.create(
    model="...",
    input="My name is Alice",
    store=False,
    include=["reasoning.encrypted_content"],
)
# resp1.output[-1] == ReasoningItem(encrypted_content="vllm:1:...", id="rs_state_...")

Turn 2:

resp2 = client.responses.create(
    model="...",
    input="What is my name?",
    store=False,
    previous_response=resp1,   # full response object, not previous_response_id
    include=["reasoning.encrypted_content"],
)
# → "Your name is Alice."

Multi-node / restart-safe:

export VLLM_RESPONSES_STATE_SIGNING_KEY="$(openssl rand -hex 32)"
# same value across all vLLM nodes

Backward Compatibility

No breaking changes. The existing previous_response_id + VLLM_ENABLE_RESPONSES_API_STORE=1 path is completely unchanged. The new path requires explicit opt-in (store=false + include=["reasoning.encrypted_content"] + previous_response on turn 2).

previous_response_id with store disabled now returns a helpful 400 pointing users to the stateless path.

Test plan

Unit tests pass: pytest tests/entrypoints/openai/responses/test_state.py tests/entrypoints/openai/responses/test_serving_stateless.py -v
Pre-commit passes on all changed files
E2e with GPT-OSS model: verify stateless multi-turn produces correct conversation continuity
E2e with multi-node: verify VLLM_RESPONSES_STATE_SIGNING_KEY enables cross-node carrier compatibility

cc @qandrew @WoosukKwon @njhill @DanielMe @chaunceyjiang

gemini-code-assist

Code Review

This pull request introduces a well-designed stateless multi-turn conversation mechanism for the Responses API. By using an HMAC-signed state carrier in the encrypted_content field, it effectively addresses the memory leak and multi-node deployment issues of the previous in-memory store. The implementation is robust, with new logic for state serialization, deserialization, and validation. The changes are well-tested with a comprehensive suite of new unit tests covering protocol validation, error paths, and state carrier logic. I have one suggestion to improve the robustness of message store access.

gemini-code-assist · 2026-03-03T20:20:26Z

vllm/entrypoints/openai/responses/serving.py

+                    for m in prev_messages
+                ]
+            else:
+                prev_msgs = self.msg_store[prev_response.id]


Direct dictionary access self.msg_store[prev_response.id] can raise a KeyError if the ID is not found, leading to an unhandled exception and a 500 Internal Server Error. This could happen if response_store and msg_store become inconsistent.

For improved robustness and to align with the non-Harmony path in _make_request which uses .get(), consider changing this to self.msg_store.get(prev_response.id) and then handling the potential None value for prev_msgs in the subsequent logic to prevent a TypeError.

Good catch — switched to .get() with an explicit ValueError when the ID is missing. This gives a clear error message instead of an unhandled KeyError. Fixed in 0d68f9d.

mergify · 2026-03-05T17:27:29Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @will-deines.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@grs

…-project#26934) Implements the @grs proposal for stateless multi-turn Responses API conversations without server-side storage, using the standard OpenAI `encrypted_content` field on a synthetic `ResponseReasoningItem` as the state carrier. **How it works:** 1. Client sets `store=false` + `include=["reasoning.encrypted_content"]` 2. vLLM serialises the Harmony message history into a signed blob (`vllm:1:<base64(json)>:<hmac-sha256>`) and appends it as a synthetic `ReasoningItem` to the response output 3. On the next turn the client passes `previous_response` (full response object) instead of `previous_response_id` 4. vLLM extracts, verifies, and deserialises the history from the carrier item — no in-memory store touched **No breaking changes.** Existing `previous_response_id` + store-enabled path is unchanged. New path requires explicit opt-in. **Multi-node safe:** set `VLLM_RESPONSES_STATE_SIGNING_KEY` to the same 64-char hex value on all nodes so tokens validate across replicas. Files changed: - `vllm/entrypoints/openai/responses/state.py` (new) — serialise / deserialise / HMAC-verify state carriers - `vllm/entrypoints/openai/responses/protocol.py` — add `previous_response` field + mutual-exclusion validator on `ResponsesRequest`; `model_rebuild()` for forward ref - `vllm/entrypoints/openai/responses/serving.py` — stateless prev-response resolution; thread `prev_messages` through `_make_request*`; inject state carrier in `responses_full_generator`; 501 guards on `retrieve_responses` / `cancel_responses` when store disabled - `vllm/entrypoints/openai/responses/utils.py` — skip state-carrier `ReasoningItem`s when reconstructing chat messages - `vllm/envs.py` — register `VLLM_RESPONSES_STATE_SIGNING_KEY` - `tests/entrypoints/openai/responses/test_state.py` (new) — 16 unit tests - `tests/entrypoints/openai/responses/test_serving_stateless.py` (new) — 14 unit tests Closes vllm-project#26934 (partial — non-streaming only; streaming carrier TBD) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Will Deines <will@garr.io>

…400 on success, info log Per code review feedback: - Return 404 (not 501) when response_id has no matching background task, consistent with the stateful path's _make_not_found_error behavior - Return 400 BAD_REQUEST (not 501 NOT_IMPLEMENTED) when a task is found and cancelled — cancellation succeeded, but no stored response object can be returned; 501 was misleading - Use logger.info instead of logger.exception for asyncio.CancelledError, since cancellation is the expected outcome of this call path Update test to assert 404 for the unknown-id case. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Will Deines <will@garr.io>

…arrier guard, background invariant) Fixes three issues found in review: 1. cancel_responses stateless mode (gemini-code-assist P1): - Return 404 (not 501) for unknown response_id — consistent with stateful path - Return 400 BAD_REQUEST (not 501) on successful cancellation — task was cancelled but no stored response is available; 501 was misleading - Use logger.info (not logger.exception) for expected CancelledError 2. Missing carrier guard in create_responses (codex P1): - When previous_response has no state carrier and store is disabled, return 400 with a clear message instead of falling through to msg_store[id] KeyError → 500 3. background/store invariant in protocol validator (codex P2): - Reject background=True + previous_response at validation time rather than silently producing an unretrievable background response Tests: - Add test_cancel_without_store_active_task_returns_400: covers the success branch of the cancel fix; uses await asyncio.sleep(0) to start the task before cancelling (Python 3.12: unstarted tasks cancelled before first await never run their body) - Add test_previous_response_without_carrier_returns_400: regression for the KeyError → 500 bug - Add test_background_with_previous_response_raises: regression for the background/store invariant - Remove test_no_previous_response_preserves_store_true: passed regardless of our code (no new path exercised) - Remove test_full_stateless_roundtrip: duplicate of test_build_and_extract_roundtrip - Rename test_prev_messages_used_over_empty_msg_store → test_construct_input_messages_prepends_prev_msg (accurate name) All 31 tests pass. Signed-off-by: Will Deines <will@garr.io>

…ey validation Gemini critical — non-Harmony state carrier missing the assistant turn: carrier_messages was only the input messages, omitting the assistant response just generated. The next turn would see history without the last assistant message. Fix: append construct_chat_messages_with_tool_call(response.output) so the carrier contains the full turn (input + response). Codex P1 — carrierless previous_response check gated on enable_store: The guard 'if prev_messages_from_state is None and not self.enable_store' was too narrow. previous_response always means stateless path; a server restart with enable_store=True and empty msg_store would still KeyError. Fix: drop the 'not self.enable_store' condition. Codex P2 — any-length hex key accepted as signing key: bytes.fromhex('aa') produces 1 byte — a weak HMAC key. Fix: enforce len(key_bytes) >= 32 (64 hex chars) and raise ValueError if too short. Tests: - test_previous_response_without_carrier_store_enabled_returns_400: regression for P1 (store=True path also returns 400, not KeyError) - test_short_key_raises: regression for P2 (4-byte key raises) Run pre-commit --all-files; apply linter reformatting. Signed-off-by: Will Deines <will@garr.io>

Signed-off-by: Will Deines <will@garr.io>

…t state Signed-off-by: Will Deines <will@garr.io>

Signed-off-by: Will Deines <will@garr.io>

The non-streaming path caught GenerationError and called self._convert_generation_error_to_response() which doesn't exist on OpenAIServingResponses or its base class. The base class only has _convert_generation_error_to_streaming_response() for the streaming path. Non-streaming GenerationErrors are handled by the global FastAPI exception handler registered in api_server.py, so the explicit catch is unnecessary. Signed-off-by: Will Deines <will@garr.io>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a8a5379112

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-18T17:33:33Z

vllm/entrypoints/openai/responses/serving.py

+                carrier_messages = messages + construct_chat_messages_with_tool_call(
+                    response.output


Stop duplicating the last assistant turn in stateless prompts

When store=false + previous_response is used on non-Harmony models, this branch serializes messages and response.output into the carrier. On the next turn _make_request() passes that deserialized list as prev_msg, but construct_input_messages() still appends prev_response.output again, so the previous assistant/tool output is injected twice. Every stateless follow-up on the simple/parsable path will therefore send a different conversation than the original one, which can change model behavior and break tool-call continuation.

Useful? React with 👍 / 👎.

Good catch — fixed in 7c35d29. _make_request() now passes prev_response_output=None when prev_messages is not None (i.e. the stateless path), since the carrier already contains the full history including the assistant turn. Added regression tests in TestNoDuplicateAssistantTurn confirming no duplication.

chatgpt-codex-connector · 2026-03-18T17:33:33Z

vllm/entrypoints/openai/responses/serving.py

+        if prev_response is not None and request.previous_response is not None:
+            prev_messages_from_state = self._extract_state_from_response(prev_response)


Handle invalid state carriers as request errors

_extract_state_from_response() raises ValueError for HMAC mismatches or malformed carriers, but this call now happens before the preprocessing try/except. If a client retries after a restart, lands on a different replica, or sends a tampered blob, create_responses() will propagate an uncaught exception to FastAPI and return a 500 instead of the intended 4xx validation error. Because cross-instance mismatches are expected unless VLLM_RESPONSES_STATE_SIGNING_KEY is shared, this will surface as a production-facing failure mode for normal stateless deployments.

Useful? React with 👍 / 👎.

Good catch — fixed in 7c35d29. Wrapped _extract_state_from_response() in try/except ValueError returning a 400 with a message about tampering and VLLM_RESPONSES_STATE_SIGNING_KEY consistency. Added regression test in TestTamperedCarrierReturns400.

chatgpt-codex-connector · 2026-03-18T17:33:33Z

vllm/entrypoints/openai/responses/utils.py

+        if len(item.summary) == 1:
+            reasoning_content = item.summary[0].text
+        elif item.content and len(item.content) == 1:


Preserve reasoning content when summary is also present

This changes the existing precedence so a reasoning item with both content and a one-entry summary now uses the summary text instead of the actual content. That is a regression for providers/models that populate both fields: downstream chat reconstruction loses the full reasoning text, even though the repo’s existing tests document that content should win when both are present.

Useful? React with 👍 / 👎.

Good catch — fixed in 7c35d29. Restored content-first precedence: item.content is checked before item.summary, with summary used only as a fallback (with warning). The existing TestReasoningItemContentPriority tests now pass again.

- Fix duplicate assistant turn: pass prev_response_output=None when using stateless path (prev_messages already contains the output) - Handle ValueError from _extract_state_from_response: wrap in try/except returning 400 on HMAC mismatch instead of unhandled 500 - Fix content/summary precedence: check item.content before item.summary in reasoning items, restoring intended content-first behavior Signed-off-by: Will Deines <will@garr.io>

Signed-off-by: Will Deines <will@garr.io>

mergify · 2026-03-23T10:20:15Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @will-deines.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

The stateless encrypted_content carrier was using Pydantic's model_dump()/model_validate() to serialize/deserialize Harmony messages, but this produces dicts incompatible with the library's typed constructors. Switch to to_dict()/from_dict() so messages roundtrip correctly and remain renderable for completion. Signed-off-by: Will Deines <will@garr.io>

mergify bot added the frontend label Mar 3, 2026

gemini-code-assist bot reviewed Mar 3, 2026

View reviewed changes

will-deines pushed a commit to will-deines/vllm that referenced this pull request Mar 3, 2026

merge: stateless multi-turn (PR vllm-project#35903)

570fb07

will-deines force-pushed the feat/stateless-responses-encrypted-content branch from bddeedd to 589f29a Compare March 4, 2026 20:12

mergify bot added the needs-rebase label Mar 5, 2026

garrio-1 and others added 7 commits March 18, 2026 09:35

fix: remove unused TYPE_CHECKING import to pass pre-commit ruff check

7df1db4

Signed-off-by: Will Deines <will@garr.io>

fix: use .get() for msg_store access to avoid KeyError on inconsisten…

61a524c

…t state Signed-off-by: Will Deines <will@garr.io>

fix: resolve mypy type narrowing for msg_store.get() return

f5f030f

Signed-off-by: Will Deines <will@garr.io>

will-deines force-pushed the feat/stateless-responses-encrypted-content branch from 589f29a to f5f030f Compare March 18, 2026 13:37

mergify bot removed the needs-rebase label Mar 18, 2026

Merge branch 'main' into feat/stateless-responses-encrypted-content

d037cf1

will-deines mentioned this pull request Mar 18, 2026

feat(responses): pluggable ResponseStore abstraction #35905

Open

5 tasks

will-deines and others added 2 commits March 18, 2026 13:21

Merge branch 'main' into feat/stateless-responses-encrypted-content

0f543b4

will-deines marked this pull request as ready for review March 18, 2026 17:27

will-deines requested review from DarkLight1337, NickLucche, aarnphm, chaunceyjiang, robertgshaw2-redhat and russellb as code owners March 18, 2026 17:27

chatgpt-codex-connector bot reviewed Mar 18, 2026

View reviewed changes

mergify bot added the gpt-oss Related to GPT-OSS models label Mar 18, 2026

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Mar 18, 2026

github-project-automation bot added this to gpt-oss Issues & Enhancements Mar 18, 2026

will-deines force-pushed the feat/stateless-responses-encrypted-content branch from b56e429 to 87b43c6 Compare March 18, 2026 18:36

Merge upstream/main and resolve conflict in responses/utils.py

cfc8e64

Signed-off-by: Will Deines <will@garr.io>

mergify bot added the needs-rebase label Mar 23, 2026

will-deines mentioned this pull request Mar 24, 2026

[RFC][ResponsesAPI]: Separating State & Providing Flexibility for serving ResponsesAPI #26934

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(responses): stateless multi-turn via encrypted_content state carrier (RFC #26934)#35903

feat(responses): stateless multi-turn via encrypted_content state carrier (RFC #26934)#35903
will-deines wants to merge 13 commits intovllm-project:mainfrom
will-deines:feat/stateless-responses-encrypted-content

will-deines commented Mar 3, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 3, 2026

Uh oh!

will-deines Mar 3, 2026

Uh oh!

mergify bot commented Mar 5, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 18, 2026

Uh oh!

will-deines Mar 18, 2026

Uh oh!

chatgpt-codex-connector bot Mar 18, 2026

Uh oh!

will-deines Mar 18, 2026

Uh oh!

chatgpt-codex-connector bot Mar 18, 2026

Uh oh!

will-deines Mar 18, 2026

Uh oh!

mergify bot commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		carrier_messages = messages + construct_chat_messages_with_tool_call(
		response.output

		if prev_response is not None and request.previous_response is not None:
		prev_messages_from_state = self._extract_state_from_response(prev_response)

Uh oh!

Conversation

will-deines commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related Issues, PRs, and RFCs

Directly Addressed by This PR

Related RFCs

Companion PR

Decisions We Made That Can Be Debated

1. HMAC signing vs. actual encryption of conversation history

2. Single carrier (full Harmony history) vs. per-item state

3. Reuse encrypted_content field vs. a new vLLM-specific field

4. Per-process random key (default) vs. requiring explicit key configuration

5. New previous_response field (full object) vs. extending previous_response_id for stateless

6. Stateless path as additive (keep the store) vs. replacing it

Summary

Design

Files Changed

Test Results

Usage

Backward Compatibility

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

will-deines Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Mar 5, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

will-deines Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

will-deines Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

will-deines Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

will-deines commented Mar 3, 2026 •

edited

Loading

3. Reuse `encrypted_content` field vs. a new vLLM-specific field

5. New `previous_response` field (full object) vs. extending `previous_response_id` for stateless