Skip to content

server: add OpenAI-compatible /v1/responses endpoint#214

Open
krystophny wants to merge 4 commits intowaybarrios:mainfrom
computor-org:feature/openai-responses-api
Open

server: add OpenAI-compatible /v1/responses endpoint#214
krystophny wants to merge 4 commits intowaybarrios:mainfrom
computor-org:feature/openai-responses-api

Conversation

@krystophny
Copy link
Copy Markdown

@krystophny krystophny commented Mar 24, 2026

Summary

Add an OpenAI-compatible /v1/responses endpoint for local coding-agent workflows.

Scope

  • text messages, function tools, function call outputs
  • streaming and non-streaming Responses output
  • previous_response_id replay for persisted replayable input items
  • developer/instructions normalization onto one leading system prompt
  • request-level chat_template_kwargs forwarding
  • LRU-bounded response store (max 1000 entries, oldest evicted)
  • reasoning input items converted to assistant messages for model context
  • reasoning configuration gracefully ignored (not supported, no crash)

What changed

  • server.py: full /v1/responses endpoint with streaming SSE
  • api/responses_models.py: Pydantic models for Responses API
  • _responses_store capped with OrderedDict LRU eviction (max 1000)
  • ResponseReasoningItem input converted to assistant messages (Codex sends these in multi-turn)
  • Reasoning config (request.reasoning) logged and ignored instead of crashing mid-stream

Files

  • vllm_mlx/server.py
  • vllm_mlx/api/responses_models.py
  • tests/test_responses_api.py

Validation

$ python -m pytest tests/test_responses_api.py -v
33 passed

@krystophny krystophny changed the title Add OpenAI Responses API core server: add OpenAI-compatible /v1/responses endpoint Mar 24, 2026
@krystophny krystophny force-pushed the feature/openai-responses-api branch from c7f7364 to ad483cc Compare March 24, 2026 12:26
@krystophny krystophny changed the title server: add OpenAI-compatible /v1/responses endpoint server: add non-streaming OpenAI-compatible /v1/responses endpoint Mar 24, 2026
@krystophny krystophny changed the title server: add non-streaming OpenAI-compatible /v1/responses endpoint server: add OpenAI-compatible /v1/responses endpoint Mar 24, 2026
@krystophny krystophny force-pushed the feature/openai-responses-api branch from df4f9af to 05838da Compare March 25, 2026 22:52
…onse_object

Replace unbounded dict with OrderedDict (max 1000 entries) to prevent
memory leaks from accumulated stored responses. Evict oldest entries
on insert when the cap is exceeded.

Remove the _stream_response_object function (190 lines) which was
never called anywhere in the codebase.
… API

Codex sends ResponseReasoningItem in the Responses API input array
during multi-turn conversations. Convert reasoning content to assistant
messages so the model sees its prior chain-of-thought.

Previously this raised an HTTPException, but since the streaming
response had already started, this caused a RuntimeError that broke
the SSE stream mid-flight.
The reasoning input item rejection (HTTPException 400) was added in
PR #28 but conflicts with the earlier fix that converts reasoning
items to assistant messages in _responses_input_to_chat_messages.
The rejection ran first, crashing the SSE stream mid-flight.

Also downgrade reasoning config rejection to a debug log since
raising inside the streaming generator causes "response already
started" crashes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant