Skip to content

server: add OpenAI Responses API compliance#19720

Open
riskywindow wants to merge 1 commit intoggml-org:masterfrom
riskywindow:openresponses-compliance
Open

server: add OpenAI Responses API compliance#19720
riskywindow wants to merge 1 commit intoggml-org:masterfrom
riskywindow:openresponses-compliance

Conversation

@riskywindow
Copy link
Copy Markdown

Fix Response object schema, streaming event schema, and multi-turn conversation input parsing for /v1/responses endpoint.

  • Add 24 missing fields to Response object (tools, truncation, temperature, etc.)
  • Add sequence_number, output_index, content_index, item_id, logprobs to all streaming events
  • Add annotations and logprobs to content part objects
  • Fix completed_at and usage fields on response.created/in_progress events
  • Fix function_call output item structure (id vs call_id)
  • Fix multi-turn input parsing to handle both OutputMessage and AssistantMessageItemParam

Passes 5/6 compliance tests (Image Input requires mmproj, not a code issue).

Fixes #19138

Make sure to read the contributing guidelines before submitting a PR

Fix Response object schema, streaming event schema, and multi-turn
conversation input parsing for /v1/responses endpoint.

- Add 24 missing fields to Response object (tools, truncation, temperature, etc.)
- Add sequence_number, output_index, content_index, item_id, logprobs to all streaming events
- Add annotations and logprobs to content part objects
- Fix completed_at and usage fields on response.created/in_progress events
- Fix function_call output item structure (id vs call_id)
- Fix multi-turn input parsing to handle both OutputMessage and AssistantMessageItemParam

Passes 5/6 compliance tests (Image Input requires mmproj, not a code issue).

Fixes ggml-org#19138
@TheNexter
Copy link
Copy Markdown

Very important merge request for n8n agent to work using open weight model.

Latest merged version :
image

With this patch (Working perfectly):
image

@bartlettroscoe
Copy link
Copy Markdown

NOTE: It would seem with Hugging Face supporting the Open Reponses standard:

and with GGML joining Hugging Face:

that one would expect llama.cpp to adopt the Open Responses standard?

Is this not ready to be reviewed because it does not yet pass all of the Open Responses compliance tests? Otherwise, is this just a matter of resources?

krystophny added a commit to krystophny/llama.cpp that referenced this pull request Mar 30, 2026
Codex CLI compatibility:
- Skip non-function tool types (web_search, code_interpreter)
- Merge developer/system messages into position 0 for Qwen templates
- Strip Responses-only request keys (store, include, prompt_cache_key)
- output_text convenience field in streaming and non-streaming responses

Responses API compliance (ideas from ggml-org#19720 by riskywindow, adapted):
- Add 24 missing Response object fields per OpenAI spec
- Fix function_call id/call_id field mapping
- Add sequence_number, output_index, content_index to streaming events
- Accept input_text type and EasyInputMessage for multi-turn input

Verified: codex -p local and codex -p fast work against local
llama.cpp with Qwen3.5 models including native tool calling.

Refs: ggml-org#19138, ggml-org#19720
krystophny added a commit to krystophny/llama.cpp that referenced this pull request Mar 30, 2026
Codex CLI compatibility:
- Skip non-function tool types (web_search, code_interpreter)
- Merge developer/system messages into position 0 for Qwen templates
- Strip Responses-only request keys (store, include, prompt_cache_key)
- output_text convenience field in streaming and non-streaming responses

Responses API compliance (ideas from ggml-org#19720 by riskywindow, adapted):
- Add 24 missing Response object fields per OpenAI spec
- Fix function_call id/call_id field mapping
- Add sequence_number, output_index, content_index to streaming events
- Accept input_text type and EasyInputMessage for multi-turn input

Verified: codex -p local and codex -p fast work against local
llama.cpp with Qwen3.5 models including native tool calling.

Refs: ggml-org#19138, ggml-org#19720
krystophny added a commit to krystophny/llama.cpp that referenced this pull request Mar 30, 2026
Codex CLI compatibility:
- Skip non-function tool types (web_search, code_interpreter)
- Merge developer/system messages into position 0 for Qwen templates
- Strip Responses-only request keys (store, include, prompt_cache_key)
- Restore refusal content type handling

Responses API compliance (ideas from ggml-org#19720 by riskywindow, adapted):
- Add 24 missing Response object fields per OpenAI spec
- Fix function_call id/call_id field mapping
- Add sequence_number, output_index, content_index to ALL streaming events
- Full response object in response.created/in_progress events
- Accept input_text type and EasyInputMessage for multi-turn input
- output_text convenience field, output_tokens_details

14 pytest tests, E2E tested with async OpenAI SDK and Codex CLI.

Refs: ggml-org#19138, ggml-org#19720, ggml-org#21174
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Support OpenAI Responses API (/v1/responses) in llama.cpp server

3 participants