server: add OpenAI Responses API compliance by riskywindow · Pull Request #19720 · ggml-org/llama.cpp

riskywindow · 2026-02-19T02:45:13Z

Fix Response object schema, streaming event schema, and multi-turn conversation input parsing for /v1/responses endpoint.

Add 24 missing fields to Response object (tools, truncation, temperature, etc.)
Add sequence_number, output_index, content_index, item_id, logprobs to all streaming events
Add annotations and logprobs to content part objects
Fix completed_at and usage fields on response.created/in_progress events
Fix function_call output item structure (id vs call_id)
Fix multi-turn input parsing to handle both OutputMessage and AssistantMessageItemParam

Passes 5/6 compliance tests (Image Input requires mmproj, not a code issue).

Fixes #19138

Make sure to read the contributing guidelines before submitting a PR

Fix Response object schema, streaming event schema, and multi-turn conversation input parsing for /v1/responses endpoint. - Add 24 missing fields to Response object (tools, truncation, temperature, etc.) - Add sequence_number, output_index, content_index, item_id, logprobs to all streaming events - Add annotations and logprobs to content part objects - Fix completed_at and usage fields on response.created/in_progress events - Fix function_call output item structure (id vs call_id) - Fix multi-turn input parsing to handle both OutputMessage and AssistantMessageItemParam Passes 5/6 compliance tests (Image Input requires mmproj, not a code issue). Fixes ggml-org#19138

TheNexter · 2026-03-08T21:22:31Z

Very important merge request for n8n agent to work using open weight model.

Latest merged version :

With this patch (Working perfectly):

bartlettroscoe · 2026-03-25T21:41:35Z

NOTE: It would seem with Hugging Face supporting the Open Reponses standard:

https://huggingface.co/blog/open-responses

and with GGML joining Hugging Face:

ggml.ai joins Hugging Face to ensure the long-term progress of Local AI #19759

that one would expect llama.cpp to adopt the Open Responses standard?

Is this not ready to be reviewed because it does not yet pass all of the Open Responses compliance tests? Otherwise, is this just a matter of resources?

Codex CLI compatibility: - Skip non-function tool types (web_search, code_interpreter) - Merge developer/system messages into position 0 for Qwen templates - Strip Responses-only request keys (store, include, prompt_cache_key) - output_text convenience field in streaming and non-streaming responses Responses API compliance (ideas from ggml-org#19720 by riskywindow, adapted): - Add 24 missing Response object fields per OpenAI spec - Fix function_call id/call_id field mapping - Add sequence_number, output_index, content_index to streaming events - Accept input_text type and EasyInputMessage for multi-turn input Verified: codex -p local and codex -p fast work against local llama.cpp with Qwen3.5 models including native tool calling. Refs: ggml-org#19138, ggml-org#19720

Codex CLI compatibility: - Skip non-function tool types (web_search, code_interpreter) - Merge developer/system messages into position 0 for Qwen templates - Strip Responses-only request keys (store, include, prompt_cache_key) - Restore refusal content type handling Responses API compliance (ideas from ggml-org#19720 by riskywindow, adapted): - Add 24 missing Response object fields per OpenAI spec - Fix function_call id/call_id field mapping - Add sequence_number, output_index, content_index to ALL streaming events - Full response object in response.created/in_progress events - Accept input_text type and EasyInputMessage for multi-turn input - output_text convenience field, output_tokens_details 14 pytest tests, E2E tested with async OpenAI SDK and Codex CLI. Refs: ggml-org#19138, ggml-org#19720, ggml-org#21174

riskywindow requested review from ggerganov and ngxson as code owners February 19, 2026 02:45

github-actions bot added examples server labels Feb 19, 2026

aittalam mentioned this pull request Feb 25, 2026

Feature Request: OpenResponses API mozilla-ai/llamafile#869

Open

4 tasks

bartlettroscoe mentioned this pull request Mar 25, 2026

Feature Request: Support OpenAI Responses API (/v1/responses) in llama.cpp server #19138

Open

4 tasks

bartlettroscoe mentioned this pull request Mar 25, 2026

Performance of newer versions of codex with gpt-oss has fallen off a cliff openai/codex#8272

Open

krystophny mentioned this pull request Mar 30, 2026

server: improve Responses API compliance and Codex CLI compatibility #21174

Draft

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: add OpenAI Responses API compliance#19720

server: add OpenAI Responses API compliance#19720
riskywindow wants to merge 1 commit intoggml-org:masterfrom
riskywindow:openresponses-compliance

riskywindow commented Feb 19, 2026

Uh oh!

TheNexter commented Mar 8, 2026

Uh oh!

bartlettroscoe commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

riskywindow commented Feb 19, 2026

Uh oh!

TheNexter commented Mar 8, 2026

Uh oh!

bartlettroscoe commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants