server: add OpenAI Responses API compliance#19720
Open
riskywindow wants to merge 1 commit intoggml-org:masterfrom
Open
server: add OpenAI Responses API compliance#19720riskywindow wants to merge 1 commit intoggml-org:masterfrom
riskywindow wants to merge 1 commit intoggml-org:masterfrom
Conversation
Fix Response object schema, streaming event schema, and multi-turn conversation input parsing for /v1/responses endpoint. - Add 24 missing fields to Response object (tools, truncation, temperature, etc.) - Add sequence_number, output_index, content_index, item_id, logprobs to all streaming events - Add annotations and logprobs to content part objects - Fix completed_at and usage fields on response.created/in_progress events - Fix function_call output item structure (id vs call_id) - Fix multi-turn input parsing to handle both OutputMessage and AssistantMessageItemParam Passes 5/6 compliance tests (Image Input requires mmproj, not a code issue). Fixes ggml-org#19138
4 tasks
This was referenced Mar 24, 2026
4 tasks
|
NOTE: It would seem with Hugging Face supporting the Open Reponses standard: and with GGML joining Hugging Face: that one would expect llama.cpp to adopt the Open Responses standard? Is this not ready to be reviewed because it does not yet pass all of the Open Responses compliance tests? Otherwise, is this just a matter of resources? |
krystophny
added a commit
to krystophny/llama.cpp
that referenced
this pull request
Mar 30, 2026
Codex CLI compatibility: - Skip non-function tool types (web_search, code_interpreter) - Merge developer/system messages into position 0 for Qwen templates - Strip Responses-only request keys (store, include, prompt_cache_key) - output_text convenience field in streaming and non-streaming responses Responses API compliance (ideas from ggml-org#19720 by riskywindow, adapted): - Add 24 missing Response object fields per OpenAI spec - Fix function_call id/call_id field mapping - Add sequence_number, output_index, content_index to streaming events - Accept input_text type and EasyInputMessage for multi-turn input Verified: codex -p local and codex -p fast work against local llama.cpp with Qwen3.5 models including native tool calling. Refs: ggml-org#19138, ggml-org#19720
krystophny
added a commit
to krystophny/llama.cpp
that referenced
this pull request
Mar 30, 2026
Codex CLI compatibility: - Skip non-function tool types (web_search, code_interpreter) - Merge developer/system messages into position 0 for Qwen templates - Strip Responses-only request keys (store, include, prompt_cache_key) - output_text convenience field in streaming and non-streaming responses Responses API compliance (ideas from ggml-org#19720 by riskywindow, adapted): - Add 24 missing Response object fields per OpenAI spec - Fix function_call id/call_id field mapping - Add sequence_number, output_index, content_index to streaming events - Accept input_text type and EasyInputMessage for multi-turn input Verified: codex -p local and codex -p fast work against local llama.cpp with Qwen3.5 models including native tool calling. Refs: ggml-org#19138, ggml-org#19720
8 tasks
krystophny
added a commit
to krystophny/llama.cpp
that referenced
this pull request
Mar 30, 2026
Codex CLI compatibility: - Skip non-function tool types (web_search, code_interpreter) - Merge developer/system messages into position 0 for Qwen templates - Strip Responses-only request keys (store, include, prompt_cache_key) - Restore refusal content type handling Responses API compliance (ideas from ggml-org#19720 by riskywindow, adapted): - Add 24 missing Response object fields per OpenAI spec - Fix function_call id/call_id field mapping - Add sequence_number, output_index, content_index to ALL streaming events - Full response object in response.created/in_progress events - Accept input_text type and EasyInputMessage for multi-turn input - output_text convenience field, output_tokens_details 14 pytest tests, E2E tested with async OpenAI SDK and Codex CLI. Refs: ggml-org#19138, ggml-org#19720, ggml-org#21174
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Fix Response object schema, streaming event schema, and multi-turn conversation input parsing for /v1/responses endpoint.
Passes 5/6 compliance tests (Image Input requires mmproj, not a code issue).
Fixes #19138
Make sure to read the contributing guidelines before submitting a PR