server: add /v1/responses support#1184
Conversation
|
I don't think these tests are well maintained. Just need to test in |
|
This is still draft? |
|
Tool calling doesn't work perfectly but I checked mainline llama.cpp and it has the same problems, so it has feature parity in terms of regular text completion for streaming and non-streaming test: Launch command: Test: |
|
The model name is defaulted to "gpt-3.5-turbo-0613" when I leave the model empty in the request message. Mainline returns the correct model name. Otherwise, they look good. |
Nice catch, thanks. If model is empty/missing now, it uses the loaded model name instead of defaulting to "gpt-3.5-turbo-0613". |
|
Yeah, it's a bug not related to this PR. I will fix later. |
|
It is fine to merge now. |
Summary
Testing
Notes
Curently as a draft as I do more test. Currently works via regular curl
Based off ggml-org/llama.cpp#18486
Edit: Used an AI agent to help create the draft, as this seems like a good use case of an implementation for it as it doesn't involve a new model or architecture which seems to yield bad results. I didn't see any contributing rules against it. If that's the case feel free to close it.