feat: add /completion endpoint#275
Conversation
WalkthroughAdds a POST Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant C as Client
participant P as ProxyManager (Gin)
participant H as proxyOAIHandler
participant R as Responder/Upstream
Note over C,P: llama-server compatible route
C->>P: POST /completion { model, ... }
P->>H: mm middleware → proxyOAIHandler
H->>R: Forward completion request
R-->>H: JSON { response, usage, ... }
H-->>P: Response passthrough
P-->>C: 200 OK + JSON
rect rgb(230,245,255)
Note over P,H: New route wired to existing OAI proxy flow
end
sequenceDiagram
autonumber
participant C as Client
participant S as Simple Responder
C->>S: POST /completion
S-->>C: 200 OK { responseMessage, usage }
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (2)
🚧 Files skipped from review as they are similar to previous changes (1)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
README.md (1)
26-26: Clarify model requirement for llama-swap’s /completion.Unlike llama.cpp, llama-swap requires a "model" field in the request body to route the call. Document this to prevent 400s from clients that omit it.
- - `/completion` - for completion endpoint + - `/completion` - for completion endpoint (requires {"model": "<model_id>"} in the JSON body when used via llama-swap)misc/simple-responder/simple-responder.go (1)
156-168: Add streaming support to /completion for parity with llama.cpp behavior.Optional but useful for clients sending
?stream=true; mirrors your existing chat streaming path.-// llama-server compatibility: /completion -r.POST("/completion", func(c *gin.Context) { - c.Header("Content-Type", "application/json") - c.JSON(http.StatusOK, gin.H{ - "responseMessage": *responseMessage, - "usage": gin.H{ - "completion_tokens": 10, - "prompt_tokens": 25, - "total_tokens": 35, - }, - }) -}) +// llama-server compatibility: /completion +r.POST("/completion", func(c *gin.Context) { + // Support optional streaming like /v1/chat/completions + if c.Query("stream") == "true" { + c.Header("Content-Type", "text/event-stream") + c.Header("Cache-Control", "no-cache") + c.Header("Connection", "keep-alive") + c.Header("Transfer-Encoding", "chunked") + + // optional wait to simulate slower responses + if wait, err := time.ParseDuration(c.Query("wait")); err == nil { + time.Sleep(wait) + } + for i := 0; i < 10; i++ { + c.SSEvent("message", gin.H{ + "created": time.Now().Unix(), + "choices": []gin.H{{"index": 0, "text": "asdf", "finish_reason": nil}}, + }) + c.Writer.Flush() + } + c.SSEvent("message", gin.H{ + "usage": gin.H{"completion_tokens": 10, "prompt_tokens": 25, "total_tokens": 35}, + "timings": gin.H{"prompt_n": 25, "prompt_ms": 13, "predicted_n": 10, "predicted_ms": 17, "predicted_per_second": 10}, + }) + c.Writer.Flush() + c.SSEvent("message", "[DONE]") + c.Writer.Flush() + return + } + + c.Header("Content-Type", "application/json") + c.JSON(http.StatusOK, gin.H{ + "responseMessage": *responseMessage, + "usage": gin.H{ + "completion_tokens": 10, + "prompt_tokens": 25, + "total_tokens": 35, + }, + }) +})
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
README.md(1 hunks)misc/simple-responder/simple-responder.go(1 hunks)proxy/proxymanager.go(1 hunks)proxy/proxymanager_test.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
proxy/proxymanager_test.go (3)
proxy/config.go (3)
AddDefaultGroupToConfig(364-406)Config(149-169)ModelConfig(20-40)proxy/proxymanager.go (1)
New(47-130)proxy/process.go (1)
StopWaitForInflightRequest(36-36)
🪛 GitHub Actions: Linux CI
proxy/proxymanager_test.go
[error] 1-1: gofmt formatting check failed for proxy/proxymanager_test.go. Command that triggered the failure: gofmt -l . | grep -v 'event/.*_test.go' | wc -l. Run 'gofmt -w .' to fix formatting.
🔇 Additional comments (1)
proxy/proxymanager.go (1)
206-208: Route registration looks correct and consistent.Good: uses MetricsMiddleware and the same proxy handler as other llama-server endpoints, ensuring filtering and model rewriting apply uniformly.
mostlygeek
left a comment
There was a problem hiding this comment.
Hi,
Please revert the changes to event/default_test.go and event/event_test.go.
Everything else looks good.
|
Ah, sorry, just saw the formatting made that very messy - apologies. |
This pull request adds support for the llama-server
/completionendpoint, for e.g. multimodal transcription via Voxtral or other custom chat template purposes.Updates
llama-server
/completionendpoint:README.mdto document the/completionendpoint as a supported llama-server API./completionPOST handler tomisc/simple-responder/simple-responder.gofor compatibility with llama-server, returning a sample response.Proxy manager integration:
/completionendpoint inproxy/proxymanager.goTesting:
proxy/proxymanager_test.goto verify that the/completionendpoint correctly proxies requests and returns expected results.Summary by CodeRabbit
New Features
Documentation
Tests