feat: anthropic endpoint support and translation for openai backend#1878
feat: anthropic endpoint support and translation for openai backend#1878nacx merged 12 commits intoenvoyproxy:mainfrom
Conversation
Signed-off-by: Chang Min <changminbark@gmail.com>
b3b16e3 to
50907fb
Compare
Signed-off-by: Chang Min <changminbark@gmail.com>
16efcfd to
e1fe7a0
Compare
|
Is it possible to review if tool calling works in various models? I think this would be the key potential bug point. |
|
Sure, what would be a simple setup that would help emulate this? |
|
I think it would be the usage of Claude Code on several kinds of models (Kimi-K2, GLM-4.7/5, MiniMax-M2 series, Qwen3, etc.). It is apparent when it fails. |
|
@ehfd I do not have access to any compute or GPUs, so I'm not sure how to test this. |
|
I think we can test it, we'll take a look. |
|
@ehfd Thank you! Let me know how it goes. |
…voyproxy#1878) Co-authored-by: Cursor <cursoragent@cursor.com>
cmd/extproc/mainlib/main.go
Outdated
| server.Register(path.Join(flags.rootPrefix, endpointPrefixes.Anthropic, "/v1/messages"), extproc.NewFactory( | ||
| messagesMetricsFactory, tracing.MessageTracer(), endpointspec.MessagesEndpointSpec{})) | ||
| // These are for OpenAI schema backends that support /v1/messages endpoint (no endpoint prefix as OpenAI prefix is '/') | ||
| server.Register(path.Join(flags.rootPrefix, endpointPrefixes.OpenAI, "/v1/messages"), extproc.NewFactory( |
There was a problem hiding this comment.
The root path / is intended for unified API endpoints which works for heterogeneous backends by translating between API schemas. For /v1/messages currently we only support pass through hence prefixed with anthropic, you could configure this root path on the client side to be compatible with the gateway path.
There was a problem hiding this comment.
ah I see you want Claude Code to work with models using OpenAI API, but I think vllm actually now supports v1/messages natively.
There was a problem hiding this comment.
Please follow through the conversations here.
There was a problem hiding this comment.
@yuzisun Yes, I discussed the implementation details in the tagged issue. Please take a look and let me know what you think.
There was a problem hiding this comment.
Hi @changminbark and @yuzisun
I work with @ehfd, just tested the PR in our cluster with curl and claude code, works perfectly! Looks great!
There was a problem hiding this comment.
I believe the /v1/messages -> /v1/chat/completions translation will work with the existing registered endpoint, and the current change in this file is not required -
server.Register(path.Join(flags.rootPrefix, endpointPrefixes.Anthropic, "/v1/messages"), extproc.NewFactory(
messagesMetricsFactory, tracing.MessageTracer(), endpointspec.MessagesEndpointSpec{}))
The anthropic prefix can be set to an empty string anthropic: "" in values.yaml, if you don't need it in the access URL
There was a problem hiding this comment.
ai-gateway-extproc time=2026-02-14T07:39:50.432Z level=ERROR msg="error processing request message" error="rpc error: code = Internal desc = cannot set backend: failed to create translator for backend nrp-llm/envoy-ai-gateway-nrp-glm-4/route/envoy-ai-gateway-nrp-glm/rule/1/ref/0: /v1/messages endpoint only supports backends that return native Anthropic format (Anthropic, GCPAnthropic, AWSAnthropic). Backend OpenAI uses different model format"
In core, we want to get this rid and not have to duplicate APISchemaAnthropic, BackendSecurityPolicyAnthropicAPIKey, AnthropicAPIKey with the OpenAI schema. So the correct implementation is a translator like in https://github.com/envoyproxy/ai-gateway/tree/main/internal/translator?
There was a problem hiding this comment.
Yes, adding OpenAI Translator for existing anthropic /v1/messages endpoint is correct
There was a problem hiding this comment.
So I should get rid of registering the route with the OpenAI prefix?
There was a problem hiding this comment.
Available here. I think this is resolved.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1878 +/- ##
==========================================
+ Coverage 84.20% 84.30% +0.10%
==========================================
Files 126 128 +2
Lines 17075 17545 +470
==========================================
+ Hits 14378 14792 +414
- Misses 1803 1827 +24
- Partials 894 926 +32 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Chang Min <changminbark@gmail.com>
|
Thanks! Can you please add all the relevant cases to that, so that we are confident this is properly tested e2e? |
|
After further validation, the conversion from |
Signed-off-by: Chang Min <changminbark@gmail.com>
|
@ehfd I just fixed a bug while creating e2e tests. Basically, there was a problem where some of the OpenAI SSE were being passed by Envoy back to the client even though it had no Anthropic SSE equivalent. Do you mind trying it now? Example of malformed streaming response (tool call) before: Example of correct streaming response (tool call) after: |
|
@changminbark Looking into it, thanks. |
|
MiniMax-M2, only reproduced in the PR and not directly through vLLM. |
|
@ehfd I'll try looking into this more, but do you have access to Claude Code's internal logs? That may be helpful to help debug this. |
|
I tried doing this on my local machine. vLLMDirectly calling /v1/messages using vLLM Streaming AI Gateway |
|
@ehfd It seems that the format that Envoy is outputting is identical to the format the vLLM is outputting. I am not too sure why your tool call is not working. Isn't Claude Code also executing the tool call in the screenshot that you showed? Doesn't that mean that it is properly processing the tool call request from the model? Maybe it's a problem with Claude Code? My thoughts: I think that web search tool in Claude Code requires a specific streaming SSE format as seen below. However, other tool calls (like ones that you custom define should work as intended). You can see that the type of the content block in the SSE for web search tool is
|
|
@ehfd It looks like |
|
@changminbark okay, I ran some tests. You're 100% right. This is a Claude Code specific issue. Web search is failing on both vLLM and Envoy. IMO, PR looks great to merge. Can always ask CC to use a DDG MCP server and disable web search tool and it works flawlessly. One thing that is happening is that on vLLM, web search fails and it moves to different tools (fetch), via Envoy it remains stuck on a loop of failing. |
This part does look worth addressing, though. Thanks for all your efforts troubleshooting! @changminbark @groundsada |
It might be better to open up a new issue for this. I think this PR is getting quite big.
Thank you! |
OK, if possible we could get this merged first. Thank you! |
Signed-off-by: Chang Min <changminbark@gmail.com>
nacx
left a comment
There was a problem hiding this comment.
Thanks!
Overall LGTM. Just a couple minor comments and one comment about the test completeness.
@johnugeorge @yuzisun would you wanna do another review?
…ropic-openai-local.yaml Signed-off-by: Chang Min <changminbark@gmail.com>
|
@nacx I think the failing e2e tests are not related to my changes. Please correct me if I'm wrong. |
|
/retest |
|
LGTM |
Description
This commit adds a translator that will convert a request sent to
/anthropic/v1/messagesand/v1/messagesendpoints for OpenAI schema backends. It does not matter whether the OpenAI schema backend natively supports the endpoint (e.g. vLLM) as translating should be a light/fast enough process. This approach is also more versatile and future-proof than just passing through the Anthropic Message Request to a backend that natively supports it. It also follows the already-existing structure for adding translators, path processor factories, and schema translation.A major example use case would be using AI Gateway to route requests from Claude Code to several AI backends like locally hosted vLLM models with LoRA adapters.
NOTE: vLLM is only used for local testing as I do not have access to compute. The intended goal for this PR is to support any OpenAI compatible backend/services using an Anthropic interface.
Related Issues/PRs (if applicable)
Fixes #1372
Fixes #1867
Special notes for reviewers (if applicable)
Claude Code was used to write most of the tests but were verified. It would also be nice if the maintainers could review the other PR #1843 as some of the Anthropic apischema here can be updated once #1843 is merged.
Functional Test Results
Test for anthropic endpoints for OpenAI schema backends that natively support it:
Port Forward logs
vLLM Logs (for both requests)