Skip to content

feat: anthropic endpoint support and translation for openai backend#1878

Merged
nacx merged 12 commits intoenvoyproxy:mainfrom
changminbark:anthropic-support-for-openai
Mar 2, 2026
Merged

feat: anthropic endpoint support and translation for openai backend#1878
nacx merged 12 commits intoenvoyproxy:mainfrom
changminbark:anthropic-support-for-openai

Conversation

@changminbark
Copy link
Contributor

@changminbark changminbark commented Feb 19, 2026

Description

This commit adds a translator that will convert a request sent to /anthropic/v1/messages and /v1/messages endpoints for OpenAI schema backends. It does not matter whether the OpenAI schema backend natively supports the endpoint (e.g. vLLM) as translating should be a light/fast enough process. This approach is also more versatile and future-proof than just passing through the Anthropic Message Request to a backend that natively supports it. It also follows the already-existing structure for adding translators, path processor factories, and schema translation.

A major example use case would be using AI Gateway to route requests from Claude Code to several AI backends like locally hosted vLLM models with LoRA adapters.

NOTE: vLLM is only used for local testing as I do not have access to compute. The intended goal for this PR is to support any OpenAI compatible backend/services using an Anthropic interface.

Related Issues/PRs (if applicable)

Fixes #1372
Fixes #1867

Special notes for reviewers (if applicable)
Claude Code was used to write most of the tests but were verified. It would also be nice if the maintainers could review the other PR #1843 as some of the Anthropic apischema here can be updated once #1843 is merged.

Functional Test Results

Test for anthropic endpoints for OpenAI schema backends that natively support it:

$ curl -v http://localhost:8080/v1/messages   -H "Content-Type: application/json"   -d '{
    "model": "Qwen/Qwen2.5-0.5B-Instruct",
    "messages": [
      {"role": "user", "content": "Say hello!"}
    ],
    "max_tokens": 100
  }'
* Host localhost:8080 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8080...
* Connected to localhost (::1) port 8080
> POST /v1/messages HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.5.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 143
> 
< HTTP/1.1 200 OK
< date: Fri, 20 Feb 2026 18:46:04 GMT
< server: uvicorn
< content-type: application/json
< content-length: 331
< 
* Connection #0 to host localhost left intact
{"id":"chatcmpl-36ec5b3d-4273-41e9-966b-ed742f7a93d1","type":"message","role":"assistant","content":[{"type":"text","text":"Hello! How can I assist you today?"}],"model":"Qwen/Qwen2.5-0.5B-Instruct","stop_reason":"end_turn","usage":{"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"input_tokens":32,"output_tokens":10}}
$ curl -v http://localhost:8080/anthropic/v1/messages   -H "Content-Type: application/json"   -d '{
    "model": "Qwen/Qwen2.5-0.5B-Instruct",
    "messages": [
      {"role": "user", "content": "Say hello!"}
    ],
    "max_tokens": 100
  }'
* Host localhost:8080 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8080...
* Connected to localhost (::1) port 8080
> POST /anthropic/v1/messages HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.5.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 143
> 
< HTTP/1.1 200 OK
< date: Fri, 20 Feb 2026 18:46:44 GMT
< server: uvicorn
< content-type: application/json
< content-length: 331
< 
* Connection #0 to host localhost left intact
{"id":"chatcmpl-f639ff32-4f89-48c5-b5b1-56878e641da6","type":"message","role":"assistant","content":[{"type":"text","text":"Hello! How can I assist you today?"}],"model":"Qwen/Qwen2.5-0.5B-Instruct","stop_reason":"end_turn","usage":{"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"input_tokens":32,"output_tokens":10}}

Port Forward logs

$ kubectl port-forward -n envoy-gateway-system svc/$ENVOY_SERVICE 8080:80
Forwarding from 127.0.0.1:8080 -> 10080
Forwarding from [::1]:8080 -> 10080
Handling connection for 8080
Handling connection for 8080
Handling connection for 8080

vLLM Logs (for both requests)

(APIServer pid=141923) INFO:     Started server process [141923]
(APIServer pid=141923) INFO:     Waiting for application startup.
(APIServer pid=141923) INFO:     Application startup complete.
(APIServer pid=141923) INFO:     172.18.0.2:46854 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=141923) INFO 02-20 13:46:05 [loggers.py:257] Engine 000: Avg prompt throughput: 3.2 tokens/s, Avg generation throughput: 1.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
(APIServer pid=141923) INFO 02-20 13:46:15 [loggers.py:257] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
(APIServer pid=141923) INFO:     172.18.0.2:47216 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=141923) INFO 02-20 13:46:45 [loggers.py:257] Engine 000: Avg prompt throughput: 3.2 tokens/s, Avg generation throughput: 1.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 25.0%
(APIServer pid=141923) INFO 02-20 13:46:55 [loggers.py:257] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 25.0%

@changminbark changminbark requested a review from a team as a code owner February 19, 2026 20:48
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Feb 19, 2026
@changminbark changminbark marked this pull request as draft February 19, 2026 20:48
Signed-off-by: Chang Min <changminbark@gmail.com>
@changminbark changminbark force-pushed the anthropic-support-for-openai branch from b3b16e3 to 50907fb Compare February 19, 2026 20:52
@changminbark changminbark marked this pull request as ready for review February 20, 2026 19:06
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Feb 20, 2026
Signed-off-by: Chang Min <changminbark@gmail.com>
@ehfd
Copy link

ehfd commented Feb 21, 2026

Is it possible to review if tool calling works in various models? I think this would be the key potential bug point.

@changminbark
Copy link
Contributor Author

Sure, what would be a simple setup that would help emulate this?

@ehfd
Copy link

ehfd commented Feb 21, 2026

I think it would be the usage of Claude Code on several kinds of models (Kimi-K2, GLM-4.7/5, MiniMax-M2 series, Qwen3, etc.). It is apparent when it fails.

@changminbark
Copy link
Contributor Author

@ehfd I do not have access to any compute or GPUs, so I'm not sure how to test this.

@ehfd
Copy link

ehfd commented Feb 21, 2026

I think we can test it, we'll take a look.

@changminbark
Copy link
Contributor Author

@ehfd Thank you! Let me know how it goes.

groundsada added a commit to groundsada/ai-gateway that referenced this pull request Feb 23, 2026
server.Register(path.Join(flags.rootPrefix, endpointPrefixes.Anthropic, "/v1/messages"), extproc.NewFactory(
messagesMetricsFactory, tracing.MessageTracer(), endpointspec.MessagesEndpointSpec{}))
// These are for OpenAI schema backends that support /v1/messages endpoint (no endpoint prefix as OpenAI prefix is '/')
server.Register(path.Join(flags.rootPrefix, endpointPrefixes.OpenAI, "/v1/messages"), extproc.NewFactory(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root path / is intended for unified API endpoints which works for heterogeneous backends by translating between API schemas. For /v1/messages currently we only support pass through hence prefixed with anthropic, you could configure this root path on the client side to be compatible with the gateway path.

Copy link
Contributor

@yuzisun yuzisun Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I see you want Claude Code to work with models using OpenAI API, but I think vllm actually now supports v1/messages natively.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1867

Please follow through the conversations here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuzisun Yes, I discussed the implementation details in the tagged issue. Please take a look and let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @changminbark and @yuzisun
I work with @ehfd, just tested the PR in our cluster with curl and claude code, works perfectly! Looks great!

Copy link
Contributor

@gavrissh gavrissh Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the /v1/messages -> /v1/chat/completions translation will work with the existing registered endpoint, and the current change in this file is not required -

	server.Register(path.Join(flags.rootPrefix, endpointPrefixes.Anthropic, "/v1/messages"), extproc.NewFactory(
		messagesMetricsFactory, tracing.MessageTracer(), endpointspec.MessagesEndpointSpec{}))

https://github.com/envoyproxy/ai-gateway/pull/1878/changes#diff-33280abdb776271e54b6d3fea30758bc5ea3f458cdf3cce1b4d57e3be8caa853R310

The anthropic prefix can be set to an empty string anthropic: "" in values.yaml, if you don't need it in the access URL

Copy link

@ehfd ehfd Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ai-gateway-extproc time=2026-02-14T07:39:50.432Z level=ERROR msg="error processing request message" error="rpc error: code = Internal desc = cannot set backend: failed to create translator for backend nrp-llm/envoy-ai-gateway-nrp-glm-4/route/envoy-ai-gateway-nrp-glm/rule/1/ref/0: /v1/messages endpoint only supports backends that return native Anthropic format (Anthropic, GCPAnthropic, AWSAnthropic). Backend OpenAI uses different model format"

In core, we want to get this rid and not have to duplicate APISchemaAnthropic, BackendSecurityPolicyAnthropicAPIKey, AnthropicAPIKey with the OpenAI schema. So the correct implementation is a translator like in https://github.com/envoyproxy/ai-gateway/tree/main/internal/translator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, adding OpenAI Translator for existing anthropic /v1/messages endpoint is correct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I should get rid of registering the route with the OpenAI prefix?

Copy link

@ehfd ehfd Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dd9b9a9

Available here. I think this is resolved.

@codecov-commenter
Copy link

codecov-commenter commented Feb 23, 2026

Codecov Report

❌ Patch coverage is 86.62420% with 63 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.30%. Comparing base (c07e7a8) to head (1be9091).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
internal/translator/openai_helper.go 86.68% 24 Missing and 21 partials ⚠️
internal/translator/anthropic_openai.go 86.15% 9 Missing and 9 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1878      +/-   ##
==========================================
+ Coverage   84.20%   84.30%   +0.10%     
==========================================
  Files         126      128       +2     
  Lines       17075    17545     +470     
==========================================
+ Hits        14378    14792     +414     
- Misses       1803     1827      +24     
- Partials      894      926      +32     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@nacx
Copy link
Member

nacx commented Feb 24, 2026

Thanks!
This PR will need some end-to-end tests that validate that translation happens properly. We have the testupstream_test.go file that contains a lot of end-to-end tests that verify the input request and the request that is being sent to the upstream, to verify translation and extproc behaves properly e2e.

Can you please add all the relevant cases to that, so that we are confident this is properly tested e2e?

@ehfd
Copy link

ehfd commented Feb 25, 2026

After further validation, the conversion from /v1/messages from the frontend to /v1/chat/completions to the backend doesn't work properly for tool calling in Claude Code. Almost no models work with tool calling.

Signed-off-by: Chang Min <changminbark@gmail.com>
@changminbark
Copy link
Contributor Author

changminbark commented Feb 25, 2026

@ehfd I just fixed a bug while creating e2e tests. Basically, there was a problem where some of the OpenAI SSE were being passed by Envoy back to the client even though it had no Anthropic SSE equivalent. Do you mind trying it now?

Example of malformed streaming response (tool call) before:

Response body: event: message_start
        data: {"type":"message_start","message":{"stop_sequence":null,"id":"chatcmpl-stream","type":"message","role":"assistant","content":[],"model":"gpt-4o","usage":{"input_tokens":0,"output_tokens":0},"stop_reason":null}}
        
        event: content_block_start
        data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
        
        event: content_block_delta
        data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hi"}}
        
        event: content_block_delta
        data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" there!"}}
        
        data: {"id":"chatcmpl-stream","model":"gpt-4o","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":null}
        
        event: content_block_stop
        data: {"type":"content_block_stop","index":0}
        
        event: message_delta
        data: {"type":"message_delta","delta":{"stop_sequence":null,"stop_reason":"end_turn"},"usage":{"output_tokens":3}}
        
        event: message_stop
        data: {"type":"message_stop"}
        
        data: [DONE]
        

Example of correct streaming response (tool call) after:

Response body: event: message_start
data: {"type":"message_start","message":{"id":"chatcmpl-tool","type":"message","role":"assistant","content":[],"model":"gpt-4o","stop_reason":null,"stop_sequence":null,"usage":{"input_tokens":0,"output_tokens":0}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"tool_use","id":"call_abc","name":"get_weather","input":{}}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"input_json_delta","partial_json":"{\"location\":"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"input_json_delta","partial_json":"\"Paris\"}"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"tool_use","stop_sequence":null},"usage":{"output_tokens":15}}

event: message_stop
data: {"type":"message_stop"}

@ehfd
Copy link

ehfd commented Feb 25, 2026

@changminbark Looking into it, thanks.

@ehfd
Copy link

ehfd commented Feb 25, 2026

MiniMax-M2, only reproduced in the PR and not directly through vLLM.

⏺ Web Search("Super Bowl LX winner")
  ⎿  Did 0 searches in 2s                                                         
                                                          
⏺ The search isn't returning any results. This could be because:                  
  1. The web search tool isn't functioning properly                                           
  2. The information isn't available online               
                                                                                              
  Given that the system date is February 25, 2026, and Super Bowl 60 (LX) is typically played 
  in early February, it would have just happened recently. However, I don't have reliable     
  search results to provide an answer.                                                        
                  
  Let me be honest with the user about this limitation.


  I'm having trouble retrieving the search results. Based on today's date (February 25, 2026),
   the most recent Super Bowl would be Super Bowl LX (60), which is typically played in early
  February. However, I'm not getting search results to confirm the winner.

  Based on my knowledge, Super Bowl 60 (played February 2026) would have just concluded
  recently. Without working search results, I cannot provide definitive information on who
  won.

  Sources: None available due to search failure

✻ Cooked for 5m 1s

@changminbark
Copy link
Contributor Author

@ehfd I'll try looking into this more, but do you have access to Claude Code's internal logs? That may be helpful to help debug this.

@changminbark
Copy link
Contributor Author

changminbark commented Feb 25, 2026

I tried doing this on my local machine.

vLLM

Directly calling /v1/messages using vLLM

$ curl http://localhost:8000/v1/messages   -H "Content-Type: application/json"   -d '{"model":"Qwen/Qwen3-0.6B","max_tokens":1024,"messages":[{"role":"user","content":"search the web for the year america was founded"}],"tools":[{"name":"search","description":"search the web","input_schema":{"type":"object","properties":{"query":{"type":"string"}},"required":["query"]}}]}' | jq  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1519  100  1231  100   288   1510    353 --:--:-- --:--:-- --:--:--  1863
{
  "id": "chatcmpl-b0753bfdda7893af",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "<think>\nOkay, the user is asking to search the web for the year America was founded. Let me think about how to approach this.\n\nFirst, I need to figure out the correct historical date. America was founded on July 4, 1776. But the user wants to know the year, not the date. So, the answer should be 1776. However, the user is asking for a web search, so I should use the search function provided.\n\nI should construct a query to search for \"year america was founded\" to get relevant information. The function requires a \"query\" parameter, so I'll input that. Let me make sure there are no typos. The function name is \"search\", and the parameters are a JSON object with \"query\". \n\nI think that's all. Just need to format the tool call correctly within the XML tags. Let me double-check the syntax to ensure it's valid JSON. Once confirmed, the tool call should return the result.\n</think>\n\n"
    },
    {
      "type": "tool_use",
      "id": "chatcmpl-tool-a02c6f995701b000",
      "name": "search",
      "input": {
        "query": "year america was founded"
      }
    }
  ],
  "model": "Qwen/Qwen3-0.6B",
  "stop_reason": "tool_use",
  "usage": {
    "input_tokens": 152,
    "output_tokens": 229
  }
}

Streaming

$ curl http://localhost:8000/v1/messages \
  -H "Content-Type: application/json" \
  -d '{"model":"Qwen/Qwen3-0.6B","max_tokens":1024,"stream":true,"messages":[{"role":"user","content":"search the web for the year america was founded"}],"tools":[{"name":"search","description":"search the web","input_schema":{"type":"object","properties":{"query":{"type":"string"}},"required":["query"]}}]}'
event: message_start
data: {"type":"message_start","message":{"id":"chatcmpl-854bb730eed1434f","content":[],"model":"Qwen/Qwen3-0.6B","usage":{"input_tokens":152,"output_tokens":0}}}

event: content_block_start
data: {"type":"content_block_start","content_block":{"type":"text","text":""},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"<think>"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"\n"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"Okay"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":","},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" the"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" user"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" wants"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" to"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" know"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" the"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" year"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" America"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" was"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" founded"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"."},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" Let"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" me"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" think"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"."},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" I"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" need"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" to"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" search"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" the"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" web"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" for"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" that"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" information"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"."},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" The"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" available"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" tool"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" is"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" the"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" '"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"search"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"'"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" function"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":","},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" which"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" takes"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" a"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" query"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" parameter"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"."},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" The"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" query"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" here"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" would"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" be"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" \""},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"year"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" america"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" was"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" founded"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"\"."},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" I"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" should"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" make"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" sure"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" to"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" use"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" the"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" correct"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" function"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" name"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" and"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" parameters"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"."},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" Let"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" me"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" construct"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" the"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" tool"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" call"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" properly"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":".\n"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"</think>"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"\n\n"},"index":0}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: content_block_start
data: {"type":"content_block_start","content_block":{"type":"tool_use","id":"chatcmpl-tool-b5b2602282b39cb5","name":"search","input":{}},"index":1}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"input_json_delta","partial_json":"{\"query\": \""},"index":1}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"input_json_delta","partial_json":"year"},"index":1}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"input_json_delta","partial_json":" america"},"index":1}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"input_json_delta","partial_json":" was"},"index":1}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"input_json_delta","partial_json":" founded"},"index":1}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"input_json_delta","partial_json":"\"}"},"index":1}

event: content_block_stop
data: {"type":"content_block_stop","index":1}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"tool_use"},"usage":{"input_tokens":152,"output_tokens":101}}

event: message_stop
data: {"type":"message_stop"}

data: [DONE]

AI Gateway

$ curl http://localhost:8000/v1/messages   -H "Content-Type: application/json"   -d '{"model":"Qwen/Qwen2.5-0.5B-Instruct","max_tokens":1024,"messages":[{"role":"user","content":"search the web for the year america was founded"}],"tools":[{"name":"search","description":"search the web","input_schema":{"type":"object","properties":{"query":{"type":"string"}},"required":["query"]}}]}' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   636  100   337  100   299   2378   2109 --:--:-- --:--:-- --:--:--  4510
{
  "id": "chatcmpl-be92bf95eb4f5d6d",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": ""
    },
    {
      "type": "tool_use",
      "id": "chatcmpl-tool-8f41db8c125e33eb",
      "name": "search",
      "input": {
        "query": "year america was founded"
      }
    }
  ],
  "model": "Qwen/Qwen2.5-0.5B-Instruct",
  "stop_reason": "tool_use",
  "usage": {
    "input_tokens": 168,
    "output_tokens": 22
  }
}
$ curl http://localhost:8000/v1/messages \ 
  -H "Content-Type: application/json" \
  -d '{"model":"Qwen/Qwen2.5-0.5B-Instruct","max_tokens":1024,"stream":true,"messages":[{"role":"user","content":"search the web for the year america was founded"}],"tools":[{"name":"search","description":"search the web","input_schema":{"type":"object","properties":{"query":{"type":"string"}},"required":["query"]}}]}'
event: message_start
data: {"type":"message_start","message":{"id":"chatcmpl-a2a79c35b0a88cfe","content":[],"model":"Qwen/Qwen2.5-0.5B-Instruct","usage":{"input_tokens":168,"output_tokens":0}}}

event: content_block_start
data: {"type":"content_block_start","content_block":{"type":"tool_use","id":"chatcmpl-tool-b846494e0a70f02e","name":"search","input":{}},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"input_json_delta","partial_json":"{\"query\": \""},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"input_json_delta","partial_json":"America"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"input_json_delta","partial_json":" was"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"input_json_delta","partial_json":" founded"},"index":0}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"input_json_delta","partial_json":"\"}"},"index":0}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"tool_use"},"usage":{"input_tokens":168,"output_tokens":21}}

event: message_stop
data: {"type":"message_stop"}

data: [DONE]

@changminbark
Copy link
Contributor Author

changminbark commented Feb 25, 2026

@ehfd It seems that the format that Envoy is outputting is identical to the format the vLLM is outputting. I am not too sure why your tool call is not working. Isn't Claude Code also executing the tool call in the screenshot that you showed? Doesn't that mean that it is properly processing the tool call request from the model? Maybe it's a problem with Claude Code?

My thoughts: I think that web search tool in Claude Code requires a specific streaming SSE format as seen below. However, other tool calls (like ones that you custom define should work as intended). You can see that the type of the content block in the SSE for web search tool is server_tool_use instead of tool_use here.

Note the type being server_tool_use instead of tool_use and the last content block start having a type web_search_tool_result and compare it to the output with non-web search tool usage here.

event: content_block_start
data: {"type":"content_block_start","index":1,"content_block":{"type":"server_tool_use","id":"srvtoolu_014hJH82Qum7Td6UV8gDXThB","name":"web_search","input":{}}}

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"{\"query"}}

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"\":"}}

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":" \"weather"}}

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":" NY"}}

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"C to"}}

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"day\"}"}}

event: content_block_stop
data: {"type":"content_block_stop","index":1 }

event: content_block_start
data: {"type":"content_block_start","index":2,"content_block":{"type":"web_search_tool_result","tool_use_id":"srvtoolu_014hJH82Qum7Td6UV8gDXThB","content":[{"type":"web_search_result","title":"Weather in New York City in May 2025 (New York) - detailed Weather Forecast for a month","url":"https://world-weather.info/forecast/usa/new_york/may-2025/","encrypted_content":"Ev0DCioIAxgCIiQ3NmU4ZmI4OC1k...","page_age":null},...]}}

@changminbark
Copy link
Contributor Author

changminbark commented Feb 25, 2026

@ehfd It looks like web search tool is a Anthropic-specific tool that only Claude models can generate as part of their output SSE, so even if the request comes from Claude Code that says that this tool is available, the OpenAI backend cannot generate the web search tool specific SSE response needed for Claude Code to do the web search. Claude Code can only conduct web searches with this specific tool defined by Anthropic. The way LiteLLM goes around this is that they intercept the user-defined (not Anthropic-defined) web search tool and then executes it server-side as described here. Please correct me if my understanding is wrong.

@groundsada
Copy link
Contributor

@changminbark okay, I ran some tests. You're 100% right. This is a Claude Code specific issue. Web search is failing on both vLLM and Envoy. IMO, PR looks great to merge. Can always ask CC to use a DDG MCP server and disable web search tool and it works flawlessly. One thing that is happening is that on vLLM, web search fails and it moves to different tools (fetch), via Envoy it remains stuck on a loop of failing.

@ehfd
Copy link

ehfd commented Feb 26, 2026

One thing that is happening is that on vLLM, web search fails and it moves to different tools (fetch), via Envoy it remains stuck on a loop of failing.

This part does look worth addressing, though.

Thanks for all your efforts troubleshooting! @changminbark @groundsada

@changminbark
Copy link
Contributor Author

One thing that is happening is that on vLLM, web search fails and it moves to different tools (fetch), via Envoy it remains stuck on a loop of failing.

This part does look worth addressing, though.

It might be better to open up a new issue for this. I think this PR is getting quite big.

Thanks for all your efforts troubleshooting! @changminbark @groundsada

Thank you!

@ehfd
Copy link

ehfd commented Feb 26, 2026

It might be better to open up a new issue for this. I think this PR is getting quite big.

OK, if possible we could get this merged first. Thank you!

@changminbark
Copy link
Contributor Author

@ehfd #1843 still has to get merged because I introduce some updates to the Anthropic messages apischema and need to validate the JSON marshaling/unmarshaling.

@ehfd
Copy link

ehfd commented Feb 26, 2026

xref vllm-project/vllm#34887

Copy link
Member

@nacx nacx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Overall LGTM. Just a couple minor comments and one comment about the test completeness.

@johnugeorge @yuzisun would you wanna do another review?

…ropic-openai-local.yaml

Signed-off-by: Chang Min <changminbark@gmail.com>
@changminbark changminbark requested a review from nacx February 26, 2026 23:41
@changminbark
Copy link
Contributor Author

@nacx I think the failing e2e tests are not related to my changes. Please correct me if I'm wrong.

@nacx
Copy link
Member

nacx commented Feb 28, 2026

/retest

@johnugeorge
Copy link
Contributor

LGTM

Copy link
Contributor

@gavrissh gavrissh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@nacx nacx enabled auto-merge (squash) March 2, 2026 17:01
@nacx nacx merged commit 4ad94b0 into envoyproxy:main Mar 2, 2026
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

8 participants