Skip to content

feat: add InvokeModel API support for claude models in aws bedrock#1648

Merged
yuzisun merged 16 commits intoenvoyproxy:mainfrom
hustxiayang:aws-anthropic
Feb 7, 2026
Merged

feat: add InvokeModel API support for claude models in aws bedrock#1648
yuzisun merged 16 commits intoenvoyproxy:mainfrom
hustxiayang:aws-anthropic

Conversation

@hustxiayang
Copy link
Contributor

@hustxiayang hustxiayang commented Dec 11, 2025

Description
Add InvokeModel API support for claude models in aws bedrock. The motivation is to provide consistent services cross providers, #1644 for more details about the motivation.

Other Changes:
I put common codes related to anthropic into anthropic_helper.go, so that both aws and gcp can share these codes.

@hustxiayang hustxiayang requested a review from a team as a code owner December 11, 2025 18:15
@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Dec 11, 2025
@hustxiayang hustxiayang marked this pull request as draft December 11, 2025 18:16
@codecov-commenter
Copy link

codecov-commenter commented Dec 11, 2025

Codecov Report

❌ Patch coverage is 84.15366% with 132 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.48%. Comparing base (d851c0c) to head (0bc0bec).

Files with missing lines Patch % Lines
internal/translator/anthropic_helper.go 85.49% 74 Missing and 29 partials ⚠️
internal/translator/openai_awsanthropic.go 77.58% 14 Missing and 12 partials ⚠️
internal/translator/openai_gcpanthropic.go 40.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main    #1648    +/-   ##
========================================
  Coverage   83.48%   83.48%            
========================================
  Files         123      124     +1     
  Lines       16335    16438   +103     
========================================
+ Hits        13637    13723    +86     
+ Misses       1818     1814     -4     
- Partials      880      901    +21     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@hustxiayang hustxiayang changed the title feat: Add InvokeModel API support for claude models in aws bedrock feat: add InvokeModel API support for claude models in aws bedrock Dec 11, 2025
@yuzisun
Copy link
Contributor

yuzisun commented Jan 20, 2026

@hustxiayang can you help resolve the conflicts?

@hustxiayang hustxiayang force-pushed the aws-anthropic branch 3 times, most recently from 467446a to 7d4ec3f Compare January 29, 2026 22:50
@hustxiayang hustxiayang marked this pull request as ready for review January 29, 2026 22:53
@hustxiayang hustxiayang force-pushed the aws-anthropic branch 2 times, most recently from cd4cf90 to e5b0450 Compare February 2, 2026 16:19
hustxiayang and others added 12 commits February 5, 2026 11:22
Signed-off-by: yxia216 <yxia216@bloomberg.net>
Signed-off-by: yxia216 <yxia216@bloomberg.net>
…nvoyproxy#1607)

**Description**
This PR is to fix the following issues:
1 Add reasoning content in the request is also missing in the gcp
anthropic
2 The reasoning output of gcp anthropic is not parsed out

In this way, reasoning claude models can have an unified interface.

Other issues:
1 The assistant message of gcp anthropic did not cover the case of
array. Fixed.

---------

Signed-off-by: yxia216 <yxia216@bloomberg.net>
**Description**

This replaces encoding/json with bytedance/sonic for faster json
operations. This makes the data plane benchmark results drastically
better, especially for the translation code path. One thing to note is
that we could use goccy/go-json, but it comes with some incompatibility
(no omitzero tag support, etc). On the other hand, bytesdance/sonic is
99% compatible with the current behavior except for the field order,
which semantically doesn't matter in practice.

```
goos: linux
goarch: amd64
pkg: github.com/envoyproxy/ai-gateway/tests/data-plane
cpu: AMD Ryzen 9 7940HS w/ Radeon 780M Graphics
                                             │   old.txt    │               new.txt               │
                                             │    sec/op    │   sec/op     vs base                │
ChatCompletions/OpenAI_-_small-16               135.0µ ± 1%   107.7µ ± 2%  -20.24% (p=0.000 n=10)
ChatCompletions/OpenAI_-_medium-16              2.546m ± 1%   1.456m ± 1%  -42.83% (p=0.000 n=10)
ChatCompletions/OpenAI_-_large-16               28.50m ± 7%   17.41m ± 4%  -38.91% (p=0.000 n=10)
ChatCompletions/OpenAI_-_xlarge-16             141.24m ± 3%   71.07m ± 7%  -49.68% (p=0.000 n=10)
ChatCompletions/AWS_Bedrock_-_small-16          155.5µ ± 1%   120.4µ ± 1%  -22.57% (p=0.000 n=10)
ChatCompletions/AWS_Bedrock_-_medium-16         3.475m ± 1%   2.004m ± 1%  -42.33% (p=0.000 n=10)
ChatCompletions/AWS_Bedrock_-_large-16          37.01m ± 2%   24.08m ± 3%  -34.93% (p=0.000 n=10)
ChatCompletions/AWS_Bedrock_-_xlarge-16         346.1m ± 3%   100.2m ± 5%  -71.04% (p=0.000 n=10)
ChatCompletions/GCP_VertexAI_-_small-16         172.3µ ± 1%   141.9µ ± 2%  -17.63% (p=0.000 n=10)
ChatCompletions/GCP_VertexAI_-_medium-16        4.371m ± 1%   2.827m ± 1%  -35.31% (p=0.000 n=10)
ChatCompletions/GCP_VertexAI_-_large-16         51.21m ± 3%   32.67m ± 3%  -36.19% (p=0.000 n=10)
ChatCompletions/GCP_VertexAI_-_xlarge-16        344.9m ± 2%   102.3m ± 2%  -70.33% (p=0.000 n=10)
ChatCompletions/GCP_AnthropicAI_-_small-16      232.4µ ± 1%   198.4µ ± 1%  -14.64% (p=0.000 n=10)
ChatCompletions/GCP_AnthropicAI_-_medium-16     9.098m ± 1%   7.513m ± 2%  -17.43% (p=0.000 n=10)
ChatCompletions/GCP_AnthropicAI_-_large-16     102.14m ± 4%   83.27m ± 3%  -18.48% (p=0.000 n=10)
ChatCompletions/GCP_AnthropicAI_-_xlarge-16      2.014 ± 3%    1.450 ± 2%  -27.99% (p=0.000 n=10)
ChatCompletionsStreaming/OpenAI_Streaming-16    13.11m ± 0%   13.12m ± 1%        ~ (p=0.190 n=10)
ChatCompletionsStreaming/AWS_Streaming-16       13.11m ± 0%   13.10m ± 0%        ~ (p=0.353 n=10)
geomean                                         11.33m        7.424m       -34.50%
```

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
**Description**

Anthropic cache writes cost different from cache reads. Cost calculation
should be updated to account for writes vs reads. Adding a new cost
type. Updated similarly for AWS.

https://platform.claude.com/docs/en/build-with-claude/prompt-caching

https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_TokenUsage.html

Vertex AI and OpenAI themselves do not support cache write response so
cache writes will be set to 0.

**Changes**
Dynamic metadata will now include cache writes.
Separates cache reads and writes.
User is returned new usage cached writes.

Updated tests to match -- hopefully I caught them all. Updated wherever
I saw CachedInputTokens.

---------

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
…hing (envoyproxy#1721)

**Description**
Include cache creation and cache hit tokens to total input tokens as
well as keep separate fields for cache miss/hit accounting. This is to
unify the usage response to user for both implicit and explicit cache as
the input tokens for gpt and gemini include the cache tokens.

---------

Signed-off-by: Dan Sun <dsun20@bloomberg.net>
Signed-off-by: yxia216 <yxia216@bloomberg.net>
Signed-off-by: yxia216 <yxia216@bloomberg.net>
Signed-off-by: yxia216 <yxia216@bloomberg.net>
Signed-off-by: yxia216 <yxia216@bloomberg.net>
Signed-off-by: yxia216 <yxia216@bloomberg.net>
Signed-off-by: yxia216 <yxia216@bloomberg.net>
Signed-off-by: yxia216 <yxia216@bloomberg.net>
@hustxiayang
Copy link
Contributor Author

/retest

// Used for Gemini models hosted on Google Cloud Vertex AI.
APISchemaGCPVertexAI APISchemaName = "GCPVertexAI"
// APISchemaGCPAnthropic represents the Google Cloud Anthropic API schema.
// APISchemaGCPAnthropic represents the schema from OpenAI API to Google Cloud Anthropic API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct, it is the same schema if you use message API directly not always translated from OpenAI API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Signed-off-by: yxia216 <yxia216@bloomberg.net>
@hustxiayang
Copy link
Contributor Author

/retest

1 similar comment
@hustxiayang
Copy link
Contributor Author

/retest

@yuzisun yuzisun merged commit 4347d17 into envoyproxy:main Feb 7, 2026
34 checks passed
changminbark pushed a commit to changminbark/ai-gateway that referenced this pull request Feb 9, 2026
…nvoyproxy#1648)

**Description**
Add InvokeModel API support for claude models in aws bedrock. The
motivation is to provide consistent services cross providers,
envoyproxy#1644 for more details
about the motivation.

Other Changes:
I put common codes related to anthropic into `anthropic_helper.go`, so
that both aws and gcp can share these codes.

---------

Signed-off-by: yxia216 <yxia216@bloomberg.net>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Dan Sun <dsun20@bloomberg.net>
Co-authored-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Co-authored-by: Aaron Choo <achoo30@bloomberg.net>
Co-authored-by: Dan Sun <dsun20@bloomberg.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants