feat: add InvokeModel API support for claude models in aws bedrock#1648
Merged
yuzisun merged 16 commits intoenvoyproxy:mainfrom Feb 7, 2026
Merged
feat: add InvokeModel API support for claude models in aws bedrock#1648yuzisun merged 16 commits intoenvoyproxy:mainfrom
yuzisun merged 16 commits intoenvoyproxy:mainfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1648 +/- ##
========================================
Coverage 83.48% 83.48%
========================================
Files 123 124 +1
Lines 16335 16438 +103
========================================
+ Hits 13637 13723 +86
+ Misses 1818 1814 -4
- Partials 880 901 +21 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Contributor
|
@hustxiayang can you help resolve the conflicts? |
467446a to
7d4ec3f
Compare
yuzisun
reviewed
Feb 1, 2026
yuzisun
reviewed
Feb 1, 2026
yuzisun
reviewed
Feb 1, 2026
cd4cf90 to
e5b0450
Compare
…nvoyproxy#1607) **Description** This PR is to fix the following issues: 1 Add reasoning content in the request is also missing in the gcp anthropic 2 The reasoning output of gcp anthropic is not parsed out In this way, reasoning claude models can have an unified interface. Other issues: 1 The assistant message of gcp anthropic did not cover the case of array. Fixed. --------- Signed-off-by: yxia216 <yxia216@bloomberg.net>
**Description**
This replaces encoding/json with bytedance/sonic for faster json
operations. This makes the data plane benchmark results drastically
better, especially for the translation code path. One thing to note is
that we could use goccy/go-json, but it comes with some incompatibility
(no omitzero tag support, etc). On the other hand, bytesdance/sonic is
99% compatible with the current behavior except for the field order,
which semantically doesn't matter in practice.
```
goos: linux
goarch: amd64
pkg: github.com/envoyproxy/ai-gateway/tests/data-plane
cpu: AMD Ryzen 9 7940HS w/ Radeon 780M Graphics
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
ChatCompletions/OpenAI_-_small-16 135.0µ ± 1% 107.7µ ± 2% -20.24% (p=0.000 n=10)
ChatCompletions/OpenAI_-_medium-16 2.546m ± 1% 1.456m ± 1% -42.83% (p=0.000 n=10)
ChatCompletions/OpenAI_-_large-16 28.50m ± 7% 17.41m ± 4% -38.91% (p=0.000 n=10)
ChatCompletions/OpenAI_-_xlarge-16 141.24m ± 3% 71.07m ± 7% -49.68% (p=0.000 n=10)
ChatCompletions/AWS_Bedrock_-_small-16 155.5µ ± 1% 120.4µ ± 1% -22.57% (p=0.000 n=10)
ChatCompletions/AWS_Bedrock_-_medium-16 3.475m ± 1% 2.004m ± 1% -42.33% (p=0.000 n=10)
ChatCompletions/AWS_Bedrock_-_large-16 37.01m ± 2% 24.08m ± 3% -34.93% (p=0.000 n=10)
ChatCompletions/AWS_Bedrock_-_xlarge-16 346.1m ± 3% 100.2m ± 5% -71.04% (p=0.000 n=10)
ChatCompletions/GCP_VertexAI_-_small-16 172.3µ ± 1% 141.9µ ± 2% -17.63% (p=0.000 n=10)
ChatCompletions/GCP_VertexAI_-_medium-16 4.371m ± 1% 2.827m ± 1% -35.31% (p=0.000 n=10)
ChatCompletions/GCP_VertexAI_-_large-16 51.21m ± 3% 32.67m ± 3% -36.19% (p=0.000 n=10)
ChatCompletions/GCP_VertexAI_-_xlarge-16 344.9m ± 2% 102.3m ± 2% -70.33% (p=0.000 n=10)
ChatCompletions/GCP_AnthropicAI_-_small-16 232.4µ ± 1% 198.4µ ± 1% -14.64% (p=0.000 n=10)
ChatCompletions/GCP_AnthropicAI_-_medium-16 9.098m ± 1% 7.513m ± 2% -17.43% (p=0.000 n=10)
ChatCompletions/GCP_AnthropicAI_-_large-16 102.14m ± 4% 83.27m ± 3% -18.48% (p=0.000 n=10)
ChatCompletions/GCP_AnthropicAI_-_xlarge-16 2.014 ± 3% 1.450 ± 2% -27.99% (p=0.000 n=10)
ChatCompletionsStreaming/OpenAI_Streaming-16 13.11m ± 0% 13.12m ± 1% ~ (p=0.190 n=10)
ChatCompletionsStreaming/AWS_Streaming-16 13.11m ± 0% 13.10m ± 0% ~ (p=0.353 n=10)
geomean 11.33m 7.424m -34.50%
```
---------
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
**Description** Anthropic cache writes cost different from cache reads. Cost calculation should be updated to account for writes vs reads. Adding a new cost type. Updated similarly for AWS. https://platform.claude.com/docs/en/build-with-claude/prompt-caching https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_TokenUsage.html Vertex AI and OpenAI themselves do not support cache write response so cache writes will be set to 0. **Changes** Dynamic metadata will now include cache writes. Separates cache reads and writes. User is returned new usage cached writes. Updated tests to match -- hopefully I caught them all. Updated wherever I saw CachedInputTokens. --------- Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
…hing (envoyproxy#1721) **Description** Include cache creation and cache hit tokens to total input tokens as well as keep separate fields for cache miss/hit accounting. This is to unify the usage response to user for both implicit and explicit cache as the input tokens for gpt and gemini include the cache tokens. --------- Signed-off-by: Dan Sun <dsun20@bloomberg.net>
Signed-off-by: yxia216 <yxia216@bloomberg.net>
Signed-off-by: yxia216 <yxia216@bloomberg.net>
Signed-off-by: yxia216 <yxia216@bloomberg.net>
3f4ea09 to
cc439e2
Compare
Signed-off-by: yxia216 <yxia216@bloomberg.net>
cc439e2 to
0b36399
Compare
Contributor
Author
|
/retest |
yuzisun
reviewed
Feb 5, 2026
internal/filterapi/filterconfig.go
Outdated
| // Used for Gemini models hosted on Google Cloud Vertex AI. | ||
| APISchemaGCPVertexAI APISchemaName = "GCPVertexAI" | ||
| // APISchemaGCPAnthropic represents the Google Cloud Anthropic API schema. | ||
| // APISchemaGCPAnthropic represents the schema from OpenAI API to Google Cloud Anthropic API. |
Contributor
There was a problem hiding this comment.
This is not correct, it is the same schema if you use message API directly not always translated from OpenAI API.
fca0997 to
bfdcb40
Compare
Contributor
Author
|
/retest |
1 similar comment
Contributor
Author
|
/retest |
yuzisun
approved these changes
Feb 7, 2026
changminbark
pushed a commit
to changminbark/ai-gateway
that referenced
this pull request
Feb 9, 2026
…nvoyproxy#1648) **Description** Add InvokeModel API support for claude models in aws bedrock. The motivation is to provide consistent services cross providers, envoyproxy#1644 for more details about the motivation. Other Changes: I put common codes related to anthropic into `anthropic_helper.go`, so that both aws and gcp can share these codes. --------- Signed-off-by: yxia216 <yxia216@bloomberg.net> Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> Signed-off-by: Aaron Choo <achoo30@bloomberg.net> Signed-off-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Takeshi Yoneda <t.y.mathetake@gmail.com> Co-authored-by: Aaron Choo <achoo30@bloomberg.net> Co-authored-by: Dan Sun <dsun20@bloomberg.net>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Add InvokeModel API support for claude models in aws bedrock. The motivation is to provide consistent services cross providers, #1644 for more details about the motivation.
Other Changes:
I put common codes related to anthropic into
anthropic_helper.go, so that both aws and gcp can share these codes.