Skip to content

extproc: add cache writes#1719

Merged
yuzisun merged 20 commits intoenvoyproxy:mainfrom
aabchoo:aaron/cache-writes
Jan 3, 2026
Merged

extproc: add cache writes#1719
yuzisun merged 20 commits intoenvoyproxy:mainfrom
aabchoo:aaron/cache-writes

Conversation

@aabchoo
Copy link
Contributor

@aabchoo aabchoo commented Jan 2, 2026

Description

Anthropic cache writes cost different from cache reads. Cost calculation should be updated to account for writes vs reads. Adding a new cost type. Updated similarly for AWS.

https://platform.claude.com/docs/en/build-with-claude/prompt-caching
https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_TokenUsage.html

Vertex AI and OpenAI themselves do not support cache write response so cache writes will be set to 0.

Changes
Dynamic metadata will now include cache writes.
Separates cache reads and writes.
User is returned new usage cached writes.

Updated tests to match -- hopefully I caught them all. Updated wherever I saw CachedInputTokens.

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
@aabchoo aabchoo marked this pull request as ready for review January 2, 2026 20:18
@aabchoo aabchoo requested a review from a team as a code owner January 2, 2026 20:18
@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Jan 2, 2026
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
// Cached tokens present in the prompt.
CachedTokens int `json:"cached_tokens,omitzero"`
// Tokens written to the cache.
CachedWriteTokens int `json:"cached_write_tokens,omitzero"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

litellm named it cache_creation_input_tokens
https://docs.litellm.ai/docs/completion/prompt_caching

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWS calls it cacheWriteInputTokens and Anthropic calls it cache_creation_input_tokens. Can opt to name it creation instead of write

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either cache_creation_input_tokens or cache_write_input_tokens is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to use cache creation everywhere

aabchoo added 14 commits January 2, 2026 15:27
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
@codecov-commenter
Copy link

codecov-commenter commented Jan 2, 2026

Codecov Report

❌ Patch coverage is 90.00000% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.04%. Comparing base (51478b0) to head (e48c555).

Files with missing lines Patch % Lines
internal/translator/openai_awsbedrock.go 69.23% 2 Missing and 2 partials ⚠️
internal/translator/openai_completions.go 0.00% 2 Missing ⚠️
internal/translator/openai_openai.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1719      +/-   ##
==========================================
+ Coverage   81.01%   81.04%   +0.03%     
==========================================
  Files         147      147              
  Lines       13288    13341      +53     
==========================================
+ Hits        10765    10812      +47     
- Misses       1872     1876       +4     
- Partials      651      653       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

translator := NewAnthropicToAnthropicTranslator("", "")
require.NotNil(t, translator)
const responseBody = `{"model":"claude-sonnet-4-5-20250929","id":"msg_01J5gW6Sffiem6avXSAooZZw","type":"message","role":"assistant","content":[{"type":"text","text":"Hi! 👋 How can I help you today?"}],"stop_reason":"end_turn","stop_sequence":null,"usage":{"input_tokens":9,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"cache_creation":{"ephemeral_5m_input_tokens":0,"ephemeral_1h_input_tokens":0},"output_tokens":16,"service_tier":"standard"}}`
const responseBody = `{"model":"claude-sonnet-4-5-20250929","id":"msg_01J5gW6Sffiem6avXSAooZZw","type":"message","role":"assistant","content":[{"type":"text","text":"Hi! 👋 How can I help you today?"}],"stop_reason":"end_turn","stop_sequence":null,"usage":{"input_tokens":9,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"cached_creation":{"ephemeral_5m_input_tokens":0,"ephemeral_1h_input_tokens":0},"output_tokens":16,"service_tier":"standard"}}`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Cached tokens present in the prompt.
CachedTokens int `json:"cached_tokens,omitzero"`
// Tokens written to the cache.
CachedCreationTokens int `json:"cached_creation_input_tokens,omitzero"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are trying to align with the name cached_tokens but cache_creation_input_tokens reads better.

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
@yuzisun yuzisun merged commit f92f8f4 into envoyproxy:main Jan 3, 2026
51 of 53 checks passed
hustxiayang pushed a commit to hustxiayang/ai-gateway that referenced this pull request Jan 29, 2026
**Description**

Anthropic cache writes cost different from cache reads. Cost calculation
should be updated to account for writes vs reads. Adding a new cost
type. Updated similarly for AWS.

https://platform.claude.com/docs/en/build-with-claude/prompt-caching

https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_TokenUsage.html

Vertex AI and OpenAI themselves do not support cache write response so
cache writes will be set to 0.

**Changes**
Dynamic metadata will now include cache writes.
Separates cache reads and writes.
User is returned new usage cached writes.

Updated tests to match -- hopefully I caught them all. Updated wherever
I saw CachedInputTokens.

---------

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
hustxiayang pushed a commit to hustxiayang/ai-gateway that referenced this pull request Feb 2, 2026
**Description**

Anthropic cache writes cost different from cache reads. Cost calculation
should be updated to account for writes vs reads. Adding a new cost
type. Updated similarly for AWS.

https://platform.claude.com/docs/en/build-with-claude/prompt-caching

https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_TokenUsage.html

Vertex AI and OpenAI themselves do not support cache write response so
cache writes will be set to 0.

**Changes**
Dynamic metadata will now include cache writes.
Separates cache reads and writes.
User is returned new usage cached writes.

Updated tests to match -- hopefully I caught them all. Updated wherever
I saw CachedInputTokens.

---------

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
hustxiayang pushed a commit to hustxiayang/ai-gateway that referenced this pull request Feb 5, 2026
**Description**

Anthropic cache writes cost different from cache reads. Cost calculation
should be updated to account for writes vs reads. Adding a new cost
type. Updated similarly for AWS.

https://platform.claude.com/docs/en/build-with-claude/prompt-caching

https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_TokenUsage.html

Vertex AI and OpenAI themselves do not support cache write response so
cache writes will be set to 0.

**Changes**
Dynamic metadata will now include cache writes.
Separates cache reads and writes.
User is returned new usage cached writes.

Updated tests to match -- hopefully I caught them all. Updated wherever
I saw CachedInputTokens.

---------

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants