extproc: add cache writes by aabchoo · Pull Request #1719 · envoyproxy/ai-gateway

aabchoo · 2026-01-02T20:18:04Z

Description

Anthropic cache writes cost different from cache reads. Cost calculation should be updated to account for writes vs reads. Adding a new cost type. Updated similarly for AWS.

https://platform.claude.com/docs/en/build-with-claude/prompt-caching
https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_TokenUsage.html

Vertex AI and OpenAI themselves do not support cache write response so cache writes will be set to 0.

Changes
Dynamic metadata will now include cache writes.
Separates cache reads and writes.
User is returned new usage cached writes.

Updated tests to match -- hopefully I caught them all. Updated wherever I saw CachedInputTokens.

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

internal/translator/anthropic_anthropic_test.go

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

yuzisun · 2026-01-02T20:26:35Z

internal/apischema/openai/openai.go

 	// Cached tokens present in the prompt.
 	CachedTokens int `json:"cached_tokens,omitzero"`
+	// Tokens written to the cache.
+	CachedWriteTokens int `json:"cached_write_tokens,omitzero"`


litellm named it cache_creation_input_tokens
https://docs.litellm.ai/docs/completion/prompt_caching

AWS calls it cacheWriteInputTokens and Anthropic calls it cache_creation_input_tokens. Can opt to name it creation instead of write

I think either cache_creation_input_tokens or cache_write_input_tokens is fine.

updated to use cache creation everywhere

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

codecov-commenter · 2026-01-02T23:36:38Z

Codecov Report

❌ Patch coverage is 90.00000% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.04%. Comparing base (51478b0) to head (e48c555).

Files with missing lines	Patch %	Lines
internal/translator/openai_awsbedrock.go	69.23%	2 Missing and 2 partials ⚠️
internal/translator/openai_completions.go	0.00%	2 Missing ⚠️
internal/translator/openai_openai.go	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1719      +/-   ##
==========================================
+ Coverage   81.01%   81.04%   +0.03%     
==========================================
  Files         147      147              
  Lines       13288    13341      +53     
==========================================
+ Hits        10765    10812      +47     
- Misses       1872     1876       +4     
- Partials      651      653       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

yuzisun · 2026-01-03T00:01:33Z

internal/translator/anthropic_anthropic_test.go

 	translator := NewAnthropicToAnthropicTranslator("", "")
 	require.NotNil(t, translator)
-	const responseBody = `{"model":"claude-sonnet-4-5-20250929","id":"msg_01J5gW6Sffiem6avXSAooZZw","type":"message","role":"assistant","content":[{"type":"text","text":"Hi! 👋 How can I help you today?"}],"stop_reason":"end_turn","stop_sequence":null,"usage":{"input_tokens":9,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"cache_creation":{"ephemeral_5m_input_tokens":0,"ephemeral_1h_input_tokens":0},"output_tokens":16,"service_tier":"standard"}}`
+	const responseBody = `{"model":"claude-sonnet-4-5-20250929","id":"msg_01J5gW6Sffiem6avXSAooZZw","type":"message","role":"assistant","content":[{"type":"text","text":"Hi! 👋 How can I help you today?"}],"stop_reason":"end_turn","stop_sequence":null,"usage":{"input_tokens":9,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"cached_creation":{"ephemeral_5m_input_tokens":0,"ephemeral_1h_input_tokens":0},"output_tokens":16,"service_tier":"standard"}}`


it should be cache_creation based on the anthropic doc
https://platform.claude.com/docs/en/build-with-claude/prompt-caching#1-hour-cache-duration

yuzisun · 2026-01-03T00:06:30Z

internal/apischema/openai/openai.go

 	// Cached tokens present in the prompt.
 	CachedTokens int `json:"cached_tokens,omitzero"`
+	// Tokens written to the cache.
+	CachedCreationTokens int `json:"cached_creation_input_tokens,omitzero"`


I think you are trying to align with the name cached_tokens but cache_creation_input_tokens reads better.

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

**Description** Anthropic cache writes cost different from cache reads. Cost calculation should be updated to account for writes vs reads. Adding a new cost type. Updated similarly for AWS. https://platform.claude.com/docs/en/build-with-claude/prompt-caching https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_TokenUsage.html Vertex AI and OpenAI themselves do not support cache write response so cache writes will be set to 0. **Changes** Dynamic metadata will now include cache writes. Separates cache reads and writes. User is returned new usage cached writes. Updated tests to match -- hopefully I caught them all. Updated wherever I saw CachedInputTokens. --------- Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

aabchoo added 3 commits January 2, 2026 13:46

add cache writes

0209b06

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

found more cache needing update

985d4a9

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

cache for aws;

6ed2f43

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

aabchoo marked this pull request as ready for review January 2, 2026 20:18

aabchoo requested a review from a team as a code owner January 2, 2026 20:18

dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Jan 2, 2026

yuzisun reviewed Jan 2, 2026

View reviewed changes

internal/translator/anthropic_anthropic_test.go Outdated Show resolved Hide resolved

fix cel

21dc66c

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

yuzisun reviewed Jan 2, 2026

View reviewed changes

aabchoo added 14 commits January 2, 2026 15:27

fix -1

4c1dc96

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

find+replace cache write with cache creation

c58a29a

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

missed a few

72408dd

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

fix typo

ea12f8e

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

make apigen

ff67164

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

fix cached -> cache for anthropic

6f5087a

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

missing a few typos

d8a318f

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

update typo

f2a3cbb

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

last try fixing typos

d027658

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

update anthropic

35b06ed

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

fix some tests

bfd0054

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

fix more tests

ac126df

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

fixed

a1c4f48

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

negative

337bd11

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

yuzisun reviewed Jan 3, 2026

View reviewed changes

aabchoo added 2 commits January 2, 2026 19:16

updated cached creation -> cache creation

da31c22

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

update missing

e48c555

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

yuzisun approved these changes Jan 3, 2026

View reviewed changes

yuzisun merged commit f92f8f4 into envoyproxy:main Jan 3, 2026
51 of 53 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extproc: add cache writes#1719

extproc: add cache writes#1719
yuzisun merged 20 commits intoenvoyproxy:mainfrom
aabchoo:aaron/cache-writes

aabchoo commented Jan 2, 2026

Uh oh!

Uh oh!

yuzisun Jan 2, 2026

Uh oh!

aabchoo Jan 2, 2026

Uh oh!

yuzisun Jan 2, 2026

Uh oh!

aabchoo Jan 2, 2026

Uh oh!

codecov-commenter commented Jan 2, 2026 •

edited

Loading

Uh oh!

yuzisun Jan 3, 2026

Uh oh!

yuzisun Jan 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

aabchoo commented Jan 2, 2026

Uh oh!

Uh oh!

yuzisun Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

aabchoo Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

yuzisun Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

aabchoo Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

yuzisun Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

yuzisun Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Jan 2, 2026 •

edited

Loading