Skip to content

fix: strip billing header from Anthropic system prompt for prefix cache#277

Merged
waybarrios merged 1 commit intowaybarrios:mainfrom
janhilgard:fix/anthropic-prefix-cache
Apr 11, 2026
Merged

fix: strip billing header from Anthropic system prompt for prefix cache#277
waybarrios merged 1 commit intowaybarrios:mainfrom
janhilgard:fix/anthropic-prefix-cache

Conversation

@janhilgard
Copy link
Copy Markdown
Collaborator

Summary

  • Strip x-anthropic-billing-header from system prompt before tokenization to enable prefix cache reuse across turn boundaries

Problem

Claude Code injects a billing/tracking header into the Anthropic Messages API system prompt:

x-anthropic-billing-header: cc_version=2.1.100.7b4; cc_entrypoint=cli; cch=eb6c6;
You are Claude Code...

The cch= hash changes with every request, causing token sequences to diverge at position ~40. This completely defeats prefix cache — every request requires full prefill of 60K+ tokens (~50 seconds on Gemma 4 26B-A4B).

Fix

One-line regex strip in anthropic_adapter.py before the system text is passed to the chat template:

system_text = re.sub(r"x-anthropic-billing-header:[^\n]*\n?", "", system_text)

Result

Metric Before After
Prefix match 40/60K tokens (0.07%) 60K/60K tokens (99.9%)
Time per request 50s 3.65s
Throughput 1.9 tok/s 42.1 tok/s
Speedup 13.7x

Test plan

  • Consecutive Claude Code Anthropic requests match 60K/60K tokens (prefix hit)
  • First request (cold cache): full prefill ~49s
  • Second request (warm cache): 3.65s (only 34 new tokens to process)
  • Non-Anthropic (OpenAI) requests unaffected
  • Anthropic requests without billing header unaffected

🤖 Generated with Claude Code

Claude Code injects `x-anthropic-billing-header: cc_version=...; cch=HASH;`
into the system prompt. The `cch=` hash changes with every request, causing
token sequences to diverge at position ~40 and completely defeating prefix
cache reuse across turn boundaries.

Strip this header before tokenization so consecutive requests from the same
conversation share 99%+ of their token prefix.

Result: 50s → 3.65s per request (13.7x speedup) on Gemma 4 26B-A4B with
60K-token prompts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@waybarrios waybarrios merged commit b9f2a5f into waybarrios:main Apr 11, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants