fix: strip billing header from Anthropic system prompt for prefix cache by janhilgard · Pull Request #277 · waybarrios/vllm-mlx

janhilgard · 2026-04-10T22:05:52Z

Summary

Strip x-anthropic-billing-header from system prompt before tokenization to enable prefix cache reuse across turn boundaries

Problem

Claude Code injects a billing/tracking header into the Anthropic Messages API system prompt:

x-anthropic-billing-header: cc_version=2.1.100.7b4; cc_entrypoint=cli; cch=eb6c6;
You are Claude Code...

The cch= hash changes with every request, causing token sequences to diverge at position ~40. This completely defeats prefix cache — every request requires full prefill of 60K+ tokens (~50 seconds on Gemma 4 26B-A4B).

Fix

One-line regex strip in anthropic_adapter.py before the system text is passed to the chat template:

system_text = re.sub(r"x-anthropic-billing-header:[^\n]*\n?", "", system_text)

Result

Metric	Before	After
Prefix match	40/60K tokens (0.07%)	60K/60K tokens (99.9%)
Time per request	50s	3.65s
Throughput	1.9 tok/s	42.1 tok/s
Speedup	—	13.7x

Test plan

Consecutive Claude Code Anthropic requests match 60K/60K tokens (prefix hit)
First request (cold cache): full prefill ~49s
Second request (warm cache): 3.65s (only 34 new tokens to process)
Non-Anthropic (OpenAI) requests unaffected
Anthropic requests without billing header unaffected

🤖 Generated with Claude Code

Claude Code injects `x-anthropic-billing-header: cc_version=...; cch=HASH;` into the system prompt. The `cch=` hash changes with every request, causing token sequences to diverge at position ~40 and completely defeating prefix cache reuse across turn boundaries. Strip this header before tokenization so consecutive requests from the same conversation share 99%+ of their token prefix. Result: 50s → 3.65s per request (13.7x speedup) on Gemma 4 26B-A4B with 60K-token prompts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

waybarrios merged commit b9f2a5f into waybarrios:main Apr 11, 2026
7 checks passed

janhilgard mentioned this pull request Apr 11, 2026

feat: add prompt prefix caching to SimpleEngine #90

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: strip billing header from Anthropic system prompt for prefix cache#277

fix: strip billing header from Anthropic system prompt for prefix cache#277
waybarrios merged 1 commit intowaybarrios:mainfrom
janhilgard:fix/anthropic-prefix-cache

janhilgard commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

janhilgard commented Apr 10, 2026

Summary

Problem

Fix

Result

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants