fix(bedrock,azure_ai): strip scope from cache_control for Anthropic messages by Sameerlite · Pull Request #22867 · BerriAI/litellm

Sameerlite · 2026-03-05T05:19:53Z

Summary

Bedrock and Azure AI Foundry's Anthropic endpoints do not support the scope field in cache_control (e.g., "global" for cross-request caching). Requests with cache_control: { type: "ephemeral", scope: "global" } were failing with:

system.0.cache_control.ephemeral.scope: Extra inputs are not permitted

Changes

Bedrock

Extend _remove_ttl_from_cache_control to also remove scope from cache_control
Process both system and messages content blocks (previously only messages)
Add test for scope removal

Azure AI

Add _remove_scope_from_cache_control and override transform_anthropic_messages_request
Strip scope from cache_control in both system and messages before sending
Add test for scope removal

Testing

pytest tests/test_litellm/llms/bedrock/messages/invoke_transformations/test_anthropic_claude3_transformation.py -v
pytest tests/test_litellm/llms/azure_ai/claude/test_azure_anthropic_messages_transformation.py -v

Made with Cursor

Bedrock does not support the scope field in cache_control (e.g. 'global' for cross-request caching). Only type and ttl are supported per AWS docs. - Remove scope from cache_control in both system and messages - Extend _remove_ttl_from_cache_control to process system blocks - Add test for scope removal Made-with: Cursor

Azure AI Foundry's Anthropic endpoint does not support the scope field in cache_control. Strip it from both system and messages before sending. Made-with: Cursor

vercel · 2026-03-05T05:19:58Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Mar 5, 2026 5:21am

greptile-apps · 2026-03-05T05:22:34Z

Greptile Summary

Strips the unsupported scope field from cache_control before sending requests to Bedrock and Azure AI Foundry Anthropic endpoints, fixing 400 errors when users specify cache_control: { type: "ephemeral", scope: "global" }. The Bedrock change also extends system-block processing (previously only messages were processed). Both providers add dedicated tests.

Bedrock: _remove_ttl_from_cache_control refactored to also remove scope and process system content blocks (not just messages). Supports TTL retention for Claude 4.5 models.
Azure AI: New _remove_scope_from_cache_control method and transform_anthropic_messages_request override to strip scope from both system and messages.
The system/messages traversal logic is duplicated between the two providers — could be consolidated into the base class.
The Bedrock method name _remove_ttl_from_cache_control is now stale since it also removes scope and processes system blocks.

Confidence Score: 4/5

This PR is safe to merge — it only strips unsupported fields before sending to provider APIs, with no behavioral changes for supported configurations.
The changes are straightforward field stripping with correct logic, tests covering both system and message blocks, and no backwards-incompatible behavior. Minor naming and duplication concerns don't affect correctness.
No files require special attention — both implementation files have minor style concerns but no functional issues.

Important Files Changed

Filename	Overview
litellm/llms/bedrock/messages/invoke_transformations/anthropic_claude3_transformation.py	Extends `_remove_ttl_from_cache_control` to also strip `scope` and process `system` blocks; logic is correct but method name is now misleading.
litellm/llms/azure_ai/anthropic/messages_transformation.py	Adds `_remove_scope_from_cache_control` and overrides `transform_anthropic_messages_request` to strip scope; logic is correct but duplicates Bedrock traversal pattern.
tests/test_litellm/llms/bedrock/messages/invoke_transformations/test_anthropic_claude3_transformation.py	Adds mock-only test for scope removal from both system and messages blocks; no network calls.
tests/test_litellm/llms/azure_ai/claude/test_azure_anthropic_messages_transformation.py	Adds mock-only test for scope removal via `transform_anthropic_messages_request` end-to-end; no network calls.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["User sends request with<br/>cache_control: {type: ephemeral, scope: global}"] --> B{Provider?}
    B -->|Bedrock| C["_remove_ttl_from_cache_control()"]
    B -->|Azure AI| D["_remove_scope_from_cache_control()"]
    C --> E["Strip 'scope' from system blocks"]
    C --> F["Strip 'scope' from message blocks"]
    C --> G["Strip 'ttl' (non-4.5 models)"]
    D --> H["Strip 'scope' from system blocks"]
    D --> I["Strip 'scope' from message blocks"]
    E & F & G --> J["Send sanitized request to Bedrock API"]
    H & I --> K["Send sanitized request to Azure AI API"]

_{Last reviewed commit: 482bc93}

Comments Outside Diff (1)

undefined, line undefined (link)

Method name is now misleading

This method now removes both ttl and scope from cache_control, and processes both system and messages blocks — but its name _remove_ttl_from_cache_control only references TTL. Consider renaming to something like _sanitize_cache_control to accurately reflect the broader scope of what it does. The stale name will confuse future contributors who need to understand what fields are being stripped.

Note: this would also require updating the call site at line 409 and the test references.

greptile-apps · 2026-03-05T05:22:40Z

litellm/llms/azure_ai/anthropic/messages_transformation.py

+    def _remove_scope_from_cache_control(
+        self, anthropic_messages_request: Dict
+    ) -> None:
+        """
+        Remove `scope` field from cache_control for Azure AI Foundry.
+
+        Azure AI Foundry's Anthropic endpoint does not support the `scope` field
+        (e.g., "global" for cross-request caching). Only `type` and `ttl` are supported.
+
+        Processes both `system` and `messages` content blocks.
+        """
+        def _sanitize(cache_control: Any) -> None:
+            if isinstance(cache_control, dict):
+                cache_control.pop("scope", None)
+
+        def _process_content_list(content: list) -> None:
+            for item in content:
+                if isinstance(item, dict) and "cache_control" in item:
+                    _sanitize(item["cache_control"])
+
+        if "system" in anthropic_messages_request:
+            system = anthropic_messages_request["system"]
+            if isinstance(system, list):
+                _process_content_list(system)
+
+        if "messages" in anthropic_messages_request:
+            for message in anthropic_messages_request["messages"]:
+                if isinstance(message, dict) and "content" in message:
+                    content = message["content"]
+                    if isinstance(content, list):
+                        _process_content_list(content)


Duplicated sanitization logic across providers

The system/messages iteration pattern in _remove_scope_from_cache_control is nearly identical to the one in Bedrock's _remove_ttl_from_cache_control. Consider extracting the shared traversal logic (iterating over system + message content blocks and calling a sanitizer on each cache_control dict) into a shared utility in the base class BaseAnthropicMessagesConfig, with each provider only supplying the sanitization function. This would reduce duplication and make it easier to add future field stripping in one place.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

…nd Azure AI but omitted Vertex AI. This applies the same pattern to VertexAIPartnerModelsAnthropicMessagesConfig."

…opicMessagesConfig but missed VertexAIPartnerModelsAnthropicMessagesConfi Rather than duplicating the method again, moved it up to the base AnthropicMessagesConfig so all providers inherit it, and removed the now-redundant copy from the Azure AI subclass.

…g blocks (issue BerriAI#23178) When Claude extended-thinking is enabled on Bedrock the converse API emits two content-block types in the same response: contentBlockIndex=0 → reasoning / thinking block contentBlockIndex=1 → text block The existing converse_chunk_parser already hardcodes StreamingChoices(index=0) for every event (tool-calls fix from BerriAI#22867), so the normalisation is already in place for the converse path. The AmazonAnthropicClaudeStreamDecoder (invoke/anthropic path) likewise always sets index=0 via AnthropicModelResponseIterator.chunk_parser. This commit adds explicit regression tests for both paths covering the full thinking-block event sequence (start, delta, signature, stop) and the subsequent text-block events that arrive on contentBlockIndex=1, ensuring choices[0].index is always 0 and OpenAI-compatible clients do not crash. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

#23183) * PR #22867 added _remove_scope_from_cache_control for Bedrock and Azure AI but omitted Vertex AI. This applies the same pattern to VertexAIPartnerModelsAnthropicMessagesConfig." * PR #22867 added _remove_scope_from_cache_control for Bedrock and Azure AI but omitted Vertex AI. This applies the same pattern to VertexAIPartnerModelsAnthropicMessagesConfig." * PR #22867 added _remove_scope_from_cache_control to AzureAnthropicMessagesConfig but missed VertexAIPartnerModelsAnthropicMessagesConfi Rather than duplicating the method again, moved it up to the base AnthropicMessagesConfig so all providers inherit it, and removed the now-redundant copy from the Azure AI subclass. * PR #22867 added _remove_scope_from_cache_control to AzureAnthropicMessagesConfig but missed VertexAIPartnerModelsAnthropicMessagesConfi Rather than duplicating the method again, moved it up to the base AnthropicMessagesConfig so all providers inherit it, and removed the now-redundant copy from the Azure AI subclass. --------- Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>

…der (#23663) * fix: forward extra_headers to HuggingFace embedding calls (#23525) Fixes #23502 The huggingface_embed.embedding() call was not receiving the headers parameter, causing extra_headers (e.g., X-HF-Bill-To) to be silently dropped. Other providers (openrouter, vercel_ai_gateway, bedrock) already pass headers correctly. This fix adds headers=headers to match the behavior of other providers. Co-authored-by: Jah-yee <sparklab@outlook.com> * fix: add getPopupContainer to Select components in fallback modal to fix z-index issue (#23516) The model dropdown menus in the Add Fallbacks modal were rendering behind the modal overlay because Ant Design portals Select dropdowns to document.body by default. By setting getPopupContainer to attach the dropdown to its parent element, the dropdown inherits the modal's stacking context and renders above the modal. Fixes #17895 * PR #22867 added _remove_scope_from_cache_control for Bedrock and Azur… (#23183) * PR #22867 added _remove_scope_from_cache_control for Bedrock and Azure AI but omitted Vertex AI. This applies the same pattern to VertexAIPartnerModelsAnthropicMessagesConfig." * PR #22867 added _remove_scope_from_cache_control for Bedrock and Azure AI but omitted Vertex AI. This applies the same pattern to VertexAIPartnerModelsAnthropicMessagesConfig." * PR #22867 added _remove_scope_from_cache_control to AzureAnthropicMessagesConfig but missed VertexAIPartnerModelsAnthropicMessagesConfi Rather than duplicating the method again, moved it up to the base AnthropicMessagesConfig so all providers inherit it, and removed the now-redundant copy from the Azure AI subclass. * PR #22867 added _remove_scope_from_cache_control to AzureAnthropicMessagesConfig but missed VertexAIPartnerModelsAnthropicMessagesConfi Rather than duplicating the method again, moved it up to the base AnthropicMessagesConfig so all providers inherit it, and removed the now-redundant copy from the Azure AI subclass. --------- Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * fix: auto-fill reasoning_content for moonshot kimi reasoning models in multi-turn tool calling (#23580) * Handle response.failed, response.incomplete, and response.cancelled (#23492) * Handle response.failed, response.incomplete, and response.cancelled terminal events in background streaming Previously the background streaming task only handled response.completed and hardcoded the final status to "completed". This missed three other terminal event types from the OpenAI streaming spec, causing failed/incomplete/cancelled responses to be incorrectly marked as completed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Committed-By-Agent: claude * Remove unused terminal_response_data variable Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Committed-By-Agent: claude * Address code review: derive fallback status from event type, rewrite tests as integration tests 1. Replace hardcoded "completed" fallback in response_data.get("status") with _event_to_status lookup so that response.incomplete and response.cancelled events get the correct fallback if the response body ever omits the status field. 2. Replace duplicated-logic unit tests with integration tests that exercise background_streaming_task directly using mocked streaming responses and assert on the final update_state call arguments. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Committed-By-Agent: claude * Remove dead mock_processor and unused mock_response parameter from test helper Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Committed-By-Agent: claude * Remove FastAPI and UserAPIKeyAuth imports from test file These types were only used as Mock(spec=...) arguments. Drop the spec constraints and remove the top-level imports to avoid pulling FastAPI into test files outside litellm/proxy/. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Committed-By-Agent: claude * Log warning when streaming response has no body_iterator If base_process_llm_request returns a non-streaming response (no body_iterator), log a warning since this likely indicates a misconfiguration or provider error rather than a successful completion. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Committed-By-Agent: claude --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * fix(security): bump tar to 7.5.11 and tornado to 6.5.5 (#23602) * fix(security): bump tar to 7.5.11 and tornado to 6.5.5 - tar >=7.5.11: fixes CVE-2026-31802 (HIGH) in node-pkg - tornado >=6.5.5: fixes CVE-2026-31958 (HIGH) and GHSA-78cv-mqj4-43f7 (MEDIUM) in python-pkg Addresses vulnerabilities found in ghcr.io/berriai/litellm:main-v1.82.0-stable Trivy scan. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: document tar override is enforced via Dockerfile, not npm * fix: revert invalid JSON comment in package.json tar override --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * [Feat] - Ishaan main merge branch (#23596) * fix(bedrock): respect s3_region_name for batch file uploads (#23569) * fix(bedrock): respect s3_region_name for batch file uploads (GovCloud fix) * fix: s3_region_name always wins over aws_region_name for S3 signing (Greptile feedback) * fix: _filter_headers_for_aws_signature - Bedrock KB (#23571) * fix: _filter_headers_for_aws_signature * fix: filter None header values in all post-signing re-merge paths Addresses Greptile feedback: None-valued headers were being filtered during SigV4 signing but re-merged back into the final headers dict afterward, which would cause downstream HTTP client failures. Made-with: Cursor * feat(router): tag_regex routing — route by User-Agent regex without per-developer tag config (#23594) * feat(router): add tag_regex support for header-based routing Adds a new `tag_regex` field to litellm_params that lets operators route requests based on regex patterns matched against request headers — primarily User-Agent — without requiring per-developer tag configuration. Use case: route all Claude Code traffic (User-Agent: claude-code/x.y.z) to a dedicated deployment by setting: tag_regex: - "^User-Agent: claude-code\\/" in the deployment's litellm_params. Works alongside existing `tags` routing; exact tag match takes precedence over regex match. Unmatched requests fall through to deployments tagged `default`. The matched deployment, pattern, and user_agent are recorded in `metadata["tag_routing"]` so they flow through to SpendLogs automatically. * fix(tag_regex): address backwards-compat, metadata overwrite, and warning noise Three issues from code review: 1. Backwards-compat: `has_tag_filter` was widened to activate on any non-empty User-Agent, which would raise ValueError for existing deployments using plain tags without a `default` fallback. Fix: only activate header-based regex filtering when at least one candidate deployment has `tag_regex` configured. 2. Metadata overwrite: `metadata["tag_routing"]` was overwritten for every matching deployment in the loop, leaving inaccurate provenance when multiple deployments match. Fix: write only for the first match. 3. Warning noise: an invalid regex pattern logged one warning per header string rather than once per pattern. Fix: compile first (catching re.error once), then iterate over header strings. Also adds two new tests covering these cases, and adds docs page for tag_regex routing with a Claude Code walk-through. * refactor(tag_regex): remove unnecessary _healthy_list copy * docs: merge tag_regex section into tag_routing.md, remove standalone page - Add ## Regex-based tag routing (tag_regex) section to existing tag_routing.md instead of a separate page - Remove tag_regex_routing.md standalone doc (odd UX to have a separate page for a sub-feature) - Remove proxy/tag_regex_routing from sidebars.js - Add match_any=False debug warning in tag_based_routing.py when regex routing fires under strict mode (regex always uses OR semantics) * fix(tag_regex): address greptile review - security docs, strict-mode enforcement, validation order - Strengthen security note in tag_routing.md: explicitly state User-Agent is client-supplied and can be set to any value; frame tag_regex as a traffic classification hint, not an access-control mechanism - Move tag_regex startup validation before _add_deployment() so an invalid pattern never leaves partial router state - Enforce match_any=False strict-tag policy: when a deployment has both tags and tag_regex and the strict tag check fails, skip the regex fallback rather than silently bypassing the operator's intent - Extract per-deployment match logic into _match_deployment() helper to keep get_deployments_for_tag() readable - Add two new tests: strict-mode blocks regex fallback, regex-only deployment still matches under match_any=False * fix(ci): apply Black formatting to 14 files and stabilize flaky caplog tests - Run Black formatter on 14 files that were failing the lint check - Replace caplog-based assertions in TestAliasConflicts with unittest.mock.patch on verbose_logger.warning for xdist compatibility - The caplog fixture can produce empty text in pytest-xdist workers in certain CI environments, causing flaky test failures Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: tiktoken cache nonroot offline (#23498) * fix: restore offline tiktoken cache for non-root envs Made-with: Cursor * chore: mkdir for custom tiktoken cache dir Made-with: Cursor * test: patch tiktoken.get_encoding in custom-dir test to avoid network Made-with: Cursor * test: clear CUSTOM_TIKTOKEN_CACHE_DIR in helper for test isolation Made-with: Cursor * test: restore default_encoding module state after custom-dir test Made-with: Cursor * fix: normalize content_filtered finish_reason (#23564) Map provider finish_reason "content_filtered" to the OpenAI-compatible "content_filter" and extend core_helpers tests to cover this case. Made-with: Cursor * fix: Fixes #23185 (#23647) * fix: merge annotations from all streaming chunks in stream_chunk_builder Previously, stream_chunk_builder only took annotations from the first chunk that contained them, losing any annotations from later chunks. This is a problem because providers like Gemini/Vertex AI send grounding metadata (converted to annotations) in the final streaming chunk, while other providers may spread annotations across multiple chunks. Changes: - Collect and merge annotations from ALL annotation-bearing chunks instead of only using the first one --------- Co-authored-by: RoomWithOutRoof <166608075+Jah-yee@users.noreply.github.com> Co-authored-by: Jah-yee <sparklab@outlook.com> Co-authored-by: Ethan T. <ethanchang32@gmail.com> Co-authored-by: Awais Qureshi <awais.qureshi@arbisoft.com> Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> Co-authored-by: Pradyumna Yadav <pradyumna.aky@gmail.com> Co-authored-by: xianzongxie-stripe <87151258+xianzongxie-stripe@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Joe Reyna <joseph.reyna@gmail.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> Co-authored-by: milan-berri <milan@berri.ai>

Sameerlite added 2 commits March 5, 2026 10:49

fix(azure_ai): strip scope from cache_control for Anthropic messages

482bc93

Azure AI Foundry's Anthropic endpoint does not support the scope field in cache_control. Strip it from both system and messages before sending. Made-with: Cursor

vercel bot deployed to Preview March 5, 2026 05:21 View deployment

greptile-apps bot reviewed Mar 5, 2026

View reviewed changes

Sameerlite merged commit 0620f99 into main Mar 5, 2026
48 of 100 checks passed

fatedier mentioned this pull request Mar 9, 2026

fix(vertex_ai): strip scope from cache_control for Anthropic messages passthrough #23149

Closed

awais786 added a commit to awais786/litellm that referenced this pull request Mar 9, 2026

PR BerriAI#22867 added _remove_scope_from_cache_control for Bedrock a…

c5cdc76

…nd Azure AI but omitted Vertex AI. This applies the same pattern to VertexAIPartnerModelsAnthropicMessagesConfig."

awais786 added a commit to awais786/litellm that referenced this pull request Mar 9, 2026

PR BerriAI#22867 added _remove_scope_from_cache_control for Bedrock a…

42f60a4

…nd Azure AI but omitted Vertex AI. This applies the same pattern to VertexAIPartnerModelsAnthropicMessagesConfig."

greptile-apps bot mentioned this pull request Mar 9, 2026

PR #22867 added _remove_scope_from_cache_control for Bedrock and Azur… #23183

Merged

7 tasks

awais786 mentioned this pull request Mar 10, 2026

fix(bedrock): normalise streaming choice index=0 for extended-thinkin… #23248

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(bedrock,azure_ai): strip scope from cache_control for Anthropic messages#22867

fix(bedrock,azure_ai): strip scope from cache_control for Anthropic messages#22867
Sameerlite merged 2 commits intomainfrom
litellm_bedrock-azure-cache-control-scope

Sameerlite commented Mar 5, 2026

Uh oh!

vercel bot commented Mar 5, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 5, 2026 •

edited

Loading

Important Files Changed

Comments Outside Diff (1)

Uh oh!

greptile-apps bot Mar 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Sameerlite commented Mar 5, 2026

Summary

Changes

Bedrock

Azure AI

Testing

Uh oh!

vercel bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

greptile-apps bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel bot commented Mar 5, 2026 •

edited

Loading

greptile-apps bot commented Mar 5, 2026 •

edited

Loading