fix(bedrock/thinking/websearch): beta headers, thinking fixes, API key loading, guardrails null safety#23523
fix(bedrock/thinking/websearch): beta headers, thinking fixes, API key loading, guardrails null safety#23523Quentin-M wants to merge 7 commits intoBerriAI:mainfrom
Conversation
AWS Bedrock does not recognize anthropic.claude-opus-4-6-v1:0 as a valid model identifier. Unlike other Claude models, Opus 4.6 requires the model ID without the :0 version suffix: anthropic.claude-opus-4-6-v1. Cherry-picked from search_tools_fix (efec746a17), adapted since upstream PR BerriAI#20564 already fixed the JSON pricing keys. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Upstream only checks for type="enabled" but Opus 4.6 uses type="adaptive". Without this fix, max_tokens auto-adjustment doesn't trigger for adaptive thinking, causing API errors.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR bundles several production fixes for AWS Bedrock + Claude Opus 4.5/4.6 deployments: centralised beta-header filtering via a new
Confidence Score: 3/5
|
| Filename | Overview |
|---|---|
| litellm/utils.py | Adds _message_has_thinking_blocks helper and last_assistant_message_has_no_thinking_blocks, but the new function has an internal guard that contradicts the outer condition, making the intended fix for "Expected thinking or redacted_thinking, but found text" a no-op. |
| litellm/llms/bedrock/beta_headers_config.py | New centralized whitelist-based beta header filter for all three Bedrock API paths; well-structured with version and family restrictions, thorough doc comments, and fixes noted in previous review rounds have been addressed. |
| litellm/llms/anthropic/chat/transformation.py | Adds last_assistant_message_has_no_thinking_blocks to the thinking-drop condition, but the new OR branch can never trigger due to the contradictory internal/outer guards (see utils.py comment). |
| litellm/llms/bedrock/chat/converse_transformation.py | Replaces ad-hoc blacklist filtering with the new centralized BedrockBetaHeaderFilter, strips context_management from the request body, and applies the same (dead-code) thinking-drop logic as the Anthropic path. |
| litellm/llms/bedrock/messages/invoke_transformations/anthropic_claude3_transformation.py | Removes hardcoded model-specific helpers, now correctly adds both tool-search-tool-2025-10-19 and tool-examples-2025-10-29 together before handing off to the centralized filter, and strips context_management from the body. |
| litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py | Straightforward null-safety fix adding or [] fallback after .get(...) calls to prevent TypeError: 'NoneType' is not iterable when AWS returns explicit null for policy fields. |
| litellm/integrations/websearch_interception/handler.py | Correctly extracts api_key and api_base from the search tool's litellm_params and passes them to litellm.asearch, fixing the TAVILY_API_KEY is not set error when keys are in YAML config rather than environment variables. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Incoming Request with thinking param] --> B{messages is not None?}
B -- No --> Z[Keep thinking param]
B -- Yes --> C{last_assistant_with_tool_calls\nhas_no_thinking_blocks?}
C -- True --> D{not any_assistant_message\n_has_thinking_blocks?}
C -- False --> E{last_assistant_message\n_has_no_thinking_blocks?}
E -- True: only when some\nmessages have thinking --> D
E -- False --> Z
D -- True: no prior\nthinking exists --> F[Drop thinking param\n✅ works for tool_calls path]
D -- False: prior thinking\nexists → BLOCKS DROP --> G["❌ last_assistant_message\npath is dead code\n(inner guard requires prior thinking;\nouter guard forbids prior thinking)"]
G --> Z
style G fill:#ffcccc,stroke:#cc0000
style F fill:#ccffcc,stroke:#009900
Comments Outside Diff (1)
-
litellm/utils.py, line 7677-7720 (link)New code path is dead — contradictory guards make fix a no-op
last_assistant_message_has_no_thinking_blockshas an internal early-return that makes it impossible to ever trigger the thinking-drop at the call sites inanthropic/chat/transformation.pyandbedrock/chat/converse_transformation.py.Trace through the two scenarios the PR targets:
Scenario A — thinking never used before (the stated fix):
messages = [user, assistant("Hello!"), user, assistant("Sure!")] thinking = {"type": "enabled"} # first-time enableany_assistant_message_has_thinking_blocks(messages)→False- The internal guard at line 7696 fires:
if not any_assistant_message_has_thinking_blocks(messages): return False last_assistant_message_has_no_thinking_blocksreturns False- Outer condition:
(False or False) and not False= False - Thinking NOT dropped → error "Expected thinking or redacted_thinking, but found text" still raised ❌
Scenario B — thinking was used before, last message has no blocks:
messages = [user, assistant[thinking+text], user, assistant("text")]any_assistant_message_has_thinking_blocks(messages)→True(earlier message has thinking)- Internal guard passes;
last_assistant_message_has_no_thinking_blocksreturns True - Outer condition:
(... or True) and not True= False (outer guard kills it) - Thinking NOT dropped ❌ (but dropping here would also break things — it would trigger "When thinking is disabled, an assistant message cannot contain thinking" for the earlier messages with thinking blocks)
The internal guard (
if not any_assistant_message_has_thinking_blocks(messages): return False) and the outer guard (and not any_assistant_message_has_thinking_blocks(messages)) are mutually exclusive — when one is satisfied the other is not — making the newor last_assistant_message_has_no_thinking_blocks(messages)branch permanently unreachable.The tests exercise the function in isolation and correctly document the expected return value, but no test exercises the full compound condition through
transform_request/_transform_request_helper. A minimal end-to-end test would catch this:# Scenario A: enable thinking for the first time with a text-only history optional_params = {"thinking": {"type": "enabled"}} messages = [ {"role": "user", "content": "hi"}, {"role": "assistant", "content": "hello"}, ] # After transformation, thinking should be dropped (with modify_params=True) # Currently it is NOT dropped, so the API call will fail.
The likely fix is to remove the internal early-return inside
last_assistant_message_has_no_thinking_blocksand let the outerand not any_assistant_message_has_thinking_blocks(messages)guard handle both branches uniformly, since that guard already prevents the dangerous Scenario B case.
Last reviewed commit: bdf1260
| BETA_HEADER_MINIMUM_VERSION: Dict[str, float] = { | ||
| # Extended thinking features require Claude 4.0+ | ||
| "interleaved-thinking-2025-05-14": 4.0, | ||
| "dev-full-thinking-2025-05-14": 4.0, | ||
| # 1M context requires Claude 4.0+ | ||
| "context-1m-2025-08-07": 4.0, | ||
| # Context management requires Claude 4.5+ | ||
| "context-management-2025-06-27": 4.5, | ||
| # Effort parameter requires Claude 4.5+ (but only Opus 4.5, see family restrictions) | ||
| "effort-2025-11-24": 4.5, | ||
| # Tool search requires Claude 4.5+ | ||
| "tool-search-tool-2025-10-19": 4.5, | ||
| "tool-examples-2025-10-29": 4.5, | ||
| } | ||
|
|
||
| # Model family restrictions for specific beta headers | ||
| # Only enforced if the version requirement is met | ||
| # Example: "effort-2025-11-24" requires Claude 4.5+ AND Opus family | ||
| BETA_HEADER_FAMILY_RESTRICTIONS: Dict[str, List[str]] = { | ||
| "effort-2025-11-24": ["opus"], # Only Opus 4.5+ supports effort | ||
| # Tool search works on Opus 4.5+ and Sonnet 4.5+, but not Haiku | ||
| "tool-search-tool-2025-10-19": ["opus", "sonnet"], | ||
| "tool-examples-2025-10-29": ["opus", "sonnet"], | ||
| } |
There was a problem hiding this comment.
Hardcoded model-specific flags violate codebase convention
BETA_HEADER_MINIMUM_VERSION and BETA_HEADER_FAMILY_RESTRICTIONS encode model capability knowledge directly in code. Per the repo's own convention (see the custom instructions), model-specific capability flags should live in model_prices_and_context_window.json and be accessed via get_model_info / supports_* helpers — not hardcoded as Python dicts. This is exactly the same pattern the convention calls out as "BAD":
# BAD example from the convention
"interleaved-thinking-2025-05-14": 4.0, # ← hardcoded version knowledge
"effort-2025-11-24": ["opus"], # ← hardcoded family knowledgeThe practical consequence is that each time Anthropic/AWS expands a beta flag to a new model generation, someone needs to update this file — the same maintenance burden the convention explicitly wants to avoid.
The intended good approach would be to consult get_model_info or a supports_* function (e.g., supports_reasoning, supports_computer_use) derived from model_prices_and_context_window.json, making the filter self-updating as model entries are added.
Rule Used: What: Do not hardcode model-specific flags in the ... (source)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Respectfully pushing back on this one. model_prices_and_context_window.json has no per-beta-header capability fields — it would require updating 100+ model entries every time AWS adds support for a new header. The version-based approach in BETA_HEADER_MINIMUM_VERSION is intentionally more maintainable: new Claude models require zero code changes, and only new capability restrictions (not expansions) need an entry. This is exactly the future-proof design called out in the module docstring.
litellm/llms/bedrock/chat/invoke_transformations/anthropic_claude3_transformation.py
Outdated
Show resolved
Hide resolved
619d2e5 to
acfcba4
Compare
acfcba4 to
dee50f9
Compare
e8f64bc to
6047358
Compare
litellm/llms/bedrock/messages/invoke_transformations/anthropic_claude3_transformation.py
Show resolved
Hide resolved
…without thinking blocks Follow-up to a494503f4b which fixed thinking + tool_use. That fix only detected missing thinking blocks on assistant messages with tool_calls. When the last assistant message has plain text content (no tool_calls), the check returned False and thinking was not dropped, causing: "Expected thinking or redacted_thinking, but found text" Add last_assistant_message_has_no_thinking_blocks() to detect any assistant message with content but no thinking blocks. Extract shared _message_has_thinking_blocks() helper that checks both the thinking_blocks field and content array for thinking/redacted_thinking blocks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ock APIs Bedrock doesn't support context_management as a request body parameter. The feature is enabled via the anthropic-beta header (context-management-2025-06-27) which was already handled correctly. Leaving context_management in the body causes: "context_management: Extra inputs are not permitted" Strip the parameter from all 3 Bedrock API paths: - Invoke Messages API - Invoke Chat API - Converse API (additionalModelRequestFields) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…pport Standardize anthropic-beta header handling across all Bedrock APIs (Invoke Chat, Converse, Messages) using a centralized whitelist-based filter with version-based model support. - Inconsistent filtering: Invoke Chat used whitelist (safe), Converse/Messages used blacklist (allows unsupported headers through) - Production risk: unsupported headers could cause AWS API errors - Maintenance burden: adding new Claude models required updating multiple hardcoded lists - Centralized BedrockBetaHeaderFilter with whitelist approach - Version-based filtering (e.g., "requires 4.5+") instead of model lists - Family restrictions (opus/sonnet/haiku) when needed - Automatic header translation for backward compatibility - Add `litellm/llms/bedrock/beta_headers_config.py` - BedrockBetaHeaderFilter class - Whitelist of 11 supported beta headers - Version/family restriction logic - Debug logging support - Invoke Chat: Replace local whitelist with centralized filter - Converse: Remove blacklist (30 lines), use whitelist filter - Messages: Remove complex filter (55 lines), preserve translation - Add `tests/test_litellm/llms/bedrock/test_beta_headers_config.py` - 40+ unit tests for filter logic - Extend `tests/test_litellm/llms/bedrock/test_anthropic_beta_support.py` - 13 integration tests for API transformations - Verify filtering, version restrictions, translations - Add `litellm/llms/bedrock/README.md` - Maintenance guide for adding new headers/models - Enhanced module docstrings with examples - Production safety: only whitelisted headers reach AWS - Zero maintenance for new Claude models (Opus 5, Sonnet 5, etc.) - Consistent filtering across all 3 APIs - Preserved backward compatibility (advanced-tool-use translation) ```bash poetry run pytest tests/test_litellm/llms/bedrock/ -v ``` Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes issue where websearch interception failed with "TAVILY_API_KEY is not set" error when using search providers that require API keys configured in the proxy config rather than environment variables. Extract api_key and api_base from the router search_tools litellm_params configuration and pass them to litellm.asearch(). Falls back to environment variables when credentials are not in the config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6047358 to
bdf1260
Compare
Summary
Collection of production fixes developed and battle-tested against AWS Bedrock + Claude Opus 4.5/4.6 with extended thinking and websearch interception.
Supersedes #20470 and #20489 (closed without merging). Builds on the foundation of #19818 (merged). Related: #20488 (open — independently addresses the same websearch thinking blocks issue as 4630793, which is already merged).
Bedrock: Centralize Beta Header Filtering
Problem: Inconsistent
anthropic-betaheader handling across the 3 Bedrock API paths:UNSUPPORTED_BEDROCK_CONVERSE_BETA_PATTERNS(allows unsupported headers through, causing AWS API errors in production)Fix: New
litellm/llms/bedrock/beta_headers_config.pywith centralizedBedrockBetaHeaderFilter:advanced-tool-use→tool-search-tool+tool-examplestranslation for backward compatibilityBedrock: Strip
context_managementfrom Request BodyProblem:
context_managementis enabled via thecontext-management-2025-06-27anthropic-beta header, NOT as a request body parameter. Leaving it in the body causes"context_management: Extra inputs are not permitted"across all 3 Bedrock API paths.Fix: Pop
context_managementfrom the request body in Converse, Invoke Chat, and Invoke Messages transformations.Bedrock: Remove Invalid
:0Suffix from Claude Opus 4.6 Model IDProblem:
anthropic.claude-opus-4-6-v1:0was present inBEDROCK_CONVERSE_MODELSbut AWS Bedrock does not recognize this identifier. Onlyanthropic.claude-opus-4-6-v1(without:0) is valid.Fix: Remove the incorrect
anthropic.claude-opus-4-6-v1:0entry.Cherry-picked from
search_tools_fixbranch (adapted fromefec746a17).Thinking: Drop Thinking Param for Text-Only Assistant Messages
Problem: The existing fix (a494503f4b) only dropped the
thinkingparam when the last assistant message hadtool_callswithout thinking blocks. When the last assistant message has plain text content (notool_calls), the check returnedFalseand thinking was not dropped, causing:"Expected thinking or redacted_thinking, but found text"Fix:
last_assistant_message_has_no_thinking_blocks()to detect assistant messages with content but no thinking blocks_message_has_thinking_blocks()helper checking boththinking_blocksfield and content arrayThinking: Recognize
adaptiveType inis_thinking_enabledProblem:
is_thinking_enabled()only checked fortype="enabled", but Claude Opus 4.6 usestype="adaptive". Without recognizingadaptive,max_tokensauto-adjustment didn't trigger, causing API errors.Fix: Check
thinking_type in ("enabled", "adaptive")inis_thinking_enabled().Websearch Interception: Load API Keys from Router Configuration
Problem:
_execute_search()only extractedsearch_providerfrom the router'ssearch_toolsconfig, notapi_keyorapi_base. This caused"TAVILY_API_KEY is not set"errors when keys are configured in the proxy YAML config rather than environment variables.Fix: Extract
api_keyandapi_basefrom the search tool'slitellm_paramsand pass them tolitellm.asearch(), with fallback to environment variables.Note: The websearch thinking blocks preservation (4630793) and the max_tokens budget validation (12691dc) are already merged into main. This PR adds the missing API key loading fix on top.
Related: #14194, #20488
Bedrock Guardrails: PII Redaction Null Safety
Problem: Multiple
.get("key", [])calls inbedrock_guardrails.pycan still returnNonewhen the AWS API returns explicitnullfor a field. This causesTypeError: 'NoneType' is not iterablein_redact_pii_matches()and_check_guardrail_blocked().Fix: Add
or []fallback after each.get(...)call (9 locations).Authored by Ryan Goldblatt, cherry-picked from
search_tools_fixbranch.Test Plan
pytest tests/test_litellm/llms/bedrock/test_beta_headers_config.py -vpytest tests/test_litellm/llms/bedrock/test_anthropic_beta_support.py -vpytest tests/test_litellm/test_utils.py -v -k thinkingpytest tests/test_litellm/proxy/guardrails/ -vmake lint-ruffAll changes validated end-to-end: Claude Code → LiteLLM Proxy → AWS Bedrock → Claude Opus 4.5/4.6 with extended thinking and websearch.
🤖 Generated with Claude Code