fix(bedrock/thinking/websearch): beta headers, thinking fixes, API key loading, guardrails null safety by Quentin-M · Pull Request #23523 · BerriAI/litellm

Quentin-M · 2026-03-13T06:04:38Z

Summary

Collection of production fixes developed and battle-tested against AWS Bedrock + Claude Opus 4.5/4.6 with extended thinking and websearch interception.

Supersedes #20470 and #20489 (closed without merging). Builds on the foundation of #19818 (merged). Related: #20488 (open — independently addresses the same websearch thinking blocks issue as 4630793, which is already merged).

Bedrock: Centralize Beta Header Filtering

Problem: Inconsistent anthropic-beta header handling across the 3 Bedrock API paths:

Invoke Chat used a whitelist (safe)
Converse and Messages used blacklists via UNSUPPORTED_BEDROCK_CONVERSE_BETA_PATTERNS (allows unsupported headers through, causing AWS API errors in production)
Adding new Claude models required updating multiple hardcoded lists

Fix: New litellm/llms/bedrock/beta_headers_config.py with centralized BedrockBetaHeaderFilter:

Whitelist of 11 supported beta headers with version-based model restrictions
Applied uniformly to all 3 Bedrock API paths (Invoke Chat, Converse, Messages)
Zero maintenance for new Claude models — version-based rather than model-list-based
Preserves advanced-tool-use → tool-search-tool + tool-examples translation for backward compatibility
40+ unit tests + 13 integration tests

Bedrock: Strip `context_management` from Request Body

Problem: context_management is enabled via the context-management-2025-06-27 anthropic-beta header, NOT as a request body parameter. Leaving it in the body causes "context_management: Extra inputs are not permitted" across all 3 Bedrock API paths.

Fix: Pop context_management from the request body in Converse, Invoke Chat, and Invoke Messages transformations.

Bedrock: Remove Invalid `:0` Suffix from Claude Opus 4.6 Model ID

Problem: anthropic.claude-opus-4-6-v1:0 was present in BEDROCK_CONVERSE_MODELS but AWS Bedrock does not recognize this identifier. Only anthropic.claude-opus-4-6-v1 (without :0) is valid.

Fix: Remove the incorrect anthropic.claude-opus-4-6-v1:0 entry.

Cherry-picked from search_tools_fix branch (adapted from efec746a17).

Thinking: Drop Thinking Param for Text-Only Assistant Messages

Problem: The existing fix (a494503f4b) only dropped the thinking param when the last assistant message had tool_calls without thinking blocks. When the last assistant message has plain text content (no tool_calls), the check returned False and thinking was not dropped, causing: "Expected thinking or redacted_thinking, but found text"

Fix:

Add last_assistant_message_has_no_thinking_blocks() to detect assistant messages with content but no thinking blocks
Extract shared _message_has_thinking_blocks() helper checking both thinking_blocks field and content array
Apply fix to both Anthropic and Bedrock Converse transformations

Thinking: Recognize `adaptive` Type in `is_thinking_enabled`

Problem: is_thinking_enabled() only checked for type="enabled", but Claude Opus 4.6 uses type="adaptive". Without recognizing adaptive, max_tokens auto-adjustment didn't trigger, causing API errors.

Fix: Check thinking_type in ("enabled", "adaptive") in is_thinking_enabled().

Websearch Interception: Load API Keys from Router Configuration

Problem: _execute_search() only extracted search_provider from the router's search_tools config, not api_key or api_base. This caused "TAVILY_API_KEY is not set" errors when keys are configured in the proxy YAML config rather than environment variables.

Fix: Extract api_key and api_base from the search tool's litellm_params and pass them to litellm.asearch(), with fallback to environment variables.

Note: The websearch thinking blocks preservation (4630793) and the max_tokens budget validation (12691dc) are already merged into main. This PR adds the missing API key loading fix on top.

Related: #14194, #20488

Bedrock Guardrails: PII Redaction Null Safety

Problem: Multiple .get("key", []) calls in bedrock_guardrails.py can still return None when the AWS API returns explicit null for a field. This causes TypeError: 'NoneType' is not iterable in _redact_pii_matches() and _check_guardrail_blocked().

Fix: Add or [] fallback after each .get(...) call (9 locations).

Authored by Ryan Goldblatt, cherry-picked from search_tools_fix branch.

Test Plan

pytest tests/test_litellm/llms/bedrock/test_beta_headers_config.py -v
pytest tests/test_litellm/llms/bedrock/test_anthropic_beta_support.py -v
pytest tests/test_litellm/test_utils.py -v -k thinking
pytest tests/test_litellm/proxy/guardrails/ -v
Lint: make lint-ruff

All changes validated end-to-end: Claude Code → LiteLLM Proxy → AWS Bedrock → Claude Opus 4.5/4.6 with extended thinking and websearch.

🤖 Generated with Claude Code

AWS Bedrock does not recognize anthropic.claude-opus-4-6-v1:0 as a valid model identifier. Unlike other Claude models, Opus 4.6 requires the model ID without the :0 version suffix: anthropic.claude-opus-4-6-v1. Cherry-picked from search_tools_fix (efec746a17), adapted since upstream PR BerriAI#20564 already fixed the JSON pricing keys. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Upstream only checks for type="enabled" but Opus 4.6 uses type="adaptive". Without this fix, max_tokens auto-adjustment doesn't trigger for adaptive thinking, causing API errors.

vercel · 2026-03-13T06:04:44Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Mar 17, 2026 4:12am

litellm/llms/bedrock/beta_headers_config.py

greptile-apps · 2026-03-13T06:09:14Z

Greptile Summary

This PR bundles several production fixes for AWS Bedrock + Claude Opus 4.5/4.6 deployments: centralised beta-header filtering via a new BedrockBetaHeaderFilter, context_management body-param stripping across all three Bedrock API paths, a model-ID correction for Opus 4.6, adaptive thinking-type recognition, guardrail null-safety, and websearch API-key loading from YAML config. Most of the fixes are well-executed and battle-tested. The one issue found:

Dead code path for the "found text" thinking fix — The PR adds or last_assistant_message_has_no_thinking_blocks(messages) to the thinking-drop condition in both litellm/llms/anthropic/chat/transformation.py and litellm/llms/bedrock/chat/converse_transformation.py. However, the function has an internal guard (if not any_assistant_message_has_thinking_blocks(messages): return False) that is mutually exclusive with the outer condition's and not any_assistant_message_has_thinking_blocks(messages). When no prior thinking blocks exist (the scenario the fix targets), the function returns False; when prior thinking blocks do exist the outer guard prevents the drop. The result is that the new branch is permanently unreachable, and the "Expected thinking or redacted_thinking, but found text" error will continue to surface. The _message_has_thinking_blocks refactoring and the isolated unit tests are correct; the issue is only in how the function interacts with the call-site guard.
All other fixes (centralized beta-header filtering, context_management stripping, is_thinking_enabled adaptive support, guardrail null-safety, websearch key loading) look correct and are well-tested.

Confidence Score: 3/5

Safe to merge for all fixes except the thinking-drop path, which remains broken and will still surface "Expected thinking or redacted_thinking, but found text" at runtime.
The PR ships many well-tested, clearly scoped fixes. One logic bug — the last_assistant_message_has_no_thinking_blocks dead code path — leaves a stated bug unfixed without surfacing a new error. All other changes are sound.
litellm/utils.py (dead code path), litellm/llms/anthropic/chat/transformation.py, and litellm/llms/bedrock/chat/converse_transformation.py (both consume the broken function).

Important Files Changed

Filename	Overview
litellm/utils.py	Adds `_message_has_thinking_blocks` helper and `last_assistant_message_has_no_thinking_blocks`, but the new function has an internal guard that contradicts the outer condition, making the intended fix for "Expected thinking or redacted_thinking, but found text" a no-op.
litellm/llms/bedrock/beta_headers_config.py	New centralized whitelist-based beta header filter for all three Bedrock API paths; well-structured with version and family restrictions, thorough doc comments, and fixes noted in previous review rounds have been addressed.
litellm/llms/anthropic/chat/transformation.py	Adds `last_assistant_message_has_no_thinking_blocks` to the thinking-drop condition, but the new OR branch can never trigger due to the contradictory internal/outer guards (see utils.py comment).
litellm/llms/bedrock/chat/converse_transformation.py	Replaces ad-hoc blacklist filtering with the new centralized `BedrockBetaHeaderFilter`, strips `context_management` from the request body, and applies the same (dead-code) thinking-drop logic as the Anthropic path.
litellm/llms/bedrock/messages/invoke_transformations/anthropic_claude3_transformation.py	Removes hardcoded model-specific helpers, now correctly adds both `tool-search-tool-2025-10-19` and `tool-examples-2025-10-29` together before handing off to the centralized filter, and strips `context_management` from the body.
litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py	Straightforward null-safety fix adding `or []` fallback after `.get(...)` calls to prevent `TypeError: 'NoneType' is not iterable` when AWS returns explicit null for policy fields.
litellm/integrations/websearch_interception/handler.py	Correctly extracts `api_key` and `api_base` from the search tool's `litellm_params` and passes them to `litellm.asearch`, fixing the `TAVILY_API_KEY is not set` error when keys are in YAML config rather than environment variables.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Incoming Request with thinking param] --> B{messages is not None?}
    B -- No --> Z[Keep thinking param]
    B -- Yes --> C{last_assistant_with_tool_calls\nhas_no_thinking_blocks?}

    C -- True --> D{not any_assistant_message\n_has_thinking_blocks?}
    C -- False --> E{last_assistant_message\n_has_no_thinking_blocks?}

    E -- True: only when some\nmessages have thinking --> D
    E -- False --> Z

    D -- True: no prior\nthinking exists --> F[Drop thinking param\n✅ works for tool_calls path]
    D -- False: prior thinking\nexists → BLOCKS DROP --> G["❌ last_assistant_message\npath is dead code\n(inner guard requires prior thinking;\nouter guard forbids prior thinking)"]

    G --> Z

    style G fill:#ffcccc,stroke:#cc0000
    style F fill:#ccffcc,stroke:#009900

Comments Outside Diff (1)

litellm/utils.py, line 7677-7720 (link)

New code path is dead — contradictory guards make fix a no-op

last_assistant_message_has_no_thinking_blocks has an internal early-return that makes it impossible to ever trigger the thinking-drop at the call sites in anthropic/chat/transformation.py and bedrock/chat/converse_transformation.py.

Trace through the two scenarios the PR targets:

Scenario A — thinking never used before (the stated fix):
```
messages = [user, assistant("Hello!"), user, assistant("Sure!")]
thinking = {"type": "enabled"}  # first-time enable
```
- any_assistant_message_has_thinking_blocks(messages) → False
- The internal guard at line 7696 fires: if not any_assistant_message_has_thinking_blocks(messages): return False
- last_assistant_message_has_no_thinking_blocks returns False
- Outer condition: (False or False) and not False = False
- Thinking NOT dropped → error "Expected thinking or redacted_thinking, but found text" still raised ❌
Scenario B — thinking was used before, last message has no blocks:
```
messages = [user, assistant[thinking+text], user, assistant("text")]
```
- any_assistant_message_has_thinking_blocks(messages) → True (earlier message has thinking)
- Internal guard passes; last_assistant_message_has_no_thinking_blocks returns True
- Outer condition: (... or True) and not True = False (outer guard kills it)
- Thinking NOT dropped ❌ (but dropping here would also break things — it would trigger "When thinking is disabled, an assistant message cannot contain thinking" for the earlier messages with thinking blocks)
The internal guard (if not any_assistant_message_has_thinking_blocks(messages): return False) and the outer guard (and not any_assistant_message_has_thinking_blocks(messages)) are mutually exclusive — when one is satisfied the other is not — making the new or last_assistant_message_has_no_thinking_blocks(messages) branch permanently unreachable.

The tests exercise the function in isolation and correctly document the expected return value, but no test exercises the full compound condition through transform_request / _transform_request_helper. A minimal end-to-end test would catch this:
```
# Scenario A: enable thinking for the first time with a text-only history
optional_params = {"thinking": {"type": "enabled"}}
messages = [
    {"role": "user", "content": "hi"},
    {"role": "assistant", "content": "hello"},
]
# After transformation, thinking should be dropped (with modify_params=True)
# Currently it is NOT dropped, so the API call will fail.
```
The likely fix is to remove the internal early-return inside last_assistant_message_has_no_thinking_blocks and let the outer and not any_assistant_message_has_thinking_blocks(messages) guard handle both branches uniformly, since that guard already prevents the dangerous Scenario B case.

_{Last reviewed commit: bdf1260}

litellm/llms/bedrock/beta_headers_config.py

greptile-apps · 2026-03-13T06:09:19Z

litellm/llms/bedrock/beta_headers_config.py

+BETA_HEADER_MINIMUM_VERSION: Dict[str, float] = {
+    # Extended thinking features require Claude 4.0+
+    "interleaved-thinking-2025-05-14": 4.0,
+    "dev-full-thinking-2025-05-14": 4.0,
+    # 1M context requires Claude 4.0+
+    "context-1m-2025-08-07": 4.0,
+    # Context management requires Claude 4.5+
+    "context-management-2025-06-27": 4.5,
+    # Effort parameter requires Claude 4.5+ (but only Opus 4.5, see family restrictions)
+    "effort-2025-11-24": 4.5,
+    # Tool search requires Claude 4.5+
+    "tool-search-tool-2025-10-19": 4.5,
+    "tool-examples-2025-10-29": 4.5,
+}
+
+# Model family restrictions for specific beta headers
+# Only enforced if the version requirement is met
+# Example: "effort-2025-11-24" requires Claude 4.5+ AND Opus family
+BETA_HEADER_FAMILY_RESTRICTIONS: Dict[str, List[str]] = {
+    "effort-2025-11-24": ["opus"],  # Only Opus 4.5+ supports effort
+    # Tool search works on Opus 4.5+ and Sonnet 4.5+, but not Haiku
+    "tool-search-tool-2025-10-19": ["opus", "sonnet"],
+    "tool-examples-2025-10-29": ["opus", "sonnet"],
+}


Hardcoded model-specific flags violate codebase convention

BETA_HEADER_MINIMUM_VERSION and BETA_HEADER_FAMILY_RESTRICTIONS encode model capability knowledge directly in code. Per the repo's own convention (see the custom instructions), model-specific capability flags should live in model_prices_and_context_window.json and be accessed via get_model_info / supports_* helpers — not hardcoded as Python dicts. This is exactly the same pattern the convention calls out as "BAD":

# BAD example from the convention "interleaved-thinking-2025-05-14": 4.0, # ← hardcoded version knowledge "effort-2025-11-24": ["opus"], # ← hardcoded family knowledge

The practical consequence is that each time Anthropic/AWS expands a beta flag to a new model generation, someone needs to update this file — the same maintenance burden the convention explicitly wants to avoid.

The intended good approach would be to consult get_model_info or a supports_* function (e.g., supports_reasoning, supports_computer_use) derived from model_prices_and_context_window.json, making the filter self-updating as model entries are added.

Rule Used: What: Do not hardcode model-specific flags in the ... (source)

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Respectfully pushing back on this one. model_prices_and_context_window.json has no per-beta-header capability fields — it would require updating 100+ model entries every time AWS adds support for a new header. The version-based approach in BETA_HEADER_MINIMUM_VERSION is intentionally more maintainable: new Claude models require zero code changes, and only new capability restrictions (not expansions) need an entry. This is exactly the future-proof design called out in the module docstring.

litellm/llms/bedrock/chat/invoke_transformations/anthropic_claude3_transformation.py

codspeed-hq · 2026-03-17T03:15:34Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing Quentin-M:fix/websearch-bedrock-thinking (bdf1260) with main (278c9ba)}

litellm/llms/bedrock/beta_headers_config.py

litellm/utils.py

litellm/llms/bedrock/messages/invoke_transformations/anthropic_claude3_transformation.py

…without thinking blocks Follow-up to a494503f4b which fixed thinking + tool_use. That fix only detected missing thinking blocks on assistant messages with tool_calls. When the last assistant message has plain text content (no tool_calls), the check returned False and thinking was not dropped, causing: "Expected thinking or redacted_thinking, but found text" Add last_assistant_message_has_no_thinking_blocks() to detect any assistant message with content but no thinking blocks. Extract shared _message_has_thinking_blocks() helper that checks both the thinking_blocks field and content array for thinking/redacted_thinking blocks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ock APIs Bedrock doesn't support context_management as a request body parameter. The feature is enabled via the anthropic-beta header (context-management-2025-06-27) which was already handled correctly. Leaving context_management in the body causes: "context_management: Extra inputs are not permitted" Strip the parameter from all 3 Bedrock API paths: - Invoke Messages API - Invoke Chat API - Converse API (additionalModelRequestFields) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…pport Standardize anthropic-beta header handling across all Bedrock APIs (Invoke Chat, Converse, Messages) using a centralized whitelist-based filter with version-based model support. - Inconsistent filtering: Invoke Chat used whitelist (safe), Converse/Messages used blacklist (allows unsupported headers through) - Production risk: unsupported headers could cause AWS API errors - Maintenance burden: adding new Claude models required updating multiple hardcoded lists - Centralized BedrockBetaHeaderFilter with whitelist approach - Version-based filtering (e.g., "requires 4.5+") instead of model lists - Family restrictions (opus/sonnet/haiku) when needed - Automatic header translation for backward compatibility - Add `litellm/llms/bedrock/beta_headers_config.py` - BedrockBetaHeaderFilter class - Whitelist of 11 supported beta headers - Version/family restriction logic - Debug logging support - Invoke Chat: Replace local whitelist with centralized filter - Converse: Remove blacklist (30 lines), use whitelist filter - Messages: Remove complex filter (55 lines), preserve translation - Add `tests/test_litellm/llms/bedrock/test_beta_headers_config.py` - 40+ unit tests for filter logic - Extend `tests/test_litellm/llms/bedrock/test_anthropic_beta_support.py` - 13 integration tests for API transformations - Verify filtering, version restrictions, translations - Add `litellm/llms/bedrock/README.md` - Maintenance guide for adding new headers/models - Enhanced module docstrings with examples - Production safety: only whitelisted headers reach AWS - Zero maintenance for new Claude models (Opus 5, Sonnet 5, etc.) - Consistent filtering across all 3 APIs - Preserved backward compatibility (advanced-tool-use translation) ```bash poetry run pytest tests/test_litellm/llms/bedrock/ -v ``` Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Fixes issue where websearch interception failed with "TAVILY_API_KEY is not set" error when using search providers that require API keys configured in the proxy config rather than environment variables. Extract api_key and api_base from the router search_tools litellm_params configuration and pass them to litellm.asearch(). Falls back to environment variables when credentials are not in the config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ryangoldblatt-bm and others added 3 commits March 13, 2026 01:59

fix bedrock pii redaction null value handling

616f97b

fix(thinking): recognize adaptive thinking type in is_thinking_enabled

37a8dab

Upstream only checks for type="enabled" but Opus 4.6 uses type="adaptive". Without this fix, max_tokens auto-adjustment doesn't trigger for adaptive thinking, causing API errors.

vercel bot deployed to Preview March 13, 2026 06:06 View deployment

github-advanced-security bot found potential problems Mar 13, 2026

View reviewed changes

litellm/llms/bedrock/beta_headers_config.py Fixed Show fixed Hide fixed

litellm/llms/bedrock/beta_headers_config.py Fixed Show fixed Hide fixed

litellm/llms/bedrock/beta_headers_config.py Fixed Show fixed Hide fixed

litellm/llms/bedrock/beta_headers_config.py Fixed Show fixed Hide fixed

Quentin-M changed the title ~~fix/feat: bedrock beta headers, thinking fixes, websearch API keys, guardrails null safety~~ fix(bedrock/thinking/websearch): beta headers, thinking fixes, API key loading, guardrails null safety Mar 13, 2026

greptile-apps bot reviewed Mar 13, 2026

View reviewed changes

Quentin-M force-pushed the fix/websearch-bedrock-thinking branch from 619d2e5 to acfcba4 Compare March 17, 2026 03:13

vercel bot deployed to Preview March 17, 2026 03:15 View deployment

github-advanced-security bot found potential problems Mar 17, 2026

View reviewed changes

greptile-apps bot reviewed Mar 17, 2026

View reviewed changes

litellm/llms/bedrock/beta_headers_config.py Outdated Show resolved Hide resolved

Quentin-M force-pushed the fix/websearch-bedrock-thinking branch from acfcba4 to dee50f9 Compare March 17, 2026 03:22

vercel bot deployed to Preview March 17, 2026 03:24 View deployment

greptile-apps bot reviewed Mar 17, 2026

View reviewed changes

litellm/llms/bedrock/beta_headers_config.py Outdated Show resolved Hide resolved

vercel bot deployed to Preview March 17, 2026 03:35 View deployment

Quentin-M force-pushed the fix/websearch-bedrock-thinking branch from e8f64bc to 6047358 Compare March 17, 2026 03:39

vercel bot deployed to Preview March 17, 2026 03:41 View deployment

greptile-apps bot reviewed Mar 17, 2026

View reviewed changes

litellm/utils.py Show resolved Hide resolved

litellm/llms/bedrock/messages/invoke_transformations/anthropic_claude3_transformation.py Show resolved Hide resolved

Quentin-M and others added 4 commits March 17, 2026 00:10

Quentin-M force-pushed the fix/websearch-bedrock-thinking branch from 6047358 to bdf1260 Compare March 17, 2026 04:10

vercel bot deployed to Preview March 17, 2026 04:12 View deployment

Uh oh!

Conversation

Quentin-M commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Bedrock: Centralize Beta Header Filtering

Bedrock: Strip context_management from Request Body

Bedrock: Remove Invalid :0 Suffix from Claude Opus 4.6 Model ID

Thinking: Drop Thinking Param for Text-Only Assistant Messages

Thinking: Recognize adaptive Type in is_thinking_enabled

Websearch Interception: Load API Keys from Router Configuration

Bedrock Guardrails: PII Redaction Null Safety

Test Plan

Uh oh!

vercel bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

Uh oh!

greptile-apps bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Quentin-M Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codspeed-hq bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Quentin-M commented Mar 13, 2026 •

edited

Loading

Bedrock: Strip `context_management` from Request Body

Bedrock: Remove Invalid `:0` Suffix from Claude Opus 4.6 Model ID

Thinking: Recognize `adaptive` Type in `is_thinking_enabled`

vercel bot commented Mar 13, 2026 •

edited

Loading

greptile-apps bot commented Mar 13, 2026 •

edited

Loading

codspeed-hq bot commented Mar 17, 2026 •

edited

Loading