Skip to content

fix(bedrock/thinking/websearch): beta headers, thinking fixes, API key loading, guardrails null safety#23523

Open
Quentin-M wants to merge 7 commits intoBerriAI:mainfrom
Quentin-M:fix/websearch-bedrock-thinking
Open

fix(bedrock/thinking/websearch): beta headers, thinking fixes, API key loading, guardrails null safety#23523
Quentin-M wants to merge 7 commits intoBerriAI:mainfrom
Quentin-M:fix/websearch-bedrock-thinking

Conversation

@Quentin-M
Copy link
Contributor

@Quentin-M Quentin-M commented Mar 13, 2026

Summary

Collection of production fixes developed and battle-tested against AWS Bedrock + Claude Opus 4.5/4.6 with extended thinking and websearch interception.

Supersedes #20470 and #20489 (closed without merging). Builds on the foundation of #19818 (merged). Related: #20488 (open — independently addresses the same websearch thinking blocks issue as 4630793, which is already merged).


Bedrock: Centralize Beta Header Filtering

Problem: Inconsistent anthropic-beta header handling across the 3 Bedrock API paths:

  • Invoke Chat used a whitelist (safe)
  • Converse and Messages used blacklists via UNSUPPORTED_BEDROCK_CONVERSE_BETA_PATTERNS (allows unsupported headers through, causing AWS API errors in production)
  • Adding new Claude models required updating multiple hardcoded lists

Fix: New litellm/llms/bedrock/beta_headers_config.py with centralized BedrockBetaHeaderFilter:

  • Whitelist of 11 supported beta headers with version-based model restrictions
  • Applied uniformly to all 3 Bedrock API paths (Invoke Chat, Converse, Messages)
  • Zero maintenance for new Claude models — version-based rather than model-list-based
  • Preserves advanced-tool-usetool-search-tool + tool-examples translation for backward compatibility
  • 40+ unit tests + 13 integration tests

Bedrock: Strip context_management from Request Body

Problem: context_management is enabled via the context-management-2025-06-27 anthropic-beta header, NOT as a request body parameter. Leaving it in the body causes "context_management: Extra inputs are not permitted" across all 3 Bedrock API paths.

Fix: Pop context_management from the request body in Converse, Invoke Chat, and Invoke Messages transformations.

Bedrock: Remove Invalid :0 Suffix from Claude Opus 4.6 Model ID

Problem: anthropic.claude-opus-4-6-v1:0 was present in BEDROCK_CONVERSE_MODELS but AWS Bedrock does not recognize this identifier. Only anthropic.claude-opus-4-6-v1 (without :0) is valid.

Fix: Remove the incorrect anthropic.claude-opus-4-6-v1:0 entry.

Cherry-picked from search_tools_fix branch (adapted from efec746a17).


Thinking: Drop Thinking Param for Text-Only Assistant Messages

Problem: The existing fix (a494503f4b) only dropped the thinking param when the last assistant message had tool_calls without thinking blocks. When the last assistant message has plain text content (no tool_calls), the check returned False and thinking was not dropped, causing: "Expected thinking or redacted_thinking, but found text"

Fix:

  • Add last_assistant_message_has_no_thinking_blocks() to detect assistant messages with content but no thinking blocks
  • Extract shared _message_has_thinking_blocks() helper checking both thinking_blocks field and content array
  • Apply fix to both Anthropic and Bedrock Converse transformations

Thinking: Recognize adaptive Type in is_thinking_enabled

Problem: is_thinking_enabled() only checked for type="enabled", but Claude Opus 4.6 uses type="adaptive". Without recognizing adaptive, max_tokens auto-adjustment didn't trigger, causing API errors.

Fix: Check thinking_type in ("enabled", "adaptive") in is_thinking_enabled().


Websearch Interception: Load API Keys from Router Configuration

Problem: _execute_search() only extracted search_provider from the router's search_tools config, not api_key or api_base. This caused "TAVILY_API_KEY is not set" errors when keys are configured in the proxy YAML config rather than environment variables.

Fix: Extract api_key and api_base from the search tool's litellm_params and pass them to litellm.asearch(), with fallback to environment variables.

Note: The websearch thinking blocks preservation (4630793) and the max_tokens budget validation (12691dc) are already merged into main. This PR adds the missing API key loading fix on top.

Related: #14194, #20488


Bedrock Guardrails: PII Redaction Null Safety

Problem: Multiple .get("key", []) calls in bedrock_guardrails.py can still return None when the AWS API returns explicit null for a field. This causes TypeError: 'NoneType' is not iterable in _redact_pii_matches() and _check_guardrail_blocked().

Fix: Add or [] fallback after each .get(...) call (9 locations).

Authored by Ryan Goldblatt, cherry-picked from search_tools_fix branch.


Test Plan

  • pytest tests/test_litellm/llms/bedrock/test_beta_headers_config.py -v
  • pytest tests/test_litellm/llms/bedrock/test_anthropic_beta_support.py -v
  • pytest tests/test_litellm/test_utils.py -v -k thinking
  • pytest tests/test_litellm/proxy/guardrails/ -v
  • Lint: make lint-ruff

All changes validated end-to-end: Claude Code → LiteLLM Proxy → AWS Bedrock → Claude Opus 4.5/4.6 with extended thinking and websearch.

🤖 Generated with Claude Code

ryangoldblatt-bm and others added 3 commits March 13, 2026 01:59
AWS Bedrock does not recognize anthropic.claude-opus-4-6-v1:0 as a valid
model identifier. Unlike other Claude models, Opus 4.6 requires the model
ID without the :0 version suffix: anthropic.claude-opus-4-6-v1.

Cherry-picked from search_tools_fix (efec746a17), adapted since upstream
PR BerriAI#20564 already fixed the JSON pricing keys.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Upstream only checks for type="enabled" but Opus 4.6 uses type="adaptive".
Without this fix, max_tokens auto-adjustment doesn't trigger for adaptive
thinking, causing API errors.
@vercel
Copy link

vercel bot commented Mar 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 17, 2026 4:12am

Request Review

@Quentin-M Quentin-M changed the title fix/feat: bedrock beta headers, thinking fixes, websearch API keys, guardrails null safety fix(bedrock/thinking/websearch): beta headers, thinking fixes, API key loading, guardrails null safety Mar 13, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 13, 2026

Greptile Summary

This PR bundles several production fixes for AWS Bedrock + Claude Opus 4.5/4.6 deployments: centralised beta-header filtering via a new BedrockBetaHeaderFilter, context_management body-param stripping across all three Bedrock API paths, a model-ID correction for Opus 4.6, adaptive thinking-type recognition, guardrail null-safety, and websearch API-key loading from YAML config. Most of the fixes are well-executed and battle-tested. The one issue found:

  • Dead code path for the "found text" thinking fix — The PR adds or last_assistant_message_has_no_thinking_blocks(messages) to the thinking-drop condition in both litellm/llms/anthropic/chat/transformation.py and litellm/llms/bedrock/chat/converse_transformation.py. However, the function has an internal guard (if not any_assistant_message_has_thinking_blocks(messages): return False) that is mutually exclusive with the outer condition's and not any_assistant_message_has_thinking_blocks(messages). When no prior thinking blocks exist (the scenario the fix targets), the function returns False; when prior thinking blocks do exist the outer guard prevents the drop. The result is that the new branch is permanently unreachable, and the "Expected thinking or redacted_thinking, but found text" error will continue to surface. The _message_has_thinking_blocks refactoring and the isolated unit tests are correct; the issue is only in how the function interacts with the call-site guard.
  • All other fixes (centralized beta-header filtering, context_management stripping, is_thinking_enabled adaptive support, guardrail null-safety, websearch key loading) look correct and are well-tested.

Confidence Score: 3/5

  • Safe to merge for all fixes except the thinking-drop path, which remains broken and will still surface "Expected thinking or redacted_thinking, but found text" at runtime.
  • The PR ships many well-tested, clearly scoped fixes. One logic bug — the last_assistant_message_has_no_thinking_blocks dead code path — leaves a stated bug unfixed without surfacing a new error. All other changes are sound.
  • litellm/utils.py (dead code path), litellm/llms/anthropic/chat/transformation.py, and litellm/llms/bedrock/chat/converse_transformation.py (both consume the broken function).

Important Files Changed

Filename Overview
litellm/utils.py Adds _message_has_thinking_blocks helper and last_assistant_message_has_no_thinking_blocks, but the new function has an internal guard that contradicts the outer condition, making the intended fix for "Expected thinking or redacted_thinking, but found text" a no-op.
litellm/llms/bedrock/beta_headers_config.py New centralized whitelist-based beta header filter for all three Bedrock API paths; well-structured with version and family restrictions, thorough doc comments, and fixes noted in previous review rounds have been addressed.
litellm/llms/anthropic/chat/transformation.py Adds last_assistant_message_has_no_thinking_blocks to the thinking-drop condition, but the new OR branch can never trigger due to the contradictory internal/outer guards (see utils.py comment).
litellm/llms/bedrock/chat/converse_transformation.py Replaces ad-hoc blacklist filtering with the new centralized BedrockBetaHeaderFilter, strips context_management from the request body, and applies the same (dead-code) thinking-drop logic as the Anthropic path.
litellm/llms/bedrock/messages/invoke_transformations/anthropic_claude3_transformation.py Removes hardcoded model-specific helpers, now correctly adds both tool-search-tool-2025-10-19 and tool-examples-2025-10-29 together before handing off to the centralized filter, and strips context_management from the body.
litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py Straightforward null-safety fix adding or [] fallback after .get(...) calls to prevent TypeError: 'NoneType' is not iterable when AWS returns explicit null for policy fields.
litellm/integrations/websearch_interception/handler.py Correctly extracts api_key and api_base from the search tool's litellm_params and passes them to litellm.asearch, fixing the TAVILY_API_KEY is not set error when keys are in YAML config rather than environment variables.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Incoming Request with thinking param] --> B{messages is not None?}
    B -- No --> Z[Keep thinking param]
    B -- Yes --> C{last_assistant_with_tool_calls\nhas_no_thinking_blocks?}

    C -- True --> D{not any_assistant_message\n_has_thinking_blocks?}
    C -- False --> E{last_assistant_message\n_has_no_thinking_blocks?}

    E -- True: only when some\nmessages have thinking --> D
    E -- False --> Z

    D -- True: no prior\nthinking exists --> F[Drop thinking param\n✅ works for tool_calls path]
    D -- False: prior thinking\nexists → BLOCKS DROP --> G["❌ last_assistant_message\npath is dead code\n(inner guard requires prior thinking;\nouter guard forbids prior thinking)"]

    G --> Z

    style G fill:#ffcccc,stroke:#cc0000
    style F fill:#ccffcc,stroke:#009900
Loading

Comments Outside Diff (1)

  1. litellm/utils.py, line 7677-7720 (link)

    P1 New code path is dead — contradictory guards make fix a no-op

    last_assistant_message_has_no_thinking_blocks has an internal early-return that makes it impossible to ever trigger the thinking-drop at the call sites in anthropic/chat/transformation.py and bedrock/chat/converse_transformation.py.

    Trace through the two scenarios the PR targets:

    Scenario A — thinking never used before (the stated fix):

    messages = [user, assistant("Hello!"), user, assistant("Sure!")]
    thinking = {"type": "enabled"}  # first-time enable
    
    • any_assistant_message_has_thinking_blocks(messages)False
    • The internal guard at line 7696 fires: if not any_assistant_message_has_thinking_blocks(messages): return False
    • last_assistant_message_has_no_thinking_blocks returns False
    • Outer condition: (False or False) and not False = False
    • Thinking NOT dropped → error "Expected thinking or redacted_thinking, but found text" still raised ❌

    Scenario B — thinking was used before, last message has no blocks:

    messages = [user, assistant[thinking+text], user, assistant("text")]
    
    • any_assistant_message_has_thinking_blocks(messages)True (earlier message has thinking)
    • Internal guard passes; last_assistant_message_has_no_thinking_blocks returns True
    • Outer condition: (... or True) and not True = False (outer guard kills it)
    • Thinking NOT dropped ❌ (but dropping here would also break things — it would trigger "When thinking is disabled, an assistant message cannot contain thinking" for the earlier messages with thinking blocks)

    The internal guard (if not any_assistant_message_has_thinking_blocks(messages): return False) and the outer guard (and not any_assistant_message_has_thinking_blocks(messages)) are mutually exclusive — when one is satisfied the other is not — making the new or last_assistant_message_has_no_thinking_blocks(messages) branch permanently unreachable.

    The tests exercise the function in isolation and correctly document the expected return value, but no test exercises the full compound condition through transform_request / _transform_request_helper. A minimal end-to-end test would catch this:

    # Scenario A: enable thinking for the first time with a text-only history
    optional_params = {"thinking": {"type": "enabled"}}
    messages = [
        {"role": "user", "content": "hi"},
        {"role": "assistant", "content": "hello"},
    ]
    # After transformation, thinking should be dropped (with modify_params=True)
    # Currently it is NOT dropped, so the API call will fail.

    The likely fix is to remove the internal early-return inside last_assistant_message_has_no_thinking_blocks and let the outer and not any_assistant_message_has_thinking_blocks(messages) guard handle both branches uniformly, since that guard already prevents the dangerous Scenario B case.

Last reviewed commit: bdf1260

Comment on lines +115 to +138
BETA_HEADER_MINIMUM_VERSION: Dict[str, float] = {
# Extended thinking features require Claude 4.0+
"interleaved-thinking-2025-05-14": 4.0,
"dev-full-thinking-2025-05-14": 4.0,
# 1M context requires Claude 4.0+
"context-1m-2025-08-07": 4.0,
# Context management requires Claude 4.5+
"context-management-2025-06-27": 4.5,
# Effort parameter requires Claude 4.5+ (but only Opus 4.5, see family restrictions)
"effort-2025-11-24": 4.5,
# Tool search requires Claude 4.5+
"tool-search-tool-2025-10-19": 4.5,
"tool-examples-2025-10-29": 4.5,
}

# Model family restrictions for specific beta headers
# Only enforced if the version requirement is met
# Example: "effort-2025-11-24" requires Claude 4.5+ AND Opus family
BETA_HEADER_FAMILY_RESTRICTIONS: Dict[str, List[str]] = {
"effort-2025-11-24": ["opus"], # Only Opus 4.5+ supports effort
# Tool search works on Opus 4.5+ and Sonnet 4.5+, but not Haiku
"tool-search-tool-2025-10-19": ["opus", "sonnet"],
"tool-examples-2025-10-29": ["opus", "sonnet"],
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded model-specific flags violate codebase convention

BETA_HEADER_MINIMUM_VERSION and BETA_HEADER_FAMILY_RESTRICTIONS encode model capability knowledge directly in code. Per the repo's own convention (see the custom instructions), model-specific capability flags should live in model_prices_and_context_window.json and be accessed via get_model_info / supports_* helpers — not hardcoded as Python dicts. This is exactly the same pattern the convention calls out as "BAD":

# BAD example from the convention
"interleaved-thinking-2025-05-14": 4.0,   # ← hardcoded version knowledge
"effort-2025-11-24": ["opus"],             # ← hardcoded family knowledge

The practical consequence is that each time Anthropic/AWS expands a beta flag to a new model generation, someone needs to update this file — the same maintenance burden the convention explicitly wants to avoid.

The intended good approach would be to consult get_model_info or a supports_* function (e.g., supports_reasoning, supports_computer_use) derived from model_prices_and_context_window.json, making the filter self-updating as model entries are added.

Rule Used: What: Do not hardcode model-specific flags in the ... (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Respectfully pushing back on this one. model_prices_and_context_window.json has no per-beta-header capability fields — it would require updating 100+ model entries every time AWS adds support for a new header. The version-based approach in BETA_HEADER_MINIMUM_VERSION is intentionally more maintainable: new Claude models require zero code changes, and only new capability restrictions (not expansions) need an entry. This is exactly the future-proof design called out in the module docstring.

@codspeed-hq
Copy link
Contributor

codspeed-hq bot commented Mar 17, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing Quentin-M:fix/websearch-bedrock-thinking (bdf1260) with main (278c9ba)

Open in CodSpeed

Quentin-M and others added 4 commits March 17, 2026 00:10
…without thinking blocks

Follow-up to a494503f4b which fixed thinking + tool_use. That fix only
detected missing thinking blocks on assistant messages with tool_calls.
When the last assistant message has plain text content (no tool_calls),
the check returned False and thinking was not dropped, causing:
"Expected thinking or redacted_thinking, but found text"

Add last_assistant_message_has_no_thinking_blocks() to detect any
assistant message with content but no thinking blocks. Extract shared
_message_has_thinking_blocks() helper that checks both the
thinking_blocks field and content array for thinking/redacted_thinking
blocks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ock APIs

Bedrock doesn't support context_management as a request body parameter.
The feature is enabled via the anthropic-beta header (context-management-2025-06-27)
which was already handled correctly. Leaving context_management in the body causes:
"context_management: Extra inputs are not permitted"

Strip the parameter from all 3 Bedrock API paths:
- Invoke Messages API
- Invoke Chat API
- Converse API (additionalModelRequestFields)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…pport

Standardize anthropic-beta header handling across all Bedrock APIs
(Invoke Chat, Converse, Messages) using a centralized whitelist-based
filter with version-based model support.

- Inconsistent filtering: Invoke Chat used whitelist (safe),
  Converse/Messages used blacklist (allows unsupported headers through)
- Production risk: unsupported headers could cause AWS API errors
- Maintenance burden: adding new Claude models required updating
  multiple hardcoded lists

- Centralized BedrockBetaHeaderFilter with whitelist approach
- Version-based filtering (e.g., "requires 4.5+") instead of model lists
- Family restrictions (opus/sonnet/haiku) when needed
- Automatic header translation for backward compatibility

- Add `litellm/llms/bedrock/beta_headers_config.py`
  - BedrockBetaHeaderFilter class
  - Whitelist of 11 supported beta headers
  - Version/family restriction logic
  - Debug logging support

- Invoke Chat: Replace local whitelist with centralized filter
- Converse: Remove blacklist (30 lines), use whitelist filter
- Messages: Remove complex filter (55 lines), preserve translation

- Add `tests/test_litellm/llms/bedrock/test_beta_headers_config.py`
  - 40+ unit tests for filter logic
- Extend `tests/test_litellm/llms/bedrock/test_anthropic_beta_support.py`
  - 13 integration tests for API transformations
  - Verify filtering, version restrictions, translations

- Add `litellm/llms/bedrock/README.md`
  - Maintenance guide for adding new headers/models
- Enhanced module docstrings with examples

- Production safety: only whitelisted headers reach AWS
- Zero maintenance for new Claude models (Opus 5, Sonnet 5, etc.)
- Consistent filtering across all 3 APIs
- Preserved backward compatibility (advanced-tool-use translation)

```bash
poetry run pytest tests/test_litellm/llms/bedrock/ -v
```

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes issue where websearch interception failed with "TAVILY_API_KEY is not set"
error when using search providers that require API keys configured in the proxy
config rather than environment variables.

Extract api_key and api_base from the router search_tools litellm_params
configuration and pass them to litellm.asearch(). Falls back to environment
variables when credentials are not in the config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants