Skip to content

LiteLLM update-0218: rebase on upstream + fixes#1

Closed
Quentin-M wants to merge 1141 commits intomainfrom
update-0213
Closed

LiteLLM update-0218: rebase on upstream + fixes#1
Quentin-M wants to merge 1141 commits intomainfrom
update-0213

Conversation

@Quentin-M
Copy link
Owner

Summary

Rebase on BerriAI/litellm main (Feb 18, 2026) with the following fixes:

Websearch Interception

Bedrock

  • Centralized beta header filtering with version-based support (replaces inconsistent per-API filtering)
  • Fix version extraction regex in beta headers config
  • Remove invalid :0 suffix from Claude Opus 4.6 model ID
  • Strip context_management from request body for all Bedrock APIs (Invoke Messages, Invoke Chat, Converse)

Thinking

  • Drop thinking param when assistant messages have text without thinking blocks
  • Recognize adaptive thinking type in is_thinking_enabled (Opus 4.6)

Test Plan

  • All websearch interception tests pass (51 passed)
  • All bedrock beta header tests pass (69 passed)
  • All thinking tests pass (8 passed)
  • Docker image build + deploy

🤖 Generated with Claude Code

ishaan-jaff and others added 30 commits February 16, 2026 15:33
…tection (BerriAI#21342)

* Add 6 new EU PII patterns for GDPR compliance

- fr_nir: French Social Security Number (NIR/INSEE) with validation
- eu_iban_enhanced: Enhanced IBAN detection with specific format
- fr_phone: French phone numbers (+33, 0033, 0 formats)
- eu_vat: EU VAT identification numbers (all 27 member states)
- eu_passport_generic: Generic EU passport format
- fr_postal_code: French postal codes with contextual keywords

* Add GDPR Art. 32 EU PII Protection policy template

- Comprehensive GDPR Article 32 compliance policy
- 4 guardrail groups: National IDs, Financial, Contact Info, Business IDs
- Masks French NIR/INSEE, EU IBANs, French phones, EU VAT numbers
- Includes EU passport numbers and email addresses
- Medium complexity template with indigo icon

* Add comprehensive tests for EU PII patterns

- Test French NIR validation (sex digit, month range)
- Test enhanced IBAN detection (French, German)
- Test French phone number formats
- Test EU VAT numbers
- Test generic EU passport format
- Test French postal code pattern

* Add EU pattern loading and category validation tests

- Verify all 6 EU PII patterns are loaded correctly
- Verify patterns are categorized as 'EU PII Patterns'
- Ensure pattern loading consistency

* Add end-to-end tests for GDPR policy template

- 4 tests for PII that should be masked (NIR, IBAN, phone, VAT)
- 4 tests for text that should pass through (invalid patterns, no PII)
- 1 bonus test for multiple PII types in same message
- All tests verify correct masking behavior

* Add region field to policy templates

- Added region field to all 6 templates (EU, AU, Global)
- Updated both main and backup JSON files
- Enables region-based filtering in UI

* Add region filter to policy templates UI

- Added Radio.Group filter for regions (All, AU, EU, Global)
- Efficient filtering with useMemo hooks
- Clean button-based UI matching existing design
- Defaults missing regions to Global

* feat: add EU AI Act Article 5 policy template

Add policy template for detecting EU AI Act Article 5 prohibited practices using conditional keyword matching.

Coverage:
- Article 5.1.c: Social scoring systems
- Article 5.1.f: Emotion recognition in workplace/education
- Article 5.1.h: Biometric categorization of protected characteristics
- Article 5.1.a: Harmful manipulation techniques
- Article 5.1.b: Vulnerability exploitation

Implementation:
- Uses proven conditional matching pattern (identifier + block words)
- 10 always-block keywords for explicit violations
- 8 exceptions for research/compliance/entertainment
- Zero cost (<5ms), no external APIs, 100% private

* feat: add EU AI Act guardrail config example

Example configuration showing how to enable EU AI Act Article 5 guardrail.

* test: add 40 test cases for EU AI Act Article 5

Comprehensive test coverage:
- 10 always-block keywords (explicit violations)
- 15 conditional matches (identifier + block word)
- 8 exceptions (research, compliance, entertainment)
- 7 no-match cases (legitimate uses)

Tests validate correct blocking/allowing behavior for Article 5 prohibited practices.

* Fix: support standalone conditional matching without inherit_from

- Updated loading logic to activate conditional matching when either:
  1. identifier_words + inherit_from (existing pattern)
  2. identifier_words + additional_block_words (new standalone pattern)
- Modified _load_conditional_category to handle standalone templates
- EU AI Act template now works properly without inherit_from
- All 45 tests passing

Fixes Greptile feedback: conditional matching now activates for templates
that define additional_block_words without requiring inherit_from

* fix: address Greptile code review feedback (2/5 score)

- patterns.json: add keyword_pattern to eu_vat and eu_passport_generic
- patterns.json: fix fr_phone pattern with leading word boundary
- patterns.json: fix eu_iban_enhanced regex efficiency
- policy_templates.json: remove country-specific passport patterns from GDPR template
- policy_templates_backup.json: sync with main templates file
- test_gdpr_policy_e2e.py: update test setup and fix VAT test text

All tests now pass. Keyword guards prevent false positives.

* Fix: address Greptile pattern feedback

- Fix fr_phone: use negative lookbehind (?<!\d) to prevent false matches in digit strings
- Add keyword_pattern to eu_passport_generic to reduce false positives
- Add keyword_pattern to eu_vat for contextual matching

All pattern tests passing

* Update litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/patterns.json

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
- Fix HTTPException swallowed by broad except block in get_user_daily_activity
  and get_user_daily_activity_aggregated: re-raise HTTPException before the
  generic handler so 403 status codes propagate correctly
- Add status_code assertions in non-admin access tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Keep both structured output tests (ours) and min thinking budget tests
(staging). Accept staging poetry.lock.
- Default user_id to caller's own ID for non-admins instead of 403 when
  omitted, preserving backward compatibility for API consumers
- Apply same fix to aggregated endpoint
- Update test to verify defaulting behavior instead of expecting 403
- Add useEffect to sync selectedUserId when auth state settles in
  UsagePageView to handle async auth initialization

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…_by_user

[Feature] UI - Usage: Allow Filtering by User
* fix: SSO PKCE support fails in multi-pod Kubernetes deployments

* fix: virutal key grace period from env/UI

* fix: refactor, race condition handle, fstring sql injection

* fix: add async call to avoid server pauses

* Update tests/test_litellm/proxy/management_endpoints/test_ui_sso.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix: add await in tests

* add modify test to perform async run

* Update tests/test_litellm/proxy/management_endpoints/test_ui_sso.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update tests/test_litellm/proxy/management_endpoints/test_ui_sso.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix grace period with better error handling on frontend and as per best practices

* Update tests/test_litellm/proxy/management_endpoints/test_ui_sso.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix: as per request changes

* Update litellm/proxy/utils.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Fix errors when callbacks are invoked for file delete operations:

* Fix errors when callbacks are invoked for file operations

* Fix: pass deployment credentials to afile_retrieve in managed_files post-call hook

* Fix: bypass managed files access check in batch polling by calling afile_content directly

* Update tests/test_litellm/proxy/management_endpoints/test_ui_sso.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix: afile_retrieve returns unified ID for batch output files

* fix: batch retrieve returns unified input_file_id

* fix(chatgpt): drop unsupported responses params for Codex

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(chatgpt): ensure Codex request filters unsupported params

Co-authored-by: Cursor <cursoragent@cursor.com>

* Fix deleted managed files returning 403 instead of 404

* Add comments

* Update litellm/proxy/utils.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix: thread deployment model_info through batch cost calculation

batch_cost_calculator only checked the global cost map, ignoring
deployment-level custom pricing (input_cost_per_token_batches etc.).
Add optional model_info param through the batch cost chain and pass
it from CheckBatchCost.

* fix(deps): add pytest-postgresql for db schema migration tests

The test_db_schema_migration.py test requires pytest-postgresql but it was
missing from dependencies, causing import errors:

  ModuleNotFoundError: No module named 'pytest_postgresql'

Added pytest-postgresql ^6.0.0 to dev dependencies to fix test collection
errors in proxy_unit_tests.

This is a pre-existing issue, not related to PR BerriAI#21277.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix(test): replace caplog with custom handler for parallel execution

The cost calculation log level tests were failing when run with pytest-xdist
parallel execution because caplog doesn't work reliably across worker processes.
This causes "ValueError: I/O operation on closed file" errors.

Solution: Replace caplog fixture with a custom LogRecordHandler that directly
attaches to the logger. This approach works correctly in parallel execution
because each worker process has its own handler instance.

Fixes test failures in PR BerriAI#21277 when running with --dist=loadscope.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix(test): correct async mock for video generation logging test

The test was failing with AuthenticationError because the mock wasn't
intercepting the actual HTTP handler calls. This caused real API calls
with no API key, resulting in 401 errors.

Root cause: The test was patching the wrong target using string path
'litellm.videos.main.base_llm_http_handler' instead of using patch.object
on the actual handler instance. Additionally, it was mocking the sync
method instead of async_video_generation_handler.

Solution: Use patch.object with side_effect pattern on the correct
async handler method, following the same pattern used in
test_video_generation_async().

Fixes test failure in PR BerriAI#21277 when running with --dist=loadscope.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix(test): add cleanup fixture and no_parallel mark for MCP tests

Two MCP server tests were failing when run with pytest-xdist parallel
execution (--dist=loadscope):
- test_mcp_routing_with_conflicting_alias_and_group_name
- test_oauth2_headers_passed_to_mcp_client

Both tests showed assertion failures where mocks weren't being called
(0 times instead of expected 1 time).

Root cause: These tests rely on global_mcp_server_manager singleton
state and complex async mocking that doesn't work reliably with
parallel execution. Each worker process can have different state
and patches may not apply correctly.

Solution:
1. Added autouse fixture to clean up global_mcp_server_manager registry
   before and after each test for better isolation
2. Added @pytest.mark.no_parallel to these specific tests to ensure
   they run sequentially, avoiding parallel execution issues

This approach maintains test reliability while allowing other tests
in the file to still benefit from parallelization.

Fixes test failures exposed by PR BerriAI#21277.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* Regenerate poetry.lock with Poetry 2.3.2

Updated lock file to use Poetry 2.3.2 (matching main branch standard).
This addresses Greptile feedback about Poetry version mismatch.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* Remove unused pytest import and add trailing newline

- Removed unused pytest import (caplog fixture was removed)
- Added missing trailing newline at end of file

Addresses Greptile feedback (minor style issues).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* Remove redundant import inside test method

The module litellm.videos.main is already imported at the top of
the file (line 21), so the import inside the test method is redundant.

Addresses Greptile feedback (minor style issue).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* Fix converse anthropic usage object according to v1/messages specs

* Add routing based on if reasoning is supported or not

* add fireworks_ai/accounts/fireworks/models/kimi-k2p5 in model map

* Removed stray .md file

* fix(bedrock): clamp thinking.budget_tokens to minimum 1024

Bedrock rejects thinking.budget_tokens values below 1024 with a 400
error. This adds automatic clamping in the LiteLLM transformation
layer so callers (e.g. router with reasoning_effort="low") don't
need to know about the provider-specific minimum.

Fixes BerriAI#21297

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: improve Langfuse test isolation to prevent flaky failures (BerriAI#21093)

The test was creating fresh mocks but not fully isolating from setUp state,
causing intermittent CI failures with 'Expected generation to be called once.
Called 0 times.'

Instead of creating fresh mocks, properly reset the existing setUp mocks to
ensure clean state while maintaining proper mock chain configuration.

* feat(s3): add support for virtual-hosted-style URLs (BerriAI#21094)

Add s3_use_virtual_hosted_style parameter to support AWS S3 virtual-hosted-style URL format (bucket.endpoint/key) alongside the existing path-style format (endpoint/bucket/key).

This enables compatibility with S3-compatible services like MinIO and aligns with AWS S3 official terminology.

* Addressed greptile comments to extract common helpers and return 404

* Allow effort="max" for Claude Opus 4.6 (BerriAI#21112)

* fix(aiohttp): prevent closing shared ClientSession in AiohttpTransport (BerriAI#21117)

When a shared ClientSession is passed to LiteLLMAiohttpTransport,
calling aclose() on the transport would close the shared session,
breaking other clients still using it.

Add owns_session parameter (default True for backwards compatibility)
to AiohttpTransport and LiteLLMAiohttpTransport. When a shared session
is provided in http_handler.py, owns_session=False is set to prevent
the transport from closing a session it does not own.

This aligns AiohttpTransport with the ownership pattern already used
in AiohttpHandler (aiohttp_handler.py).

* perf(spend): avoid duplicate daily agent transaction computation (BerriAI#21187)

* fix: proxy/batches_endpoints/endpoints.py:309:11: PLR0915 Too many statements (54 > 50)

* fix mypy

* Add doc for OpenAI Agents SDK with LiteLLM

* Add doc for OpenAI Agents SDK with LiteLLM

* Update docs/my-website/sidebars.js

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix mypy

* Update tests/test_litellm/proxy/_experimental/mcp_server/test_mcp_server.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Add blog fffor Managing Anthropic Beta Headers

* Add blog fffor Managing Anthropic Beta Headers

* correct the time

* Fix: Exclude tool params for models without function calling support (BerriAI#21125) (BerriAI#21244)

* Fix tool params reported as supported for models without function calling (BerriAI#21125)

JSON-configured providers (e.g. PublicAI) inherited all OpenAI params
including tools, tool_choice, function_call, and functions — even for
models that don't support function calling. This caused an inconsistency
where get_supported_openai_params included "tools" but
supports_function_calling returned False.

The fix checks supports_function_calling in the dynamic config's
get_supported_openai_params and removes tool-related params when the
model doesn't support it. Follows the same pattern used by OVHCloud
and Fireworks AI providers.

* Style: move verbose_logger to module-level import, remove redundant try/except

Address review feedback from Greptile bot:
- Move verbose_logger import to top-level (matches project convention)
- Remove redundant try/except around supports_function_calling() since it
  already handles exceptions internally via _supports_factory()

* fix(index.md): cleanup str

* fix(proxy): handle missing DATABASE_URL in append_query_params (BerriAI#21239)

* fix: handle missing database url in append_query_params

* Update litellm/proxy/proxy_cli.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix(migrations): Make vector stores migration idempotent with IF NOT EXISTS

- Add IF NOT EXISTS to ALTER TABLE ADD COLUMN statements
- Add IF NOT EXISTS to CREATE INDEX statements
- Prevents migration failures when columns/indexes already exist from manual fixes
- Follows PostgreSQL best practices for idempotent migrations

---------

Co-authored-by: Harshit Jain <harshitjain0562@gmail.com>
Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Ephrim Stanley <ephrim.stanley@point72.com>
Co-authored-by: Jay Prajapati <79649559+jayy-77@users.noreply.github.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
Co-authored-by: mjkam <mjkam@naver.com>
Co-authored-by: Fly <48186978+tuzkiyoung@users.noreply.github.com>
Co-authored-by: Kristoffer Arlind <13228507+KristofferArlind@users.noreply.github.com>
Co-authored-by: Constantine <Runixer@gmail.com>
Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com>
Co-authored-by: Atharva Jaiswal <92455570+AtharvaJaiswal005@users.noreply.github.com>
Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
…erriAI#21349)

* feat: add GuardrailTracingDetail TypedDict and tracing fields to StandardLoggingGuardrailInformation

* feat: add policy_template field to Guardrail config TypedDict

* feat: accept GuardrailTracingDetail in base guardrail logging method

* feat: populate tracing fields in content filter guardrail

* test: add tracing fields tests for custom guardrail base class

* test: add tracing fields e2e tests for content filter guardrail

* feat: add guardrail tracing UI - policy badges, match details, timeline

* feat: redesign GuardrailViewer to Guardrails & Policy Compliance layout

Two-column layout with request lifecycle timeline on the left
and compact evaluation detail cards on the right. Header shows
guardrail count, pass/fail status, total overhead, policy info,
and an export button.

* feat: add clickable guardrail link in metrics + show policy names

* feat: add risk_score field to StandardLoggingGuardrailInformation

* feat: compute risk_score in content filter guardrail

* feat: display backend risk_score badge on evaluation cards

* fix: fallback to frontend risk score when backend doesn't provide one
…-structured-outputs

feat(bedrock): support native structured outputs API (outputConfig.textFormat)
…ude-opus-4.6-fast (BerriAI#21316)

Add missing GitHub Copilot model entries for gpt-5.3-codex (GA) and
claude-opus-4.6-fast (Public Preview) to both the root and backup
model pricing JSON files.
…proxy layer (BerriAI#19912)

PR BerriAI#21039 fixed OAuth token handling at the LLM layer (Authorization: Bearer
instead of x-api-key), but the proxy layer still strips the Authorization
header in clean_headers() before it reaches the Anthropic code. This breaks
OAuth for proxy users (e.g., Claude Code Max through LiteLLM proxy).

Changes:
- Add is_anthropic_oauth_key() helper to detect OAuth tokens (sk-ant-oat*)
- Preserve OAuth Authorization headers in clean_headers() instead of stripping
- Forward OAuth Authorization via ProviderSpecificHeader in
  add_provider_specific_headers_to_request() so tokens only reach
  Anthropic-compatible providers (anthropic, bedrock, vertex_ai)

Fixes BerriAI#19618

Co-authored-by: Adam Reed <iamadamreed@users.noreply.github.com>
* feat: Add IBM watsonx.ai rerank support

* feat: added unit tests

* fix docstring

* added documentataion

* Update litellm/llms/watsonx/rerank/transformation.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update litellm/rerank_api/main.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update litellm/llms/watsonx/rerank/transformation.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* update validate_environment signature

* fix ruff check and mypy

* fix CR

* CR fix

* CR fix

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Sameerlite and others added 27 commits February 18, 2026 18:39
…_managed_access

Add File deletion criteria with batch references
…_vllm

Incident Report: vLLM Embeddings Broken by encoding_format Parameter
[Feat]Add day 0 claude sonnet 4.6 feat support
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
…pired-key-cleanup

fix(tests): restore proxy_server module attrs after test_proxy_admin_expired_key_from_cache
…rix-ci

fix(ci): add prisma generate step to matrix CI workflow
The model_prices_and_context_window_backup.json file has 'inference_geo'
fields (e.g. on 'us/claude-sonnet-4-6') for geo-prefixed Anthropic models
used in cost calculation, but the JSON schema validator in test_utils.py
did not include 'inference_geo' as an allowed property.

This caused test_aaamodel_prices_and_context_window_json_is_valid to fail
with: Additional properties are not allowed ('inference_geo' was unexpected)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ation.py

The file had two unresolved git merge conflict markers from a merge of
litellm_oss_staging_02_17_2026 into main, causing a SyntaxError when
pytest tried to collect the test module.

Kept the instance-level mocking approach (from litellm_oss_staging) for
test_get_complete_url and test_validate_environment, which is consistent
with the rest of the file and avoids class-reference issues caused by
importlib.reload(litellm) in conftest.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…_tags

feat(datadog): add 'team' tag to logs, metrics, and cost management
…ate_many

Prisma's create_many() requires JSON fields to be wrapped in prisma.Json()
not passed as raw JSON strings. Lines 3099-3100 were using safe_dumps()
(which returns str) instead of prisma.Json(), causing Prisma validation
errors during master key rotation.

This is consistent with the existing pattern in the same file (line 3134
already uses prisma.Json for litellm_config env vars).

The regression test test_rotate_master_key_model_data_valid_for_prisma
was already correctly asserting isinstance(..., prisma.Json) — the test
exposed the mismatch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-conflict

fix(tests): resolve merge conflict in test_vertex_ai_rerank_transformation.py
…many

fix(proxy): use prisma.Json for JSON fields in _rotate_master_key create_many()
…o-property

fix(tests): add inference_geo to model prices JSON schema validator
When Anthropic's extended thinking is enabled, assistant messages must start with thinking blocks before tool_use blocks. The agentic loop was creating follow-up messages with only tool_use blocks, causing validation errors. This change ensures thinking blocks from the original response are preserved and included at the start of follow-up assistant messages.

- Created `TransformRequestResult` NamedTuple to capture both tool_calls and thinking_blocks from `transform_request()`, making the contract explicit and extensible
- Modified `transform_request()` to extract and return thinking/redacted_thinking blocks alongside tool calls
- Updated `transform_response()` to accept thinking_blocks and prepend them to follow-up assistant messages
- Passed thinking_blocks through the agentic loop chain: detection → execution → message transformation
- Fixed `transform_request()` to return full kwargs (not just tools) to preserve other request parameters
- Used `filter_internal_params()` utility instead of manual filtering for consistency

This change fixes websearch interception when extended thinking mode is enabled.

**Problem**: When Anthropic's extended thinking is enabled, assistant messages must start with thinking blocks before tool_use blocks. The agentic loop was creating follow-up messages with only tool_use blocks, causing the error: `messages.1.content.0.type: Expected 'thinking' or 'redacted_thinking', but found 'tool_use'`

**Solution**: Modified `transform_request()` to capture thinking/redacted_thinking blocks from the original response, and `transform_response()` to include them at the start of the assistant message in follow-up requests.

**Testing**: Successfully tested end-to-end with Claude Code → LiteLLM Proxy → AWS Bedrock → Claude Opus 4.5.

```yaml
model_list:
  - model_name: claude-opus-4-5-20251101
    litellm_params:
      model: bedrock/us.anthropic.claude-opus-4-5-20251101-v1:0
      aws_region_name: us-west-2
    model_info:
      supports_web_search: true
litellm_settings:
  callbacks: ["websearch_interception"]
  websearch_interception_params:
    enabled_providers: ["bedrock"]
    search_tool_name: "searxng-search"
search_tools:
  - search_tool_name: searxng-search
    litellm_params:
      search_provider: searxng
      api_base: "https://searxng.example.com"
```

**Note**: Uses `bedrock/` (not `bedrock/converse/`) to route through `anthropic_messages_handler()` which supports agentic hooks.
Fixes issue where websearch interception failed with "TAVILY_API_KEY is not set"
error when using search providers that require API keys.

Changes:
- Extract api_key and api_base from router search_tools configuration
- Pass credentials to litellm.asearch() when available
- Falls back to environment variables when credentials not in config
- Maintains backward compatibility with existing configurations

Root cause:
Handler was only extracting search_provider from router config, but not the
associated api_key and api_base fields. This caused litellm.asearch() to fall
back to environment variables, which failed when keys weren't set in env.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes websearch interception failures when thinking.budget_tokens is set
and requests violate Anthropic's requirement: max_tokens > budget_tokens.

Changes:
- Validate max_tokens against thinking.budget_tokens when extended thinking is enabled
- Automatically adjust max_tokens to budget_tokens + DEFAULT_MAX_TOKENS (4096) when insufficient
- Follows the same pattern as base transformation classes in LiteLLM

This prevents the error: "max_tokens must be greater than thinking.budget_tokens"
when using extended thinking with websearch interception.

Related issue: BerriAI#14194

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…pport

Standardize anthropic-beta header handling across all Bedrock APIs
(Invoke Chat, Converse, Messages) using a centralized whitelist-based
filter with version-based model support.

- Inconsistent filtering: Invoke Chat used whitelist (safe),
  Converse/Messages used blacklist (allows unsupported headers through)
- Production risk: unsupported headers could cause AWS API errors
- Maintenance burden: adding new Claude models required updating
  multiple hardcoded lists

- Centralized BedrockBetaHeaderFilter with whitelist approach
- Version-based filtering (e.g., "requires 4.5+") instead of model lists
- Family restrictions (opus/sonnet/haiku) when needed
- Automatic header translation for backward compatibility

- Add `litellm/llms/bedrock/beta_headers_config.py`
  - BedrockBetaHeaderFilter class
  - Whitelist of 11 supported beta headers
  - Version/family restriction logic
  - Debug logging support

- Invoke Chat: Replace local whitelist with centralized filter
- Converse: Remove blacklist (30 lines), use whitelist filter
- Messages: Remove complex filter (55 lines), preserve translation

- Add `tests/test_litellm/llms/bedrock/test_beta_headers_config.py`
  - 40+ unit tests for filter logic
- Extend `tests/test_litellm/llms/bedrock/test_anthropic_beta_support.py`
  - 13 integration tests for API transformations
  - Verify filtering, version restrictions, translations

- Add `litellm/llms/bedrock/README.md`
  - Maintenance guide for adding new headers/models
- Enhanced module docstrings with examples

- Production safety: only whitelisted headers reach AWS
- Zero maintenance for new Claude models (Opus 5, Sonnet 5, etc.)
- Consistent filtering across all 3 APIs
- Preserved backward compatibility (advanced-tool-use translation)

```bash
poetry run pytest tests/test_litellm/llms/bedrock/ -v
```

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
AWS Bedrock does not recognize anthropic.claude-opus-4-6-v1:0 as a valid
model identifier. Unlike other Claude models, Opus 4.6 requires the model
ID without the :0 version suffix: anthropic.claude-opus-4-6-v1.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ock APIs

Bedrock doesn't support context_management as a request body parameter.
The feature is enabled via the anthropic-beta header (context-management-2025-06-27)
which was already handled correctly. Leaving context_management in the body causes:
"context_management: Extra inputs are not permitted"

Strip the parameter from all 3 Bedrock API paths:
- Invoke Messages API
- Invoke Chat API
- Converse API (additionalModelRequestFields)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…without thinking blocks

Follow-up to a494503f4b which fixed thinking + tool_use. That fix only
detected missing thinking blocks on assistant messages with tool_calls.
When the last assistant message has plain text content (no tool_calls),
the check returned False and thinking was not dropped, causing:
"Expected thinking or redacted_thinking, but found text"

Add last_assistant_message_has_no_thinking_blocks() to detect any
assistant message with content but no thinking blocks. Extract shared
_message_has_thinking_blocks() helper that checks both the
thinking_blocks field and content array for thinking/redacted_thinking
blocks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Upstream only checks for type="enabled" but Opus 4.6 uses type="adaptive".
Without this fix, max_tokens auto-adjustment doesn't trigger for adaptive
thinking, causing API errors.
The MODEL_VERSION_PATTERN regex had double-escaped backslashes (\\d
instead of \d) in a raw string, causing it to never match any model
ID. Also constrained minor version capture to a single digit followed
by a hyphen to avoid capturing the 8-digit date suffix as the minor
version.

Additionally added computer-use-2025-11-24 to the whitelist (used by
upstream for Opus 4.5+) and updated integration tests to use model IDs
compatible with the version-gated beta headers they test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Quentin-M Quentin-M closed this Feb 18, 2026
@Quentin-M Quentin-M deleted the update-0213 branch February 18, 2026 15:34
Quentin-M pushed a commit that referenced this pull request Mar 13, 2026
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.