diff --git a/docs/my-website/docs/tutorials/claude_code_websearch.md b/docs/my-website/docs/tutorials/claude_code_websearch.md new file mode 100644 index 00000000000..cc2f79666da --- /dev/null +++ b/docs/my-website/docs/tutorials/claude_code_websearch.md @@ -0,0 +1,192 @@ +# Claude Code - WebSearch Across All Providers + +Enable Claude Code's web search tool to work with any provider (Bedrock, Azure, Vertex, etc.). LiteLLM automatically intercepts web search requests and executes them server-side. + +## Proxy Configuration + +Add WebSearch interception to your `litellm_config.yaml`: + +```yaml +model_list: + - model_name: bedrock-sonnet + litellm_params: + model: bedrock/us.anthropic.claude-sonnet-4-5-20250929-v1:0 + aws_region_name: us-east-1 + +# Enable WebSearch interception for providers +litellm_settings: + callbacks: + - websearch_interception: + enabled_providers: + - bedrock + - azure + - vertex_ai + search_tool_name: perplexity-search # Optional: specific search tool + +# Configure search provider +search_tools: + - search_tool_name: perplexity-search + litellm_params: + search_provider: perplexity + api_key: os.environ/PERPLEXITY_API_KEY +``` + +## Quick Start + +### 1. Configure LiteLLM Proxy + +Create `config.yaml`: + +```yaml +model_list: + - model_name: bedrock-sonnet + litellm_params: + model: bedrock/us.anthropic.claude-sonnet-4-5-20250929-v1:0 + aws_region_name: us-east-1 + +litellm_settings: + callbacks: + - websearch_interception: + enabled_providers: [bedrock] + +search_tools: + - search_tool_name: perplexity-search + litellm_params: + search_provider: perplexity + api_key: os.environ/PERPLEXITY_API_KEY +``` + +### 2. Start Proxy + +```bash +export PERPLEXITY_API_KEY=your-key +litellm --config config.yaml +``` + +### 3. Use with Claude Code + +```bash +export ANTHROPIC_BASE_URL=http://localhost:4000 +export ANTHROPIC_API_KEY=sk-1234 +claude +``` + +Now use web search in Claude Code - it works with any provider! + +## How It Works + +When Claude Code sends a web search request, LiteLLM: +1. Intercepts the native `web_search` tool +2. Converts it to LiteLLM's standard format +3. Executes the search via Perplexity/Tavily +4. Returns the final answer to Claude Code + +```mermaid +sequenceDiagram + participant CC as Claude Code + participant LP as LiteLLM Proxy + participant B as Bedrock/Azure/etc + participant P as Perplexity/Tavily + + CC->>LP: Request with web_search tool + Note over LP: Convert native tool
to LiteLLM format + LP->>B: Request with converted tool + B-->>LP: Response: tool_use + Note over LP: Detect web search
tool_use + LP->>P: Execute search + P-->>LP: Search results + LP->>B: Follow-up with results + B-->>LP: Final answer + LP-->>CC: Final answer with search results +``` + +**Result**: One API call from Claude Code → Complete answer with search results + +## Supported Providers + +| Provider | Native Web Search | With LiteLLM | +|----------|-------------------|--------------| +| **Anthropic** | ✅ Yes | ✅ Yes | +| **Bedrock** | ❌ No | ✅ Yes | +| **Azure** | ❌ No | ✅ Yes | +| **Vertex AI** | ❌ No | ✅ Yes | +| **Other Providers** | ❌ No | ✅ Yes | + +## Search Providers + +Configure which search provider to use. LiteLLM supports multiple search providers: + +| Provider | Configuration | +|----------|---------------| +| **Perplexity** | `search_provider: perplexity` | +| **Tavily** | `search_provider: tavily` | + +See [all supported search providers](../search/index.md) for the complete list. + +## Configuration Options + +### WebSearch Interception Parameters + +| Parameter | Type | Required | Description | Example | +|-----------|------|----------|-------------|---------| +| `enabled_providers` | List[String] | Yes | List of providers to enable web search interception for | `[bedrock, azure, vertex_ai]` | +| `search_tool_name` | String | No | Specific search tool from `search_tools` config. If not set, uses first available search tool. | `perplexity-search` | + +### Supported Provider Values + +Use these values in `enabled_providers`: + +| Provider | Value | Description | +|----------|-------|-------------| +| AWS Bedrock | `bedrock` | Amazon Bedrock Claude models | +| Azure OpenAI | `azure` | Azure-hosted models | +| Google Vertex AI | `vertex_ai` | Google Cloud Vertex AI | +| Any Other | Provider name | Any LiteLLM-supported provider | + +### Complete Configuration Example + +```yaml +model_list: + - model_name: bedrock-sonnet + litellm_params: + model: bedrock/us.anthropic.claude-sonnet-4-5-20250929-v1:0 + aws_region_name: us-east-1 + + - model_name: azure-gpt4 + litellm_params: + model: azure/gpt-4 + api_base: https://my-azure.openai.azure.com + api_key: os.environ/AZURE_API_KEY + +litellm_settings: + callbacks: + - websearch_interception: + enabled_providers: + - bedrock # Enable for AWS Bedrock + - azure # Enable for Azure OpenAI + - vertex_ai # Enable for Google Vertex + search_tool_name: perplexity-search # Optional: use specific search tool + +# Configure search tools +search_tools: + - search_tool_name: perplexity-search + litellm_params: + search_provider: perplexity + api_key: os.environ/PERPLEXITY_API_KEY + + - search_tool_name: tavily-search + litellm_params: + search_provider: tavily + api_key: os.environ/TAVILY_API_KEY +``` + +**How search tool selection works:** +- If `search_tool_name` is specified → Uses that specific search tool +- If `search_tool_name` is not specified → Uses first search tool in `search_tools` list +- In example above: Without `search_tool_name`, would use `perplexity-search` (first in list) + +## Related + +- [Claude Code Quickstart](./claude_responses_api.md) +- [Claude Code Cost Tracking](./claude_code_customer_tracking.md) +- [Using Non-Anthropic Models](./claude_non_anthropic_models.md) diff --git a/docs/my-website/sidebars.js b/docs/my-website/sidebars.js index fd651c3adc5..102e3dfe1c5 100644 --- a/docs/my-website/sidebars.js +++ b/docs/my-website/sidebars.js @@ -122,6 +122,7 @@ const sidebars = { items: [ "tutorials/claude_responses_api", "tutorials/claude_code_customer_tracking", + "tutorials/claude_code_websearch", "tutorials/claude_mcp", "tutorials/claude_non_anthropic_models", ] diff --git a/litellm/constants.py b/litellm/constants.py index dba79b2f186..3bdd943481e 100644 --- a/litellm/constants.py +++ b/litellm/constants.py @@ -329,6 +329,11 @@ "medium": 5, "high": 10, } + +# LiteLLM standard web search tool name +# Used for web search interception across providers +LITELLM_WEB_SEARCH_TOOL_NAME = "litellm_web_search" + DEFAULT_IMAGE_ENDPOINT_MODEL = "dall-e-2" DEFAULT_VIDEO_ENDPOINT_MODEL = "sora-2" diff --git a/litellm/integrations/custom_logger.py b/litellm/integrations/custom_logger.py index 317613420a5..12243a19184 100644 --- a/litellm/integrations/custom_logger.py +++ b/litellm/integrations/custom_logger.py @@ -143,6 +143,34 @@ async def async_log_stream_event(self, kwargs, response_obj, start_time, end_tim async def async_log_pre_api_call(self, model, messages, kwargs): pass + async def async_pre_request_hook( + self, model: str, messages: List, kwargs: Dict + ) -> Optional[Dict]: + """ + Hook called before making the API request to allow modifying request parameters. + + This is specifically designed for modifying the request before it's sent to the provider. + Unlike async_log_pre_api_call (which is for logging), this hook is meant for transformations. + + Args: + model: The model name + messages: The messages list + kwargs: The request parameters (tools, stream, temperature, etc.) + + Returns: + Optional[Dict]: Modified kwargs to use for the request, or None if no modifications + + Example: + ```python + async def async_pre_request_hook(self, model, messages, kwargs): + # Convert native tools to standard format + if kwargs.get("tools"): + kwargs["tools"] = convert_tools(kwargs["tools"]) + return kwargs + ``` + """ + pass + async def async_log_success_event(self, kwargs, response_obj, start_time, end_time): pass diff --git a/litellm/integrations/prometheus.py b/litellm/integrations/prometheus.py index 1e1da803e48..b490c21174f 100644 --- a/litellm/integrations/prometheus.py +++ b/litellm/integrations/prometheus.py @@ -21,7 +21,12 @@ import litellm from litellm._logging import print_verbose, verbose_logger from litellm.integrations.custom_logger import CustomLogger -from litellm.proxy._types import LiteLLM_TeamTable, LiteLLM_UserTable, UserAPIKeyAuth +from litellm.proxy._types import ( + LiteLLM_DeletedVerificationToken, + LiteLLM_TeamTable, + LiteLLM_UserTable, + UserAPIKeyAuth, +) from litellm.types.integrations.prometheus import * from litellm.types.integrations.prometheus import _sanitize_prometheus_label_name from litellm.types.utils import StandardLoggingPayload @@ -2153,7 +2158,7 @@ async def _initialize_budget_metrics( self, data_fetch_function: Callable[..., Awaitable[Tuple[List[Any], Optional[int]]]], set_metrics_function: Callable[[List[Any]], Awaitable[None]], - data_type: Literal["teams", "keys"], + data_type: Literal["teams", "keys", "users"], ): """ Generic method to initialize budget metrics for teams or API keys. @@ -2245,7 +2250,7 @@ async def _initialize_api_key_budget_metrics(self): async def fetch_keys( page_size: int, page: int - ) -> Tuple[List[Union[str, UserAPIKeyAuth]], Optional[int]]: + ) -> Tuple[List[Union[str, UserAPIKeyAuth, LiteLLM_DeletedVerificationToken]], Optional[int]]: key_list_response = await _list_key_helper( prisma_client=prisma_client, page=page, diff --git a/litellm/integrations/websearch_interception/ARCHITECTURE.md b/litellm/integrations/websearch_interception/ARCHITECTURE.md index 345741c3c03..3aa0a1558d7 100644 --- a/litellm/integrations/websearch_interception/ARCHITECTURE.md +++ b/litellm/integrations/websearch_interception/ARCHITECTURE.md @@ -7,6 +7,98 @@ Server-side WebSearch tool execution for models that don't natively support it ( User makes **ONE** `litellm.messages.acreate()` call → Gets final answer with search results. The agentic loop happens transparently on the server. +## LiteLLM Standard Web Search Tool + +LiteLLM defines a standard web search tool format (`litellm_web_search`) that all native provider tools are converted to. This enables consistent interception across providers. + +**Standard Tool Definition** (defined in `tools.py`): +```python +{ + "name": "litellm_web_search", + "description": "Search the web for information...", + "input_schema": { + "type": "object", + "properties": { + "query": {"type": "string", "description": "The search query"} + }, + "required": ["query"] + } +} +``` + +**Tool Name Constant**: `LITELLM_WEB_SEARCH_TOOL_NAME = "litellm_web_search"` (defined in `litellm/constants.py`) + +### Supported Tool Formats + +The interception system automatically detects and handles: + +| Tool Format | Example | Provider | Detection Method | Future-Proof | +|-------------|---------|----------|------------------|-------------| +| **LiteLLM Standard** | `name="litellm_web_search"` | Any | Direct name match | N/A | +| **Anthropic Native** | `type="web_search_20250305"` | Bedrock, Claude API | Type prefix: `startswith("web_search_")` | ✅ Yes (web_search_2026, etc.) | +| **Claude Code CLI** | `name="web_search"`, `type="web_search_20250305"` | Claude Code | Name + type check | ✅ Yes (version-agnostic) | +| **Legacy** | `name="WebSearch"` | Custom | Name match | N/A (backwards compat) | + +**Future Compatibility**: The `startswith("web_search_")` check in `tools.py` automatically supports future Anthropic web search versions. + +### Claude Code CLI Integration + +Claude Code (Anthropic's official CLI) sends web search requests using Anthropic's native tool format: + +```python +{ + "type": "web_search_20250305", + "name": "web_search", + "max_uses": 8 +} +``` + +**What Happens:** +1. Claude Code sends native `web_search_20250305` tool to LiteLLM proxy +2. LiteLLM intercepts and converts to `litellm_web_search` standard format +3. Bedrock receives converted tool (NOT native format) +4. Model returns `tool_use` block for `litellm_web_search` (not `server_tool_use`) +5. LiteLLM's agentic loop intercepts the `tool_use` +6. Executes `litellm.asearch()` using configured provider (Perplexity, Tavily, etc.) +7. Returns final answer to Claude Code user + +**Without Interception**: Bedrock would receive native tool → try to execute natively → return `web_search_tool_result_error` with `invalid_tool_input` + +**With Interception**: LiteLLM converts → Bedrock returns tool_use → LiteLLM executes search → Returns final answer ✅ + +### Native Tool Conversion + +Native tools are converted to LiteLLM standard format **before** sending to the provider: + +1. **Conversion Point** (`litellm/llms/anthropic/experimental_pass_through/messages/handler.py`): + - In `anthropic_messages()` function (lines 60-127) + - Runs BEFORE the API request is made + - Detects native web search tools using `is_web_search_tool()` + - Converts to `litellm_web_search` format using `get_litellm_web_search_tool()` + - Prevents provider from executing search natively (avoids `web_search_tool_result_error`) + +2. **Response Detection** (`transformation.py`): + - Detects `tool_use` blocks with any web search tool name + - Handles: `litellm_web_search`, `WebSearch`, `web_search` + - Extracts search queries for execution + +**Example Conversion**: +```python +# Input (Claude Code's native tool) +{ + "type": "web_search_20250305", + "name": "web_search", + "max_uses": 8 +} + +# Output (LiteLLM standard) +{ + "name": "litellm_web_search", + "description": "Search the web for information...", + "input_schema": {...} +} +``` + --- ## Request Flow @@ -63,6 +155,9 @@ sequenceDiagram | Component | File | Purpose | |-----------|------|---------| | **WebSearchInterceptionLogger** | `handler.py` | CustomLogger that implements agentic loop hooks | +| **Tool Standardization** | `tools.py` | Standard tool definition, detection, and utilities | +| **Tool Name Constant** | `constants.py` | `LITELLM_WEB_SEARCH_TOOL_NAME = "litellm_web_search"` | +| **Tool Conversion** | `anthropic/.../ handler.py` | Converts native tools to LiteLLM standard before API call | | **Transformation Logic** | `transformation.py` | Detect tool_use, build tool_result messages, format search responses | | **Agentic Loop Hooks** | `integrations/custom_logger.py` | Base hooks: `async_should_run_agentic_loop()`, `async_run_agentic_loop()` | | **Hook Orchestration** | `llms/custom_httpx/llm_http_handler.py` | `_call_agentic_completion_hooks()` - calls hooks after response | @@ -74,7 +169,10 @@ sequenceDiagram ## Configuration ```python -from litellm.integrations.websearch_interception import WebSearchInterceptionLogger +from litellm.integrations.websearch_interception import ( + WebSearchInterceptionLogger, + get_litellm_web_search_tool, +) from litellm.types.utils import LlmProviders # Enable for Bedrock with specific search tool @@ -85,13 +183,25 @@ litellm.callbacks = [ ) ] -# Make request (streaming or non-streaming both work) +# Make request with LiteLLM standard tool (recommended) +response = await litellm.messages.acreate( + model="bedrock/us.anthropic.claude-sonnet-4-5-20250929-v1:0", + messages=[{"role": "user", "content": "What is LiteLLM?"}], + tools=[get_litellm_web_search_tool()], # LiteLLM standard + max_tokens=1024, + stream=True # Auto-converted to non-streaming +) + +# OR send native tools - they're auto-converted to LiteLLM standard response = await litellm.messages.acreate( - model="bedrock/us.anthropic.claude-3-5-sonnet-20241022-v2:0", + model="bedrock/us.anthropic.claude-sonnet-4-5-20250929-v1:0", messages=[{"role": "user", "content": "What is LiteLLM?"}], - tools=[{"name": "WebSearch", ...}], + tools=[{ + "type": "web_search_20250305", # Native Anthropic format + "name": "web_search", + "max_uses": 8 + }], max_tokens=1024, - stream=True # Streaming is automatically converted to non-streaming for WebSearch ) ``` diff --git a/litellm/integrations/websearch_interception/__init__.py b/litellm/integrations/websearch_interception/__init__.py index c0feb5235e2..f5b1963c1cf 100644 --- a/litellm/integrations/websearch_interception/__init__.py +++ b/litellm/integrations/websearch_interception/__init__.py @@ -8,5 +8,13 @@ from litellm.integrations.websearch_interception.handler import ( WebSearchInterceptionLogger, ) +from litellm.integrations.websearch_interception.tools import ( + get_litellm_web_search_tool, + is_web_search_tool, +) -__all__ = ["WebSearchInterceptionLogger"] +__all__ = [ + "WebSearchInterceptionLogger", + "get_litellm_web_search_tool", + "is_web_search_tool", +] diff --git a/litellm/integrations/websearch_interception/handler.py b/litellm/integrations/websearch_interception/handler.py index 0b08bc2312a..943a2bb4f36 100644 --- a/litellm/integrations/websearch_interception/handler.py +++ b/litellm/integrations/websearch_interception/handler.py @@ -12,7 +12,12 @@ import litellm from litellm._logging import verbose_logger from litellm.anthropic_interface import messages as anthropic_messages +from litellm.constants import LITELLM_WEB_SEARCH_TOOL_NAME from litellm.integrations.custom_logger import CustomLogger +from litellm.integrations.websearch_interception.tools import ( + get_litellm_web_search_tool, + is_web_search_tool, +) from litellm.integrations.websearch_interception.transformation import ( WebSearchTransformation, ) @@ -57,6 +62,55 @@ def __init__( for p in enabled_providers ] self.search_tool_name = search_tool_name + self._request_has_websearch = False # Track if current request has web search + + async def async_pre_call_deployment_hook( + self, kwargs: Dict[str, Any], call_type: Optional[Any] + ) -> Optional[dict]: + """ + Pre-call hook to convert native Anthropic web_search tools to regular tools. + + This prevents Bedrock from trying to execute web search server-side (which fails). + Instead, we convert it to a regular tool so the model returns tool_use blocks + that we can intercept and execute ourselves. + """ + # Check if this is for an enabled provider + custom_llm_provider = kwargs.get("litellm_params", {}).get("custom_llm_provider", "") + if custom_llm_provider not in self.enabled_providers: + return None + + # Check if request has tools with native web_search + tools = kwargs.get("tools") + if not tools: + return None + + # Check if any tool is a web search tool (native or already LiteLLM standard) + has_websearch = any(is_web_search_tool(t) for t in tools) + + if not has_websearch: + return None + + verbose_logger.debug( + "WebSearchInterception: Converting native web_search tools to LiteLLM standard" + ) + + # Convert native/custom web_search tools to LiteLLM standard + converted_tools = [] + for tool in tools: + if is_web_search_tool(tool): + # Convert to LiteLLM standard web search tool + converted_tool = get_litellm_web_search_tool() + converted_tools.append(converted_tool) + verbose_logger.debug( + f"WebSearchInterception: Converted {tool.get('name', 'unknown')} " + f"(type={tool.get('type', 'none')}) to {LITELLM_WEB_SEARCH_TOOL_NAME}" + ) + else: + # Keep other tools as-is + converted_tools.append(tool) + + # Return modified kwargs with converted tools + return {"tools": converted_tools} @classmethod def from_config_yaml( @@ -104,6 +158,83 @@ def from_config_yaml( search_tool_name=search_tool_name, ) + async def async_pre_request_hook( + self, model: str, messages: List[Dict], kwargs: Dict + ) -> Optional[Dict]: + """ + Pre-request hook to convert native web search tools to LiteLLM standard. + + This hook is called before the API request is made, allowing us to: + 1. Detect native web search tools (web_search_20250305, etc.) + 2. Convert them to LiteLLM standard format (litellm_web_search) + 3. Convert stream=True to stream=False for interception + + This prevents providers like Bedrock from trying to execute web search + natively (which fails), and ensures our agentic loop can intercept tool_use. + + Returns: + Modified kwargs dict with converted tools, or None if no modifications needed + """ + # Check if this request is for an enabled provider + custom_llm_provider = kwargs.get("litellm_params", {}).get( + "custom_llm_provider", "" + ) + + verbose_logger.debug( + f"WebSearchInterception: Pre-request hook called" + f" - custom_llm_provider={custom_llm_provider}" + f" - enabled_providers={self.enabled_providers}" + ) + + if custom_llm_provider not in self.enabled_providers: + verbose_logger.debug( + f"WebSearchInterception: Skipping - provider {custom_llm_provider} not in {self.enabled_providers}" + ) + return None + + # Check if request has tools + tools = kwargs.get("tools") + if not tools: + return None + + # Check if any tool is a web search tool + has_websearch = any(is_web_search_tool(t) for t in tools) + if not has_websearch: + return None + + verbose_logger.debug( + f"WebSearchInterception: Pre-request hook triggered for provider={custom_llm_provider}" + ) + + # Convert native web search tools to LiteLLM standard + converted_tools = [] + for tool in tools: + if is_web_search_tool(tool): + standard_tool = get_litellm_web_search_tool() + converted_tools.append(standard_tool) + verbose_logger.debug( + f"WebSearchInterception: Converted {tool.get('name', 'unknown')} " + f"(type={tool.get('type', 'none')}) to {LITELLM_WEB_SEARCH_TOOL_NAME}" + ) + else: + converted_tools.append(tool) + + # Update kwargs with converted tools + kwargs["tools"] = converted_tools + verbose_logger.debug( + f"WebSearchInterception: Tools after conversion: {[t.get('name') for t in converted_tools]}" + ) + + # Convert stream=True to stream=False for WebSearch interception + if kwargs.get("stream"): + verbose_logger.debug( + "WebSearchInterception: Converting stream=True to stream=False" + ) + kwargs["stream"] = False + kwargs["_websearch_interception_converted_stream"] = True + + return kwargs + async def async_should_run_agentic_loop( self, response: Any, @@ -128,11 +259,11 @@ async def async_should_run_agentic_loop( ) return False, {} - # Check if tools include WebSearch - has_websearch_tool = any(t.get("name") == "WebSearch" for t in (tools or [])) + # Check if tools include any web search tool (LiteLLM standard or native) + has_websearch_tool = any(is_web_search_tool(t) for t in (tools or [])) if not has_websearch_tool: verbose_logger.debug( - "WebSearchInterception: No WebSearch tool in request" + "WebSearchInterception: No web search tool in request" ) return False, {} diff --git a/litellm/integrations/websearch_interception/tools.py b/litellm/integrations/websearch_interception/tools.py new file mode 100644 index 00000000000..4f8b7372fe3 --- /dev/null +++ b/litellm/integrations/websearch_interception/tools.py @@ -0,0 +1,95 @@ +""" +LiteLLM Web Search Tool Definition + +This module defines the standard web search tool used across LiteLLM. +Native provider tools (like Anthropic's web_search_20250305) are converted +to this format for consistent interception and execution. +""" + +from typing import Any, Dict + +from litellm.constants import LITELLM_WEB_SEARCH_TOOL_NAME + + +def get_litellm_web_search_tool() -> Dict[str, Any]: + """ + Get the standard LiteLLM web search tool definition. + + This is the canonical tool definition that all native web search tools + (like Anthropic's web_search_20250305, Claude Code's web_search, etc.) + are converted to for interception. + + Returns: + Dict containing the Anthropic-style tool definition with: + - name: Tool name + - description: What the tool does + - input_schema: JSON schema for tool parameters + + Example: + >>> tool = get_litellm_web_search_tool() + >>> tool['name'] + 'litellm_web_search' + """ + return { + "name": LITELLM_WEB_SEARCH_TOOL_NAME, + "description": ( + "Search the web for information. Use this when you need current " + "information or answers to questions that require up-to-date data." + ), + "input_schema": { + "type": "object", + "properties": { + "query": { + "type": "string", + "description": "The search query to execute" + } + }, + "required": ["query"] + } + } + + +def is_web_search_tool(tool: Dict[str, Any]) -> bool: + """ + Check if a tool is a web search tool (native or LiteLLM standard). + + Detects: + - LiteLLM standard: name == "litellm_web_search" + - Anthropic native: type starts with "web_search_" (e.g., "web_search_20250305") + - Claude Code: name == "web_search" with a type field + - Custom: name == "WebSearch" (legacy format) + + Args: + tool: Tool dictionary to check + + Returns: + True if tool is a web search tool + + Example: + >>> is_web_search_tool({"name": "litellm_web_search"}) + True + >>> is_web_search_tool({"type": "web_search_20250305", "name": "web_search"}) + True + >>> is_web_search_tool({"name": "calculator"}) + False + """ + tool_name = tool.get("name", "") + tool_type = tool.get("type", "") + + # Check for LiteLLM standard tool + if tool_name == LITELLM_WEB_SEARCH_TOOL_NAME: + return True + + # Check for native Anthropic web_search_* types + if tool_type.startswith("web_search_"): + return True + + # Check for Claude Code's web_search with a type field + if tool_name == "web_search" and tool_type: + return True + + # Check for legacy WebSearch format + if tool_name == "WebSearch": + return True + + return False diff --git a/litellm/integrations/websearch_interception/transformation.py b/litellm/integrations/websearch_interception/transformation.py index e8211311281..313358822a5 100644 --- a/litellm/integrations/websearch_interception/transformation.py +++ b/litellm/integrations/websearch_interception/transformation.py @@ -7,6 +7,7 @@ from typing import Any, Dict, List, Tuple from litellm._logging import verbose_logger +from litellm.constants import LITELLM_WEB_SEARCH_TOOL_NAME from litellm.llms.base_llm.search.transformation import SearchResponse @@ -94,17 +95,21 @@ def _detect_from_non_streaming_response( block_id = getattr(block, "id", None) block_input = getattr(block, "input", {}) - if block_type == "tool_use" and block_name == "WebSearch": + # Check for LiteLLM standard or legacy web search tools + # Handles: litellm_web_search, WebSearch, web_search + if block_type == "tool_use" and block_name in ( + LITELLM_WEB_SEARCH_TOOL_NAME, "WebSearch", "web_search" + ): # Convert to dict for easier handling tool_call = { "id": block_id, "type": "tool_use", - "name": "WebSearch", + "name": block_name, # Preserve original name "input": block_input, } tool_calls.append(tool_call) verbose_logger.debug( - f"WebSearchInterception: Found WebSearch tool_use with id={tool_call['id']}" + f"WebSearchInterception: Found {block_name} tool_use with id={tool_call['id']}" ) return len(tool_calls) > 0, tool_calls diff --git a/litellm/llms/anthropic/experimental_pass_through/messages/fake_stream_iterator.py b/litellm/llms/anthropic/experimental_pass_through/messages/fake_stream_iterator.py new file mode 100644 index 00000000000..542ae20b602 --- /dev/null +++ b/litellm/llms/anthropic/experimental_pass_through/messages/fake_stream_iterator.py @@ -0,0 +1,246 @@ +""" +Fake Streaming Iterator for Anthropic Messages + +This module provides a fake streaming iterator that converts non-streaming +Anthropic Messages responses into proper streaming format. + +Used when WebSearch interception converts stream=True to stream=False but +the LLM doesn't make a tool call, and we need to return a stream to the user. +""" + +import json +from typing import Any, Dict, List, cast + +from litellm.types.llms.anthropic_messages.anthropic_response import ( + AnthropicMessagesResponse, +) + + +class FakeAnthropicMessagesStreamIterator: + """ + Fake streaming iterator for Anthropic Messages responses. + + Used when we need to convert a non-streaming response to a streaming format, + such as when WebSearch interception converts stream=True to stream=False but + the LLM doesn't make a tool call. + + This creates a proper Anthropic-style streaming response with multiple events: + - message_start + - content_block_start (for each content block) + - content_block_delta (for text content, chunked) + - content_block_stop + - message_delta (for usage) + - message_stop + """ + + def __init__(self, response: AnthropicMessagesResponse): + self.response = response + self.chunks = self._create_streaming_chunks() + self.current_index = 0 + + def _create_streaming_chunks(self) -> List[bytes]: + """Convert the non-streaming response to streaming chunks""" + chunks = [] + + # Cast response to dict for easier access + response_dict = cast(Dict[str, Any], self.response) + + # 1. message_start event + usage = response_dict.get("usage", {}) + message_start = { + "type": "message_start", + "message": { + "id": response_dict.get("id"), + "type": "message", + "role": response_dict.get("role", "assistant"), + "model": response_dict.get("model"), + "content": [], + "stop_reason": None, + "stop_sequence": None, + "usage": { + "input_tokens": usage.get("input_tokens", 0) if usage else 0, + "output_tokens": 0 + } + } + } + chunks.append(f"event: message_start\ndata: {json.dumps(message_start)}\n\n".encode()) + + # 2-4. For each content block, send start/delta/stop events + content_blocks = response_dict.get("content", []) + if content_blocks: + for index, block in enumerate(content_blocks): + # Cast block to dict for easier access + block_dict = cast(Dict[str, Any], block) + block_type = block_dict.get("type") + + if block_type == "text": + # content_block_start + content_block_start = { + "type": "content_block_start", + "index": index, + "content_block": { + "type": "text", + "text": "" + } + } + chunks.append(f"event: content_block_start\ndata: {json.dumps(content_block_start)}\n\n".encode()) + + # content_block_delta (send full text as one delta for simplicity) + text = block_dict.get("text", "") + content_block_delta = { + "type": "content_block_delta", + "index": index, + "delta": { + "type": "text_delta", + "text": text + } + } + chunks.append(f"event: content_block_delta\ndata: {json.dumps(content_block_delta)}\n\n".encode()) + + # content_block_stop + content_block_stop = { + "type": "content_block_stop", + "index": index + } + chunks.append(f"event: content_block_stop\ndata: {json.dumps(content_block_stop)}\n\n".encode()) + + elif block_type == "thinking": + # content_block_start for thinking + content_block_start = { + "type": "content_block_start", + "index": index, + "content_block": { + "type": "thinking", + "thinking": "", + "signature": "" + } + } + chunks.append(f"event: content_block_start\ndata: {json.dumps(content_block_start)}\n\n".encode()) + + # content_block_delta for thinking text + thinking_text = block_dict.get("thinking", "") + if thinking_text: + content_block_delta = { + "type": "content_block_delta", + "index": index, + "delta": { + "type": "thinking_delta", + "thinking": thinking_text + } + } + chunks.append(f"event: content_block_delta\ndata: {json.dumps(content_block_delta)}\n\n".encode()) + + # content_block_delta for signature (if present) + signature = block_dict.get("signature", "") + if signature: + signature_delta = { + "type": "content_block_delta", + "index": index, + "delta": { + "type": "signature_delta", + "signature": signature + } + } + chunks.append(f"event: content_block_delta\ndata: {json.dumps(signature_delta)}\n\n".encode()) + + # content_block_stop + content_block_stop = { + "type": "content_block_stop", + "index": index + } + chunks.append(f"event: content_block_stop\ndata: {json.dumps(content_block_stop)}\n\n".encode()) + + elif block_type == "redacted_thinking": + # content_block_start for redacted_thinking + content_block_start = { + "type": "content_block_start", + "index": index, + "content_block": { + "type": "redacted_thinking" + } + } + chunks.append(f"event: content_block_start\ndata: {json.dumps(content_block_start)}\n\n".encode()) + + # content_block_stop (no delta for redacted thinking) + content_block_stop = { + "type": "content_block_stop", + "index": index + } + chunks.append(f"event: content_block_stop\ndata: {json.dumps(content_block_stop)}\n\n".encode()) + + elif block_type == "tool_use": + # content_block_start + content_block_start = { + "type": "content_block_start", + "index": index, + "content_block": { + "type": "tool_use", + "id": block_dict.get("id"), + "name": block_dict.get("name"), + "input": {} + } + } + chunks.append(f"event: content_block_start\ndata: {json.dumps(content_block_start)}\n\n".encode()) + + # content_block_delta (send input as JSON delta) + input_data = block_dict.get("input", {}) + content_block_delta = { + "type": "content_block_delta", + "index": index, + "delta": { + "type": "input_json_delta", + "partial_json": json.dumps(input_data) + } + } + chunks.append(f"event: content_block_delta\ndata: {json.dumps(content_block_delta)}\n\n".encode()) + + # content_block_stop + content_block_stop = { + "type": "content_block_stop", + "index": index + } + chunks.append(f"event: content_block_stop\ndata: {json.dumps(content_block_stop)}\n\n".encode()) + + # 5. message_delta event (with final usage and stop_reason) + message_delta = { + "type": "message_delta", + "delta": { + "stop_reason": response_dict.get("stop_reason"), + "stop_sequence": response_dict.get("stop_sequence") + }, + "usage": { + "output_tokens": usage.get("output_tokens", 0) if usage else 0 + } + } + chunks.append(f"event: message_delta\ndata: {json.dumps(message_delta)}\n\n".encode()) + + # 6. message_stop event + message_stop = { + "type": "message_stop", + "usage": usage if usage else {} + } + chunks.append(f"event: message_stop\ndata: {json.dumps(message_stop)}\n\n".encode()) + + return chunks + + def __aiter__(self): + return self + + async def __anext__(self): + if self.current_index >= len(self.chunks): + raise StopAsyncIteration + + chunk = self.chunks[self.current_index] + self.current_index += 1 + return chunk + + def __iter__(self): + return self + + def __next__(self): + if self.current_index >= len(self.chunks): + raise StopIteration + + chunk = self.chunks[self.current_index] + self.current_index += 1 + return chunk diff --git a/litellm/llms/anthropic/experimental_pass_through/messages/handler.py b/litellm/llms/anthropic/experimental_pass_through/messages/handler.py index 11245b1bdba..7e5a4f22a7f 100644 --- a/litellm/llms/anthropic/experimental_pass_through/messages/handler.py +++ b/litellm/llms/anthropic/experimental_pass_through/messages/handler.py @@ -33,6 +33,70 @@ ################################################# +async def _execute_pre_request_hooks( + model: str, + messages: List[Dict], + tools: Optional[List[Dict]], + stream: Optional[bool], + custom_llm_provider: Optional[str], + **kwargs, +) -> Dict: + """ + Execute pre-request hooks from CustomLogger callbacks. + + Allows CustomLoggers to modify request parameters before the API call. + Used for WebSearch tool conversion, stream modification, etc. + + Args: + model: Model name + messages: List of messages + tools: Optional tools list + stream: Optional stream flag + custom_llm_provider: Provider name (if not set, will be extracted from model) + **kwargs: Additional request parameters + + Returns: + Dict containing all (potentially modified) request parameters including tools, stream + """ + # If custom_llm_provider not provided, extract from model + if not custom_llm_provider: + try: + _, custom_llm_provider, _, _ = litellm.get_llm_provider(model=model) + except Exception: + # If extraction fails, continue without provider + pass + + # Build complete request kwargs dict + request_kwargs = { + "tools": tools, + "stream": stream, + "litellm_params": { + "custom_llm_provider": custom_llm_provider, + }, + **kwargs, + } + + if not litellm.callbacks: + return request_kwargs + + from litellm.integrations.custom_logger import CustomLogger as _CustomLogger + + for callback in litellm.callbacks: + if not isinstance(callback, _CustomLogger): + continue + + # Call the pre-request hook + modified_kwargs = await callback.async_pre_request_hook( + model, messages, request_kwargs + ) + + # If hook returned modified kwargs, use them + if modified_kwargs is not None: + request_kwargs = modified_kwargs + + return request_kwargs + + @client async def anthropic_messages( max_tokens: int, @@ -57,39 +121,24 @@ async def anthropic_messages( """ Async: Make llm api request in Anthropic /messages API spec """ - # WebSearch Interception: Convert stream=True to stream=False if WebSearch interception is enabled - # This allows transparent server-side agentic loop execution for streaming requests - if stream and tools and any(t.get("name") == "WebSearch" for t in tools): - # Extract provider using litellm's helper function - try: - _, provider, _, _ = litellm.get_llm_provider( - model=model, - custom_llm_provider=custom_llm_provider, - api_base=api_base, - api_key=api_key, - ) - except Exception: - # Fallback to simple split if helper fails - provider = model.split("/")[0] if "/" in model else "" + # Execute pre-request hooks to allow CustomLoggers to modify request + request_kwargs = await _execute_pre_request_hooks( + model=model, + messages=messages, + tools=tools, + stream=stream, + custom_llm_provider=custom_llm_provider, + **kwargs, + ) - # Check if WebSearch interception is enabled in callbacks - from litellm._logging import verbose_logger - from litellm.integrations.websearch_interception import ( - WebSearchInterceptionLogger, - ) - if litellm.callbacks: - for callback in litellm.callbacks: - if isinstance(callback, WebSearchInterceptionLogger): - # Check if provider is enabled for interception - if provider in callback.enabled_providers: - verbose_logger.debug( - f"WebSearchInterception: Converting stream=True to stream=False for WebSearch interception " - f"(provider={provider})" - ) - stream = False - break + # Extract modified parameters + tools = request_kwargs.pop("tools", tools) + stream = request_kwargs.pop("stream", stream) + # Remove litellm_params from kwargs (only needed for hooks) + request_kwargs.pop("litellm_params", None) + # Merge back any other modifications + kwargs.update(request_kwargs) - local_vars = locals() loop = asyncio.get_event_loop() kwargs["is_async"] = True @@ -206,6 +255,11 @@ def anthropic_messages_handler( "model": original_model, "custom_llm_provider": custom_llm_provider, } + + # Check if stream was converted for WebSearch interception + # This is set in the async wrapper above when stream=True is converted to stream=False + if kwargs.get("_websearch_interception_converted_stream", False): + litellm_logging_obj.model_call_details["websearch_interception_converted_stream"] = True if litellm_params.mock_response and isinstance(litellm_params.mock_response, str): diff --git a/litellm/llms/custom_httpx/llm_http_handler.py b/litellm/llms/custom_httpx/llm_http_handler.py index 490786155c6..ab1e735fca7 100644 --- a/litellm/llms/custom_httpx/llm_http_handler.py +++ b/litellm/llms/custom_httpx/llm_http_handler.py @@ -4418,6 +4418,41 @@ async def _call_agentic_completion_hooks( f"LiteLLM.AgenticHookError: Exception in agentic completion hooks: {str(e)}" ) + # Check if we need to convert response to fake stream + # This happens when: + # 1. Stream was originally True but converted to False for WebSearch interception + # 2. No agentic loop ran (LLM didn't use the tool) + # 3. We have a non-streaming response that needs to be converted to streaming + websearch_converted_stream = ( + logging_obj.model_call_details.get("websearch_interception_converted_stream", False) + if logging_obj is not None + else False + ) + + if websearch_converted_stream: + from typing import cast + + from litellm._logging import verbose_logger + from litellm.llms.anthropic.experimental_pass_through.messages.fake_stream_iterator import ( + FakeAnthropicMessagesStreamIterator, + ) + from litellm.types.llms.anthropic_messages.anthropic_response import ( + AnthropicMessagesResponse, + ) + + verbose_logger.debug( + "WebSearchInterception: No tool call made, converting non-streaming response to fake stream" + ) + + # Convert the non-streaming response to a fake stream + # The response should be an AnthropicMessagesResponse (dict) + if isinstance(response, dict): + # Create a fake streaming iterator + fake_stream = FakeAnthropicMessagesStreamIterator( + response=cast(AnthropicMessagesResponse, response) + ) + return fake_stream + return None def _handle_error( diff --git a/litellm/proxy/proxy_config.yaml b/litellm/proxy/proxy_config.yaml index 87e02a142ee..cf852805f83 100644 --- a/litellm/proxy/proxy_config.yaml +++ b/litellm/proxy/proxy_config.yaml @@ -46,7 +46,21 @@ model_list: api_base: https://krish-mh44t553-eastus2.services.ai.azure.com api_key: os.environ/AZURE_ANTHROPIC_API_KEY +# Search Tools Configuration - Define search providers for WebSearch interception +# search_tools: +# - search_tool_name: "my-perplexity-search" +# litellm_params: +# search_provider: "perplexity" # Can be: perplexity, brave, etc. + +litellm_settings: + callbacks: ["websearch_interception"] + # WebSearch Interception - Automatically intercepts and executes WebSearch tool calls + # for models that don't natively support web search (e.g., Bedrock/Claude) + websearch_interception_params: + enabled_providers: ["bedrock"] # List of providers to enable interception for + search_tool_name: "my-perplexity-search" # Optional: Name of search tool from search_tools config general_settings: store_prompts_in_spend_logs: true - forward_client_headers_to_llm_api: true \ No newline at end of file + forward_client_headers_to_llm_api: true + diff --git a/tests/pass_through_unit_tests/test_websearch_interception_e2e.py b/tests/pass_through_unit_tests/test_websearch_interception_e2e.py index 2dec9da8b70..bf50c1c9cd2 100644 --- a/tests/pass_through_unit_tests/test_websearch_interception_e2e.py +++ b/tests/pass_through_unit_tests/test_websearch_interception_e2e.py @@ -323,3 +323,632 @@ async def test_websearch_interception_streaming(): import traceback traceback.print_exc() return False + + +async def test_websearch_interception_no_tool_call_streaming(): + """ + Test WebSearch interception when LLM doesn't make a tool call with streaming. + + This tests the scenario where: + 1. User requests stream=True + 2. WebSearch tool is provided + 3. LLM decides NOT to use the tool (just responds with text) + 4. System should return a fake stream + """ + print("\n" + "="*80) + print("E2E TEST 3: WebSearch Interception (No Tool Call, Streaming)") + print("="*80) + + # Router already initialized from test 1 + print("\n✅ Using existing router configuration") + print("✅ WebSearch interception already enabled for Bedrock") + + try: + # Make request with WebSearch tool AND stream=True + # Use a query that the LLM will answer directly without using the tool + print("\n📞 Making litellm.messages.acreate() call with stream=True...") + print(f" Model: bedrock/us.anthropic.claude-3-5-sonnet-20241022-v2:0") + print(f" Query: 'What is 2+2?'") + print(f" Tools: WebSearch") + print(f" Stream: True") + + response = await messages.acreate( + model="bedrock/us.anthropic.claude-3-5-sonnet-20241022-v2:0", + messages=[{"role": "user", "content": "What is 2+2? Just give me the answer, no need to search."}], + tools=[ + { + "name": "WebSearch", + "description": "Search the web for information", + "input_schema": { + "type": "object", + "properties": { + "query": { + "type": "string", + "description": "The search query", + } + }, + "required": ["query"], + }, + } + ], + max_tokens=1024, + stream=True, # REQUEST STREAMING + ) + + print("\n✅ Received response!") + + # Check if response is actually a stream (async generator or async iterator) + import inspect + is_async_gen = inspect.isasyncgen(response) + is_async_iter = hasattr(response, '__aiter__') and hasattr(response, '__anext__') + is_stream = is_async_gen or is_async_iter + + if not is_stream: + print("\n❌ TEST 3 FAILED: Response is NOT a stream") + print(f"❌ Expected a fake stream when LLM doesn't use the tool") + print(f"❌ Response type: {type(response)}") + return False + + print(f"✅ Response is a stream (async_gen={is_async_gen}, async_iter={is_async_iter})") + print("\n📦 Consuming stream chunks:") + + chunks = [] + chunk_count = 0 + async for chunk in response: + chunk_count += 1 + print(f"\n--- Chunk {chunk_count} ---") + print(f" Type: {type(chunk)}") + print(f" Content: {chunk[:200] if isinstance(chunk, bytes) else str(chunk)[:200]}...") + chunks.append(chunk) + + print(f"\n✅ Received {len(chunks)} stream chunk(s)") + + if len(chunks) > 0: + print("\n" + "="*80) + print("✅ TEST 3 PASSED!") + print("="*80) + print("✅ User made ONE litellm.messages.acreate() call with stream=True") + print("✅ LLM didn't use the WebSearch tool") + print("✅ Got back a fake stream (not a non-streaming response)") + print("✅ WebSearch interception handles no-tool-call case correctly!") + print("="*80) + return True + else: + print("\n❌ TEST 3 FAILED: No chunks received") + return False + + except Exception as e: + print(f"\n❌ Test 3 failed with error: {str(e)}") + import traceback + traceback.print_exc() + return False + + +async def test_claude_code_native_websearch(): + """ + Test WebSearch interception with Claude Code's native web_search_20250305 tool. + + This tests the exact request format that Claude Code sends: + - tools: [{'type': 'web_search_20250305', 'name': 'web_search', 'max_uses': 8}] + - Model: bedrock/us.anthropic.claude-sonnet-4-5-20250929-v1:0 + """ + print("\n" + "="*80) + print("E2E TEST: Claude Code Native WebSearch (web_search_20250305)") + print("="*80) + + # Router already initialized from test 1 + print("\n✅ Using existing router configuration") + print("✅ WebSearch interception already enabled for Bedrock") + + try: + # Make request with Claude Code's exact native web_search tool format + print("\n📞 Making litellm.messages.acreate() call...") + print(f" Model: bedrock/us.anthropic.claude-sonnet-4-5-20250929-v1:0") + print(f" Query: 'Perform a web search for the query: litellm what is it'") + print(f" Tools: Native web_search_20250305") + print(f" Stream: False") + + response = await messages.acreate( + model="bedrock/us.anthropic.claude-sonnet-4-5-20250929-v1:0", + messages=[{"role": "user", "content": "Perform a web search for the query: litellm what is it"}], + tools=[ + { + "type": "web_search_20250305", + "name": "web_search", + "max_uses": 8 + } + ], + max_tokens=1024, + stream=False, + ) + + print("\n✅ Received response!") + + # Handle both dict and object responses + if isinstance(response, dict): + response_id = response.get("id") + response_model = response.get("model") + response_stop_reason = response.get("stop_reason") + response_content = response.get("content", []) + else: + response_id = response.id + response_model = response.model + response_stop_reason = response.stop_reason + response_content = response.content + + print(f"\n📄 Response ID: {response_id}") + print(f"📄 Model: {response_model}") + print(f"📄 Stop Reason: {response_stop_reason}") + print(f"📄 Content blocks: {len(response_content)}") + + # Debug: Print all content block types + for i, block in enumerate(response_content): + block_type = block.get("type") if isinstance(block, dict) else block.type + print(f" Block {i}: type={block_type}") + if block_type == "tool_use": + block_name = block.get("name") if isinstance(block, dict) else block.name + print(f" name={block_name}") + + # Validate response + assert response is not None, "Response should not be None" + assert response_content is not None, "Response should have content" + assert len(response_content) > 0, "Response should have at least one content block" + + # Check if response contains tool_use (means interception didn't work) + has_tool_use = any( + (block.get("type") if isinstance(block, dict) else block.type) == "tool_use" + for block in response_content + ) + + # Check if we got a text response + has_text = any( + (block.get("type") if isinstance(block, dict) else block.type) == "text" + for block in response_content + ) + + if has_tool_use: + print("\n❌ TEST FAILED: Interception did not work") + print(f"❌ Stop reason: {response_stop_reason}") + print("❌ Response contains tool_use blocks") + return False + + elif has_text and response_stop_reason != "tool_use": + text_block = next( + block for block in response_content + if (block.get("type") if isinstance(block, dict) else block.type) == "text" + ) + text_content = text_block.get("text") if isinstance(text_block, dict) else text_block.text + + print(f"\n📝 Response Text:") + print(f" {text_content[:200]}...") + + if "litellm" in text_content.lower(): + print("\n" + "="*80) + print("✅ TEST PASSED!") + print("="*80) + print("✅ Claude Code's native web_search_20250305 tool was intercepted") + print("✅ Tool was converted to LiteLLM standard format") + print("✅ User made ONE litellm.messages.acreate() call") + print("✅ Got back final answer with search results") + print("✅ Agentic loop executed transparently") + print("✅ WebSearch interception working with Claude Code!") + print("="*80) + return True + else: + print("\n⚠️ Got text response but doesn't mention LiteLLM") + return False + else: + print("\n❌ Unexpected response format") + return False + + except Exception as e: + print(f"\n❌ Test failed with error: {str(e)}") + import traceback + traceback.print_exc() + return False + + +if __name__ == "__main__": + import asyncio + + async def run_all_tests(): + """Run all E2E tests""" + test_results = [] + + # Test 1: Non-streaming + result1 = await test_websearch_interception_non_streaming() + test_results.append(("Non-Streaming", result1)) + + # Test 2: Streaming + result2 = await test_websearch_interception_streaming() + test_results.append(("Streaming", result2)) + + # Test 3: No tool call with streaming + result3 = await test_websearch_interception_no_tool_call_streaming() + test_results.append(("No Tool Call Streaming", result3)) + + # Test 4: Claude Code native web_search + result4 = await test_claude_code_native_websearch() + test_results.append(("Claude Code Native WebSearch", result4)) + + # Print summary + print("\n" + "="*80) + print("TEST SUMMARY") + print("="*80) + for test_name, result in test_results: + status = "✅ PASSED" if result else "❌ FAILED" + print(f"{test_name}: {status}") + print("="*80) + + # Return overall result + return all(result for _, result in test_results) + + result = asyncio.run(run_all_tests()) + import sys + sys.exit(0 if result else 1) + + +async def test_litellm_standard_websearch_tool(): + """ + PRIORITY TEST #1: Test with the canonical litellm_web_search tool format. + + This validates that using get_litellm_web_search_tool() directly + works end-to-end without any conversion needed. + """ + print("\n" + "="*80) + print("E2E TEST: LiteLLM Standard WebSearch Tool") + print("="*80) + + from litellm.integrations.websearch_interception import get_litellm_web_search_tool + + print("\n✅ Using existing router configuration") + print("✅ WebSearch interception already enabled for Bedrock") + + try: + print("\n📞 Making litellm.messages.acreate() call...") + print(f" Model: bedrock/us.anthropic.claude-sonnet-4-5-20250929-v1:0") + print(f" Query: 'What is the latest news about AI?'") + print(f" Tool: litellm_web_search (standard format, no conversion needed)") + print(f" Stream: False") + + response = await messages.acreate( + model="bedrock/us.anthropic.claude-sonnet-4-5-20250929-v1:0", + messages=[{"role": "user", "content": "What is the latest news about AI? Give me a brief overview."}], + tools=[get_litellm_web_search_tool()], + max_tokens=1024, + stream=False, + ) + + print("\n✅ Received response!") + + if isinstance(response, dict): + response_id = response.get("id") + response_stop_reason = response.get("stop_reason") + response_content = response.get("content", []) + else: + response_id = response.id + response_stop_reason = response.stop_reason + response_content = response.content + + print(f"\n📄 Response ID: {response_id}") + print(f"📄 Stop Reason: {response_stop_reason}") + print(f"📄 Content blocks: {len(response_content)}") + + for i, block in enumerate(response_content): + block_type = block.get("type") if isinstance(block, dict) else block.type + print(f" Block {i}: type={block_type}") + + has_tool_use = any( + (block.get("type") if isinstance(block, dict) else block.type) == "tool_use" + for block in response_content + ) + + has_text = any( + (block.get("type") if isinstance(block, dict) else block.type) == "text" + for block in response_content + ) + + if has_tool_use: + print("\n❌ TEST FAILED: Interception did not work") + return False + + elif has_text and response_stop_reason != "tool_use": + text_block = next( + block for block in response_content + if (block.get("type") if isinstance(block, dict) else block.type) == "text" + ) + text_content = text_block.get("text") if isinstance(text_block, dict) else text_block.text + + print(f"\n📝 Response Text: {text_content[:200]}...") + + print("\n" + "="*80) + print("✅ TEST PASSED!") + print("="*80) + print("✅ LiteLLM standard tool format works without conversion") + print("✅ Agentic loop executed transparently") + print("="*80) + return True + else: + print("\n❌ Unexpected response format") + return False + + except Exception as e: + print(f"\n❌ Test failed with error: {str(e)}") + import traceback + traceback.print_exc() + return False + + +async def test_claude_code_native_websearch_streaming(): + """ + PRIORITY TEST #2: Test Claude Code's native tool WITH stream=True. + + Validates: + - Native tool conversion (web_search_20250305 → litellm_web_search) + - Stream=True → Stream=False conversion + - Agentic loop executes with both conversions + """ + print("\n" + "="*80) + print("E2E TEST: Claude Code Native WebSearch + Streaming") + print("="*80) + + print("\n✅ Using existing router configuration") + print("✅ WebSearch interception already enabled for Bedrock") + + try: + print("\n📞 Making litellm.messages.acreate() call with stream=True...") + print(f" Model: bedrock/us.anthropic.claude-sonnet-4-5-20250929-v1:0") + print(f" Tool: Native web_search_20250305") + print(f" Stream: True (will be converted to False)") + + response = await messages.acreate( + model="bedrock/us.anthropic.claude-sonnet-4-5-20250929-v1:0", + messages=[{"role": "user", "content": "Search for the latest AI developments."}], + tools=[{"type": "web_search_20250305", "name": "web_search", "max_uses": 8}], + max_tokens=1024, + stream=True, + ) + + print("\n✅ Received response!") + + import inspect + is_stream = inspect.isasyncgen(response) + + if is_stream: + print("\n⚠️ Response is a stream (stream conversion didn't work)") + return False + + print("✅ Response is NOT a stream (conversion worked!)") + + if isinstance(response, dict): + response_stop_reason = response.get("stop_reason") + response_content = response.get("content", []) + else: + response_stop_reason = response.stop_reason + response_content = response.content + + has_tool_use = any( + (block.get("type") if isinstance(block, dict) else block.type) == "tool_use" + for block in response_content + ) + + has_text = any( + (block.get("type") if isinstance(block, dict) else block.type) == "text" + for block in response_content + ) + + if has_tool_use: + print("\n❌ TEST FAILED: Interception did not work") + return False + + elif has_text and response_stop_reason != "tool_use": + print("\n" + "="*80) + print("✅ TEST PASSED!") + print("="*80) + print("✅ Native tool converted to litellm_web_search") + print("✅ Stream=True converted to Stream=False") + print("✅ Both conversions working together!") + print("="*80) + return True + else: + print("\n❌ Unexpected response format") + return False + + except Exception as e: + print(f"\n❌ Test failed with error: {str(e)}") + import traceback + traceback.print_exc() + return False + + +def test_is_web_search_tool_detection(): + """ + PRIORITY TEST #3: Unit test for is_web_search_tool() utility. + + Validates detection of all supported formats including future versions. + """ + print("\n" + "="*80) + print("UNIT TEST: Web Search Tool Detection") + print("="*80) + + from litellm.integrations.websearch_interception import is_web_search_tool + + test_cases = [ + ({"name": "litellm_web_search"}, True, "LiteLLM standard tool"), + ({"type": "web_search_20250305", "name": "web_search", "max_uses": 8}, True, "Current Anthropic native (2025)"), + ({"type": "web_search_2026", "name": "web_search"}, True, "Future Anthropic native (2026)"), + ({"type": "web_search_20270615", "name": "web_search"}, True, "Future Anthropic native (2027)"), + ({"name": "web_search", "type": "web_search_20250305"}, True, "Claude Code format"), + ({"name": "WebSearch"}, True, "Legacy WebSearch"), + ({"name": "calculator"}, False, "Non-web-search tool"), + ({"name": "some_tool", "type": "function"}, False, "Other tool with type"), + ({"type": "custom_tool"}, False, "Custom tool type"), + ] + + passed = 0 + failed = 0 + + for tool, expected, description in test_cases: + result = is_web_search_tool(tool) + if result == expected: + print(f" ✅ PASS: {description}") + passed += 1 + else: + print(f" ❌ FAIL: {description}") + print(f" Tool: {tool}") + print(f" Expected: {expected}, Got: {result}") + failed += 1 + + print(f"\n📊 Results: {passed} passed, {failed} failed") + + if failed == 0: + print("\n" + "="*80) + print("✅ ALL DETECTION TESTS PASSED!") + print("="*80) + print("✅ Detects all current formats") + print("✅ Future-proof for new web_search_* versions") + print("="*80) + return True + else: + print("\n❌ Some detection tests failed") + return False + + +async def test_pre_request_hook_modifies_request_body(): + """ + Unit test to verify async_pre_request_hook correctly modifies request body. + + Tests that: + 1. WebSearchInterceptionLogger is active + 2. Native web_search_20250305 tool is converted to litellm_web_search + 3. Stream is converted from True to False + 4. Modified parameters reach the API call + """ + import asyncio + from unittest.mock import AsyncMock, patch, MagicMock + from litellm.constants import LITELLM_WEB_SEARCH_TOOL_NAME + + litellm._turn_on_debug() + + print("\n" + "="*80) + print("UNIT TEST: Pre-Request Hook Modifies Request Body") + print("="*80) + + # Initialize WebSearchInterceptionLogger + litellm.callbacks = [ + WebSearchInterceptionLogger( + enabled_providers=[LlmProviders.BEDROCK], + search_tool_name="test-search-tool" + ) + ] + + print("✅ WebSearchInterceptionLogger initialized") + + # Track what actually gets sent to the API + captured_request = {} + + def mock_anthropic_messages_handler( + max_tokens, + messages, + model, + metadata=None, + stop_sequences=None, + stream=None, + system=None, + temperature=None, + thinking=None, + tool_choice=None, + tools=None, + top_k=None, + top_p=None, + container=None, + api_key=None, + api_base=None, + client=None, + custom_llm_provider=None, + **kwargs + ): + """Mock handler that captures the actual request parameters""" + # Capture what gets sent to the handler (after hook modifications) + captured_request['tools'] = tools + captured_request['stream'] = stream + captured_request['max_tokens'] = max_tokens + captured_request['model'] = model + + # Return a mock response (non-streaming) + from litellm.types.llms.anthropic_messages.anthropic_response import AnthropicMessagesResponse + return AnthropicMessagesResponse( + id="msg_test", + type="message", + role="assistant", + content=[{ + "type": "text", + "text": "Test response" + }], + model="claude-sonnet-4-5", + stop_reason="end_turn", + usage={ + "input_tokens": 10, + "output_tokens": 20 + } + ) + + # Patch the anthropic_messages_handler function (called after hooks) + with patch('litellm.llms.anthropic.experimental_pass_through.messages.handler.anthropic_messages_handler', + side_effect=mock_anthropic_messages_handler): + + print("\n📝 Making request with native web_search_20250305 tool (stream=True)...") + + # Make the request with native tool format + response = await messages.acreate( + model="bedrock/us.anthropic.claude-sonnet-4-5-20250929-v1:0", + messages=[{"role": "user", "content": "Test query"}], + tools=[{ + "type": "web_search_20250305", + "name": "web_search", + "max_uses": 8 + }], + max_tokens=100, + stream=True # Should be converted to False + ) + + print("\n🔍 Verifying request modifications...") + + # Verify tool was converted + tools = captured_request.get('tools') + print(f"\n Captured tools: {tools}") + + if tools and len(tools) > 0: + tool = tools[0] + tool_name = tool.get('name') + + if tool_name == LITELLM_WEB_SEARCH_TOOL_NAME: + print(f" ✅ Tool converted: web_search_20250305 → {LITELLM_WEB_SEARCH_TOOL_NAME}") + else: + print(f" ❌ Tool NOT converted: expected {LITELLM_WEB_SEARCH_TOOL_NAME}, got {tool_name}") + return False + else: + print(" ❌ No tools captured in request") + return False + + # Verify stream was converted + stream = captured_request.get('stream') + print(f" Captured stream: {stream}") + + if stream is False: + print(" ✅ Stream converted: True → False") + else: + print(f" ❌ Stream NOT converted: expected False, got {stream}") + return False + + print("\n" + "="*80) + print("✅ PRE-REQUEST HOOK TEST PASSED!") + print("="*80) + print("✅ CustomLogger is active") + print("✅ async_pre_request_hook modifies request body") + print("✅ Tool conversion works correctly") + print("✅ Stream conversion works correctly") + print("="*80) + + return True +