Skip to content

Litellm docs mcp filtering semantic#20316

Merged
ishaan-jaff merged 22 commits intomainfrom
litellm_docs_mcp_filtering_semantic
Feb 3, 2026
Merged

Litellm docs mcp filtering semantic#20316
ishaan-jaff merged 22 commits intomainfrom
litellm_docs_mcp_filtering_semantic

Conversation

@ishaan-jaff
Copy link
Member

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

@vercel
Copy link

vercel bot commented Feb 3, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Building Building Preview, Comment Feb 3, 2026 2:28am

Request Review

@ishaan-jaff ishaan-jaff merged commit 0ef506a into main Feb 3, 2026
7 of 12 checks passed
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 3, 2026

Greptile Overview

Greptile Summary

This PR introduces MCP Semantic Tool Filtering, a new feature that reduces context window usage by semantically filtering MCP tools before sending them to LLMs. The implementation adds a pre-call hook that uses semantic-router to match user queries against tool descriptions and returns only the top-K most relevant tools.

Key Changes:

  • Startup initialization in proxy_server.py:793-809 builds the semantic router once during proxy startup, avoiding performance impact on request path
  • Hook implementation (hook.py) intercepts requests, expands MCP tool references, extracts user queries, and applies semantic filtering
  • Core filtering logic (semantic_tool_filter.py) uses semantic-router with embeddings to rank and select relevant tools
  • Configuration via litellm_settings.mcp_semantic_tool_filter with defaults for embedding model, top_k, and similarity threshold
  • Comprehensive testing with both unit tests and end-to-end tests covering various scenarios
  • Documentation includes usage examples, architecture diagrams, and configuration options

The implementation correctly follows the custom rule about avoiding Router object creation in the request path - the SemanticRouter is built once at startup and reused for all requests. Error handling is graceful, falling back to unfiltered tools if any step fails.

Confidence Score: 4/5

  • This PR is safe to merge with minimal risk - well-designed feature with proper testing and error handling
  • Score of 4 reflects solid implementation with comprehensive tests and proper architectural separation. Router initialization happens at startup (not in request path), graceful error handling falls back to unfiltered tools, and the feature is opt-in. Minor consideration: feature adds new dependency on semantic-router library and generates embeddings during requests, but this is by design and documented.
  • No files require special attention - all changes are well-structured and tested

Important Files Changed

Filename Overview
litellm/proxy/proxy_server.py Adds semantic tool filter initialization during proxy startup - properly placed outside request path
litellm/proxy/hooks/mcp_semantic_filter/hook.py Implements pre-call hook for semantic tool filtering - efficient design with startup initialization
litellm/proxy/_experimental/mcp_server/semantic_tool_filter.py Core semantic filtering logic using semantic-router - router built at startup, not per-request
tests/mcp_tests/test_semantic_tool_filter_e2e.py End-to-end tests validating semantic filter behavior with real proxy server
tests/test_litellm/proxy/_experimental/mcp_server/test_semantic_tool_filter.py Comprehensive unit tests for semantic tool filtering logic with various scenarios

Sequence Diagram

sequenceDiagram
    participant Client
    participant ProxyServer as Proxy Server
    participant Hook as SemanticToolFilterHook
    participant Filter as SemanticMCPToolFilter
    participant MCPHandler as LiteLLM_Proxy_MCP_Handler
    participant SemanticRouter as semantic-router
    participant LLM as LLM Provider

    Note over ProxyServer,Filter: Startup Phase
    ProxyServer->>Hook: _initialize_semantic_tool_filter()
    Hook->>Filter: initialize_from_config(config, llm_router)
    Filter->>Filter: Create SemanticMCPToolFilter instance
    Filter->>Filter: build_router_from_mcp_registry()
    Filter->>SemanticRouter: Build SemanticRouter with tool embeddings
    SemanticRouter->>LLM: Generate embeddings for tool descriptions
    LLM-->>SemanticRouter: Return embeddings
    SemanticRouter-->>Filter: Router ready with indexed tools
    Filter-->>Hook: SemanticToolFilterHook instance
    Hook-->>ProxyServer: Register hook with litellm.logging_callback_manager

    Note over Client,LLM: Request Phase - Semantic Filtering
    Client->>ProxyServer: POST /v1/chat/completions (with MCP tools)
    ProxyServer->>Hook: async_pre_call_hook(data, user_api_key_dict)
    
    alt MCP references need expansion
        Hook->>Hook: _should_expand_mcp_tools(tools)
        Hook->>MCPHandler: _expand_mcp_tools(tools, user_api_key_dict)
        MCPHandler->>MCPHandler: _parse_mcp_tools() → separate MCP from others
        MCPHandler->>MCPHandler: _process_mcp_tools_to_openai_format()
        MCPHandler-->>Hook: Expanded tools (OpenAI format dicts)
    end
    
    Hook->>Filter: extract_user_query(messages)
    Filter-->>Hook: User query string
    
    Hook->>Filter: filter_tools(query, available_tools)
    Filter->>SemanticRouter: router(text=query, limit=top_k)
    SemanticRouter->>LLM: Generate query embedding
    LLM-->>SemanticRouter: Query embedding
    SemanticRouter->>SemanticRouter: Calculate similarity scores
    SemanticRouter-->>Filter: Top-K matched tool names
    Filter->>Filter: _get_tools_by_names() → preserve format
    Filter-->>Hook: Filtered tools (top-K most relevant)
    
    Hook->>Hook: Update data["tools"] with filtered tools
    Hook->>Hook: Store metadata for response headers
    Hook-->>ProxyServer: Modified request data
    
    ProxyServer->>LLM: Forward request with filtered tools
    LLM-->>ProxyServer: LLM response
    
    ProxyServer->>Hook: async_post_call_response_headers_hook()
    Hook-->>ProxyServer: Add x-litellm-semantic-filter headers
    
    ProxyServer-->>Client: Response with filter stats in headers
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant