Skip to content

[Feat] - MCP Semantic Filtering Support #20296

Merged
ishaan-jaff merged 20 commits intomainfrom
litellm_mcp_semantic_filtering
Feb 3, 2026
Merged

[Feat] - MCP Semantic Filtering Support #20296
ishaan-jaff merged 20 commits intomainfrom
litellm_mcp_semantic_filtering

Conversation

@ishaan-jaff
Copy link
Member

@ishaan-jaff ishaan-jaff commented Feb 2, 2026

[Feat] - MCP Semantic Filtering Support

Fixes #12079

This PR adds semantic filtering for MCP tools to reduce context window size and improve tool selection accuracy. The filter uses embeddings to identify and return only the most relevant tools for a given user query, preventing context window bloat when many MCP tools are available.

Implementation
The implementation leverages the existing semantic-router library (already an optional dependency) and LiteLLMRouterEncoder to provide semantic matching with zero new dependencies:
SemanticMCPToolFilter: Converts MCP tools to semantic-router Routes and filters based on query similarity
SemanticToolFilterHook: Pre-call hook that filters tools before LLM inference
Uses Router.aembedding() for generating embeddings via any LiteLLM-supported model

With this request

curl -i --location 'http://0.0.0.0:4000/v1/responses' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer sk-1234" \
--data '{
    "model": "gpt-4o",
    "input": [
    {
      "role": "user",
      "content": "give me TLDR of what BerriAI/litellm repo is about. Search it on wikipedia",
      "type": "message"
    }
  ],
    "tools": [
        {
            "type": "mcp",
            "server_label": "litellm",
            "server_url": "litellm_proxy",
            "require_approval": "never"
        }
    ],
    "tool_choice": "required"
}'

Filters to just 1 MCP

headers:
x-litellm-semantic-filter: 4->1

{"id":"resp_1ftsD32DWWG459FCerymTdFaCFJYai0VPU_e897qgbzmm5dbMUTiSlwGU3DsMJXhzgdaFfhybvep2uy0Ht4Cy0g17pJ-HyE-_ATscZGXfFEz9IrkJ65BnZ2hCKRweKuhrqhHX3Pugy_9eax2ivlTsYuCbWl4v1etoPCSZPSalF4UDQRlVM-s_6V0iO6HxRWsc0N70jCU9uhWxAfz6hY6itXyRGxrCi2eDko7wg5l4IPm1IJeS2i7bugKiZy9WtRCNQzKtxYXKtCK24UEWfK5HYdiwleBZhYKTtabvKiPnW3-leEVWez0eR_pcVZE6EZkkUhd2RXk8Lq_GnfqasWsi8LX9xtIEUM1jZzV6-vz8NdVAteQitCrUzL6lymx_K5hFL0lkt03ih6WdohUhDJNHyrwzWL74u6LQrvZyZP0YWkIs9uHm2I2Ne07YEb5C2yLbDBU8Kz9a-quRt-7QA4TU8OJ","created_at":1770084202,"error":null,"incomplete_details":null,"instructions":null,"metadata":{},"model":"gpt-4o","object":"response","output":[{"arguments":"{\"repoName\":\"BerriAI/litellm\",\"question\":\"What is the main focus of this repository?\"}","call_id":"call_9hSrc4e0xeRorNz0BFo6PQwl","name":"deepwiki-ask_question","type":"function_call","id":"fc_0a9e77dbc2331101006981576a955881a0a834d8678d94ab57","status":"completed"}],"parallel_tool_calls":true,"temperature":1.0,"tool_choice":"required","tools":[{"name":"deepwiki-ask_question","parameters":{"properties":{"repoName":{"anyOf":[{"type":"string"},{"items":{"type":"string"},"type":"array"}]},"question":{"type":"string"}},"required":["repoName","question"],"type":"object","additionalProperties":false},"strict":false,"type":"function","description":"Ask any question about a GitHub repository and get an AI-powered, context-grounded response.\n\nArgs:\n    repoName: GitHub repository or list of repositories (max 10) in owner/repo format\n    question: The question to ask about the repository"}],"top_p":1.0,"max_output_tokens":null,"previous_response_id":null,"reasoning":{"effort":null,"summary":null},"status":"completed","text":{"format":{"type":"text"},"verbosity":"medium"},"truncation":"disabled","usage":{"input_tokens":127,"input_tokens_details":{"audio_tokens":null,"cached_tokens":0,"text_tokens":null},"output_tokens":25,"output_tokens_details":{"reasoning_tokens":0,"text_tokens":null},"total_tokens":152,"cost":null},"user":null,"store":true,"background":false,"billing":{"payer":"developer"},"completed_at":1770084202,"frequency_penalty":0.0,"max_tool_calls":null,"presence_penalty":0.0,"prompt_cache_key":null,"prompt_cache_retention":null,"safety_identifier":null,"service_tier":"default","top_logprobs":0}

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🆕 New Feature
✅ Test

Changes

@vercel
Copy link

vercel bot commented Feb 2, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Feb 3, 2026 2:18am

Request Review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 2, 2026

Greptile Overview

Greptile Summary

This PR adds semantic filtering support for MCP tools to reduce context window size and improve tool selection accuracy.

Key Changes:

  • SemanticMCPToolFilter class converts MCP tools to semantic-router Routes and filters them based on user query similarity
  • SemanticToolFilterHook pre-call hook integrates filtering into the request path before LLM inference
  • Comprehensive unit tests and E2E tests verify filtering behavior, edge cases, and hook integration
  • Filter uses existing LiteLLMRouterEncoder with configurable top_k and similarity_threshold parameters

How It Works:
The filter lazily initializes a SemanticRouter on first use, converting MCP tool descriptions to embeddings. When filtering tools, it queries the router with the user's message, returning only the top-k semantically relevant tools. The hook applies this filtering transparently before each LLM request containing tools.

Notable Patterns:

  • Follows existing hook patterns (similar to SkillsInjectionHook) by extending CustomLogger
  • Gracefully degrades by returning all tools if filtering fails or finds no matches
  • Filter instance accepts a pre-configured Router for embeddings rather than creating a new one

Confidence Score: 4/5

  • This PR is safe to merge with minor performance considerations to address
  • Code follows established patterns, has comprehensive tests, and handles errors gracefully. One logic issue around lazy router initialization in request path affects first-request latency. Minor performance optimization suggested for tool reordering.
  • Pay attention to semantic_tool_filter.py for the lazy initialization pattern that occurs on first request

Important Files Changed

Filename Overview
litellm/proxy/_experimental/mcp_server/semantic_tool_filter.py New semantic filtering implementation with lazy router initialization. Router rebuilt on first filter_tools call if needed.
litellm/proxy/hooks/semantic_tool_filter_hook.py Pre-call hook that integrates semantic filter into request path. Follows standard hook patterns with proper error handling.
tests/mcp_tests/test_semantic_tool_filter_e2e.py E2E test verifying hook filters tools correctly. Uses real Router instance with mocked embedding model.
tests/test_litellm/proxy/_experimental/mcp_server/test_semantic_tool_filter.py Comprehensive unit tests covering filtering logic, edge cases, query extraction, and hook behavior. All tests use mocked dependencies.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Hook as SemanticToolFilterHook
    participant Filter as SemanticMCPToolFilter
    participant Router as SemanticRouter
    participant Encoder as LiteLLMRouterEncoder
    participant LLM

    Client->>Hook: async_pre_call_hook(data with tools)
    Hook->>Hook: Check call_type, tools, messages
    Hook->>Filter: extract_user_query(messages)
    Filter-->>Hook: "Send an email"
    Hook->>Filter: filter_tools(query, available_tools)
    
    alt Router not initialized (first call)
        Filter->>Filter: rebuild_router(tools)
        Filter->>Filter: _mcp_tools_to_routes(tools)
        Filter->>Router: Create SemanticRouter(routes, encoder)
        Router->>Encoder: Build embeddings for all routes
        Encoder->>Encoder: Use litellm_router_instance
        Router-->>Filter: Router ready with index
    end
    
    Filter->>Router: __call__(text=query)
    Router->>Encoder: Generate query embedding
    Encoder-->>Router: Query embedding
    Router->>Router: Find top-k matches by similarity
    Router-->>Filter: List[RouteChoice] (matched tool names)
    
    Filter->>Filter: Filter & reorder tools by matches
    Filter-->>Hook: Filtered tools (top-k)
    
    Hook->>Hook: Update data["tools"]
    Hook-->>Client: Modified data with filtered tools
    
    Client->>LLM: Request with filtered tools
    LLM-->>Client: Response
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +163 to +166
# Rebuild router if needed (first time or tools changed)
if self.tool_router is None:
verbose_logger.debug("Router not initialized, rebuilding...")
self.rebuild_router(available_tools)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lazy router initialization happens in the critical request path on first filter_tools() call. According to custom rule 0c2a17ad, creating Router objects in the critical request path causes performance degradation. Consider pre-building the router during filter initialization or on a separate warmup call, rather than on the first actual request.

Context Used: Rule from dashboard - What: Avoid creating new database requests or Router objects in the critical request path.

Why: Cre... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: litellm/proxy/_experimental/mcp_server/semantic_tool_filter.py
Line: 163:166

Comment:
Lazy router initialization happens in the critical request path on first `filter_tools()` call. According to custom rule 0c2a17ad, creating Router objects in the critical request path causes performance degradation. Consider pre-building the router during filter initialization or on a separate warmup call, rather than on the first actual request.

**Context Used:** Rule from `dashboard` - What: Avoid creating new database requests or Router objects in the critical request path.

Why: Cre... ([source](https://app.greptile.com/review/custom-context?memory=0c2a17ad-5f29-423f-a48b-371852ac4169))

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +117 to +125
self.tool_router = SemanticRouter(
routes=routes,
encoder=LiteLLMRouterEncoder(
litellm_router_instance=self.router_instance,
model_name=self.embedding_model,
score_threshold=self.similarity_threshold,
),
auto_sync="local", # Build index immediately
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto_sync="local" builds the embedding index immediately during SemanticRouter construction. This happens on the first request when router is None (line 164), adding latency to that first request. Consider pre-warming the router after initialization or providing an explicit warmup method.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: litellm/proxy/_experimental/mcp_server/semantic_tool_filter.py
Line: 117:125

Comment:
`auto_sync="local"` builds the embedding index immediately during `SemanticRouter` construction. This happens on the first request when router is None (line 164), adding latency to that first request. Consider pre-warming the router after initialization or providing an explicit warmup method.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@ishaan-jaff ishaan-jaff merged commit 079f49f into main Feb 3, 2026
43 of 65 checks passed
shin-bot-litellm added a commit that referenced this pull request Feb 3, 2026
Two bugs from PR #20296 (MCP Semantic Filtering):

1. filter_tools() returned all tools when tool_router was None instead
   of lazily building the router from available_tools. This caused
   top_k to never limit results.

2. async_pre_call_hook() crashed with KeyError when data dict had no
   metadata key, causing the exception handler to return None (no
   filtering applied).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Semantic MCP tool auto-filtering

1 participant