[Feat] - MCP Semantic Filtering Support by ishaan-jaff · Pull Request #20296 · BerriAI/litellm

ishaan-jaff · 2026-02-02T21:42:30Z

[Feat] - MCP Semantic Filtering Support

Fixes #12079

This PR adds semantic filtering for MCP tools to reduce context window size and improve tool selection accuracy. The filter uses embeddings to identify and return only the most relevant tools for a given user query, preventing context window bloat when many MCP tools are available.

Implementation
The implementation leverages the existing semantic-router library (already an optional dependency) and LiteLLMRouterEncoder to provide semantic matching with zero new dependencies:
SemanticMCPToolFilter: Converts MCP tools to semantic-router Routes and filters based on query similarity
SemanticToolFilterHook: Pre-call hook that filters tools before LLM inference
Uses Router.aembedding() for generating embeddings via any LiteLLM-supported model

With this request

curl -i --location 'http://0.0.0.0:4000/v1/responses' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer sk-1234" \
--data '{
    "model": "gpt-4o",
    "input": [
    {
      "role": "user",
      "content": "give me TLDR of what BerriAI/litellm repo is about. Search it on wikipedia",
      "type": "message"
    }
  ],
    "tools": [
        {
            "type": "mcp",
            "server_label": "litellm",
            "server_url": "litellm_proxy",
            "require_approval": "never"
        }
    ],
    "tool_choice": "required"
}'

Filters to just 1 MCP

headers:
x-litellm-semantic-filter: 4->1

{"id":"resp_1ftsD32DWWG459FCerymTdFaCFJYai0VPU_e897qgbzmm5dbMUTiSlwGU3DsMJXhzgdaFfhybvep2uy0Ht4Cy0g17pJ-HyE-_ATscZGXfFEz9IrkJ65BnZ2hCKRweKuhrqhHX3Pugy_9eax2ivlTsYuCbWl4v1etoPCSZPSalF4UDQRlVM-s_6V0iO6HxRWsc0N70jCU9uhWxAfz6hY6itXyRGxrCi2eDko7wg5l4IPm1IJeS2i7bugKiZy9WtRCNQzKtxYXKtCK24UEWfK5HYdiwleBZhYKTtabvKiPnW3-leEVWez0eR_pcVZE6EZkkUhd2RXk8Lq_GnfqasWsi8LX9xtIEUM1jZzV6-vz8NdVAteQitCrUzL6lymx_K5hFL0lkt03ih6WdohUhDJNHyrwzWL74u6LQrvZyZP0YWkIs9uHm2I2Ne07YEb5C2yLbDBU8Kz9a-quRt-7QA4TU8OJ","created_at":1770084202,"error":null,"incomplete_details":null,"instructions":null,"metadata":{},"model":"gpt-4o","object":"response","output":[{"arguments":"{\"repoName\":\"BerriAI/litellm\",\"question\":\"What is the main focus of this repository?\"}","call_id":"call_9hSrc4e0xeRorNz0BFo6PQwl","name":"deepwiki-ask_question","type":"function_call","id":"fc_0a9e77dbc2331101006981576a955881a0a834d8678d94ab57","status":"completed"}],"parallel_tool_calls":true,"temperature":1.0,"tool_choice":"required","tools":[{"name":"deepwiki-ask_question","parameters":{"properties":{"repoName":{"anyOf":[{"type":"string"},{"items":{"type":"string"},"type":"array"}]},"question":{"type":"string"}},"required":["repoName","question"],"type":"object","additionalProperties":false},"strict":false,"type":"function","description":"Ask any question about a GitHub repository and get an AI-powered, context-grounded response.\n\nArgs:\n    repoName: GitHub repository or list of repositories (max 10) in owner/repo format\n    question: The question to ask about the repository"}],"top_p":1.0,"max_output_tokens":null,"previous_response_id":null,"reasoning":{"effort":null,"summary":null},"status":"completed","text":{"format":{"type":"text"},"verbosity":"medium"},"truncation":"disabled","usage":{"input_tokens":127,"input_tokens_details":{"audio_tokens":null,"cached_tokens":0,"text_tokens":null},"output_tokens":25,"output_tokens_details":{"reasoning_tokens":0,"text_tokens":null},"total_tokens":152,"cost":null},"user":null,"store":true,"background":false,"billing":{"payer":"developer"},"completed_at":1770084202,"frequency_penalty":0.0,"max_tool_calls":null,"presence_penalty":0.0,"prompt_cache_key":null,"prompt_cache_retention":null,"safety_identifier":null,"service_tier":"default","top_logprobs":0}

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Type

🆕 New Feature
✅ Test

Changes

vercel · 2026-02-02T21:42:35Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Feb 3, 2026 2:18am

greptile-apps · 2026-02-02T21:45:37Z

Greptile Overview

Greptile Summary

This PR adds semantic filtering support for MCP tools to reduce context window size and improve tool selection accuracy.

Key Changes:

SemanticMCPToolFilter class converts MCP tools to semantic-router Routes and filters them based on user query similarity
SemanticToolFilterHook pre-call hook integrates filtering into the request path before LLM inference
Comprehensive unit tests and E2E tests verify filtering behavior, edge cases, and hook integration
Filter uses existing LiteLLMRouterEncoder with configurable top_k and similarity_threshold parameters

How It Works:
The filter lazily initializes a SemanticRouter on first use, converting MCP tool descriptions to embeddings. When filtering tools, it queries the router with the user's message, returning only the top-k semantically relevant tools. The hook applies this filtering transparently before each LLM request containing tools.

Notable Patterns:

Follows existing hook patterns (similar to SkillsInjectionHook) by extending CustomLogger
Gracefully degrades by returning all tools if filtering fails or finds no matches
Filter instance accepts a pre-configured Router for embeddings rather than creating a new one

Confidence Score: 4/5

This PR is safe to merge with minor performance considerations to address
Code follows established patterns, has comprehensive tests, and handles errors gracefully. One logic issue around lazy router initialization in request path affects first-request latency. Minor performance optimization suggested for tool reordering.
Pay attention to semantic_tool_filter.py for the lazy initialization pattern that occurs on first request

Important Files Changed

Filename	Overview
litellm/proxy/_experimental/mcp_server/semantic_tool_filter.py	New semantic filtering implementation with lazy router initialization. Router rebuilt on first filter_tools call if needed.
litellm/proxy/hooks/semantic_tool_filter_hook.py	Pre-call hook that integrates semantic filter into request path. Follows standard hook patterns with proper error handling.
tests/mcp_tests/test_semantic_tool_filter_e2e.py	E2E test verifying hook filters tools correctly. Uses real Router instance with mocked embedding model.
tests/test_litellm/proxy/_experimental/mcp_server/test_semantic_tool_filter.py	Comprehensive unit tests covering filtering logic, edge cases, query extraction, and hook behavior. All tests use mocked dependencies.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Hook as SemanticToolFilterHook
    participant Filter as SemanticMCPToolFilter
    participant Router as SemanticRouter
    participant Encoder as LiteLLMRouterEncoder
    participant LLM

    Client->>Hook: async_pre_call_hook(data with tools)
    Hook->>Hook: Check call_type, tools, messages
    Hook->>Filter: extract_user_query(messages)
    Filter-->>Hook: "Send an email"
    Hook->>Filter: filter_tools(query, available_tools)
    
    alt Router not initialized (first call)
        Filter->>Filter: rebuild_router(tools)
        Filter->>Filter: _mcp_tools_to_routes(tools)
        Filter->>Router: Create SemanticRouter(routes, encoder)
        Router->>Encoder: Build embeddings for all routes
        Encoder->>Encoder: Use litellm_router_instance
        Router-->>Filter: Router ready with index
    end
    
    Filter->>Router: __call__(text=query)
    Router->>Encoder: Generate query embedding
    Encoder-->>Router: Query embedding
    Router->>Router: Find top-k matches by similarity
    Router-->>Filter: List[RouteChoice] (matched tool names)
    
    Filter->>Filter: Filter & reorder tools by matches
    Filter-->>Hook: Filtered tools (top-k)
    
    Hook->>Hook: Update data["tools"]
    Hook-->>Client: Modified data with filtered tools
    
    Client->>LLM: Request with filtered tools
    LLM-->>Client: Response

greptile-apps

_{4 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-02T21:45:41Z

litellm/proxy/_experimental/mcp_server/semantic_tool_filter.py

+            # Rebuild router if needed (first time or tools changed)
+            if self.tool_router is None:
+                verbose_logger.debug("Router not initialized, rebuilding...")
+                self.rebuild_router(available_tools)


Lazy router initialization happens in the critical request path on first filter_tools() call. According to custom rule 0c2a17ad, creating Router objects in the critical request path causes performance degradation. Consider pre-building the router during filter initialization or on a separate warmup call, rather than on the first actual request.

Context Used: Rule from dashboard - What: Avoid creating new database requests or Router objects in the critical request path.

Why: Cre... (source)

Prompt To Fix With AI

This is a comment left during a code review. Path: litellm/proxy/_experimental/mcp_server/semantic_tool_filter.py Line: 163:166 Comment: Lazy router initialization happens in the critical request path on first `filter_tools()` call. According to custom rule 0c2a17ad, creating Router objects in the critical request path causes performance degradation. Consider pre-building the router during filter initialization or on a separate warmup call, rather than on the first actual request. **Context Used:** Rule from `dashboard` - What: Avoid creating new database requests or Router objects in the critical request path. Why: Cre... ([source](https://app.greptile.com/review/custom-context?memory=0c2a17ad-5f29-423f-a48b-371852ac4169)) How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-02-02T21:45:41Z

litellm/proxy/_experimental/mcp_server/semantic_tool_filter.py

+            self.tool_router = SemanticRouter(
+                routes=routes,
+                encoder=LiteLLMRouterEncoder(
+                    litellm_router_instance=self.router_instance,
+                    model_name=self.embedding_model,
+                    score_threshold=self.similarity_threshold,
+                ),
+                auto_sync="local",  # Build index immediately
+            )


auto_sync="local" builds the embedding index immediately during SemanticRouter construction. This happens on the first request when router is None (line 164), adding latency to that first request. Consider pre-warming the router after initialization or providing an explicit warmup method.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review. Path: litellm/proxy/_experimental/mcp_server/semantic_tool_filter.py Line: 117:125 Comment: `auto_sync="local"` builds the embedding index immediately during `SemanticRouter` construction. This happens on the first request when router is None (line 164), adding latency to that first request. Consider pre-warming the router after initialization or providing an explicit warmup method. <sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub> How can I resolve this? If you propose a fix, please make it concise.

litellm/proxy/_experimental/mcp_server/semantic_tool_filter.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

Two bugs from PR #20296 (MCP Semantic Filtering): 1. filter_tools() returned all tools when tool_router was None instead of lazily building the router from available_tools. This caused top_k to never limit results. 2. async_pre_call_hook() crashed with KeyError when data dict had no metadata key, causing the exception handler to return None (no filtering applied).

ishaan-jaff added 4 commits February 2, 2026 13:29

init: SemanticMCPToolFilter

b2c342f

init: SemanticToolFilterHook

1df971f

test_e2e_semantic_filter

bdd7bcb

mock tests: test_semantic_filter_basic_filtering

a14ee74

greptile-apps bot reviewed Feb 2, 2026

View reviewed changes

Update litellm/proxy/_experimental/mcp_server/semantic_tool_filter.py

3e80bb2

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

vercel bot deployed to Preview February 2, 2026 21:48 View deployment

ishaan-jaff added 2 commits February 2, 2026 13:48

refactor folder/file organization

c5304dc

docs fix

722b1f4

vercel bot deployed to Preview February 2, 2026 21:50 View deployment

fix filter

dfb26c7

vercel bot deployed to Preview February 2, 2026 21:57 View deployment

fix: filter_tools

ad3b06f

vercel bot deployed to Preview February 2, 2026 22:03 View deployment

ishaan-jaff added 7 commits February 2, 2026 14:33

fix linting tool filrer

f452038

initialize_from_config

8a02888

fix: _expand_mcp_tools

8913390

_initialize_semantic_tool_filter

6751c73

working: async_post_call_response_headers_hook

3967326

clean up semantic tool filter

9f6257c

add _initialize_semantic_tool_filter

9464055

vercel bot deployed to Preview February 3, 2026 01:42 View deployment

ishaan-jaff added 3 commits February 2, 2026 18:09

build_router_from_mcp_registry

1b84d0f

_get_tools_by_names

3825d93

fiix config

d8343f8

vercel bot deployed to Preview February 3, 2026 02:12 View deployment

async_post_call_response_headers_hook

839e114

vercel bot deployed to Preview February 3, 2026 02:18 View deployment

ishaan-jaff merged commit 079f49f into main Feb 3, 2026
43 of 65 checks passed

shin-bot-litellm mentioned this pull request Feb 3, 2026

fix: MCP semantic filter regression — top_k not limiting tools #20322

Closed

qdrddr mentioned this pull request Feb 10, 2026

[Feature]: MCP Semantic Tool Pre-Filtering improvements #20879

Open

1 task

nicholastripp mentioned this pull request Feb 28, 2026

[Bug]: MCP Semantic Tool Filter returns 0 tools — tool name mismatch between router index and expanded tools #22445

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feat] - MCP Semantic Filtering Support #20296

[Feat] - MCP Semantic Filtering Support #20296
ishaan-jaff merged 20 commits intomainfrom
litellm_mcp_semantic_filtering

ishaan-jaff commented Feb 2, 2026 •

edited

Loading

Uh oh!

vercel bot commented Feb 2, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Feb 2, 2026

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 2, 2026

Uh oh!

greptile-apps bot Feb 2, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ishaan-jaff commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!