[Feat] - MCP Semantic Filtering Support #20296
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile OverviewGreptile SummaryThis PR adds semantic filtering support for MCP tools to reduce context window size and improve tool selection accuracy. Key Changes:
How It Works: Notable Patterns:
Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| litellm/proxy/_experimental/mcp_server/semantic_tool_filter.py | New semantic filtering implementation with lazy router initialization. Router rebuilt on first filter_tools call if needed. |
| litellm/proxy/hooks/semantic_tool_filter_hook.py | Pre-call hook that integrates semantic filter into request path. Follows standard hook patterns with proper error handling. |
| tests/mcp_tests/test_semantic_tool_filter_e2e.py | E2E test verifying hook filters tools correctly. Uses real Router instance with mocked embedding model. |
| tests/test_litellm/proxy/_experimental/mcp_server/test_semantic_tool_filter.py | Comprehensive unit tests covering filtering logic, edge cases, query extraction, and hook behavior. All tests use mocked dependencies. |
Sequence Diagram
sequenceDiagram
participant Client
participant Hook as SemanticToolFilterHook
participant Filter as SemanticMCPToolFilter
participant Router as SemanticRouter
participant Encoder as LiteLLMRouterEncoder
participant LLM
Client->>Hook: async_pre_call_hook(data with tools)
Hook->>Hook: Check call_type, tools, messages
Hook->>Filter: extract_user_query(messages)
Filter-->>Hook: "Send an email"
Hook->>Filter: filter_tools(query, available_tools)
alt Router not initialized (first call)
Filter->>Filter: rebuild_router(tools)
Filter->>Filter: _mcp_tools_to_routes(tools)
Filter->>Router: Create SemanticRouter(routes, encoder)
Router->>Encoder: Build embeddings for all routes
Encoder->>Encoder: Use litellm_router_instance
Router-->>Filter: Router ready with index
end
Filter->>Router: __call__(text=query)
Router->>Encoder: Generate query embedding
Encoder-->>Router: Query embedding
Router->>Router: Find top-k matches by similarity
Router-->>Filter: List[RouteChoice] (matched tool names)
Filter->>Filter: Filter & reorder tools by matches
Filter-->>Hook: Filtered tools (top-k)
Hook->>Hook: Update data["tools"]
Hook-->>Client: Modified data with filtered tools
Client->>LLM: Request with filtered tools
LLM-->>Client: Response
| # Rebuild router if needed (first time or tools changed) | ||
| if self.tool_router is None: | ||
| verbose_logger.debug("Router not initialized, rebuilding...") | ||
| self.rebuild_router(available_tools) |
There was a problem hiding this comment.
Lazy router initialization happens in the critical request path on first filter_tools() call. According to custom rule 0c2a17ad, creating Router objects in the critical request path causes performance degradation. Consider pre-building the router during filter initialization or on a separate warmup call, rather than on the first actual request.
Context Used: Rule from dashboard - What: Avoid creating new database requests or Router objects in the critical request path.
Why: Cre... (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: litellm/proxy/_experimental/mcp_server/semantic_tool_filter.py
Line: 163:166
Comment:
Lazy router initialization happens in the critical request path on first `filter_tools()` call. According to custom rule 0c2a17ad, creating Router objects in the critical request path causes performance degradation. Consider pre-building the router during filter initialization or on a separate warmup call, rather than on the first actual request.
**Context Used:** Rule from `dashboard` - What: Avoid creating new database requests or Router objects in the critical request path.
Why: Cre... ([source](https://app.greptile.com/review/custom-context?memory=0c2a17ad-5f29-423f-a48b-371852ac4169))
How can I resolve this? If you propose a fix, please make it concise.| self.tool_router = SemanticRouter( | ||
| routes=routes, | ||
| encoder=LiteLLMRouterEncoder( | ||
| litellm_router_instance=self.router_instance, | ||
| model_name=self.embedding_model, | ||
| score_threshold=self.similarity_threshold, | ||
| ), | ||
| auto_sync="local", # Build index immediately | ||
| ) |
There was a problem hiding this comment.
auto_sync="local" builds the embedding index immediately during SemanticRouter construction. This happens on the first request when router is None (line 164), adding latency to that first request. Consider pre-warming the router after initialization or providing an explicit warmup method.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: litellm/proxy/_experimental/mcp_server/semantic_tool_filter.py
Line: 117:125
Comment:
`auto_sync="local"` builds the embedding index immediately during `SemanticRouter` construction. This happens on the first request when router is None (line 164), adding latency to that first request. Consider pre-warming the router after initialization or providing an explicit warmup method.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Two bugs from PR #20296 (MCP Semantic Filtering): 1. filter_tools() returned all tools when tool_router was None instead of lazily building the router from available_tools. This caused top_k to never limit results. 2. async_pre_call_hook() crashed with KeyError when data dict had no metadata key, causing the exception handler to return None (no filtering applied).
[Feat] - MCP Semantic Filtering Support
Fixes #12079
This PR adds semantic filtering for MCP tools to reduce context window size and improve tool selection accuracy. The filter uses embeddings to identify and return only the most relevant tools for a given user query, preventing context window bloat when many MCP tools are available.
Implementation
The implementation leverages the existing semantic-router library (already an optional dependency) and LiteLLMRouterEncoder to provide semantic matching with zero new dependencies:
SemanticMCPToolFilter: Converts MCP tools to semantic-router Routes and filters based on query similarity
SemanticToolFilterHook: Pre-call hook that filters tools before LLM inference
Uses Router.aembedding() for generating embeddings via any LiteLLM-supported model
With this request
Filters to just 1 MCP
headers:
x-litellm-semantic-filter: 4->1
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unitCI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Type
🆕 New Feature
✅ Test
Changes