Skip to content

FEAT Add Tavily Search#1085

Merged
gwarmstrong merged 7 commits intomainfrom
georgea/add-tavily-search
Dec 11, 2025
Merged

FEAT Add Tavily Search#1085
gwarmstrong merged 7 commits intomainfrom
georgea/add-tavily-search

Conversation

@gwarmstrong
Copy link
Collaborator

@gwarmstrong gwarmstrong commented Dec 9, 2025

Adds Tavily Search tool implementation from @activatedgeek with a change to the tool name to match the Tavily remote MCP.

The exclude_domains_config parameter is a required configuration for the Tavily search tool. It must point to a JSON file specifying domains to exclude from all search queries.

This is intentionally mandatory to prevent accidentally training models on content from restricted domains. Users must explicitly configure domain exclusions before using the search tool.

Usage with nemo-skills

Specify exclude_domains_config via tool_overrides:

ns generate \
    --cluster local \
    --model Qwen/Qwen3-8B \
    --server_type vllm \
    --server_gpus 1 \
    --server_args '--enable-auto-tool-choice --tool-call-parser hermes' \
    --input_file data.jsonl \
    --output_dir outputs \
    ++tool_modules=[nemo_skills.mcp.servers.tavily_search_tool.TavilySearchTool] \
    ++tool_overrides.TavilySearchTool.exclude_domains_config=/path/to/exclude_domains.json

Or via Python API:

from nemo_skills.pipeline.cli import generate, wrap_arguments

generate(
    ctx=wrap_arguments(
        "++tool_modules=[nemo_skills.mcp.servers.tavily_search_tool.TavilySearchTool] "
        "++tool_overrides.TavilySearchTool.exclude_domains_config=/path/to/exclude_domains.json"
    ),
    cluster='local',
    model='Qwen/Qwen3-8B',
    server_type='vllm',
    server_gpus=1,
    server_args='--enable-auto-tool-choice --tool-call-parser hermes',
    input_file='data.jsonl',
    output_dir='outputs',
)

Exclude Domains JSON Format

The JSON file should follow this structure:

{
  "notices": [
    {
      "properties": [
        { "type": "domain", "value": "example.com" },
        { "type": "domain", "value": "restricted-site.org" }
      ]
    }
  ]
}

Behavior

  • Required: If not set, the tool raises a ValueError and refuses to run
  • Hidden from model: The exclude_domains argument is hidden via hide_args, so the model cannot see or override the exclusion list
  • Auto-injected: Excluded domains are automatically added to every search request

Summary by CodeRabbit

Release Notes

  • New Features
    • Integrated Tavily web search tool enabling real-time search functionality
    • Users can now perform web searches with query support and receive results
    • Requires Tavily API key configuration for authentication

✏️ Tip: You can customize this high-level summary in your review settings.

Co-authored-by: Sanyam Kapoor <sanyamk@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 9, 2025

📝 Walkthrough

Walkthrough

Adds a new MCP server tool for Tavily web search integration. The implementation provides an async search handler that calls the Tavily API with authentication, returns standardized responses via an ExecutionResult dataclass, and exposes a CLI entry point with API key management.

Changes

Cohort / File(s) Summary
Tavily MCP Search Tool
nemo_skills/mcp/servers/tavily_search_tool.py
New file introducing MCP server for Tavily web search. Includes: ExecutionResult dataclass for standardized responses, answer() async handler for search queries via httpx to Tavily API, TavilySearchTool class for MCP client configuration, and main() CLI entry point with API key argument parsing and server startup.

Sequence Diagram

sequenceDiagram
    actor Client
    participant MCP Server
    participant Handler
    participant Tavily API
    
    Client->>MCP Server: Call tavily-search tool with query
    MCP Server->>Handler: answer(query)
    Handler->>Handler: Validate API key
    Handler->>Tavily API: POST /search (with query & API key)
    alt API Success
        Tavily API-->>Handler: 200 + search results
        Handler->>Handler: Parse results
        Handler-->>MCP Server: ExecutionResult(result=data)
    else API Error
        Tavily API-->>Handler: Non-200 status
        Handler-->>MCP Server: ExecutionResult(error=message)
    end
    MCP Server-->>Client: Return ExecutionResult
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Single, focused new file with clear responsibility (Tavily API integration)
  • Standard patterns: dataclass definition, async handler, CLI argument parsing
  • External API interaction requires verification of error handling and authentication flow
  • httpx usage and Tavily API payload structure should be validated

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'FEAT Add Tavily Search' clearly describes the main change: adding a new Tavily Search tool implementation to the codebase.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch georgea/add-tavily-search

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 774cef6 and a79fafe.

📒 Files selected for processing (1)
  • nemo_skills/mcp/servers/tavily_search_tool.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
nemo_skills/mcp/servers/tavily_search_tool.py (1)
nemo_skills/mcp/tool_providers.py (2)
  • MCPClientTool (26-124)
  • apply_config_updates (54-58)
🪛 Ruff (0.14.8)
nemo_skills/mcp/servers/tavily_search_tool.py

94-94: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: pre-commit
  • GitHub Check: unit-tests
🔇 Additional comments (4)
nemo_skills/mcp/servers/tavily_search_tool.py (4)

30-33: LGTM!

Clean dataclass for standardizing tool responses with optional error/result fields.


36-39: LGTM!

Global API key pattern is appropriate here since it's set once at startup before the server begins processing requests.


74-85: LGTM!

The class correctly extends MCPClientTool and configures the stdio transport. Note that the subprocess will rely on the TAVILY_API_KEY environment variable since no --api-key argument is passed—this is appropriate as it avoids exposing the key in process listings.


88-99: LGTM!

The CLI setup provides flexible API key configuration with proper validation. The static analysis hint (TRY003) about exception message length is a minor style preference—the current inline message is clear and appropriate for this context.

Comment on lines +45 to +71
async def answer(
query: Annotated[str, Field(description="Search query.")],
) -> ExecutionResult:
"""Get a summary of search results from the web using Tavily."""

api_url = "https://api.tavily.com/search"

headers = {
"Authorization": f"Bearer {TAVILY_API_KEY}",
"Content-Type": "application/json",
}

payload = {
"query": query,
# "auto_parameters": False,
"search_depth": "basic",
"include_answer": "basic", ## or advanced.
}

async with httpx.AsyncClient() as client:
response = await client.post(api_url, headers=headers, json=payload)
if response.status_code != 200:
return {"error": response.json()["error"]}

result = response.json()["answer"]

return {"result": result}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Return type mismatch: function returns dicts but declares ExecutionResult.

The function signature declares ExecutionResult as the return type, but lines 67 and 71 return plain dictionaries. This creates a type inconsistency.

Apply this diff to return proper ExecutionResult instances:

-    async with httpx.AsyncClient() as client:
-        response = await client.post(api_url, headers=headers, json=payload)
+    async with httpx.AsyncClient(timeout=30.0) as client:
+        response = await client.post(api_url, headers=headers, json=payload)
         if response.status_code != 200:
-            return {"error": response.json()["error"]}
+            error_detail = response.json().get("error", response.text)
+            return ExecutionResult(error=str(error_detail))
 
-    result = response.json()["answer"]
+        result = response.json()["answer"]
 
-    return {"result": result}
+    return ExecutionResult(result=result)

This also:

  • Adds a timeout to prevent indefinite hangs.
  • Uses .get() with fallback for safer error extraction.
  • Moves response.json() inside the async with block for consistency.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def answer(
query: Annotated[str, Field(description="Search query.")],
) -> ExecutionResult:
"""Get a summary of search results from the web using Tavily."""
api_url = "https://api.tavily.com/search"
headers = {
"Authorization": f"Bearer {TAVILY_API_KEY}",
"Content-Type": "application/json",
}
payload = {
"query": query,
# "auto_parameters": False,
"search_depth": "basic",
"include_answer": "basic", ## or advanced.
}
async with httpx.AsyncClient() as client:
response = await client.post(api_url, headers=headers, json=payload)
if response.status_code != 200:
return {"error": response.json()["error"]}
result = response.json()["answer"]
return {"result": result}
async def answer(
query: Annotated[str, Field(description="Search query.")],
) -> ExecutionResult:
"""Get a summary of search results from the web using Tavily."""
api_url = "https://api.tavily.com/search"
headers = {
"Authorization": f"Bearer {TAVILY_API_KEY}",
"Content-Type": "application/json",
}
payload = {
"query": query,
# "auto_parameters": False,
"search_depth": "basic",
"include_answer": "basic", ## or advanced.
}
async with httpx.AsyncClient(timeout=30.0) as client:
response = await client.post(api_url, headers=headers, json=payload)
if response.status_code != 200:
error_detail = response.json().get("error", response.text)
return ExecutionResult(error=str(error_detail))
result = response.json()["answer"]
return ExecutionResult(result=result)

Comment on lines +64 to +69
async with httpx.AsyncClient() as client:
response = await client.post(api_url, headers=headers, json=payload)
if response.status_code != 200:
return {"error": response.json()["error"]}

result = response.json()["answer"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add error handling for JSON parsing and missing keys.

If the Tavily API returns a non-JSON error response or an unexpected JSON structure, the current code will raise unhandled exceptions (JSONDecodeError or KeyError).

Consider wrapping the response parsing in try-except:

     async with httpx.AsyncClient(timeout=30.0) as client:
         response = await client.post(api_url, headers=headers, json=payload)
+        try:
+            data = response.json()
+        except Exception as e:
+            return ExecutionResult(error=f"Failed to parse response: {e}")
+        
         if response.status_code != 200:
-            error_detail = response.json().get("error", response.text)
+            error_detail = data.get("error", response.text)
             return ExecutionResult(error=str(error_detail))
 
-        result = response.json()["answer"]
+        result = data.get("answer")
+        if result is None:
+            return ExecutionResult(error="No answer in response")
 
     return ExecutionResult(result=result)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In nemo_skills/mcp/servers/tavily_search_tool.py around lines 64 to 69, the code
assumes response.json() succeeds and the "error" or "answer" keys exist; wrap
the JSON parsing and key access in a try/except block that catches
JSONDecodeError and KeyError (or use response.is_json/content-type check),
attempt to parse JSON safely for both non-200 and 200 cases, and return a clear
error dict when parsing fails or keys are missing (e.g., include response.text
and status_code), otherwise extract and return result = parsed.get("answer")
after validating it's present.

Signed-off-by: George Armstrong <georgea@nvidia.com>

WIP return string

Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>

try direct tool call

Signed-off-by: George Armstrong <georgea@nvidia.com>

remove comment line

Signed-off-by: George Armstrong <georgea@nvidia.com>
@gwarmstrong gwarmstrong enabled auto-merge (squash) December 11, 2025 20:49
@gwarmstrong gwarmstrong disabled auto-merge December 11, 2025 20:50
@gwarmstrong gwarmstrong merged commit 5aca446 into main Dec 11, 2025
5 checks passed
@gwarmstrong gwarmstrong deleted the georgea/add-tavily-search branch December 11, 2025 20:50
gwarmstrong added a commit that referenced this pull request Dec 11, 2025
Signed-off-by: George Armstrong <georgea@nvidia.com>
Co-authored-by: Sanyam Kapoor <sanyamk@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
wasiahmad pushed a commit that referenced this pull request Dec 12, 2025
Signed-off-by: George Armstrong <georgea@nvidia.com>
Co-authored-by: Sanyam Kapoor <sanyamk@nvidia.com>
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
wasiahmad pushed a commit that referenced this pull request Dec 19, 2025
Signed-off-by: George Armstrong <georgea@nvidia.com>
Co-authored-by: Sanyam Kapoor <sanyamk@nvidia.com>
wasiahmad pushed a commit that referenced this pull request Dec 19, 2025
Signed-off-by: George Armstrong <georgea@nvidia.com>
Co-authored-by: Sanyam Kapoor <sanyamk@nvidia.com>

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
hsiehjackson pushed a commit that referenced this pull request Jan 13, 2026
Signed-off-by: George Armstrong <georgea@nvidia.com>
Co-authored-by: Sanyam Kapoor <sanyamk@nvidia.com>
Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>
wasiahmad pushed a commit that referenced this pull request Feb 4, 2026
Signed-off-by: George Armstrong <georgea@nvidia.com>
Co-authored-by: Sanyam Kapoor <sanyamk@nvidia.com>
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: George Armstrong <georgea@nvidia.com>
Co-authored-by: Sanyam Kapoor <sanyamk@nvidia.com>
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: George Armstrong <georgea@nvidia.com>
Co-authored-by: Sanyam Kapoor <sanyamk@nvidia.com>
Signed-off-by: dgitman <dgitman@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants