Skip to content

feat(mcp): introduce get_wren_guide tool for default prompt#1360

Merged
douenergy merged 4 commits intoCanner:mainfrom
goldmedal:feat/enhance-mcp-for-snowflake
Oct 31, 2025
Merged

feat(mcp): introduce get_wren_guide tool for default prompt#1360
douenergy merged 4 commits intoCanner:mainfrom
goldmedal:feat/enhance-mcp-for-snowflake

Conversation

@goldmedal
Copy link
Copy Markdown
Contributor

@goldmedal goldmedal commented Oct 30, 2025

  • introduce get_wren_guide tool to provide the guide and tips for AI agent.
  • Fix dryRun issue
  • Fix get_available_tables issue

Summary by CodeRabbit

  • New Features

    • Retrieve available functions from the engine
    • Retrieve current data source type
    • Get a customized usage guide/tips per data source (including source-specific tips)
    • Dry-run now returns actual query validation output
    • Available tables now returned as a structured list
  • Documentation

    • Expanded environment variable setup guidance with multiple configuration options and example env block
    • Added VsCode MCP Extension to compatible agents
    • Added usage tips, recommended prompts, examples and follow-ups
  • Chores

    • Updated MCP dependency to a newer version

@github-actions github-actions bot added documentation Improvements or additions to documentation dependencies Pull requests that update a dependency file labels Oct 30, 2025
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Oct 30, 2025

Walkthrough

This PR updates docs and server behavior: README expands env/config guidance and usage tips; wren.py adds GET /functions helper and endpoints (get_available_functions, get_current_data_source_type, get_wren_guide), renames dry_run→dryRun, adds header x-wren-fallback_disable, and changes get_available_tables return type to list[str]; pyproject bumps mcp[cli].

Changes

Cohort / File(s) Summary
Documentation
mcp-server/README.md
Expanded environment variable setup (root .env and MCP config), injected example env block (WREN_URL, CONNECTION_INFO_FILE, MDL_PATH), added VsCode MCP Extension to agent list, plus usage tips, a recommended prompt, and optional follow-ups.
Server API & Logic
mcp-server/app/wren.py
Added make_get_available_functions_request() helper; added endpoints get_available_functions(), get_current_data_source_type(), get_wren_guide(); renamed query param dry_rundryRun; added header x-wren-fallback_disable: "true" to requests; changed dry-run behavior to return Wren's textual response; changed get_available_tables() return type to list[str].
Dependencies
mcp-server/pyproject.toml
Upgraded dependency mcp[cli] from >=1.4.1 to >=1.19.0.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Client
    participant MCP_Server as "MCP Server"
    participant Wren as "Wren Engine"

    rect rgb(210,230,255)
    note over Client,MCP_Server: Get Available Functions (new)
    Client->>MCP_Server: get_available_functions()
    MCP_Server->>Wren: GET /functions
    Wren-->>MCP_Server: functions list
    MCP_Server-->>Client: return functions
    end

    rect rgb(220,255,230)
    note over Client,Wren: Dry-run query (modified)
    Client->>MCP_Server: query(sql, dryRun=true)
    MCP_Server->>Wren: POST /query (header: x-wren-fallback_disable: "true", param dryRun=true)
    Wren-->>MCP_Server: textual validation response
    MCP_Server-->>Client: return validation text
    end

    rect rgb(255,245,200)
    note over Client,MCP_Server: Get Guide (new, assembled server-side)
    Client->>MCP_Server: get_wren_guide()
    MCP_Server->>MCP_Server: assemble guide (data-source specific)
    MCP_Server-->>Client: return guide text
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Attention areas:
    • Verify callers of get_available_tables() for updated list[str] return type.
    • Confirm any tool/registration exposure for new endpoints (get_available_functions, get_wren_guide, get_current_data_source_type).
    • Validate renaming of query param dry_rundryRun and addition of x-wren-fallback_disable header.
    • Check compatibility with bumped mcp[cli] version.

Possibly related PRs

Suggested reviewers

  • douenergy

Poem

🐰 a hop, a nibble, then a cheer,

New guides and functions now appear,
dryRun whispers where old flags stood,
Tables tidy, lists feel good,
I bounce — the Wren grows wise and clear. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "feat(mcp): introduce get_wren_guide tool for default prompt" directly corresponds to the primary objective listed in the PR description, which is to introduce the new get_wren_guide tool. The title is specific and clear about what feature is being added and its purpose, without using vague or generic terms. While the PR includes additional changes such as dependency updates, bug fixes to dryRun and get_available_tables, and other new endpoints, the title appropriately captures the main feature that drives this changeset. The title is concise (61 characters, 7 words) and would help teammates quickly understand the primary purpose of the PR when scanning history.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@goldmedal goldmedal changed the title feat(mcp): introduce usage guide tool for default prompt feat(mcp): introduce get_wren_guide tool for default prompt Oct 30, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
mcp-server/README.md (1)

127-133: Add a language hint to the fenced block

markdownlint complains (MD040) because this block lacks a language identifier. Please mark it (for example text) so the linter passes.

-```
+```text
 Use the get_wren_guide() tool to learn how to use Wren Engine and discover available tools and examples.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3c14e99 and eeadb90.

⛔ Files ignored due to path filters (1)
  • mcp-server/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (3)
  • mcp-server/README.md (4 hunks)
  • mcp-server/app/wren.py (5 hunks)
  • mcp-server/pyproject.toml (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
mcp-server/app/wren.py (3)
ibis-server/wren/session/__init__.py (1)
  • sql (37-52)
wren-core/core/src/mdl/mod.rs (1)
  • mdl (197-232)
wren-core-py/src/extractor.rs (1)
  • mdl_base64 (174-234)
🪛 markdownlint-cli2 (0.18.1)
mcp-server/README.md

130-130: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: ci

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
mcp-server/app/wren.py (1)

329-329: Already fixed: String comparison is correct.

The past review comment flagged this line for using is instead of ==, but the current code correctly uses == for string equality comparison.

🧹 Nitpick comments (2)
mcp-server/app/wren.py (2)

197-197: LGTM! Accurate type signature.

The return type change from str to list[str] correctly reflects what the function returns, improving type safety and IDE support.

Minor: Remove trailing whitespace.

Line 203 contains trailing spaces.

Apply this diff:

-    return [table["name"] for table in mdl["models"]]   
+    return [table["name"] for table in mdl["models"]]

Also applies to: 203-203


22-29: Consider failing fast if required configuration is missing.

If MDL_PATH is not provided, data_source (line 25) and mdl_base64 (line 26) remain undefined, causing NameError when any tool that depends on them is called. Consider raising an error at startup or providing default values if the MCP server cannot function without this configuration.

Apply this diff to fail fast with a clear error message:

 if mdl_path:
     with open(mdl_path) as f:
         mdl_schema = json.load(f)
         data_source = mdl_schema["dataSource"].lower()
         mdl_base64 = dict_to_base64_string(mdl_schema)
         print(f"Loaded MDL {f.name}")  # noqa: T201
 else:
-    print("No MDL_PATH environment variable found")
+    raise RuntimeError("MDL_PATH environment variable is required but not found")

Alternatively, if the server should support running without MDL_PATH for certain operations, initialize with safe defaults:

 if mdl_path:
     with open(mdl_path) as f:
         mdl_schema = json.load(f)
         data_source = mdl_schema["dataSource"].lower()
         mdl_base64 = dict_to_base64_string(mdl_schema)
         print(f"Loaded MDL {f.name}")  # noqa: T201
 else:
     print("No MDL_PATH environment variable found")
+    data_source = None
+    mdl_base64 = None

Then add guards in functions that require these values.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eeadb90 and 37d91e4.

📒 Files selected for processing (1)
  • mcp-server/app/wren.py (5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
mcp-server/app/wren.py (2)
ibis-server/wren/session/__init__.py (1)
  • sql (37-52)
wren-core-py/src/extractor.rs (1)
  • mdl_base64 (174-234)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: ci
🔇 Additional comments (5)
mcp-server/app/wren.py (5)

40-40: LGTM! Backend compatibility improvements.

The addition of the x-wren-fallback_disable header and the parameter rename from dry_run to dryRun align with backend API requirements.

Also applies to: 44-44


85-95: LGTM! Consistent implementation.

The new helper function follows the established pattern for making HTTP requests and appropriately uses GET for fetching available functions.


162-163: LGTM! Improved validation feedback.

Returning the actual response text instead of a static message provides more informative feedback to users about query validation results.


302-308: LGTM! Useful new tool.

The function provides a straightforward way for users to discover available functions for their data source type.


317-362: LGTM! Comprehensive usage guide.

The guide provides helpful tips for using Wren Engine, with data source-specific guidance for Snowflake and generic fallback for other data sources.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
mcp-server/app/wren.py (2)

22-26: Critical: Initialize data_source at module level to prevent NameError.

This issue was flagged in a previous review but remains unresolved. The variable data_source is only assigned when mdl_path is truthy (line 25). If the MDL_PATH environment variable is not set or empty, data_source remains undefined, causing a NameError at runtime when accessed on lines 44, 63, 77, 90, 315, 326-328, 331, and 340-341.

Apply this diff to initialize data_source at module level:

 MDL_SCHEMA_PATH = "mdl.schema.json"
 connection_info_path = os.getenv("CONNECTION_INFO_FILE")
 # TODO: maybe we should log the number of tables and columns
 mdl_path = os.getenv("MDL_PATH")
+data_source = None
 
 if mdl_path:

310-317: The None check will fail if data_source is undefined.

Line 315 checks if data_source is None:, but if MDL_PATH was not set, data_source is never assigned, causing a NameError before the comparison executes. Once data_source is initialized at module level (as noted in the previous comment), this check will function correctly.

🧹 Nitpick comments (1)
mcp-server/app/wren.py (1)

332-337: Fix grammar in user-facing documentation.

Several grammatical issues reduce clarity:

  • Lines 335-336: "For process" should be "To process"
  • Lines 347, 355: "Avoid to use" should be "Avoid using"

Apply this diff:

-        3. For process semi-structure data type (e.g. `VARIANT`), you can use `get_path` function to extract the value from the semi-structure data.
-        4. For process array data type (e.g. `ARRAY`), you can use `UNNEST` function to flatten the array data. `UNNEST` only accepts array column as input. If you extract an array value by `get_path` function, you need to cast it to array type (by `to_array` function) before using `UNNEST`.
+        3. To process semi-structured data types (e.g. `VARIANT`), you can use the `get_path` function to extract values from the semi-structured data.
+        4. To process array data types (e.g. `ARRAY`), you can use the `UNNEST` function to flatten array data. `UNNEST` only accepts array columns as input. If you extract an array value with the `get_path` function, you need to cast it to an array type (using the `to_array` function) before using `UNNEST`.
-    Avoid to use database specific SQL syntax in your Wren SQL.
+    Avoid using database-specific SQL syntax in your Wren SQL.
-    5. Avoid to use `LATERAL` statement in your queries, as Wren Engine may not support it well. Use normal `JOIN` or `CROSS JOIN UNNEST` instead.
+    5. Avoid using `LATERAL` statements in your queries, as Wren Engine may not support them well. Use normal `JOIN` or `CROSS JOIN UNNEST` instead.

Also applies to: 347-347, 355-355

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 37d91e4 and b28d8c7.

📒 Files selected for processing (1)
  • mcp-server/app/wren.py (5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
mcp-server/app/wren.py (2)
wren-core/core/src/mdl/mod.rs (1)
  • mdl (197-232)
wren-core-py/src/extractor.rs (1)
  • mdl_base64 (174-234)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: ci
🔇 Additional comments (5)
mcp-server/app/wren.py (5)

85-95: LGTM!

The new make_get_available_functions_request function follows the same pattern as other request functions in the file, using consistent error handling and HTTP client usage.


162-163: Good improvement: returning actual response text.

Returning response.text instead of a static validation message provides more actionable feedback to users about what the dry run validated.


197-197: Return type correctly updated to match implementation.

Changing the return type from str to list[str] aligns with the actual return value on line 203. Note that this is a breaking change to the function signature that may affect existing callers.

Also applies to: 203-203


302-308: LGTM!

The new get_available_functions tool is well-documented and follows the established patterns in the file.


40-40: Header name is correct; no changes needed.

The header x-wren-fallback_disable with an underscore is the intentional design of the Wren Engine API. It is defined as a constant in ibis-server/app/dependencies.py and used consistently throughout the test suite. Similar headers like x-wren-db-statement_timeout follow the same underscore naming pattern, confirming this is the codebase's deliberate convention, not a deviation.

Comment on lines +325 to +342
tips = f"""
## Tips for using Wren Engine with {data_source.capitalize()}
You are connected to a {data_source.capitalize()} database via Wren Engine.
Here are some tips to use {data_source.capitalize()} effectively:
"""

if data_source == "snowflake":
tips += """
1. Snowflake supports semi-structured data types like VARIANT, OBJECT, and ARRAY. You can use these data types to store and query JSON data.
2. Snowflake has a rich set of built-in functions to process semi-structured data. You can use functions like GET_PATH, TO_VARIANT, TO_ARRAY, etc.
3. For process semi-structure data type (e.g. `VARIANT`), you can use `get_path` function to extract the value from the semi-structure data.
4. For process array data type (e.g. `ARRAY`), you can use `UNNEST` function to flatten the array data. `UNNEST` only accepts array column as input. If you extract an array value by `get_path` function, you need to cast it to array type (by `to_array` function) before using `UNNEST`.
"""
else:
tips += f"""
1. Use {data_source.capitalize()}'s specific functions and features to optimize your queries.
2. Refer to {data_source.capitalize()}'s documentation for more details on how to use its features effectively.
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add guard for None data source to prevent AttributeError.

Lines 326-328 and 340-341 call data_source.capitalize() without checking if data_source is None. After fixing the module-level initialization, you must add a guard in this function to handle the case where MDL hasn't been deployed.

Apply this diff to add the guard:

 @mcp.tool()
 async def get_wren_guide() -> str:
     """
     Understand how to use Wren Engine effectively to query your database
     """
+    
+    if data_source is None:
+        return "No data source connected. Please deploy the MDL first and assign `dataSource` field."
 
     tips = f"""
     ## Tips for using Wren Engine with {data_source.capitalize()}
🤖 Prompt for AI Agents
In mcp-server/app/wren.py around lines 325 to 342, the code calls
data_source.capitalize() and compares data_source to "snowflake" without
guarding against data_source being None; add a guard at the top of this block
that normalizes data_source into a safe variable (e.g. if not data_source:
safe_name = "Unknown" and safe_source = "" or safe_source = data_source.lower())
or return/emit a default tips string when MDL isn't deployed, then use safe_name
for display (safe_name.capitalize()) and safe_source for the snowflake
comparison (safe_source == "snowflake") so no AttributeError occurs when
data_source is None.

@goldmedal goldmedal requested a review from douenergy October 31, 2025 02:26
@douenergy douenergy merged commit ae298f4 into Canner:main Oct 31, 2025
5 checks passed
@douenergy
Copy link
Copy Markdown
Contributor

Thanks @goldmedal

@goldmedal goldmedal deleted the feat/enhance-mcp-for-snowflake branch November 5, 2025 05:37
nhaluc1005 pushed a commit to nhaluc1005/text2sql-practice that referenced this pull request Apr 3, 2026
…#1360)

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants