feat(mcp): introduce get_wren_guide tool for default prompt#1360
feat(mcp): introduce get_wren_guide tool for default prompt#1360douenergy merged 4 commits intoCanner:mainfrom
get_wren_guide tool for default prompt#1360Conversation
WalkthroughThis PR updates docs and server behavior: README expands env/config guidance and usage tips; wren.py adds GET /functions helper and endpoints (get_available_functions, get_current_data_source_type, get_wren_guide), renames dry_run→dryRun, adds header Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant MCP_Server as "MCP Server"
participant Wren as "Wren Engine"
rect rgb(210,230,255)
note over Client,MCP_Server: Get Available Functions (new)
Client->>MCP_Server: get_available_functions()
MCP_Server->>Wren: GET /functions
Wren-->>MCP_Server: functions list
MCP_Server-->>Client: return functions
end
rect rgb(220,255,230)
note over Client,Wren: Dry-run query (modified)
Client->>MCP_Server: query(sql, dryRun=true)
MCP_Server->>Wren: POST /query (header: x-wren-fallback_disable: "true", param dryRun=true)
Wren-->>MCP_Server: textual validation response
MCP_Server-->>Client: return validation text
end
rect rgb(255,245,200)
note over Client,MCP_Server: Get Guide (new, assembled server-side)
Client->>MCP_Server: get_wren_guide()
MCP_Server->>MCP_Server: assemble guide (data-source specific)
MCP_Server-->>Client: return guide text
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
get_wren_guide tool for default prompt
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
mcp-server/README.md (1)
127-133: Add a language hint to the fenced blockmarkdownlint complains (MD040) because this block lacks a language identifier. Please mark it (for example
text) so the linter passes.-``` +```text Use the get_wren_guide() tool to learn how to use Wren Engine and discover available tools and examples.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
mcp-server/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
mcp-server/README.md(4 hunks)mcp-server/app/wren.py(5 hunks)mcp-server/pyproject.toml(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
mcp-server/app/wren.py (3)
ibis-server/wren/session/__init__.py (1)
sql(37-52)wren-core/core/src/mdl/mod.rs (1)
mdl(197-232)wren-core-py/src/extractor.rs (1)
mdl_base64(174-234)
🪛 markdownlint-cli2 (0.18.1)
mcp-server/README.md
130-130: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: ci
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
mcp-server/app/wren.py (1)
329-329: Already fixed: String comparison is correct.The past review comment flagged this line for using
isinstead of==, but the current code correctly uses==for string equality comparison.
🧹 Nitpick comments (2)
mcp-server/app/wren.py (2)
197-197: LGTM! Accurate type signature.The return type change from
strtolist[str]correctly reflects what the function returns, improving type safety and IDE support.Minor: Remove trailing whitespace.
Line 203 contains trailing spaces.
Apply this diff:
- return [table["name"] for table in mdl["models"]] + return [table["name"] for table in mdl["models"]]Also applies to: 203-203
22-29: Consider failing fast if required configuration is missing.If
MDL_PATHis not provided,data_source(line 25) andmdl_base64(line 26) remain undefined, causingNameErrorwhen any tool that depends on them is called. Consider raising an error at startup or providing default values if the MCP server cannot function without this configuration.Apply this diff to fail fast with a clear error message:
if mdl_path: with open(mdl_path) as f: mdl_schema = json.load(f) data_source = mdl_schema["dataSource"].lower() mdl_base64 = dict_to_base64_string(mdl_schema) print(f"Loaded MDL {f.name}") # noqa: T201 else: - print("No MDL_PATH environment variable found") + raise RuntimeError("MDL_PATH environment variable is required but not found")Alternatively, if the server should support running without MDL_PATH for certain operations, initialize with safe defaults:
if mdl_path: with open(mdl_path) as f: mdl_schema = json.load(f) data_source = mdl_schema["dataSource"].lower() mdl_base64 = dict_to_base64_string(mdl_schema) print(f"Loaded MDL {f.name}") # noqa: T201 else: print("No MDL_PATH environment variable found") + data_source = None + mdl_base64 = NoneThen add guards in functions that require these values.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
mcp-server/app/wren.py(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
mcp-server/app/wren.py (2)
ibis-server/wren/session/__init__.py (1)
sql(37-52)wren-core-py/src/extractor.rs (1)
mdl_base64(174-234)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: ci
🔇 Additional comments (5)
mcp-server/app/wren.py (5)
40-40: LGTM! Backend compatibility improvements.The addition of the
x-wren-fallback_disableheader and the parameter rename fromdry_runtodryRunalign with backend API requirements.Also applies to: 44-44
85-95: LGTM! Consistent implementation.The new helper function follows the established pattern for making HTTP requests and appropriately uses GET for fetching available functions.
162-163: LGTM! Improved validation feedback.Returning the actual response text instead of a static message provides more informative feedback to users about query validation results.
302-308: LGTM! Useful new tool.The function provides a straightforward way for users to discover available functions for their data source type.
317-362: LGTM! Comprehensive usage guide.The guide provides helpful tips for using Wren Engine, with data source-specific guidance for Snowflake and generic fallback for other data sources.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
mcp-server/app/wren.py (2)
22-26: Critical: Initializedata_sourceat module level to prevent NameError.This issue was flagged in a previous review but remains unresolved. The variable
data_sourceis only assigned whenmdl_pathis truthy (line 25). If theMDL_PATHenvironment variable is not set or empty,data_sourceremains undefined, causing aNameErrorat runtime when accessed on lines 44, 63, 77, 90, 315, 326-328, 331, and 340-341.Apply this diff to initialize
data_sourceat module level:MDL_SCHEMA_PATH = "mdl.schema.json" connection_info_path = os.getenv("CONNECTION_INFO_FILE") # TODO: maybe we should log the number of tables and columns mdl_path = os.getenv("MDL_PATH") +data_source = None if mdl_path:
310-317: The None check will fail ifdata_sourceis undefined.Line 315 checks
if data_source is None:, but ifMDL_PATHwas not set,data_sourceis never assigned, causing aNameErrorbefore the comparison executes. Oncedata_sourceis initialized at module level (as noted in the previous comment), this check will function correctly.
🧹 Nitpick comments (1)
mcp-server/app/wren.py (1)
332-337: Fix grammar in user-facing documentation.Several grammatical issues reduce clarity:
- Lines 335-336: "For process" should be "To process"
- Lines 347, 355: "Avoid to use" should be "Avoid using"
Apply this diff:
- 3. For process semi-structure data type (e.g. `VARIANT`), you can use `get_path` function to extract the value from the semi-structure data. - 4. For process array data type (e.g. `ARRAY`), you can use `UNNEST` function to flatten the array data. `UNNEST` only accepts array column as input. If you extract an array value by `get_path` function, you need to cast it to array type (by `to_array` function) before using `UNNEST`. + 3. To process semi-structured data types (e.g. `VARIANT`), you can use the `get_path` function to extract values from the semi-structured data. + 4. To process array data types (e.g. `ARRAY`), you can use the `UNNEST` function to flatten array data. `UNNEST` only accepts array columns as input. If you extract an array value with the `get_path` function, you need to cast it to an array type (using the `to_array` function) before using `UNNEST`.- Avoid to use database specific SQL syntax in your Wren SQL. + Avoid using database-specific SQL syntax in your Wren SQL.- 5. Avoid to use `LATERAL` statement in your queries, as Wren Engine may not support it well. Use normal `JOIN` or `CROSS JOIN UNNEST` instead. + 5. Avoid using `LATERAL` statements in your queries, as Wren Engine may not support them well. Use normal `JOIN` or `CROSS JOIN UNNEST` instead.Also applies to: 347-347, 355-355
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
mcp-server/app/wren.py(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
mcp-server/app/wren.py (2)
wren-core/core/src/mdl/mod.rs (1)
mdl(197-232)wren-core-py/src/extractor.rs (1)
mdl_base64(174-234)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: ci
🔇 Additional comments (5)
mcp-server/app/wren.py (5)
85-95: LGTM!The new
make_get_available_functions_requestfunction follows the same pattern as other request functions in the file, using consistent error handling and HTTP client usage.
162-163: Good improvement: returning actual response text.Returning
response.textinstead of a static validation message provides more actionable feedback to users about what the dry run validated.
197-197: Return type correctly updated to match implementation.Changing the return type from
strtolist[str]aligns with the actual return value on line 203. Note that this is a breaking change to the function signature that may affect existing callers.Also applies to: 203-203
302-308: LGTM!The new
get_available_functionstool is well-documented and follows the established patterns in the file.
40-40: Header name is correct; no changes needed.The header
x-wren-fallback_disablewith an underscore is the intentional design of the Wren Engine API. It is defined as a constant inibis-server/app/dependencies.pyand used consistently throughout the test suite. Similar headers likex-wren-db-statement_timeoutfollow the same underscore naming pattern, confirming this is the codebase's deliberate convention, not a deviation.
| tips = f""" | ||
| ## Tips for using Wren Engine with {data_source.capitalize()} | ||
| You are connected to a {data_source.capitalize()} database via Wren Engine. | ||
| Here are some tips to use {data_source.capitalize()} effectively: | ||
| """ | ||
|
|
||
| if data_source == "snowflake": | ||
| tips += """ | ||
| 1. Snowflake supports semi-structured data types like VARIANT, OBJECT, and ARRAY. You can use these data types to store and query JSON data. | ||
| 2. Snowflake has a rich set of built-in functions to process semi-structured data. You can use functions like GET_PATH, TO_VARIANT, TO_ARRAY, etc. | ||
| 3. For process semi-structure data type (e.g. `VARIANT`), you can use `get_path` function to extract the value from the semi-structure data. | ||
| 4. For process array data type (e.g. `ARRAY`), you can use `UNNEST` function to flatten the array data. `UNNEST` only accepts array column as input. If you extract an array value by `get_path` function, you need to cast it to array type (by `to_array` function) before using `UNNEST`. | ||
| """ | ||
| else: | ||
| tips += f""" | ||
| 1. Use {data_source.capitalize()}'s specific functions and features to optimize your queries. | ||
| 2. Refer to {data_source.capitalize()}'s documentation for more details on how to use its features effectively. | ||
| """ |
There was a problem hiding this comment.
Add guard for None data source to prevent AttributeError.
Lines 326-328 and 340-341 call data_source.capitalize() without checking if data_source is None. After fixing the module-level initialization, you must add a guard in this function to handle the case where MDL hasn't been deployed.
Apply this diff to add the guard:
@mcp.tool()
async def get_wren_guide() -> str:
"""
Understand how to use Wren Engine effectively to query your database
"""
+
+ if data_source is None:
+ return "No data source connected. Please deploy the MDL first and assign `dataSource` field."
tips = f"""
## Tips for using Wren Engine with {data_source.capitalize()}🤖 Prompt for AI Agents
In mcp-server/app/wren.py around lines 325 to 342, the code calls
data_source.capitalize() and compares data_source to "snowflake" without
guarding against data_source being None; add a guard at the top of this block
that normalizes data_source into a safe variable (e.g. if not data_source:
safe_name = "Unknown" and safe_source = "" or safe_source = data_source.lower())
or return/emit a default tips string when MDL isn't deployed, then use safe_name
for display (safe_name.capitalize()) and safe_source for the snowflake
comparison (safe_source == "snowflake") so no AttributeError occurs when
data_source is None.
|
Thanks @goldmedal |
…#1360) Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
get_wren_guidetool to provide the guide and tips for AI agent.get_available_tablesissueSummary by CodeRabbit
New Features
Documentation
Chores