feat(ibis): introduce Wren engine python API and demo program#1256
feat(ibis): introduce Wren engine python API and demo program#1256douenergy merged 7 commits intoCanner:mainfrom
Conversation
WalkthroughThis update introduces Jupyter support to the Ibis Server, including Docker and poetry configuration, a demonstration notebook, and enhanced documentation. It adds new APIs for session context management and query execution, expands connector flexibility, and provides a detailed demo model and notebook for local and notebook-based workflows. Several internal APIs are extended for improved usability. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Docker/CLI
participant WrenAPI
participant Context
participant Task
participant Connector
participant DataSource
User->>Docker/CLI: Start container (with or without "jupyter")
Docker/CLI->>WrenAPI: (If "jupyter") Start Jupyter Lab
User->>WrenAPI: Import create_session_context
User->>WrenAPI: Call create_session_context(mdl_path, data_source, ...)
WrenAPI->>Context: Initialize session context
User->>Context: .sql("SELECT ...", properties)
Context->>Task: Create Task for SQL
User->>Task: .plan(sql)
Task->>Task: Rewrite SQL, transpile dialect
User->>Task: .execute(limit)
Task->>Connector: Execute query
Connector-->>Task: Return results
User->>Task: .formatted_result()
Task-->>User: Return formatted JSON/Arrow table
Possibly related PRs
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (7)
🚧 Files skipped from review as they are similar to previous changes (7)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 6
🔭 Outside diff range comments (1)
ibis-server/app/mdl/java_engine.py (1)
41-54: Add None check in _warmup method to prevent AttributeError.The
_warmupmethod accessesself.client.get()without checking ifself.clientisNone, which could cause anAttributeError.async def _warmup(self, timeout=30): + if self.client is None: + return for _ in range(timeout): try: response = await self.client.get("/v2/health")
🧹 Nitpick comments (10)
ibis-server/app/mdl/rewriter.py (1)
136-148: Consider using a distinct tracing span name.The synchronous method implementation is correct and follows the same logic as the async version. However, using the same span name
"embedded_rewrite"for both sync and async methods could make it difficult to distinguish between them in tracing/observability tools.Consider using a distinct span name for better observability:
- @tracer.start_as_current_span("embedded_rewrite", kind=trace.SpanKind.INTERNAL) + @tracer.start_as_current_span("embedded_rewrite_sync", kind=trace.SpanKind.INTERNAL) def rewrite_sync(Otherwise, the implementation correctly provides synchronous SQL rewriting capabilities that complement the async version.
ibis-server/README.md (1)
83-83: Fix markdown linting issues by adding language specifications.The static analysis tool identified missing language specifications for fenced code blocks. This improves syntax highlighting and readability.
-``` +```bash python -m wren local_file <mdl_path> <connection_info_path> -``` +``` -``` +```python Session created: Context(id=1352f5de-a8a7-4342-b2cf-015dbb2bba4f, data_source=local_file) You can now interact with the Wren session using the 'wren' variable: > task = wren.sql('SELECT * FROM your_table').execute() > print(task.results) > print(task.formatted_result()) Python 3.11.11 (main, Dec 3 2024, 17:20:40) [Clang 16.0.0 (clang-1600.0.26.4)] on darwin Type "help", "copyright", "credits" or "license" for more information. (InteractiveConsole) >>> -``` +``` -``` +```bash docker run --rm -p 8888:8888 ghcr.io/canner/wren-engine-ibis:latest jupyter -``` +``` -``` +```text http://localhost:8888/lab/tree/notebooks/demo.ipynb -``` +```Also applies to: 87-87, 101-101, 105-105
ibis-server/app/model/data_source.py (1)
78-115: Well-implemented method with minor redundancy in redshift handling.The
get_connection_infomethod provides excellent abstraction for connection info instantiation using pattern matching. However, the redshift handling is redundant.case DataSource.redshift: - if "redshift_type" in data and data["redshift_type"] == "redshift_iam": - return RedshiftConnectionInfo.model_validate(data) return RedshiftConnectionInfo.model_validate(data)Both branches return the same
RedshiftConnectionInfo.model_validate(data), making the conditional check unnecessary unless there are plans for different handling of IAM vs standard redshift connections.ibis-server/wren/__main__.py (2)
30-30: Fix typo in comment.- # The connection_info file proeduced by Wren AI dbt integration + # The connection_info file produced by Wren AI dbt integration
18-48: Improve argument parsing robustness.The current argument parsing is fragile and could benefit from using
argparsefor better error handling and help messages.import argparse def main(): """Main entry point for the Wren module.""" parser = argparse.ArgumentParser(description="Launch Wren interactive session") parser.add_argument("data_source", help="Data source identifier") parser.add_argument("mdl_path", help="Path to the model file") parser.add_argument("connection_info_path", nargs="?", help="Path to connection info JSON file") args = parser.parse_args() data_source = DataSource(args.data_source) connection_info = None if args.connection_info_path: # ... rest of connection info handling logicThis would provide better error messages, help text, and more robust argument validation.
ibis-server/wren/__init__.py (1)
56-58: Remove redundant file readability check.The
readable()check is unnecessary since if the file couldn't be opened, theopen()call would have already failed. This check adds no value and can be removed.- with open(mdl_path) as f: - if not f.readable(): - raise ValueError(f"Cannot read MDL file at {mdl_path}") - manifest = json.load(f) + with open(mdl_path) as f: + manifest = json.load(f)ibis-server/notebooks/demo.ipynb (1)
162-170: Fix inconsistency in access control documentation.The JSON example shows
"required": truebut the actual model file injaffle_shop_mdl.jsonhas"required": false. This inconsistency could confuse users.- "requiredProperties": [ - { - "name": "session_status", - "required": true - } - ], + "requiredProperties": [ + { + "name": "session_status", + "required": false + } + ],ibis-server/wren/session/__init__.py (3)
91-91: Add type annotation for input_sql parameter.The parameter should have a type annotation for consistency with the rest of the codebase.
- def plan(self, input_sql): + def plan(self, input_sql: str):
138-138: Fix incorrect error message.The error message refers to
transpile()but the actual method isplan().- raise ValueError("Dialect SQL is not set. Call transpile() first.") + raise ValueError("Dialect SQL is not set. Call plan() first.")
82-86: Improve robustness of properties handling.The method assumes all keys are strings but should handle edge cases more gracefully.
- def _lowrcase_properties(self, properties: dict | None) -> dict: + def _lowercase_properties(self, properties: dict | None) -> dict: """Convert all keys in the properties dictionary to lowercase.""" if properties is None: return {} - return {k.lower(): v for k, v in properties.items()} + return {str(k).lower(): v for k, v in properties.items()}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
ibis-server/poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (15)
ibis-server/Dockerfile(1 hunks)ibis-server/README.md(3 hunks)ibis-server/app/config.py(2 hunks)ibis-server/app/mdl/java_engine.py(3 hunks)ibis-server/app/mdl/rewriter.py(1 hunks)ibis-server/app/model/connector.py(4 hunks)ibis-server/app/model/data_source.py(3 hunks)ibis-server/entrypoint.sh(1 hunks)ibis-server/justfile(1 hunks)ibis-server/notebooks/demo.ipynb(1 hunks)ibis-server/pyproject.toml(3 hunks)ibis-server/resources/demo/jaffle_shop_mdl.json(1 hunks)ibis-server/wren/__init__.py(1 hunks)ibis-server/wren/__main__.py(1 hunks)ibis-server/wren/session/__init__.py(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: goldmedal
PR: Canner/wren-engine#1224
File: ibis-server/app/util.py:50-57
Timestamp: 2025-06-18T02:12:43.570Z
Learning: In the `to_json` function in `ibis-server/app/util.py`, the code intentionally uses `fetch_df()` to get a pandas DataFrame and then calls `to_dict(orient='split')` because this specific format is required for `orjson` serialization. The pandas conversion step is necessary to generate the correct dictionary structure for orjson.
ibis-server/app/model/connector.py (2)
Learnt from: goldmedal
PR: Canner/wren-engine#1224
File: ibis-server/app/util.py:49-56
Timestamp: 2025-06-18T02:23:34.040Z
Learning: DuckDB supports querying PyArrow Tables directly in SQL queries without needing to register them. When a pa.Table object is referenced in a FROM clause (e.g., "SELECT ... FROM df" where df is a pa.Table), DuckDB automatically handles the PyArrow object without requiring conn.register().
Learnt from: goldmedal
PR: Canner/wren-engine#1224
File: ibis-server/app/util.py:49-56
Timestamp: 2025-06-18T02:23:34.040Z
Learning: DuckDB supports querying PyArrow Tables directly in SQL queries without needing to register them. When a pa.Table object is referenced in a FROM clause (e.g., "SELECT ... FROM df" where df is a pa.Table), DuckDB automatically handles the PyArrow object via its "replacement scan" mechanism that recognizes Python variables referencing Arrow objects as SQL tables. No conn.register() call is required.
🧬 Code Graph Analysis (4)
ibis-server/app/mdl/rewriter.py (6)
ibis-server/tests/routers/v3/connector/local_file/test_query.py (1)
manifest_str(72-73)ibis-server/wren/session/__init__.py (1)
sql(36-51)wren-core/core/src/mdl/context.rs (1)
properties(82-88)ibis-server/app/mdl/core.py (1)
get_session_context(7-10)wren-core-py/src/context.rs (1)
transform_sql(190-206)wren-core/core/src/mdl/mod.rs (1)
transform_sql(351-365)
ibis-server/app/model/data_source.py (1)
ibis-server/app/model/__init__.py (5)
GcsFileConnectionInfo(438-460)LocalFileConnectionInfo(376-384)MinioFileConnectionInfo(410-435)RedshiftConnectionInfo(269-283)S3FileConnectionInfo(387-407)
ibis-server/wren/__main__.py (2)
ibis-server/app/model/data_source.py (2)
DataSource(49-114)get_connection_info(78-114)ibis-server/wren/__init__.py (1)
create_session_context(22-71)
ibis-server/wren/__init__.py (3)
ibis-server/app/model/data_source.py (1)
DataSource(49-114)ibis-server/wren/session/__init__.py (1)
Context(15-63)wren-core-base/manifest-macro/src/lib.rs (1)
manifest(26-56)
🪛 markdownlint-cli2 (0.17.2)
ibis-server/README.md
83-83: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
87-87: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
101-101: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
105-105: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Shellcheck (0.10.0)
ibis-server/entrypoint.sh
[error] 23-23: Argument mixes string and array. Use * or separate argument.
(SC2145)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: ci
🔇 Additional comments (18)
ibis-server/app/model/connector.py (3)
65-66: LGTM! Good flexibility enhancement.The change from mandatory to optional
limitparameter improves the flexibility of the query interface. This aligns well with the session/task management framework mentioned in the AI summary.
81-86: LGTM! Correct conditional limit application.The implementation correctly applies the limit only when it's not
None, which is the expected behavior for optional limits. The logic is sound and maintains backward compatibility.
125-132: LGTM! Consistent implementation with other connectors.The conditional limit application follows the same pattern as
SimpleConnector, ensuring consistency across connector implementations.ibis-server/justfile (1)
69-71: LGTM! Well-structured Jupyter recipe addition.The new
docker-run-jupyterrecipe follows the existing patterns and correctly:
- Maps the appropriate port (8888) for Jupyter Lab
- Uses the same image and environment configuration
- Passes the
jupytercommand to the container entrypointThis complements the Jupyter support added throughout the PR.
ibis-server/app/config.py (2)
36-36: LGTM! Good logging consistency improvement.Adding the default
correlation_idensures that the extra field referenced inlogger_formatis always available, preventing potential KeyError exceptions. This is a good defensive programming practice.
48-48: LGTM! Consistent application of default correlation_id.The same default
correlation_idis correctly applied to the diagnose logger configuration, maintaining consistency between both logging modes.ibis-server/entrypoint.sh (2)
6-20: LGTM! Well-configured Jupyter Lab setup.The Jupyter Lab configuration is comprehensive and appropriate for a containerized demo environment:
- Proper network binding (
--ip=0.0.0.0)- Security settings appropriate for demo/development use
- Good UX with default URL pointing to demo notebook
- Reasonable timeout and rate limit settings
The security settings (disabled token/password, allow_origin="*") are acceptable for demo environments but should be reviewed for production use.
28-29: LGTM! Clear startup messaging.Adding explicit startup messages improves the user experience by clearly indicating which mode the container is running in.
ibis-server/README.md (1)
77-108: Excellent documentation additions for new interactive features.The new sections clearly explain how to use the Python interactive mode and Jupyter notebook features, which aligns perfectly with the infrastructure changes being made in this PR.
ibis-server/Dockerfile (1)
64-93: Excellent Docker modifications for Jupyter support.The changes systematically add Jupyter support while maintaining the existing Ibis server functionality. Key improvements include:
- Poetry installation in runtime for dependency management
- Proper directory structure for notebooks and data
- Jupyter configuration generation
- Appropriate port exposure for both services
- Clean integration with the entrypoint script
ibis-server/app/mdl/java_engine.py (1)
13-27: Good defensive programming for optional endpoint configuration.The constructor changes properly handle the case where no endpoint is configured, with appropriate logging and None client handling.
ibis-server/pyproject.toml (3)
7-7: LGTM: Package inclusion for wren module.The addition of the
wrenpackage to the packages list correctly enables installation of the new Wren engine Python API.
51-61: LGTM: Jupyter dependency group configuration.The new optional Jupyter dependency group includes appropriate packages for notebook support. Making it optional is good practice to avoid forcing Jupyter installation for users who don't need it.
109-115: LGTM: Ruff exclusions for Jupyter files.The exclusion patterns appropriately handle Jupyter notebook files, checkpoint directories, and notebook folders to prevent linting issues with notebook content.
ibis-server/resources/demo/jaffle_shop_mdl.json (1)
1-167: LGTM: Well-structured demo model with access controls.The model definition effectively demonstrates key Wren features including:
- Model structure with proper column definitions and descriptions
- Row-level access control using session properties
- Column-level access control with threshold-based restrictions
- PII annotations for sensitive data
This provides a comprehensive example for users learning the Wren API.
ibis-server/wren/__init__.py (1)
22-71: LGTM: Well-structured session context creation function.The function properly validates required parameters, handles file operations, and creates the Context with appropriate error handling for missing parameters.
ibis-server/notebooks/demo.ipynb (1)
1-301: LGTM: Comprehensive demo notebook showcasing Wren capabilities.The notebook effectively demonstrates:
- Session creation and configuration
- SQL query execution and planning
- Dry run capabilities
- Row-level and column-level access controls
- Data visualization integration
This provides an excellent learning resource for new users.
ibis-server/wren/session/__init__.py (1)
15-63: LGTM: Well-designed Context class with appropriate lazy initialization.The Context class effectively manages session state and provides clean interfaces for SQL execution and connector management.
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (1)
ibis-server/Dockerfile (1)
68-72: Consider excluding large, non-runtime assets
notebooks/andwren/are copied into the slim runtime image.
Unless these are required in production, add them to.dockerignoreor conditionally copy only in a dedicated Jupyter image to keep the server image leaner and safer.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/workflows/ibis-ci.yml(1 hunks)ibis-server/Dockerfile(1 hunks)ibis-server/app/routers/v3/connector.py(5 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: goldmedal
PR: Canner/wren-engine#1224
File: ibis-server/app/util.py:50-57
Timestamp: 2025-06-18T02:12:43.570Z
Learning: In the `to_json` function in `ibis-server/app/util.py`, the code intentionally uses `fetch_df()` to get a pandas DataFrame and then calls `to_dict(orient='split')` because this specific format is required for `orjson` serialization. The pandas conversion step is necessary to generate the correct dictionary structure for orjson.
ibis-server/app/routers/v3/connector.py (1)
Learnt from: goldmedal
PR: Canner/wren-engine#1161
File: ibis-server/app/routers/v3/connector.py:78-83
Timestamp: 2025-05-05T02:27:29.829Z
Learning: The row-level access control implementation in Wren Engine filters headers with the prefix `X_WREN_VARIABLE_PREFIX` in `EmbeddedEngineRewriter.get_session_properties` and validates session property expressions in `access_control.rs` to ensure they only contain literal values, preventing SQL injection.
🧬 Code Graph Analysis (1)
ibis-server/app/routers/v3/connector.py (1)
ibis-server/app/dependencies.py (1)
exist_wren_variables_header(48-53)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: ci
🔇 Additional comments (6)
ibis-server/app/routers/v3/connector.py (4)
171-175: LGTM! Proper fallback logic enhancement.The addition of
java_engine_connector.client is Nonecorrectly prevents fallback to v2 API when no Java engine client is configured, maintaining consistency with the existing fallback prevention logic.
217-221: LGTM! Consistent implementation.The fallback logic correctly follows the same pattern as other endpoints, ensuring consistent behavior across the API.
265-269: LGTM! Maintains consistency.The implementation follows the established pattern correctly and maintains consistent fallback prevention logic.
320-324: LGTM! Correct implementation.The fallback logic is properly implemented and maintains consistency with other endpoints in the file.
ibis-server/Dockerfile (2)
64-66: Nice! EarlyWORKDIRimproves layer cache-useMoving
WORKDIR /appbefore the bulkCOPYprevents accidental file dumps into/and lets Docker cache subsequent layers more effectively.
92-95: ENTRYPOINT change acknowledgedSwitching from
CMDtoENTRYPOINTmakesentrypoint.shnon-overrideable bydocker run ... <cmd>.
Verify this is intentional; otherwise keep it asCMDso users can still run interactive shells easily.
|
Thanks @goldmedal. This is very convenient for interacting with MDL |
Description
Summary by CodeRabbit
New Features
Documentation
Chores
Bug Fixes