Skip to content

feat(wren): add standalone wren Python SDK package#1471

Merged
goldmedal merged 11 commits intoCanner:mainfrom
goldmedal:feat/cli-module
Mar 23, 2026
Merged

feat(wren): add standalone wren Python SDK package#1471
goldmedal merged 11 commits intoCanner:mainfrom
goldmedal:feat/cli-module

Conversation

@goldmedal
Copy link
Copy Markdown
Contributor

@goldmedal goldmedal commented Mar 23, 2026

Summary

  • Add a new wren/ module — standalone Python SDK and CLI wrapping wren-core-py + Ibis connectors into a single installable package
  • Includes WrenEngine facade, Typer CLI, per-datasource connectors (postgres, bigquery, duckdb, etc.), and MDL model types
  • Adds CI workflow (wren-ci.yml) for lint, unit tests, and postgres connector tests
  • Fixes psycopg v3 API usage in test setup (executemany requires cursor)
  • Adds missing pyarrow-hotfix dependency

Test plan

  • just test-unit passes unit tests
  • just test-postgres passes postgres connector tests with testcontainers
  • just lint passes ruff format and lint checks
  • CI workflow triggers on PR and passes all jobs

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Release of the Wren Python SDK and CLI for planning, transpiling, validating, and executing semantic-layer SQL
    • Added connectors for many data sources (Postgres, DuckDB, BigQuery, Redshift, Databricks, Spark, MySQL/Doris, MSSQL, ClickHouse, Trino, Snowflake, Oracle, Canner)
  • Documentation

    • Added README and developer guide with install, testing, and usage instructions
  • Tests

    • Connector-focused integration suites and unit tests for engine behavior
  • Chores

    • New CI workflow, developer automation tasks, and packaging/build configuration added

goldmedal and others added 5 commits March 23, 2026 09:41
Introduces a new top-level `wren/` package — a standalone SDK and CLI
for the Wren semantic SQL layer that can be installed independently of
ibis-server.

Key components:
- `WrenEngine`: SQL transform (MDL → dialect SQL) + execute against 20+ data sources
- `wren.connector/`: per-connector files with lazy DB driver imports
- `wren.model/`: ConnectionInfo models, DataSource enum, WrenError
- `wren.mdl/`: wren-core-py SessionContext wrapper
- `wren.cli`: Typer CLI with `query`, `dry-run`, `transpile`, `validate` commands
- `pyproject.toml`: hatchling build, optional extras per connector (wren[postgres], etc.)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Use cursor for executemany() calls (psycopg v3 API requires it)
- Add pyarrow-hotfix to core dependencies
- Support --dev flag in `just install-extra`
- Add fmt alias in justfile

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Runs lint, unit tests, and postgres connector tests on PRs touching
wren/, wren-core/, wren-core-py/, or wren-core-base/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added documentation Improvements or additions to documentation bigquery ibis dependencies Pull requests that update a dependency file python Pull requests that update Python code ci labels Mar 23, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 23, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new Wren Python package and CLI: a WrenEngine for manifest-based planning/transpilation/execution, a connector framework with many backend implementations, typed connection models and errors, test suites (unit and connector), packaging/automation, and a targeted GitHub Actions CI workflow.

Changes

Cohort / File(s) Summary
CI & Automation
​.github/workflows/wren-ci.yml, wren/justfile
New GitHub Actions workflow scoped to wren-related paths; justfile introduces build/install/test/lint tasks and helpers to build/use local wren-core-py wheels.
Packaging & Docs
wren/pyproject.toml, wren/README.md, wren/.claude/CLAUDE.md, ibis-server/docs/development.md, .claude/settings.json, ibis-server/poetry.toml
Project metadata and extras, CLI entry point, README/CLAUDE docs, updated agent settings, dev install/test instructions, and poetry in-project venv config.
Public API & CLI
wren package
wren/src/wren/__init__.py, wren/src/wren/cli.py
Package exports and a Typer CLI exposing query, dry-run, transpile, validate with manifest/connection loaders and multi-format result printing.
Engine & MDL
wren/src/wren/engine.py, wren/src/wren/mdl/__init__.py
WrenEngine implementing manifest-aware planning/transpile flows, session-context caching, connector lifecycle, and error wrapping.
Connector Core & Factory
wren/src/wren/connector/__init__.py, wren/src/wren/connector/base.py, wren/src/wren/connector/factory.py
ConnectorABC and IbisConnector abstractions (type normalization), dynamic connector resolution with import hints and data-source dispatch.
Connectors (per-backend)
wren/src/wren/connector/...
bigquery.py, canner.py, databricks.py, duckdb.py, ibis.py, mysql.py, mssql.py, postgres.py, redshift.py, spark.py
New per-backend connectors returning pyarrow tables, with backend-specific behaviors (DuckDB file discovery/attach, BigQuery job configs, Databricks cursor handling, MSSQL pagination rewrite, Postgres cancel/close, Redshift/pandas→Arrow conversion, Spark session flow) and create_connector factories.
Models & DataSource
wren/src/wren/model/__init__.py, wren/src/wren/model/data_source.py, wren/src/wren/model/error.py
Pydantic connection models/unions for many sources, DataSource enum and backend factories (including URL parsing and SSL handling), plus structured error types (WrenError, ErrorCode, ErrorPhase, DatabaseTimeoutError).
Tests & Test Utilities
wren/tests/...
conftest.py, suite/*, connectors/*, unit/*
Pytest markers, TPCH manifest generator, shared WrenQueryTestSuite, DuckDB and Postgres integration tests, and unit tests for engine planning/transpile.
Agent config
.claude/settings.json
Extended allowlist entry to permit Bash(uv sync:*) commands.

Sequence Diagram(s)

sequenceDiagram
    participant Client as "Client"
    participant Engine as "WrenEngine"
    participant MDL as "SessionContext (wren_core)"
    participant Connector as "Connector (e.g., Postgres/DuckDB)"
    participant DB as "Database"

    Client->>Engine: query(sql, limit)
    Engine->>MDL: transform_sql(sql)
    MDL-->>Engine: planned_sql
    Engine->>Engine: transpile(planned_sql) -> dialect_sql
    Engine->>Connector: query(dialect_sql, limit)
    Connector->>DB: execute(dialect_sql)
    DB-->>Connector: pa.Table
    Connector->>Connector: normalize types (Decimal/UUID/JSON)
    Connector-->>Engine: pa.Table
    Engine-->>Client: pa.Table
    Note over Engine,Connector: Exceptions at any step are wrapped as WrenError with ErrorPhase/metadata
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • douenergy

Poem

🐰
I hopped through manifests and SQL streams,
Planned and transpiled a nest of dreams,
From DuckDB burrows to Postgres plains,
Connectors hum like springtime trains,
A tiny wren sings: "Run the tests!"

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.90% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely summarizes the main addition: a standalone wren Python SDK package. It directly corresponds to the primary change in the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

goldmedal and others added 2 commits March 23, 2026 10:25
The lint job only needs ruff, not the full dependency tree.
Using uvx avoids the missing platform wheel error for wren-core-py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 19

🧹 Nitpick comments (22)
wren/.claude/CLAUDE.md (1)

7-18: Add language specifier to fenced code block.

The directory structure code block lacks a language specifier. Per Markdown best practices, specify a language (e.g., text or plaintext) for syntax highlighting consistency.

Proposed fix
-```
+```text
 wren/
   src/wren/
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/.claude/CLAUDE.md` around lines 7 - 18, Update the fenced code block in
CLAUDE.md to include a language specifier (e.g., replace the opening ``` with
```text or ```plaintext) so the directory tree (showing engine.py, cli.py, mdl/,
connector/, model/data_source.py, model/error.py, tests/) gets proper syntax
highlighting; locate the block that starts with ``` and the directory listing
and change only the opening fence to include the specifier.
wren/pyproject.toml (1)

49-49: The all extra excludes spark.

The spark extra is defined but not included in the all aggregate. If this is intentional (e.g., due to PySpark's size), consider adding a brief comment. Otherwise, include it for completeness.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/pyproject.toml` at line 49, The 'all' extras entry currently omits the
'spark' extra; update the pyproject.toml so the all aggregate either includes
"spark" alongside the other extras or add a brief inline comment next to the
'all' definition explaining why 'spark' is intentionally excluded (e.g., PySpark
size/footprint), referencing the 'all' extras line and the existing 'spark'
extra name so reviewers can see the intended behavior.
wren/tests/unit/test_engine.py (2)

121-127: Test doesn't verify close behavior for an initialized connector.

The test name suggests it verifies connector cleanup, but since e._connector is never initialized (no query() or dry_run() call), the test only confirms that _connector remains None throughout. Consider adding an assertion that actually triggers connector initialization and verifies cleanup:

Proposed enhancement
 def test_context_manager_closes_connector() -> None:
-    conn_info = {"url": "/tmp", "format": "duckdb"}
+    import tempfile
+    with tempfile.TemporaryDirectory() as tmp_dir:
+        conn_info = {"url": tmp_dir, "format": "duckdb"}
-    with WrenEngine(_MANIFEST_STR, DataSource.duckdb, conn_info) as e:
-        assert e._connector is None  # connector is lazily initialized
+        with WrenEngine(_MANIFEST_STR, DataSource.duckdb, conn_info) as e:
+            assert e._connector is None  # connector is lazily initialized
+            # Optionally trigger initialization to test actual close:
+            # _ = e._get_connector()
+            # assert e._connector is not None

-    # After __exit__, internal state is cleaned up
-    assert e._connector is None
+        # After __exit__, internal state is cleaned up
+        assert e._connector is None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/tests/unit/test_engine.py` around lines 121 - 127, The test currently
never initializes WrenEngine._connector so it only asserts None; update
test_context_manager_closes_connector to trigger connector creation (call a
method that lazily initializes it such as WrenEngine.query(...) or
WrenEngine.dry_run()) inside the with block, assert e._connector is not None
while inside the context, then after exiting the with (which invokes
WrenEngine.__exit__), assert e._connector is None to verify cleanup; reference
WrenEngine, test_context_manager_closes_connector, query/dry_run, __exit__, and
_connector when making the change.

10-10: Unused import.

math is imported but never used in the test file.

Proposed fix
 import base64
-import math

 import orjson
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/tests/unit/test_engine.py` at line 10, Remove the unused import symbol
"math" from the top of the test module (in test_engine.py) so the file no longer
imports a module that isn't used; if you intended to use math in tests instead,
add the actual usage (e.g., math.isclose) where appropriate, otherwise delete
the import line.
wren/src/wren/connector/spark.py (1)

40-48: Consider logging suppressed exceptions in close().

Silently swallowing all exceptions during close() can hide important errors. Consider logging at debug/warning level to aid debugging while still ensuring cleanup completes.

Proposed fix
+from loguru import logger
+
     def close(self) -> None:
         if self._closed:
             return
         try:
             self.connection.stop()
-        except Exception:
-            pass
+        except Exception as e:
+            logger.debug(f"Exception during Spark session close: {e}")
         finally:
             self._closed = True
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/connector/spark.py` around lines 40 - 48, The close method
currently swallows all exceptions from self.connection.stop(), which should be
logged; update close(self) in the class to catch Exception as e (around the
self.connection.stop() call), log the exception with a debug or warning level
(including the exception object and traceback) using the module/class logger (or
create one if missing), and still set self._closed = True in the finally block
so cleanup always completes; keep the existing early-return when self._closed is
already True.
wren/tests/connectors/test_duckdb.py (1)

35-48: Return type annotation mismatch for generator fixture.

The fixture uses yield but declares -> WrenEngine. For a yielding fixture, the type hint should be Generator[WrenEngine, None, None] or Iterator[WrenEngine].

Also, the comment on line 40 states "1500 orders, 150 customers" but per the WrenQueryTestSuite defaults (order_count=15000, customer_count=1500 for sf=0.01), this should be "15000 orders, 1500 customers".

Proposed fix
+from collections.abc import Generator
+
 class TestDuckDB(WrenQueryTestSuite):
     manifest = make_tpch_manifest(table_catalog=_CATALOG, table_schema=_SCHEMA)
     # DuckDB TPCH dbgen produces INTEGER as int64 in Arrow
     order_id_dtype = "int64"

     `@pytest.fixture`(scope="class")
-    def engine(self, tmp_path_factory) -> WrenEngine:  # type: ignore[override]
+    def engine(self, tmp_path_factory) -> Generator[WrenEngine, None, None]:  # type: ignore[override]
         db_dir = tmp_path_factory.mktemp("duckdb")
         db_path = db_dir / "tpch.duckdb"

-        # Generate TPCH sf=0.01 (1500 orders, 150 customers) into the file.
+        # Generate TPCH sf=0.01 (15000 orders, 1500 customers) into the file.
         con = duckdb.connect(str(db_path))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/tests/connectors/test_duckdb.py` around lines 35 - 48, The engine pytest
fixture currently declares a non-generator return type WrenEngine but uses
yield; change its annotation to a generator type such as Generator[WrenEngine,
None, None] or Iterator[WrenEngine] on the def engine(...) signature, and update
the in-code comment that describes TPCH generation in the engine fixture from
"1500 orders, 150 customers" to the correct "15000 orders, 1500 customers" per
the WrenQueryTestSuite defaults; locate the changes in the engine fixture and
adjust imports if needed (typing.Generator or typing.Iterator).
.github/workflows/wren-ci.yml (1)

64-73: Consider extracting wren-core-py build steps into a composite action.

The wren-core-py wheel build steps (lines 64-69, 100-105) are duplicated across test-unit and test-postgres jobs. As the number of connector test jobs grows, consider extracting this into a reusable composite action to reduce maintenance overhead.

Also applies to: 100-109

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/wren-ci.yml around lines 64 - 73, Duplicate step sequence
that builds the wren-core-py wheel and installs deps (the steps named "Build
wren-core-py wheel", "Install dependencies" and any identical blocks in the
`test-unit` and `test-postgres` jobs) should be extracted into a reusable
composite action; create a composite action (e.g.,
.github/actions/build-wren-core-py/action.yml) that runs the pipx/poetry/maturin
build and the uv sync install, accept any inputs needed (like working-directory
or extra flags), then replace the duplicated step blocks in both jobs with a
single uses: ./github/actions/build-wren-core-py (or appropriate path)
invocation and wire any inputs/outputs through the action so downstream steps
(like "Run unit tests") remain unchanged.
wren/src/wren/connector/canner.py (1)

37-49: Consider reusing base class helpers for type handling.

This duplicates logic from IbisConnector._round_decimal_columns and _cast_uuid_columns. If CannerConnector extended IbisConnector (or a shared mixin), you could call the existing helpers instead of reimplementing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/connector/canner.py` around lines 37 - 49, The
_handle_pyarrow_unsupported_type method in CannerConnector duplicates decimal
and UUID handling already implemented as IbisConnector._round_decimal_columns
and IbisConnector._cast_uuid_columns; change CannerConnector to reuse those
helpers instead of reimplementing: have CannerConnector inherit from
IbisConnector or import the shared mixin, then replace the loop in
_handle_pyarrow_unsupported_type with calls to
IbisConnector._round_decimal_columns(ibis_table, ...) and
IbisConnector._cast_uuid_columns(...) (or the equivalent mixin methods) so
decimals and UUIDs are processed by the existing helpers, returning the final
transformed Table.
wren/src/wren/connector/bigquery.py (2)

16-20: Minor typo: credits_jsoncreds_json.

The variable name appears to be a typo—"credits" should likely be "creds" (short for credentials).

✏️ Suggested fix
-        credits_json = loads(
+        creds_json = loads(
             base64.b64decode(connection_info.credentials.get_secret_value()).decode(
                 "utf-8"
             )
         )
         credentials = service_account.Credentials.from_service_account_info(
-            credits_json
+            creds_json
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/connector/bigquery.py` around lines 16 - 20, Rename the
mistakenly named variable credits_json to creds_json in the BigQuery connector:
update the assignment statement that decodes and loads the base64 credentials
(the variable currently declared as credits_json) and replace all subsequent
references to that variable within the same scope (e.g., anywhere creds_json is
used after the loads(...) call) to use creds_json so the name matches
"credentials" shorthand; ensure no other variables collide and run tests/imports
to verify no unresolved references.

24-29: Consider deferring OAuth scope expansion to retry path.

The current implementation adds drive and cloud-platform scopes eagerly during initialization. Based on learnings from a previous PR, credentials with additional scopes should only be created in the retry path when handling empty results with special types, not during initialization, to maintain lazy initialization.

If this connector needs to handle Google Sheets or other drive-backed tables, consider moving scope expansion to a retry mechanism when the initial query returns empty results for special types.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/connector/bigquery.py` around lines 24 - 29, The code currently
calls credentials.with_scopes([...]) eagerly during connector initialization
(the credentials variable in bigquery.py); remove that eager scope expansion and
keep the original credentials untouched at init. Instead, perform
credentials.with_scopes(...) only in the retry path that handles empty results
for special Drive-backed types (the method that detects empty query results and
retries—e.g., the connector's retry/query handling function), using the same
scopes
["https://www.googleapis.com/auth/drive","https://www.googleapis.com/auth/cloud-platform"]
just before re-creating the BigQuery client for the retry attempt.
wren/src/wren/mdl/__init__.py (1)

8-17: Consider bounded cache for long-running processes.

@cache provides unbounded memoization. For CLI usage this is fine, but if this module is used in a long-running server with many different manifests, consider @lru_cache(maxsize=N) to prevent unbounded memory growth.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/mdl/__init__.py` around lines 8 - 17, The get_session_context
function is decorated with functools.cache which is unbounded; replace it with
functools.lru_cache(maxsize=...) (or configurable maxsize) to prevent unbounded
memory growth in long‑running processes: update the decorator on
get_session_context, add/import lru_cache from functools, and choose a
reasonable maxsize (or read from config/env) so SessionContext instances are
memoized but capped.
wren/src/wren/connector/ibis.py (2)

29-29: Style nit: prefer != over not ... ==.

if not e.error_name == "EXCEEDED_TIME_LIMIT" is clearer as if e.error_name != "EXCEEDED_TIME_LIMIT". Same applies to line 46.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/connector/ibis.py` at line 29, Replace the awkward "not ... ==
..." style checks with "!=" comparisons: change the condition in the exception
handling where the code uses if not e.error_name == "EXCEEDED_TIME_LIMIT" to use
if e.error_name != "EXCEEDED_TIME_LIMIT" (also update the same pattern found
later in the file around the other exception branch); look for the exception
handler referencing e.error_name in ibis.py to make both replacements.

99-103: Simplify redundant conditional.

Lines 101-102 can be simplified since the condition explicitly returns the same type.

✏️ Suggested simplification
 def create_connector(data_source: DataSource, connection_info) -> IbisConnector:
     cls = _DATA_SOURCE_TO_CLASS.get(data_source, IbisConnector)
-    if cls is IbisConnector:
-        return IbisConnector(data_source, connection_info)
-    return cls(connection_info)
+    return cls(data_source, connection_info) if cls is IbisConnector else cls(connection_info)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/connector/ibis.py` around lines 99 - 103, The create_connector
function can be simplified by removing the explicit IbisConnector constructor
and instantiating cls with the correct arguments: if cls is IbisConnector call
cls(data_source, connection_info) else call cls(connection_info); implement this
by building args = (data_source, connection_info) when cls is IbisConnector and
args = (connection_info,) otherwise, then return cls(*args). Update the function
that uses _DATA_SOURCE_TO_CLASS, IbisConnector, DataSource and connection_info
accordingly.
wren/src/wren/connector/mysql.py (1)

14-51: Extract duplicate _handle_pyarrow_unsupported_type logic.

MySqlConnector and DorisConnector have identical implementations of _handle_pyarrow_unsupported_type. Consider:

  1. Adding JSON handling to the base IbisConnector._handle_pyarrow_unsupported_type, or
  2. Creating a shared mixin/helper that both connectors use.
♻️ Option: Extend base class and only add JSON handling
 class MySqlConnector(IbisConnector):
     def __init__(self, connection_info):
         super().__init__(DataSource.mysql, connection_info)

     def _handle_pyarrow_unsupported_type(self, ibis_table: Table, **kwargs) -> Table:
-        result_table = ibis_table
-        for name, dtype in ibis_table.schema().items():
-            if isinstance(dtype, Decimal):
-                result_table = self._round_decimal_columns(
-                    result_table=result_table, col_name=name, **kwargs
-                )
-            elif isinstance(dtype, UUID):
-                result_table = self._cast_uuid_columns(
-                    result_table=result_table, col_name=name
-                )
-            elif isinstance(dtype, dt.JSON):
-                result_table = result_table.mutate(
-                    **{name: result_table[name].cast("string")}
-                )
-        return result_table
+        # Let base class handle Decimal and UUID
+        result_table = super()._handle_pyarrow_unsupported_type(ibis_table, **kwargs)
+        # Additionally handle JSON for MySQL/Doris
+        for name, dtype in result_table.schema().items():
+            if isinstance(dtype, dt.JSON):
+                result_table = result_table.mutate(
+                    **{name: result_table[name].cast("string")}
+                )
+        return result_table
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/connector/mysql.py` around lines 14 - 51, MySqlConnector and
DorisConnector duplicate the same _handle_pyarrow_unsupported_type logic; move
the shared logic into IbisConnector._handle_pyarrow_unsupported_type (handling
Decimal via _round_decimal_columns, UUID via _cast_uuid_columns, and dt.JSON ->
cast to string) and have MySqlConnector and DorisConnector call or inherit that
base implementation (override only if connector-specific behavior is needed).
Alternatively, extract the shared code into a small mixin/helper function (e.g.,
handle_pyarrow_unsupported_types) that uses ibis_table.schema(), Decimal, UUID,
dt.JSON and the existing _round_decimal_columns/_cast_uuid_columns helpers, then
have both MySqlConnector and DorisConnector delegate to that single
implementation to remove duplication.
wren/src/wren/connector/duckdb.py (1)

57-62: Use elif instead of separate if statements.

Using separate if statements means multiple branches could execute if the connection_info happens to be an instance of multiple types (e.g., via inheritance). Use elif to ensure mutual exclusivity.

♻️ Proposed fix
         if isinstance(connection_info, S3FileConnectionInfo):
             _init_duckdb_s3(self.connection, connection_info)
-        if isinstance(connection_info, MinioFileConnectionInfo):
+        elif isinstance(connection_info, MinioFileConnectionInfo):
             _init_duckdb_minio(self.connection, connection_info)
-        if isinstance(connection_info, GcsFileConnectionInfo):
+        elif isinstance(connection_info, GcsFileConnectionInfo):
             _init_duckdb_gcs(self.connection, connection_info)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/connector/duckdb.py` around lines 57 - 62, The three separate
conditionals checking connection_info should be made mutually exclusive to avoid
multiple initializers running; change the sequential ifs to an if / elif / elif
chain so only one branch executes for S3FileConnectionInfo,
MinioFileConnectionInfo, or GcsFileConnectionInfo and call the corresponding
initializer (_init_duckdb_s3, _init_duckdb_minio, _init_duckdb_gcs) with
self.connection and connection_info.
wren/src/wren/connector/base.py (1)

55-61: Potential precision loss with fixed decimal parameters.

Hardcoding precision=38 and scale=9 may truncate high-precision decimal values from databases that support larger scales (e.g., BigQuery supports scale up to 38). Consider making this configurable or preserving original precision where possible.

Consider preserving original precision when feasible:

def _round_decimal_columns(
    self, result_table: Table, col_name: str, scale: int = 9
) -> Table:
    col = result_table[col_name]
    original_dtype = result_table.schema()[col_name]
    # Preserve original precision if within PyArrow limits
    precision = min(original_dtype.precision or 38, 38)
    decimal_type = Decimal(precision=precision, scale=scale)
    rounded_col = col.cast(decimal_type).round(scale)
    return result_table.mutate(**{col_name: rounded_col})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/connector/base.py` around lines 55 - 61, The
_round_decimal_columns method currently hardcodes Decimal(precision=38, scale=9)
which can truncate higher-precision values; change it to derive precision from
the column's original dtype (fetch original_dtype =
result_table.schema()[col_name] or equivalent), use that precision when present
(cap it to the supported maximum, e.g., 38) or fall back to a configurable
default, then construct Decimal(precision=derived_precision, scale=scale) and
cast/round using that type; ensure the change is applied in the
_round_decimal_columns function and consider adding a configurable default
precision on the connector class if needed.
wren/src/wren/cli.py (1)

150-175: transpile command doesn't close the engine.

Unlike query, dry_run, and validate, the transpile command creates an engine but never calls engine.close(). While transpile doesn't use a connector, future changes could leak resources.

♻️ Proposed fix using context manager
     # For transpile we don't need real connection_info — pass a dummy dict
-    engine = WrenEngine(manifest_str=manifest_str, data_source=ds, connection_info={})
-    try:
-        result = engine.transpile(sql)
-        typer.echo(result)
-    except Exception as e:
-        typer.echo(f"Error: {e}", err=True)
-        raise typer.Exit(1)
+    with WrenEngine(manifest_str=manifest_str, data_source=ds, connection_info={}) as engine:
+        try:
+            result = engine.transpile(sql)
+            typer.echo(result)
+        except Exception as e:
+            typer.echo(f"Error: {e}", err=True)
+            raise typer.Exit(1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/cli.py` around lines 150 - 175, The transpile command creates a
WrenEngine instance (WrenEngine in function transpile) but never calls
engine.close(), risking resource leaks; update transpile to ensure
engine.close() is always invoked (use a with/context manager on WrenEngine if it
implements __enter__/__exit__, or wrap engine usage in try/finally and call
engine.close() in finally) after creating engine =
WrenEngine(manifest_str=manifest_str, data_source=ds, connection_info={}); keep
existing manifest loading (_load_manifest) and DataSource handling, and ensure
exceptions still result in typer.Exit(1) while guaranteeing engine.close() runs.
wren/tests/suite/query.py (1)

135-137: manifest_str fixture defined but not used by subclass engine fixtures.

The manifest_str fixture is defined here, but both TestDuckDB and TestPostgres (per relevant snippets) rebuild manifest_str inline within their engine fixtures rather than requesting it as a parameter. This fixture is effectively dead code.

Either remove the unused fixture or update subclass fixtures to depend on it for consistency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/tests/suite/query.py` around lines 135 - 137, The manifest_str pytest
fixture is defined but never consumed by subclass fixtures; update the
TestDuckDB and TestPostgres engine fixtures to accept manifest_str as a
parameter and use that value instead of rebuilding it inline (or remove the
unused manifest_str fixture). Specifically, modify the engine fixtures in the
TestDuckDB and TestPostgres classes to include manifest_str in their signature
and replace any inline base64.b64encode(orjson.dumps(self.manifest)).decode()
calls with the provided manifest_str, ensuring the manifest_str fixture is
actually used; if you prefer removing duplication, delete the manifest_str
fixture instead of changing the subclasses.
wren/src/wren/model/data_source.py (2)

11-13: Top-level imports of boto3 and ibis may slow startup for non-AWS/non-Ibis use cases.

These imports happen at module load time. For CLI usage that only needs transpilation (no DB connection), this adds unnecessary overhead.

Consider lazy imports within the methods that need them, following the pattern used elsewhere in this codebase (e.g., from duckdb import ... # noqa: PLC0415 in duckdb.py).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/model/data_source.py` around lines 11 - 13, Top-level imports
of boto3, ibis, and BaseBackend slow startup; change them to lazy imports inside
the functions/classes that actually use AWS or Ibis functionality (replace
"import boto3", "import ibis", and "from ibis import BaseBackend" with
in-function imports where needed). Locate call sites that depend on boto3 or
ibis (search for usages of boto3, ibis, and BaseBackend in this module) and add
local imports at the start of those functions/methods (follow the existing
project pattern used in duckdb.py, e.g., local import with a noqa comment if
necessary) so CLI/transpile paths that don't touch DB/AWS avoid importing those
heavy libs at module load.

287-289: Minor typo: credits_json should likely be creds_json.

The variable name credits_json (implying monetary credits) seems like a typo for creds_json (credentials JSON).

✏️ Proposed rename
-        credits_json = loads(
+        creds_json = loads(
             base64.b64decode(info.credentials.get_secret_value()).decode("utf-8")
         )
         credentials = service_account.Credentials.from_service_account_info(
-            credits_json
+            creds_json
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/model/data_source.py` around lines 287 - 289, Rename the
mistakenly named variable credits_json to creds_json in the data source parsing
code: update the assignment that decodes base64 and loads JSON (the expression
using base64.b64decode(info.credentials.get_secret_value()).decode("utf-8")
passed to loads) and update all subsequent references in the same
function/method to use creds_json instead of credits_json, ensuring there are no
name collisions and tests/imports still pass.
wren/src/wren/engine.py (1)

166-173: Silent fallback on manifest extraction failure hides potential issues.

When extractor.resolve_used_table_names or extractor.extract_by fails, the exception is silently caught and the full manifest is used. This could mask configuration or MDL issues.

Consider logging the fallback for debugging:

         try:
             # Extract minimal manifest for the query
             extractor = get_manifest_extractor(self.manifest_str)
             tables = extractor.resolve_used_table_names(sql)
             manifest = extractor.extract_by(tables)
             effective_manifest = to_json_base64(manifest)
-        except Exception:
+        except Exception as e:
+            from loguru import logger
+            logger.debug(f"Manifest extraction failed, using full manifest: {e}")
             effective_manifest = self.manifest_str
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/engine.py` around lines 166 - 173, The try/except around
manifest extraction (get_manifest_extractor, extractor.resolve_used_table_names,
extractor.extract_by, to_json_base64) silently falls back to self.manifest_str;
update the except block to log the exception and context before using the full
manifest so failures aren’t hidden—capture the exception object, call the
module/class logger (or processLogger) to emit a clear message including the
exception details and which SQL/manifest caused the fallback, then set
effective_manifest = self.manifest_str as before.
wren/src/wren/connector/databricks.py (1)

53-55: Dry-run SQL wrapping may fail with CTEs or complex queries.

Wrapping SQL as SELECT * FROM ({sql}) AS sub LIMIT 0 will fail if sql contains CTEs (WITH clauses), multiple statements, or certain DDL. This differs from how other connectors handle dry-run.

Consider using Databricks' EXPLAIN to validate SQL without execution, similar to how other enterprise databases handle query validation:

def dry_run(self, sql: str) -> None:
    with closing(self.connection.cursor()) as cursor:
        cursor.execute(f"EXPLAIN {sql}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/connector/databricks.py` around lines 53 - 55, The dry_run
implementation in dry_run(self, sql: str) wraps the query in a SELECT subquery
which breaks for CTEs, multiple statements or DDL; change it to validate the SQL
using Databricks' EXPLAIN instead of wrapping (i.e., execute "EXPLAIN {sql}" via
the same cursor in dry_run) so CTEs and complex queries are accepted without
executing data-changing statements; keep using closing(self.connection.cursor())
and ensure you only call cursor.execute with the EXPLAIN-prefixed SQL in the
dry_run function.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@wren/src/wren/cli.py`:
- Around line 202-206: The current JSON branch contains a dead loop (for batch
in table.to_batches(): for row in batch.to_pydict(): pass); remove the no-op
loops and replace them with actual JSON-lines emission: iterate
table.to_batches(), for each batch call batch.to_pydict() and write each row as
a JSON string (e.g., via json.dumps) followed by newline to the output stream.
Update the code around the output == "json" branch to import/use json and ensure
rows from batch.to_pydict() are serialized and emitted instead of being passed
over.

In `@wren/src/wren/connector/duckdb.py`:
- Around line 67-70: The query method currently fetches the full Arrow table
then slices it; change it to push the limit into DuckDB SQL instead by modifying
query(sql: str, limit: int | None = None) to, when limit is provided, wrap or
append a LIMIT clause to the SQL (e.g., add "LIMIT {limit}" or "SELECT * FROM
({sql}) LIMIT {limit}") before calling
self.connection.execute(...).fetch_arrow_table(), so the database applies the
limit and only the needed rows are returned; ensure you use the existing
connection.execute and fetch_arrow_table calls and handle None limit by leaving
the SQL unchanged.
- Around line 16-23: The SQL string in _init_duckdb_s3 (and similarly
_init_duckdb_minio and _init_duckdb_gcs) interpolates secret values via
f-strings, risking SQL injection or syntax errors if a secret contains a single
quote; fix by not directly injecting raw secrets: retrieve each secret via
get_secret_value() into local variables, safely escape single quotes by doubling
them (or use the DB driver's parameterized execution if supported), then
construct the CREATE SECRET statement using the escaped values (or parameter
placeholders) and call connection.execute with the safe values; apply the same
change to _init_duckdb_minio and _init_duckdb_gcs.
- Around line 80-88: The ATTACH call in the db_files loop
(self.connection.execute) interpolates raw filenames, risking SQL injection;
instead validate and sanitize each path and alias before building the SQL:
ensure the file exists and is an allowed/absolute path (use
pathlib.Path.resolve() and check it is inside your data directory), create a
safe alias from os.path.basename(file) by removing/normalizing any characters
other than letters/digits/underscore (e.g., regexp replace to [A-Za-z0-9_]), and
escape single quotes in the resolved path by doubling them before interpolation;
keep the existing exception handling (WrenError / ErrorCode.ATTACH_DUCKDB_ERROR)
but perform these validations in the same loop before calling
self.connection.execute.

In `@wren/src/wren/connector/factory.py`:
- Around line 49-54: The install hint currently uses data_source.value directly
in the WrenError raised in factory.py (within the except ImportError block),
which can recommend non-existent extras (e.g., 'doris'); add a mapping (e.g.,
EXTRA_MAP keyed by the data source enum or name) that maps aliases to the actual
optional extra (for example map "doris" -> "mysql"), then use that mapped_extra
in the error message (instead of data_source.value) when composing the pip
install suggestion; update the WrenError call (and ErrorCode.NOT_IMPLEMENTED
usage) to reference the mapped name and ensure the mapping covers known aliases.

In `@wren/src/wren/connector/ibis.py`:
- Around line 23-39: The query method in ibis.py currently only handles
trino.exceptions.TrinoQueryError, WrenError, and TimeoutError, leaving other
exceptions unwrapped; add a catch-all except Exception as e at the end of
Connector.query to mirror PostgresConnector behavior and wrap any unexpected
exception into a WrenError (use ErrorCode.INVALID_SQL, include str(e),
phase=ErrorPhase.SQL_EXECUTION, and metadata={DIALECT_SQL: sql}) so all errors
from query(...) (the query method) are consistently wrapped; keep the existing
TrinoQueryError special-case and the pass-through for WrenError and
TimeoutError.

In `@wren/src/wren/connector/mssql.py`:
- Around line 98-105: _wrap the body of _describe_sql_for_error_message in a
try/except so any exception from sge.convert, building/executing describe_sql,
or self.connection.raw_sql is caught and does not propagate; on exception return
a safe fallback string like "Unknown reason" (and optionally log the exception
if a logger is available on self) instead of letting the helper raise and mask
the original dry-run error—ensure you still attempt the original logic
(sge.convert(...).sql("mssql"), building describe_sql, using
closing(self.connection.raw_sql(...))) but guard the whole sequence with
exception handling.
- Around line 77-78: The function _flatten_pagination_limit currently swallows
exceptions by returning a string like f"Error: {e!s}", which then becomes
invalid SQL; change the error handling to propagate the original exception
instead of returning an error string—either re-raise the caught exception
(raise) or raise a new descriptive exception (e.g., RuntimeError) including the
original exception information (use from e) so callers like the SQL execution
path receive an exception rather than malformed SQL; update the except block in
_flatten_pagination_limit accordingly.

In `@wren/src/wren/connector/redshift.py`:
- Around line 46-54: The query method currently fetches all rows then truncates,
which is inefficient; update the query(self, sql: str, limit: int | None = None)
implementation to apply the limit in the SQL sent to the database when limit is
not None (e.g., append a safe "LIMIT {limit}" clause or wrap the original SQL)
before calling cursor.execute so the DB returns only the requested rows; keep
the rest of the logic that builds cols, rows and converts to a pandas DataFrame
and pa.Table (references: query, cursor.execute, cursor.description,
pd.DataFrame, pa.Table.from_pandas).
- Around line 31-37: The port passed to redshift_connector.connect() is a string
(connection_info.port.get_secret_value()) but must be an int; update the call in
the redshift connector where self.connection is assigned (the
redshift_connector.connect(...) invocation) to convert the port to an integer
using int(...) around connection_info.port.get_secret_value(), matching the
pattern used in other connectors.

In `@wren/src/wren/model/__init__.py`:
- Around line 267-289: The ConnectionInfo union (symbol: ConnectionInfo) is
missing DatabricksServicePrincipalConnectionInfo which causes validation to fail
for service-principal Databricks connections; update the ConnectionInfo union to
include DatabricksServicePrincipalConnectionInfo alongside
DatabricksTokenConnectionInfo so the union accepts both connection types (locate
where ConnectionInfo is defined in __init__.py and add
DatabricksServicePrincipalConnectionInfo to the listed alternatives).
- Around line 127-134: DorisConnectionInfo defines port as SecretStr while other
connection classes use SecretPort (which normalizes ints); change the port field
in DorisConnectionInfo to use SecretPort instead of SecretStr so integer ports
are normalized to strings. Update the Field declaration for the port attribute
in the DorisConnectionInfo class to use SecretPort (keeping examples and any
defaults intact) so behavior matches other connection models.
- Around line 253-259: The GcsFileConnectionInfo.credentials field is typed as
SecretStr but given a default of None, causing a type mismatch; update the field
to allow None by changing its type to Optional[SecretStr] (or SecretStr | None)
and import typing.Optional (or use PEP 604 union) so the default=None is valid,
or alternatively remove the default=None if credentials must be required; locate
the credentials attribute in class GcsFileConnectionInfo to apply the change.

In `@wren/src/wren/model/data_source.py`:
- Around line 79-82: The isinstance(data, ConnectionInfo) check is invalid
because ConnectionInfo is a typing.Union, not a class; update the branch in the
method that currently does "if isinstance(data, ConnectionInfo):" (the same
block that calls self._build_connection_info(data) when false) to detect
concrete types instead: either check for dict with "if isinstance(data, dict):
info = self._build_connection_info(data)" or import and use a tuple of the
actual Pydantic model classes used in the ConnectionInfo union (e.g., "if
isinstance(data, (ConcreteModelA, ConcreteModelB)): info = data"); ensure you
reference the concrete model class names you see in the Union and replace the
isinstance(...) call accordingly so the branch no longer uses the Union type.
- Around line 438-444: get_databricks_connection currently only supports
DatabricksTokenConnectionInfo and will crash when given a
DatabricksServicePrincipalConnectionInfo (built by _build_connection_info)
because it accesses access_token; update get_databricks_connection to accept
either DatabricksTokenConnectionInfo or DatabricksServicePrincipalConnectionInfo
and branch on databricks auth type (or detect presence of
client_id/client_secret/tenant_id) to call ibis.databricks.connect with either
access_token or service principal params, mirroring the logic in
DatabricksConnector (see wren/src/wren/connector/databricks.py) or alternatively
refactor DataSourceExtension.get_connection to delegate Databricks connections
to DatabricksConnector so both token and service principal flows are handled
correctly.
- Around line 358-373: Add an explanatory comment above the per-connection
override in get_doris_connection explaining why we set
connection.con.get_autocommit = lambda: True: describe that Doris does not
reflect SERVER_STATUS_AUTOCOMMIT in the MySQL handshake so MySQLdb's
get_autocommit() remains False even after autocommit(True), causing
ibis.raw_sql() to wrap queries in BEGIN/ROLLBACK (which Doris rejects); note
this is an instance-level workaround on the connection returned by
get_doris_connection (not a global patch to MySQLdb) and reference the relevant
symbols (get_doris_connection, connection.con.get_autocommit, ibis.raw_sql,
SERVER_STATUS_AUTOCOMMIT) so future maintainers understand the rationale.

In `@wren/src/wren/model/error.py`:
- Around line 52-65: The constructor accepts a cause but never preserves it;
modify the Error class __init__ to assign the cause to an instance attribute
(e.g., self.cause = cause) and, if cause is not None, set self.__cause__ = cause
to preserve exception chaining; keep the existing super().__init__(message) call
(or call it before setting __cause__) so the original message is preserved while
the underlying exception context is retained.

In `@wren/tests/suite/query.py`:
- Around line 127-129: The hardcoded constants order_count, customer_count (and
their inline comments) are inconsistent with TPCH sf=0.01 and appear off by a
factor of 10; update the values in the class so they match actual TPCH sf=0.01
outputs (e.g., set order_count to 1500 and customer_count to 150 if your tests
use 1500 orders / 150 customers) and fix the inline comment math to reflect the
correct base counts and multiplication (reference the symbols order_count,
customer_count, and order_id_dtype) so the comment shows the correct calculation
for TPCH sf=0.01.

---

Nitpick comments:
In @.github/workflows/wren-ci.yml:
- Around line 64-73: Duplicate step sequence that builds the wren-core-py wheel
and installs deps (the steps named "Build wren-core-py wheel", "Install
dependencies" and any identical blocks in the `test-unit` and `test-postgres`
jobs) should be extracted into a reusable composite action; create a composite
action (e.g., .github/actions/build-wren-core-py/action.yml) that runs the
pipx/poetry/maturin build and the uv sync install, accept any inputs needed
(like working-directory or extra flags), then replace the duplicated step blocks
in both jobs with a single uses: ./github/actions/build-wren-core-py (or
appropriate path) invocation and wire any inputs/outputs through the action so
downstream steps (like "Run unit tests") remain unchanged.

In `@wren/.claude/CLAUDE.md`:
- Around line 7-18: Update the fenced code block in CLAUDE.md to include a
language specifier (e.g., replace the opening ``` with ```text or ```plaintext)
so the directory tree (showing engine.py, cli.py, mdl/, connector/,
model/data_source.py, model/error.py, tests/) gets proper syntax highlighting;
locate the block that starts with ``` and the directory listing and change only
the opening fence to include the specifier.

In `@wren/pyproject.toml`:
- Line 49: The 'all' extras entry currently omits the 'spark' extra; update the
pyproject.toml so the all aggregate either includes "spark" alongside the other
extras or add a brief inline comment next to the 'all' definition explaining why
'spark' is intentionally excluded (e.g., PySpark size/footprint), referencing
the 'all' extras line and the existing 'spark' extra name so reviewers can see
the intended behavior.

In `@wren/src/wren/cli.py`:
- Around line 150-175: The transpile command creates a WrenEngine instance
(WrenEngine in function transpile) but never calls engine.close(), risking
resource leaks; update transpile to ensure engine.close() is always invoked (use
a with/context manager on WrenEngine if it implements __enter__/__exit__, or
wrap engine usage in try/finally and call engine.close() in finally) after
creating engine = WrenEngine(manifest_str=manifest_str, data_source=ds,
connection_info={}); keep existing manifest loading (_load_manifest) and
DataSource handling, and ensure exceptions still result in typer.Exit(1) while
guaranteeing engine.close() runs.

In `@wren/src/wren/connector/base.py`:
- Around line 55-61: The _round_decimal_columns method currently hardcodes
Decimal(precision=38, scale=9) which can truncate higher-precision values;
change it to derive precision from the column's original dtype (fetch
original_dtype = result_table.schema()[col_name] or equivalent), use that
precision when present (cap it to the supported maximum, e.g., 38) or fall back
to a configurable default, then construct Decimal(precision=derived_precision,
scale=scale) and cast/round using that type; ensure the change is applied in the
_round_decimal_columns function and consider adding a configurable default
precision on the connector class if needed.

In `@wren/src/wren/connector/bigquery.py`:
- Around line 16-20: Rename the mistakenly named variable credits_json to
creds_json in the BigQuery connector: update the assignment statement that
decodes and loads the base64 credentials (the variable currently declared as
credits_json) and replace all subsequent references to that variable within the
same scope (e.g., anywhere creds_json is used after the loads(...) call) to use
creds_json so the name matches "credentials" shorthand; ensure no other
variables collide and run tests/imports to verify no unresolved references.
- Around line 24-29: The code currently calls credentials.with_scopes([...])
eagerly during connector initialization (the credentials variable in
bigquery.py); remove that eager scope expansion and keep the original
credentials untouched at init. Instead, perform credentials.with_scopes(...)
only in the retry path that handles empty results for special Drive-backed types
(the method that detects empty query results and retries—e.g., the connector's
retry/query handling function), using the same scopes
["https://www.googleapis.com/auth/drive","https://www.googleapis.com/auth/cloud-platform"]
just before re-creating the BigQuery client for the retry attempt.

In `@wren/src/wren/connector/canner.py`:
- Around line 37-49: The _handle_pyarrow_unsupported_type method in
CannerConnector duplicates decimal and UUID handling already implemented as
IbisConnector._round_decimal_columns and IbisConnector._cast_uuid_columns;
change CannerConnector to reuse those helpers instead of reimplementing: have
CannerConnector inherit from IbisConnector or import the shared mixin, then
replace the loop in _handle_pyarrow_unsupported_type with calls to
IbisConnector._round_decimal_columns(ibis_table, ...) and
IbisConnector._cast_uuid_columns(...) (or the equivalent mixin methods) so
decimals and UUIDs are processed by the existing helpers, returning the final
transformed Table.

In `@wren/src/wren/connector/databricks.py`:
- Around line 53-55: The dry_run implementation in dry_run(self, sql: str) wraps
the query in a SELECT subquery which breaks for CTEs, multiple statements or
DDL; change it to validate the SQL using Databricks' EXPLAIN instead of wrapping
(i.e., execute "EXPLAIN {sql}" via the same cursor in dry_run) so CTEs and
complex queries are accepted without executing data-changing statements; keep
using closing(self.connection.cursor()) and ensure you only call cursor.execute
with the EXPLAIN-prefixed SQL in the dry_run function.

In `@wren/src/wren/connector/duckdb.py`:
- Around line 57-62: The three separate conditionals checking connection_info
should be made mutually exclusive to avoid multiple initializers running; change
the sequential ifs to an if / elif / elif chain so only one branch executes for
S3FileConnectionInfo, MinioFileConnectionInfo, or GcsFileConnectionInfo and call
the corresponding initializer (_init_duckdb_s3, _init_duckdb_minio,
_init_duckdb_gcs) with self.connection and connection_info.

In `@wren/src/wren/connector/ibis.py`:
- Line 29: Replace the awkward "not ... == ..." style checks with "!="
comparisons: change the condition in the exception handling where the code uses
if not e.error_name == "EXCEEDED_TIME_LIMIT" to use if e.error_name !=
"EXCEEDED_TIME_LIMIT" (also update the same pattern found later in the file
around the other exception branch); look for the exception handler referencing
e.error_name in ibis.py to make both replacements.
- Around line 99-103: The create_connector function can be simplified by
removing the explicit IbisConnector constructor and instantiating cls with the
correct arguments: if cls is IbisConnector call cls(data_source,
connection_info) else call cls(connection_info); implement this by building args
= (data_source, connection_info) when cls is IbisConnector and args =
(connection_info,) otherwise, then return cls(*args). Update the function that
uses _DATA_SOURCE_TO_CLASS, IbisConnector, DataSource and connection_info
accordingly.

In `@wren/src/wren/connector/mysql.py`:
- Around line 14-51: MySqlConnector and DorisConnector duplicate the same
_handle_pyarrow_unsupported_type logic; move the shared logic into
IbisConnector._handle_pyarrow_unsupported_type (handling Decimal via
_round_decimal_columns, UUID via _cast_uuid_columns, and dt.JSON -> cast to
string) and have MySqlConnector and DorisConnector call or inherit that base
implementation (override only if connector-specific behavior is needed).
Alternatively, extract the shared code into a small mixin/helper function (e.g.,
handle_pyarrow_unsupported_types) that uses ibis_table.schema(), Decimal, UUID,
dt.JSON and the existing _round_decimal_columns/_cast_uuid_columns helpers, then
have both MySqlConnector and DorisConnector delegate to that single
implementation to remove duplication.

In `@wren/src/wren/connector/spark.py`:
- Around line 40-48: The close method currently swallows all exceptions from
self.connection.stop(), which should be logged; update close(self) in the class
to catch Exception as e (around the self.connection.stop() call), log the
exception with a debug or warning level (including the exception object and
traceback) using the module/class logger (or create one if missing), and still
set self._closed = True in the finally block so cleanup always completes; keep
the existing early-return when self._closed is already True.

In `@wren/src/wren/engine.py`:
- Around line 166-173: The try/except around manifest extraction
(get_manifest_extractor, extractor.resolve_used_table_names,
extractor.extract_by, to_json_base64) silently falls back to self.manifest_str;
update the except block to log the exception and context before using the full
manifest so failures aren’t hidden—capture the exception object, call the
module/class logger (or processLogger) to emit a clear message including the
exception details and which SQL/manifest caused the fallback, then set
effective_manifest = self.manifest_str as before.

In `@wren/src/wren/mdl/__init__.py`:
- Around line 8-17: The get_session_context function is decorated with
functools.cache which is unbounded; replace it with
functools.lru_cache(maxsize=...) (or configurable maxsize) to prevent unbounded
memory growth in long‑running processes: update the decorator on
get_session_context, add/import lru_cache from functools, and choose a
reasonable maxsize (or read from config/env) so SessionContext instances are
memoized but capped.

In `@wren/src/wren/model/data_source.py`:
- Around line 11-13: Top-level imports of boto3, ibis, and BaseBackend slow
startup; change them to lazy imports inside the functions/classes that actually
use AWS or Ibis functionality (replace "import boto3", "import ibis", and "from
ibis import BaseBackend" with in-function imports where needed). Locate call
sites that depend on boto3 or ibis (search for usages of boto3, ibis, and
BaseBackend in this module) and add local imports at the start of those
functions/methods (follow the existing project pattern used in duckdb.py, e.g.,
local import with a noqa comment if necessary) so CLI/transpile paths that don't
touch DB/AWS avoid importing those heavy libs at module load.
- Around line 287-289: Rename the mistakenly named variable credits_json to
creds_json in the data source parsing code: update the assignment that decodes
base64 and loads JSON (the expression using
base64.b64decode(info.credentials.get_secret_value()).decode("utf-8") passed to
loads) and update all subsequent references in the same function/method to use
creds_json instead of credits_json, ensuring there are no name collisions and
tests/imports still pass.

In `@wren/tests/connectors/test_duckdb.py`:
- Around line 35-48: The engine pytest fixture currently declares a
non-generator return type WrenEngine but uses yield; change its annotation to a
generator type such as Generator[WrenEngine, None, None] or Iterator[WrenEngine]
on the def engine(...) signature, and update the in-code comment that describes
TPCH generation in the engine fixture from "1500 orders, 150 customers" to the
correct "15000 orders, 1500 customers" per the WrenQueryTestSuite defaults;
locate the changes in the engine fixture and adjust imports if needed
(typing.Generator or typing.Iterator).

In `@wren/tests/suite/query.py`:
- Around line 135-137: The manifest_str pytest fixture is defined but never
consumed by subclass fixtures; update the TestDuckDB and TestPostgres engine
fixtures to accept manifest_str as a parameter and use that value instead of
rebuilding it inline (or remove the unused manifest_str fixture). Specifically,
modify the engine fixtures in the TestDuckDB and TestPostgres classes to include
manifest_str in their signature and replace any inline
base64.b64encode(orjson.dumps(self.manifest)).decode() calls with the provided
manifest_str, ensuring the manifest_str fixture is actually used; if you prefer
removing duplication, delete the manifest_str fixture instead of changing the
subclasses.

In `@wren/tests/unit/test_engine.py`:
- Around line 121-127: The test currently never initializes
WrenEngine._connector so it only asserts None; update
test_context_manager_closes_connector to trigger connector creation (call a
method that lazily initializes it such as WrenEngine.query(...) or
WrenEngine.dry_run()) inside the with block, assert e._connector is not None
while inside the context, then after exiting the with (which invokes
WrenEngine.__exit__), assert e._connector is None to verify cleanup; reference
WrenEngine, test_context_manager_closes_connector, query/dry_run, __exit__, and
_connector when making the change.
- Line 10: Remove the unused import symbol "math" from the top of the test
module (in test_engine.py) so the file no longer imports a module that isn't
used; if you intended to use math in tests instead, add the actual usage (e.g.,
math.isclose) where appropriate, otherwise delete the import line.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 36c5e8e0-aecb-4fb1-80b0-c10741e81c6c

📥 Commits

Reviewing files that changed from the base of the PR and between 4ceba9f and bba9740.

⛔ Files ignored due to path filters (1)
  • wren/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (36)
  • .github/workflows/wren-ci.yml
  • ibis-server/docs/development.md
  • wren/.claude/CLAUDE.md
  • wren/README.md
  • wren/justfile
  • wren/pyproject.toml
  • wren/src/wren/__init__.py
  • wren/src/wren/cli.py
  • wren/src/wren/connector/__init__.py
  • wren/src/wren/connector/base.py
  • wren/src/wren/connector/bigquery.py
  • wren/src/wren/connector/canner.py
  • wren/src/wren/connector/databricks.py
  • wren/src/wren/connector/duckdb.py
  • wren/src/wren/connector/factory.py
  • wren/src/wren/connector/ibis.py
  • wren/src/wren/connector/mssql.py
  • wren/src/wren/connector/mysql.py
  • wren/src/wren/connector/postgres.py
  • wren/src/wren/connector/redshift.py
  • wren/src/wren/connector/spark.py
  • wren/src/wren/engine.py
  • wren/src/wren/mdl/__init__.py
  • wren/src/wren/model/__init__.py
  • wren/src/wren/model/data_source.py
  • wren/src/wren/model/error.py
  • wren/tests/__init__.py
  • wren/tests/conftest.py
  • wren/tests/connectors/__init__.py
  • wren/tests/connectors/test_duckdb.py
  • wren/tests/connectors/test_postgres.py
  • wren/tests/suite/__init__.py
  • wren/tests/suite/manifests.py
  • wren/tests/suite/query.py
  • wren/tests/unit/__init__.py
  • wren/tests/unit/test_engine.py

CI: add uv lock before uv sync so lockfile regenerates for Linux after
building wren-core-py wheel (was locked to macOS ARM64 only).

Code fixes from review:
- Remove dead loop in cli.py _print_result
- Escape SQL values in DuckDB CREATE SECRET and ATTACH statements
- MSSQL: return original SQL on parse failure, robust error matching,
  wrap _describe_sql_for_error_message in try/except
- DorisConnectionInfo.port: SecretStr → SecretPort
- GcsFileConnectionInfo.credentials: add Optional type
- Add DatabricksServicePrincipalConnectionInfo to ConnectionInfo union
- WrenError: preserve exception cause via __cause__
- Redshift: cast port to int for redshift_connector
- Factory: correct pip install hint for shared connector modules

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@goldmedal
Copy link
Copy Markdown
Contributor Author

Addressed all CodeRabbit review comments in d4986bf:

CI fix:

  • Added uv lock before uv sync in both test jobs to regenerate the lockfile for Linux x86_64 (was locked to macOS ARM64 wheel only)

Code fixes:

  • ✅ Removed dead loop in cli.py _print_result
  • ✅ Added _escape_sql() helper in duckdb.py — all secret values in CREATE SECRET and ATTACH statements now escape single quotes and double quotes
  • mssql.py: _flatten_pagination_limit returns original SQL on parse failure (was returning error string); more robust AttributeError matching; _describe_sql_for_error_message wrapped in try/except
  • DorisConnectionInfo.port: SecretStrSecretPort (consistent with other connectors)
  • GcsFileConnectionInfo.credentials: type fixed to SecretStr | None
  • ✅ Added DatabricksServicePrincipalConnectionInfo to ConnectionInfo union
  • WrenError: now sets __cause__ when cause parameter is provided
  • redshift.py: port cast to int() for redshift_connector.connect()
  • factory.py: added _INSTALL_EXTRA mapping so install hints use the correct pip extra (e.g. dorismysql)

Not changed (intentional):

  • isinstance(data, ConnectionInfo) — works correctly with types.UnionType in Python 3.10+ (project requires ≥3.11)
  • TPCH test comments — the counts are correct for sf=0.01
  • Doris get_autocommit monkey-patch — matches ibis-server behavior, not changing
  • Databricks get_databricks_connection only accepting token type — service principal auth requires different ibis API, tracked separately

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (2)
wren/src/wren/connector/mssql.py (1)

98-108: ⚠️ Potential issue | 🟡 Minor

Missing null check for error_message value.

Line 106 returns rows[0][0] directly, but if SQL Server returns a row where error_message is NULL, this will return None instead of the expected "Unknown reason" string.

🛡️ Proposed fix
             with closing(self.connection.raw_sql(describe_sql)) as cur:
                 rows = cur.fetchall()
-                if not rows:
+                if not rows or rows[0][0] is None:
                     return "Unknown reason"
                 return rows[0][0]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/connector/mssql.py` around lines 98 - 108, In
_describe_sql_for_error_message, after executing describe_sql (built with
sge.convert(sql).sql("mssql")) and reading rows, guard against a NULL
error_message by checking rows[0][0] for None and returning "Unknown reason"
when it is None (in addition to the existing empty-rows and exception handling);
update the return logic that currently returns rows[0][0] so it returns
rows[0][0] only when it is a non-null string, otherwise return "Unknown reason".
wren/src/wren/connector/duckdb.py (1)

71-74: ⚠️ Potential issue | 🟠 Major

limit is still enforced client-side.

This still fetches the entire Arrow table before trimming it, so large queries don't get the memory/latency protection that limit implies.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/connector/duckdb.py` around lines 71 - 74, The query method
currently fetches the full Arrow table then slices it, so enforce the limit in
the SQL execution instead: in query (wren/src/wren/connector/duckdb.py) when
limit is not None build a limited SQL such as "SELECT * FROM ({sql}) AS
_wren_subquery LIMIT {int(limit)}" (cast limit to int) and pass that to
self.connection.execute(...).fetch_arrow_table(), otherwise execute the original
sql; this ensures DuckDB applies the limit server-side and avoids client-side
slicing.
🧹 Nitpick comments (2)
wren/src/wren/cli.py (2)

59-65: Unused parameter sql in _make_engine.

The sql parameter is accepted but never used within the function body. This appears to be leftover from refactoring.

♻️ Proposed fix to remove unused parameter
 def _make_engine(
-    sql: str,
     datasource: str,
     mdl: str,
     connection_info: str | None,
     connection_file: str | None,
 ):

Then update all call sites:

-    engine = _make_engine(sql, datasource, mdl, connection_info, connection_file)
+    engine = _make_engine(datasource, mdl, connection_info, connection_file)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/cli.py` around lines 59 - 65, Remove the unused parameter from
the _make_engine function signature: drop the sql parameter and update the
definition of _make_engine(sql: str, datasource: str, mdl: str, connection_info:
str | None, connection_file: str | None) to only accept the actually used
parameters (datasource, mdl, connection_info, connection_file); then update
every call site that invokes _make_engine to stop passing the now-removed sql
argument (search for _make_engine( or references to _make_engine in the
codebase) so the calls match the new signature.

43-56: Consider case-insensitive extension check.

Line 51 uses mdl.endswith(".json") which is case-sensitive. Files named manifest.JSON or manifest.Json would be treated as pre-encoded base64 instead of being encoded.

♻️ Proposed fix
-        if mdl.endswith(".json"):
+        if mdl.lower().endswith(".json"):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/cli.py` around lines 43 - 56, The _load_manifest function
treats files with extension check using mdl.endswith(".json") which is
case-sensitive; update the check to be case-insensitive (e.g., compare
Path(mdl).suffix.lower() or use mdl.lower().endswith(".json")) so filenames like
"manifest.JSON" are correctly detected and base64-encoded; modify the
conditional that currently uses mdl.endswith(".json") in _load_manifest to use a
lower-cased suffix check while keeping the rest of the logic intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@wren/src/wren/cli.py`:
- Around line 167-175: The transpile command creates a WrenEngine
(WrenEngine(manifest_str=..., data_source=..., connection_info={})) and calls
engine.transpile(...) but never closes the engine; update the transpile command
to wrap the transpile call in try/except/finally (matching
query/dry-run/validate) so that engine.close() is always called in the finally
block, ensuring engine.close() runs whether transpile succeeds or raises; keep
the existing error handling (typer.echo and typer.Exit) inside the except and
invoke engine.close() in finally.
- Around line 200-207: The fallback in _print_result currently prints
table.to_pydict() directly which emits a Python dict repr (single quotes)
instead of valid JSON; change the except block to serialize the dict to proper
JSON (e.g., import json and call json.dumps(table.to_pydict(),
ensure_ascii=False) or equivalent) and pass that string to typer.echo so the
"json" output always emits valid JSON; reference _print_result,
table.to_pydict(), and typer.echo when making the change.

In `@wren/src/wren/connector/duckdb.py`:
- Around line 59-69: The constructor opens a transient DuckDB connection with
duckdb.connect() but does not close it if any of _init_duckdb_s3,
_init_duckdb_minio, _init_duckdb_gcs or _attach_database raise, leaking the
connection; wrap the initialization sequence in a try/except (or try/finally)
inside __init__ so that on any exception you call self.connection.close() before
re-raising the error, ensuring the connection is always closed on failure while
preserving successful path behavior.
- Around line 96-105: The _list_duckdb_files function currently always
constructs an OpenDAL Operator with backend "fs", which will fail for remote
connection types; either enforce validators on the connection models
(S3FileConnectionInfo, MinioFileConnectionInfo, GcsFileConnectionInfo) to forbid
format="duckdb" (allow it only on LocalFileConnectionInfo), or update
_list_duckdb_files to inspect connection_info's concrete type or a backend
indicator and construct the correct Operator (e.g., "s3", "minio", "gcs", or
"fs") using the appropriate credentials/URL before listing; update references to
connection_info.url.get_secret_value() accordingly so remote backends use the
right root and do not attempt local fs listing.
- Around line 76-77: The dry_run implementation currently calls
connection.execute(sql) which may run mutations; change dry_run in the DuckDB
connector to avoid executing the raw statement by enforcing read-only validation
(for example, wrap the incoming sql in a read-only wrapper like SELECT * FROM
({sql}) LIMIT 0 and execute that instead, or alternatively use EXPLAIN or
PREPARE to validate without side effects). Update the dry_run method (symbol:
dry_run) to build and execute the safe validation statement rather than calling
connection.execute(sql) directly, so connection.execute is only ever invoked on
the wrapped/EXPLAIN/PREPARE statement.

In `@wren/src/wren/connector/mssql.py`:
- Around line 19-24: The override of query (which flattens pagination via
_flatten_pagination_limit and rounds decimals via _round_decimal_columns)
bypasses the base-class PyArrow compatibility step and thus fails for UUID
columns; update query to ensure _handle_pyarrow_unsupported_type is applied
before returning (e.g., call self._handle_pyarrow_unsupported_type(...) on the
ibis_table or the result of _round_decimal_columns), or add UUID-to-string
casting inside _round_decimal_columns so that any UUID columns are converted to
strings prior to to_pyarrow()/PyArrow conversion.

---

Duplicate comments:
In `@wren/src/wren/connector/duckdb.py`:
- Around line 71-74: The query method currently fetches the full Arrow table
then slices it, so enforce the limit in the SQL execution instead: in query
(wren/src/wren/connector/duckdb.py) when limit is not None build a limited SQL
such as "SELECT * FROM ({sql}) AS _wren_subquery LIMIT {int(limit)}" (cast limit
to int) and pass that to self.connection.execute(...).fetch_arrow_table(),
otherwise execute the original sql; this ensures DuckDB applies the limit
server-side and avoids client-side slicing.

In `@wren/src/wren/connector/mssql.py`:
- Around line 98-108: In _describe_sql_for_error_message, after executing
describe_sql (built with sge.convert(sql).sql("mssql")) and reading rows, guard
against a NULL error_message by checking rows[0][0] for None and returning
"Unknown reason" when it is None (in addition to the existing empty-rows and
exception handling); update the return logic that currently returns rows[0][0]
so it returns rows[0][0] only when it is a non-null string, otherwise return
"Unknown reason".

---

Nitpick comments:
In `@wren/src/wren/cli.py`:
- Around line 59-65: Remove the unused parameter from the _make_engine function
signature: drop the sql parameter and update the definition of _make_engine(sql:
str, datasource: str, mdl: str, connection_info: str | None, connection_file:
str | None) to only accept the actually used parameters (datasource, mdl,
connection_info, connection_file); then update every call site that invokes
_make_engine to stop passing the now-removed sql argument (search for
_make_engine( or references to _make_engine in the codebase) so the calls match
the new signature.
- Around line 43-56: The _load_manifest function treats files with extension
check using mdl.endswith(".json") which is case-sensitive; update the check to
be case-insensitive (e.g., compare Path(mdl).suffix.lower() or use
mdl.lower().endswith(".json")) so filenames like "manifest.JSON" are correctly
detected and base64-encoded; modify the conditional that currently uses
mdl.endswith(".json") in _load_manifest to use a lower-cased suffix check while
keeping the rest of the logic intact.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3ed940ad-de97-49ae-971d-0fe28ffb622d

📥 Commits

Reviewing files that changed from the base of the PR and between 52a1363 and d4986bf.

📒 Files selected for processing (8)
  • .github/workflows/wren-ci.yml
  • wren/src/wren/cli.py
  • wren/src/wren/connector/duckdb.py
  • wren/src/wren/connector/factory.py
  • wren/src/wren/connector/mssql.py
  • wren/src/wren/connector/redshift.py
  • wren/src/wren/model/__init__.py
  • wren/src/wren/model/error.py
✅ Files skipped from review due to trivial changes (3)
  • .github/workflows/wren-ci.yml
  • wren/src/wren/model/error.py
  • wren/src/wren/model/init.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • wren/src/wren/connector/factory.py
  • wren/src/wren/connector/redshift.py

goldmedal and others added 2 commits March 23, 2026 11:10
The committed uv.lock contains macOS ARM64 wheel paths from local dev.
Adding --upgrade-package wren-core-py to uv lock forces re-resolution
from the freshly built Linux wheel in CI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix isinstance check against union type (use BaseConnectionInfo)
- Close DuckDB connection on init failure to prevent resource leak
- Push limit into SQL for DuckDB and Redshift instead of post-fetch slicing
- Add missing engine.close() in transpile command
- Fix JSON fallback to output valid JSON instead of Python dict repr
- Fix TPCH count comments

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
wren/src/wren/connector/duckdb.py (2)

69-70: ⚠️ Potential issue | 🟠 Major

Remote format="duckdb" still falls back to the local filesystem backend.

__init__() calls _attach_database() for any connection type with format == "duckdb", but _list_duckdb_files() always constructs opendal.Operator("fs", ...). S3FileConnectionInfo, MinioFileConnectionInfo, and GcsFileConnectionInfo therefore try to discover databases on the local filesystem instead of the configured object store.

Also applies to: 100-101

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/connector/duckdb.py` around lines 69 - 70, The bug is that
__init__ calls _attach_database for any connection with format == "duckdb" but
_list_duckdb_files always creates an opendal.Operator("fs", ...) so S3/Minio/GCS
connections are scanned on the local filesystem; change _list_duckdb_files to
build the opendal operator from the passed connection_info
(S3FileConnectionInfo, MinioFileConnectionInfo, GcsFileConnectionInfo, or local
fs) instead of hardcoding "fs", or alternatively accept an Operator parameter
(update calls from __init__ and the callers around the referenced block/lines)
so _attach_database/_list_duckdb_files use the correct remote backend and
bucket/path when listing DuckDB files.

80-81: ⚠️ Potential issue | 🟠 Major

dry_run() still executes the real SQL.

This makes dry-run/validate capable of applying mutations if a non-SELECT statement makes it through planning. Use a read-only probe here (SELECT * FROM (...) LIMIT 0, EXPLAIN, or PREPARE) instead of executing the statement itself.

🛡️ One safe option
     def dry_run(self, sql: str) -> None:
-        self.connection.execute(sql)
+        self.connection.execute(f"SELECT * FROM ({sql}) AS _q LIMIT 0")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/connector/duckdb.py` around lines 80 - 81, The dry_run method
currently calls self.connection.execute(sql) which actually executes mutations;
change it to a read-only probe instead (e.g., use EXPLAIN, PREPARE, or wrap the
query in a SELECT * FROM (<sql>) LIMIT 0) so the statement is validated/planned
but not applied. Update the dry_run implementation to call the connection method
that runs the chosen probe (not execute) and ensure it handles errors the same
way as planning failures; reference the dry_run method and replace the direct
use of self.connection.execute with a safe read-only validation call.
wren/src/wren/model/data_source.py (1)

167-173: ⚠️ Potential issue | 🟠 Major

Service-principal Databricks configs still parse into a type this path can't connect.

_build_connection_info() now returns DatabricksServicePrincipalConnectionInfo, but get_databricks_connection() still unconditionally reads info.access_token. A databricks_type="service_principal" config will therefore validate here and then fail with AttributeError during connection creation instead of failing fast with a clear supported/unsupported-auth result.

Also applies to: 439-445

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/model/data_source.py` around lines 167 - 173,
_build_connection_info() can return DatabricksServicePrincipalConnectionInfo for
databricks_type="service_principal", but get_databricks_connection() still
assumes info.access_token and will raise AttributeError; update
get_databricks_connection() to check the concrete type (use isinstance(info,
DatabricksServicePrincipalConnectionInfo) or test for the absence of
access_token) and either perform the correct service-principal auth flow or
raise a clear UnsupportedAuth/ValueError describing that service-principal auth
is unsupported, referencing the symbols _build_connection_info,
get_databricks_connection, DatabricksServicePrincipalConnectionInfo, and
access_token so the fix targets the correct branch; apply the same change to the
other occurrence noted (lines ~439-445).
🧹 Nitpick comments (1)
wren/tests/suite/query.py (1)

135-137: Minor: manifest_str fixture may be unused by current subclasses.

The fixture duplicates the inline logic shown in the docstring example (lines 36-37). If subclass engine fixtures consistently create manifest_str inline rather than using this fixture, consider removing it to avoid dead code. Otherwise, this is fine as a convenience fixture.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/tests/suite/query.py` around lines 135 - 137, The manifest_str pytest
fixture in tests/suite/query.py (function manifest_str) appears unused by
current subclasses and duplicates inline logic in the docstring example; either
remove this fixture to avoid dead code or update subclass engine fixtures/tests
to use the shared manifest_str fixture instead of duplicating
base64.b64encode(orjson.dumps(self.manifest)).decode() — locate the fixture
named manifest_str and either delete it and clean up imports, or replace inline
copies in subclass fixtures/tests with a reference to manifest_str so the shared
fixture is exercised consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@wren/src/wren/cli.py`:
- Around line 167-168: The CLI is passing an empty dict into WrenEngine which
triggers eager normalization via data_source.get_connection_info() and
datasource model validation; to fix, change the transpile path to avoid forcing
normalization by passing a sentinel (e.g., None or a dedicated sentinel object)
instead of {} from the CLI (where engine is instantiated) and update WrenEngine
(or its constructor code that calls data_source.get_connection_info()) to skip
calling data_source.get_connection_info() when the provided connection_info is
that sentinel (or falsy) so pure transpile does not invoke datasource-specific
validation in wren/src/wren/model/data_source.py.

In `@wren/src/wren/model/data_source.py`:
- Around line 86-88: The code currently overwrites caller-provided Postgres
connect_timeout because it checks hasattr(info, "connect_timeout") instead of
looking inside info.kwargs; change the guard to check the kwargs dict itself
(e.g., if "connect_timeout" not in kwargs) and only set
kwargs["connect_timeout"] = 120 when that key is absent, using the existing
kwargs = info.kwargs if info.kwargs else {} logic so caller-supplied timeouts
are preserved.

---

Duplicate comments:
In `@wren/src/wren/connector/duckdb.py`:
- Around line 69-70: The bug is that __init__ calls _attach_database for any
connection with format == "duckdb" but _list_duckdb_files always creates an
opendal.Operator("fs", ...) so S3/Minio/GCS connections are scanned on the local
filesystem; change _list_duckdb_files to build the opendal operator from the
passed connection_info (S3FileConnectionInfo, MinioFileConnectionInfo,
GcsFileConnectionInfo, or local fs) instead of hardcoding "fs", or alternatively
accept an Operator parameter (update calls from __init__ and the callers around
the referenced block/lines) so _attach_database/_list_duckdb_files use the
correct remote backend and bucket/path when listing DuckDB files.
- Around line 80-81: The dry_run method currently calls
self.connection.execute(sql) which actually executes mutations; change it to a
read-only probe instead (e.g., use EXPLAIN, PREPARE, or wrap the query in a
SELECT * FROM (<sql>) LIMIT 0) so the statement is validated/planned but not
applied. Update the dry_run implementation to call the connection method that
runs the chosen probe (not execute) and ensure it handles errors the same way as
planning failures; reference the dry_run method and replace the direct use of
self.connection.execute with a safe read-only validation call.

In `@wren/src/wren/model/data_source.py`:
- Around line 167-173: _build_connection_info() can return
DatabricksServicePrincipalConnectionInfo for
databricks_type="service_principal", but get_databricks_connection() still
assumes info.access_token and will raise AttributeError; update
get_databricks_connection() to check the concrete type (use isinstance(info,
DatabricksServicePrincipalConnectionInfo) or test for the absence of
access_token) and either perform the correct service-principal auth flow or
raise a clear UnsupportedAuth/ValueError describing that service-principal auth
is unsupported, referencing the symbols _build_connection_info,
get_databricks_connection, DatabricksServicePrincipalConnectionInfo, and
access_token so the fix targets the correct branch; apply the same change to the
other occurrence noted (lines ~439-445).

---

Nitpick comments:
In `@wren/tests/suite/query.py`:
- Around line 135-137: The manifest_str pytest fixture in tests/suite/query.py
(function manifest_str) appears unused by current subclasses and duplicates
inline logic in the docstring example; either remove this fixture to avoid dead
code or update subclass engine fixtures/tests to use the shared manifest_str
fixture instead of duplicating
base64.b64encode(orjson.dumps(self.manifest)).decode() — locate the fixture
named manifest_str and either delete it and clean up imports, or replace inline
copies in subclass fixtures/tests with a reference to manifest_str so the shared
fixture is exercised consistently.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 50ed8c2f-10b9-432d-8562-6f6cc79f3b33

📥 Commits

Reviewing files that changed from the base of the PR and between 7eb9b44 and 43298c8.

📒 Files selected for processing (5)
  • wren/src/wren/cli.py
  • wren/src/wren/connector/duckdb.py
  • wren/src/wren/connector/redshift.py
  • wren/src/wren/model/data_source.py
  • wren/tests/suite/query.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • wren/src/wren/connector/redshift.py

- Allow empty dict connection_info for transpile-only usage
- Fix Postgres connect_timeout guard to check kwargs dict, not model attrs
- Use EXPLAIN for DuckDB dry_run instead of executing SQL directly
- Handle UUID columns in MSSQL connector via _handle_pyarrow_unsupported_type

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@goldmedal goldmedal requested a review from wwwy3y3 March 23, 2026 05:41
@goldmedal goldmedal merged commit 41f3f21 into Canner:main Mar 23, 2026
8 of 9 checks passed
@goldmedal goldmedal deleted the feat/cli-module branch March 23, 2026 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bigquery ci dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation ibis python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants