Canner · douenergy · Apr 2, 2026 · Apr 2, 2026 · Apr 2, 2026
diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md
@@ -8,15 +8,16 @@ Wren Engine (OSS) is an open source semantic engine for MCP clients and AI agent
 
 ## Repository Structure
 
-Four main modules:
+Five main modules:
 
 - **wren-core/** — Rust semantic engine (Cargo workspace: `core/`, `sqllogictest/`, `benchmarks/`, `wren-example/`). Handles MDL analysis, query planning, logical plan optimization, and SQL generation via DataFusion.
 - **wren-core-base/** — Shared Rust crate with manifest types (`Model`, `Column`, `Metric`, `Relationship`, `View`). Has optional `python-binding` feature for PyO3 compatibility.
-- **wren-core-py/** — PyO3 bindings exposing wren-core to Python. Built with Maturin.
+- **wren-core-py/** — PyO3 bindings exposing wren-core to Python. Built with Maturin. Published to PyPI.
 - **ibis-server/** — FastAPI web server (Python 3.11). Provides REST API for query execution, validation, and metadata. Uses Ibis framework for data source connectivity.
+- **wren/** — Standalone Python SDK and CLI (`wren-engine` on PyPI). Wraps `wren-core-py` + Ibis connectors into a single installable package with `wren` CLI tool. Includes optional LanceDB-backed memory layer for semantic schema/query retrieval.
 - **mcp-server/** — MCP server exposing Wren Engine to AI agents (Claude, Cline, Cursor).
 
-Supporting modules: `wren-core-legacy/` (Java engine, fallback for v2 queries), `mock-web-server/`, `benchmark/`, `example/`.
+Supporting modules: `wren-core-legacy/` (Java engine, fallback for v2 queries), `cli-skills/` (agent skill definitions), `mock-web-server/`, `benchmark/`, `example/`.
 
 ## Build & Development Commands
 
@@ -56,6 +57,23 @@ just format                         # ruff auto-fix + taplo
 
 Available test markers: `postgres`, `mysql`, `mssql`, `bigquery`, `snowflake`, `clickhouse`, `trino`, `oracle`, `athena`, `duckdb`, `athena_spark`, `databricks`, `spark`, `doris`, `local_file`, `s3_file`, `gcs_file`, `minio_file`, `functions`, `profile`, `cache`, `unit`, `enterprise`, `beta`.
 
+### wren (SDK & CLI)
+```bash
+cd wren
+just install                        # Build wren-core-py wheel + uv sync
+just install-all                    # With all optional extras (including memory)
+just install-extra <extra>          # e.g. just install-extra postgres
+just install-memory                 # Install memory extra (lancedb + sentence-transformers)
+just dev                            # Run `wren` CLI
+just test                           # pytest tests/
+just test-memory                    # Memory-specific tests
+just lint                           # ruff format --check + ruff check
+just format                         # ruff auto-fix
+just build                          # uv build (produces wheel)
+```
+
+Uses `uv` (not Poetry). `pyproject.toml` uses `hatchling` as build backend. Optional extras: `postgres`, `mysql`, `bigquery`, `snowflake`, `clickhouse`, `trino`, `mssql`, `databricks`, `redshift`, `spark`, `athena`, `oracle`, `memory`, `all`, `dev`.
+
 ### mcp-server
 Uses `uv` for dependency management. See `mcp-server/README.md`.
 

diff --git a/cli-skills/wren-usage/SKILL.md b/cli-skills/wren-usage/SKILL.md
@@ -13,9 +13,12 @@ The `wren` CLI queries databases through an MDL (Model Definition Language) sema
 
 Two files drive everything (auto-discovered from `~/.wren/`):
 - `mdl.json` — the semantic model
-- `connection_info.json` — database credentials
+- `connection_info.json` — database credentials + `datasource` field (e.g. `"datasource": "postgres"`)
+
+The data source is always read from `connection_info.json`. There is no `--datasource` flag on execution commands (`query`, `dry-run`, `validate`). Only `dry-plan` accepts `--datasource` / `-d` as an override (for transpile-only use without a connection file).
 
 For memory-specific decisions, see [references/memory.md](references/memory.md).
+For SQL syntax, CTE-based modeling, and error diagnosis, see [references/wren-sql.md](references/wren-sql.md).
 
 ---
 
@@ -47,8 +50,6 @@ GROUP BY 1 ORDER BY 2 DESC LIMIT 5'
 
 **SQL rules:**
 - Target MDL model names, not database tables
-- Use `CAST(x AS type)`, not `::type`
-- Avoid correlated subqueries — use JOINs or CTEs
 - Write dialect-neutral SQL — the engine translates
 
 ### Step 4 — Handle the result
@@ -74,15 +75,17 @@ GROUP BY 1 ORDER BY 2 DESC LIMIT 5'
 ### Connection error
 
 1. Check: `cat ~/.wren/connection_info.json`
-2. Test: `wren --sql "SELECT 1"`
-3. Valid datasource values: `postgres`, `mysql`, `bigquery`, `snowflake`, `clickhouse`, `trino`, `mssql`, `databricks`, `redshift`, `spark`, `athena`, `oracle`, `duckdb`
+2. Verify the `datasource` field is present and valid
+3. Test: `wren --sql "SELECT 1"`
+4. Valid datasource values: `postgres`, `mysql`, `bigquery`, `snowflake`, `clickhouse`, `trino`, `mssql`, `databricks`, `redshift`, `spark`, `athena`, `oracle`, `duckdb`
+5. Both flat format (`{"datasource": ..., "host": ...}`) and MCP envelope format (`{"datasource": ..., "properties": {...}}`) are accepted
 
 ### SQL syntax / planning error
 
 1. Isolate the layer:
    - `wren dry-plan --sql "..."` — if this fails, it is an MDL-level issue
    - If dry-plan succeeds but execution fails, the DB rejects the translated SQL
-2. Common fixes: replace `::` with `CAST()`, replace correlated subqueries with JOINs
+2. Compare dry-plan output with the DB error message — see [references/wren-sql.md](references/wren-sql.md) for the CTE rewrite pipeline and common error patterns
 
 ---
 
@@ -117,7 +120,7 @@ wren --sql "SELECT * FROM <changed_model> LIMIT 1"
 
 ```
 Get data back           → wren --sql "..."
-See translated SQL only → wren dry-plan --sql "..."
+See translated SQL only → wren dry-plan --sql "..." (accepts -d <datasource> if no connection file)
 Validate against DB     → wren dry-run --sql "..."
 Schema context          → wren memory fetch -q "..."
 Filter by type/model    → wren memory fetch -q "..." --type T --model M --threshold 0
@@ -134,5 +137,4 @@ Re-index after MDL change → wren memory index
 - Do not guess model or column names — check context first
 - Do not store queries the user has not confirmed — success != correctness
 - Do not re-index before every query — once per MDL change
-- Do not use database-specific syntax — write ANSI SQL
 - Do not pass passwords via `--connection-info` if shell history is shared — use `--connection-file`
diff --git a/cli-skills/wren-usage/references/wren-sql.md b/cli-skills/wren-usage/references/wren-sql.md
@@ -0,0 +1,108 @@
+# Wren SQL — How CTE-Based Modeling Works
+
+Wren Engine rewrites your SQL by injecting CTEs (Common Table Expressions) that expand each MDL model into its underlying database query. Understanding this mechanism helps you diagnose errors and write correct SQL.
+
+---
+
+## The rewrite pipeline
+
+```
+Your SQL (target dialect, e.g. Postgres)
+  → parse & qualify all column references (sqlglot)
+  → identify which models and columns are referenced
+  → per model: wren-core expands the model definition → CTE
+  → inject model CTEs into your query
+  → output final SQL in target dialect
+```
+
+**Example:** Given an MDL with model `orders` backed by table `public.orders` with columns `o_orderkey`, `o_custkey`, `o_totalprice`:
+
+```sql
+-- You write:
+SELECT o_custkey, SUM(o_totalprice) FROM orders GROUP BY 1
+
+-- Engine produces (via dry-plan):
+WITH "orders" AS (
+  SELECT "public"."orders"."o_orderkey",
+         "public"."orders"."o_custkey",
+         "public"."orders"."o_totalprice"
+  FROM "public"."orders"
+)
+SELECT o_custkey, SUM(o_totalprice) FROM orders GROUP BY 1
+```
+
+The CTE named `"orders"` shadows the model name, so the rest of your SQL runs against the CTE as if it were a table.
+
+---
+
+## What the rewriter handles
+
+| Feature | Supported |
+|---------|-----------|
+| `SELECT *` from a model | Yes — expands to all non-hidden, non-relationship columns |
+| JOINs between models | Yes — each model gets its own CTE |
+| Subqueries referencing models | Yes — outer column references are resolved |
+| Table aliases (`FROM orders o`) | Yes — alias tracking maps back to models |
+| User-defined CTEs (`WITH x AS (...)`) | Yes — model CTEs are prepended before user CTEs |
+| `RECURSIVE` WITH clauses | Yes — preserved |
+| Calculated fields / metrics | Yes — wren-core expands them inside the model CTE |
+| `COUNT(*)` without columns | Yes — model CTE selects `1` (only needs rows) |
+
+---
+
+## SQL rules for writing queries
+
+1. **Use model names, not database table names** — write `FROM orders`, not `FROM public.orders`
+2. **Write dialect-neutral SQL** — the engine translates to the target database dialect
+3. **Column names must match the MDL** — use the names defined in `mdl.json`, not the underlying database column names
+4. **Hidden columns are excluded** — columns with `"isHidden": true` are not available in `SELECT *`
+5. **Relationship columns are excluded** — relationship fields don't appear as selectable columns; use JOINs instead
+
+---
+
+## Diagnosing errors with dry-plan
+
+`dry-plan` shows the expanded SQL without executing it. This separates MDL-level issues from database-level issues.
+
+### Step 1 — Run dry-plan
+
+```bash
+wren dry-plan --sql "SELECT o_custkey, SUM(o_totalprice) FROM orders GROUP BY 1"
+```
+
+### Step 2 — Interpret the result
+
+| dry-plan result | Meaning | Fix |
+|-----------------|---------|-----|
+| **Succeeds** with valid SQL | MDL layer is fine; if execution fails, the database rejects the translated SQL | Read the DB error against the dry-plan output — the issue is in the generated SQL or DB state |
+| **Fails** with "No model references found" | Your FROM clause doesn't match any MDL model name | Check model names: `wren memory fetch -q "<name>" --type model --threshold 0` |
+| **Fails** with column error | A column you referenced doesn't exist in the model | Check columns: `wren memory fetch -q "<col>" --model <name> --threshold 0` |
+| **Fails** with qualify error | sqlglot can't resolve an ambiguous or unknown column | Qualify the column explicitly: `model_name.column_name` |
+
+### Step 3 — Compare dry-plan output with DB error
+
+When execution fails but dry-plan succeeds, compare them side by side:
+
+```bash
+# Get the expanded SQL
+wren dry-plan --sql "SELECT ..." 2>&1
+
+# Run against DB and capture the error
+wren --sql "SELECT ..." 2>&1
+```
+
+Common patterns:
+- **Type mismatch**: The CTE exposes the raw column type; a function may not accept it in the target dialect
+- **Missing table**: The underlying table referenced in the model definition doesn't exist in the database
+- **Permission denied**: The DB user lacks access to the underlying tables
+- **Syntax difference**: Rare — usually means a sqlglot dialect translation gap
+
+---
+
+## Fallback behavior
+
+If the rewriter detects no model references in your SQL (e.g. `SELECT 1` or queries against raw database tables), it falls back to passing the entire query through wren-core's `transform_sql()` directly. This means:
+
+- Queries that don't reference any MDL model still work
+- The fallback path does NOT use CTE injection — it transforms the whole query at once
+- If you expect model expansion but get none, check that your FROM clause uses model names from the MDL
diff --git a/wren/.claude/CLAUDE.md b/wren/.claude/CLAUDE.md
@@ -8,25 +8,43 @@ Standalone Python SDK and CLI for Wren Engine. Wraps `wren-core-py` (PyO3 bindin
 wren/
   src/wren/
     engine.py       — WrenEngine facade (transpile / query / dry_run / dry_plan)
-    cli.py          — Typer CLI: wren query|dry-run|transpile|validate
+    cli.py          — Typer CLI: wren query|dry-run|dry-plan|validate
     mdl/            — wren-core-py session context + manifest extraction helpers
+      cte_rewriter.py — CTE-based query rewriting
+      wren_dialect.py — Custom sqlglot dialect
     connector/      — Per-datasource Ibis connectors (factory.py + one file per source)
+      factory.py    — DataSource → connector dispatch registry
+      base.py       — Base connector interface
+      ibis.py       — Shared Ibis-backed connector (trino, clickhouse, snowflake, athena)
+      postgres.py, mysql.py, mssql.py, bigquery.py, duckdb.py, oracle.py,
+      redshift.py, spark.py, databricks.py, canner.py
     model/
       data_source.py — DataSource enum + per-source ConnectionInfo factories
       error.py       — WrenError, ErrorCode, ErrorPhase
+    memory/         — Optional LanceDB-backed semantic memory (requires `wren[memory]`)
+      store.py      — LanceDB vector store operations
+      schema_indexer.py — MDL schema embedding/indexing
+      embeddings.py — Sentence transformer integration
+      cli.py        — `wren memory` CLI subcommands
   tests/
+    unit/           — test_engine.py, test_cte_rewriter.py, test_memory.py
+    connectors/     — test_duckdb.py, test_postgres.py, test_mysql.py
+    suite/          — Shared test helpers (manifests.py, query.py)
 ```
 
 ## Build & Development
 
 ```bash
 cd wren
 just install          # build wren-core-py wheel + uv sync
-just install-all      # with all optional extras
+just install-all      # with all optional extras (including memory)
 just install-extra <extra>   # e.g. just install-extra postgres
+just install-memory   # install memory extra (lancedb + sentence-transformers)
+just dev              # run `wren` CLI
 just test             # pytest tests/
+just test-memory      # memory-specific tests
 just lint             # ruff format --check + ruff check
-just format           # ruff auto-fix
+just format           # ruff auto-fix (also aliased as `just fmt`)
 just build            # uv build (produces wheel)
 ```
 
@@ -45,9 +63,24 @@ Uses `uv` (not Poetry). `pyproject.toml` uses `hatchling` as build backend.
 
 `connector/factory.py` dispatches on `DataSource` to return the right connector. Each connector wraps an Ibis backend and exposes `.query(sql, limit)` and `.dry_run(sql)`. Base class in `connector/base.py`; Ibis-backed connectors share `connector/ibis.py`.
 
+- **Dedicated modules**: `postgres.py`, `mysql.py`, `mssql.py`, `bigquery.py`, `duckdb.py`, `oracle.py` (native oracledb, not Ibis), `redshift.py`, `spark.py`, `databricks.py`, `canner.py`
+- **Shared Ibis module** (`ibis.py`): trino, clickhouse, snowflake, athena
+- **File connectors**: `local_file`, `s3_file`, `minio_file`, `gcs_file` all map to duckdb
+- **doris** maps to mysql connector (MySQL-compatible protocol)
+- **canner** maps to postgres connector
+
+## Memory Module (Optional)
+
+`wren/src/wren/memory/` — LanceDB-backed semantic memory for schema and query retrieval. Install via `wren[memory]`.
+
+- **`WrenMemory`** — Main API: `index_manifest()`, `get_context()`, `store_query()`, `recall_queries()`, `describe_schema()`, `schema_is_current()`, `status()`, `reset()`
+- Uses sentence-transformers for embedding MDL schema items and NL↔SQL query pairs
+- CLI: `wren memory index|fetch|store|recall` subcommands (auto-registered when extras installed)
+- Backing store: LanceDB (local or remote via opendal)
+
 ## Optional Extras
 
-Install per data-source extras: `postgres`, `mysql`, `bigquery`, `snowflake`, `clickhouse`, `trino`, `mssql`, `databricks`, `redshift`, `spark`, `athena`, `oracle`, `all`, `dev`.
+Install per data-source extras: `postgres`, `mysql`, `bigquery`, `snowflake`, `clickhouse`, `trino`, `mssql`, `databricks`, `redshift`, `spark`, `athena`, `oracle`, `memory`, `all`, `dev`.
 
 On macOS, `mysql` extra needs:
 ```bash

diff --git a/wren/docs/cli.md b/wren/docs/cli.md
@@ -26,6 +26,7 @@ Translate MDL SQL to the native dialect SQL for your data source. No database co
 
 ```bash
 wren dry-plan --sql 'SELECT order_id FROM "orders"'
+wren dry-plan --sql 'SELECT order_id FROM "orders"' -d postgres  # explicit datasource, no connection file needed
 ```
 
 ## `wren dry-run`
@@ -47,13 +48,14 @@ wren validate --sql 'SELECT * FROM "NonExistent"'
 
 ## Overriding defaults
 
-All flags are optional when `~/.wren/mdl.json` and `~/.wren/connection_info.json` exist:
+All flags are optional when `~/.wren/mdl.json` and `~/.wren/connection_info.json` exist.
+
+The data source is always read from the `datasource` field in `connection_info.json` (or the inline `--connection-info` value). Only `dry-plan` accepts `--datasource` / `-d` as an override for transpile-only use without a connection file.
 
 ```bash
 wren --sql '...' \
   --mdl /path/to/other-mdl.json \
-  --connection-file /path/to/prod-connection_info.json \
-  --datasource postgres
+  --connection-file /path/to/prod-connection_info.json
 ```
 
 Or pass connection info inline:
@@ -63,6 +65,16 @@ wren --sql 'SELECT COUNT(*) FROM "orders"' \
   --connection-info '{"datasource":"mysql","host":"localhost","port":3306,"database":"mydb","user":"root","password":"secret"}'
 ```
 
+Both flat and MCP/web envelope formats are accepted:
+
+```bash
+# Flat format
+{"datasource": "postgres", "host": "localhost", "port": 5432, ...}
+
+# Envelope format (auto-unwrapped)
+{"datasource": "duckdb", "properties": {"url": "/data", "format": "duckdb"}}
+```
+
 ---
 
 ## `wren memory` — Schema & Query Memory

diff --git a/wren/docs/connections.md b/wren/docs/connections.md
@@ -2,6 +2,42 @@
 
 The `connection_info.json` file (or `--connection-info` / `--connection-file` flags) requires a `datasource` field plus the connector-specific fields below.
 
+## Accepted formats
+
+**Flat format** — all fields at the top level:
+
+```json
+{
+  "datasource": "postgres",
+  "host": "localhost",
+  "port": 5432,
+  "database": "mydb",
+  "user": "postgres",
+  "password": "secret"
+}
+```
+
+**Envelope format** — connector fields nested under `properties` (used by MCP server and Wren web):
+
+```json
+{
+  "datasource": "postgres",
+  "properties": {
+    "host": "localhost",
+    "port": 5432,
+    "database": "mydb",
+    "user": "postgres",
+    "password": "secret"
+  }
+}
+```
+
+Both formats are accepted. The CLI auto-flattens the envelope format.
+
+---
+
+## Per-connector fields
+
 ## MySQL
 
 ```json