Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 21 additions & 3 deletions .claude/CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,16 @@ Wren Engine (OSS) is an open source semantic engine for MCP clients and AI agent

## Repository Structure

Four main modules:
Five main modules:

- **wren-core/** — Rust semantic engine (Cargo workspace: `core/`, `sqllogictest/`, `benchmarks/`, `wren-example/`). Handles MDL analysis, query planning, logical plan optimization, and SQL generation via DataFusion.
- **wren-core-base/** — Shared Rust crate with manifest types (`Model`, `Column`, `Metric`, `Relationship`, `View`). Has optional `python-binding` feature for PyO3 compatibility.
- **wren-core-py/** — PyO3 bindings exposing wren-core to Python. Built with Maturin.
- **wren-core-py/** — PyO3 bindings exposing wren-core to Python. Built with Maturin. Published to PyPI.
- **ibis-server/** — FastAPI web server (Python 3.11). Provides REST API for query execution, validation, and metadata. Uses Ibis framework for data source connectivity.
- **wren/** — Standalone Python SDK and CLI (`wren-engine` on PyPI). Wraps `wren-core-py` + Ibis connectors into a single installable package with `wren` CLI tool. Includes optional LanceDB-backed memory layer for semantic schema/query retrieval.
- **mcp-server/** — MCP server exposing Wren Engine to AI agents (Claude, Cline, Cursor).

Supporting modules: `wren-core-legacy/` (Java engine, fallback for v2 queries), `mock-web-server/`, `benchmark/`, `example/`.
Supporting modules: `wren-core-legacy/` (Java engine, fallback for v2 queries), `cli-skills/` (agent skill definitions), `mock-web-server/`, `benchmark/`, `example/`.

## Build & Development Commands

Expand Down Expand Up @@ -56,6 +57,23 @@ just format # ruff auto-fix + taplo

Available test markers: `postgres`, `mysql`, `mssql`, `bigquery`, `snowflake`, `clickhouse`, `trino`, `oracle`, `athena`, `duckdb`, `athena_spark`, `databricks`, `spark`, `doris`, `local_file`, `s3_file`, `gcs_file`, `minio_file`, `functions`, `profile`, `cache`, `unit`, `enterprise`, `beta`.

### wren (SDK & CLI)
```bash
cd wren
just install # Build wren-core-py wheel + uv sync
just install-all # With all optional extras (including memory)
just install-extra <extra> # e.g. just install-extra postgres
just install-memory # Install memory extra (lancedb + sentence-transformers)
just dev # Run `wren` CLI
just test # pytest tests/
just test-memory # Memory-specific tests
just lint # ruff format --check + ruff check
just format # ruff auto-fix
just build # uv build (produces wheel)
```

Uses `uv` (not Poetry). `pyproject.toml` uses `hatchling` as build backend. Optional extras: `postgres`, `mysql`, `bigquery`, `snowflake`, `clickhouse`, `trino`, `mssql`, `databricks`, `redshift`, `spark`, `athena`, `oracle`, `memory`, `all`, `dev`.

### mcp-server
Uses `uv` for dependency management. See `mcp-server/README.md`.

Expand Down
18 changes: 10 additions & 8 deletions cli-skills/wren-usage/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,12 @@ The `wren` CLI queries databases through an MDL (Model Definition Language) sema

Two files drive everything (auto-discovered from `~/.wren/`):
- `mdl.json` — the semantic model
- `connection_info.json` — database credentials
- `connection_info.json` — database credentials + `datasource` field (e.g. `"datasource": "postgres"`)

The data source is always read from `connection_info.json`. There is no `--datasource` flag on execution commands (`query`, `dry-run`, `validate`). Only `dry-plan` accepts `--datasource` / `-d` as an override (for transpile-only use without a connection file).

For memory-specific decisions, see [references/memory.md](references/memory.md).
For SQL syntax, CTE-based modeling, and error diagnosis, see [references/wren-sql.md](references/wren-sql.md).

---

Expand Down Expand Up @@ -47,8 +50,6 @@ GROUP BY 1 ORDER BY 2 DESC LIMIT 5'

**SQL rules:**
- Target MDL model names, not database tables
- Use `CAST(x AS type)`, not `::type`
- Avoid correlated subqueries — use JOINs or CTEs
- Write dialect-neutral SQL — the engine translates

### Step 4 — Handle the result
Expand All @@ -74,15 +75,17 @@ GROUP BY 1 ORDER BY 2 DESC LIMIT 5'
### Connection error

1. Check: `cat ~/.wren/connection_info.json`
2. Test: `wren --sql "SELECT 1"`
3. Valid datasource values: `postgres`, `mysql`, `bigquery`, `snowflake`, `clickhouse`, `trino`, `mssql`, `databricks`, `redshift`, `spark`, `athena`, `oracle`, `duckdb`
2. Verify the `datasource` field is present and valid
3. Test: `wren --sql "SELECT 1"`
4. Valid datasource values: `postgres`, `mysql`, `bigquery`, `snowflake`, `clickhouse`, `trino`, `mssql`, `databricks`, `redshift`, `spark`, `athena`, `oracle`, `duckdb`
5. Both flat format (`{"datasource": ..., "host": ...}`) and MCP envelope format (`{"datasource": ..., "properties": {...}}`) are accepted

### SQL syntax / planning error

1. Isolate the layer:
- `wren dry-plan --sql "..."` — if this fails, it is an MDL-level issue
- If dry-plan succeeds but execution fails, the DB rejects the translated SQL
2. Common fixes: replace `::` with `CAST()`, replace correlated subqueries with JOINs
2. Compare dry-plan output with the DB error message — see [references/wren-sql.md](references/wren-sql.md) for the CTE rewrite pipeline and common error patterns

---

Expand Down Expand Up @@ -117,7 +120,7 @@ wren --sql "SELECT * FROM <changed_model> LIMIT 1"

```
Get data back → wren --sql "..."
See translated SQL only → wren dry-plan --sql "..."
See translated SQL only → wren dry-plan --sql "..." (accepts -d <datasource> if no connection file)
Validate against DB → wren dry-run --sql "..."
Schema context → wren memory fetch -q "..."
Filter by type/model → wren memory fetch -q "..." --type T --model M --threshold 0
Expand All @@ -134,5 +137,4 @@ Re-index after MDL change → wren memory index
- Do not guess model or column names — check context first
- Do not store queries the user has not confirmed — success != correctness
- Do not re-index before every query — once per MDL change
- Do not use database-specific syntax — write ANSI SQL
- Do not pass passwords via `--connection-info` if shell history is shared — use `--connection-file`
108 changes: 108 additions & 0 deletions cli-skills/wren-usage/references/wren-sql.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# Wren SQL — How CTE-Based Modeling Works

Wren Engine rewrites your SQL by injecting CTEs (Common Table Expressions) that expand each MDL model into its underlying database query. Understanding this mechanism helps you diagnose errors and write correct SQL.

---

## The rewrite pipeline

```
Your SQL (target dialect, e.g. Postgres)
→ parse & qualify all column references (sqlglot)
→ identify which models and columns are referenced
→ per model: wren-core expands the model definition → CTE
→ inject model CTEs into your query
→ output final SQL in target dialect
```

**Example:** Given an MDL with model `orders` backed by table `public.orders` with columns `o_orderkey`, `o_custkey`, `o_totalprice`:

```sql
-- You write:
SELECT o_custkey, SUM(o_totalprice) FROM orders GROUP BY 1

-- Engine produces (via dry-plan):
WITH "orders" AS (
SELECT "public"."orders"."o_orderkey",
"public"."orders"."o_custkey",
"public"."orders"."o_totalprice"
FROM "public"."orders"
)
SELECT o_custkey, SUM(o_totalprice) FROM orders GROUP BY 1
```

The CTE named `"orders"` shadows the model name, so the rest of your SQL runs against the CTE as if it were a table.

---

## What the rewriter handles

| Feature | Supported |
|---------|-----------|
| `SELECT *` from a model | Yes — expands to all non-hidden, non-relationship columns |
| JOINs between models | Yes — each model gets its own CTE |
| Subqueries referencing models | Yes — outer column references are resolved |
| Table aliases (`FROM orders o`) | Yes — alias tracking maps back to models |
| User-defined CTEs (`WITH x AS (...)`) | Yes — model CTEs are prepended before user CTEs |
| `RECURSIVE` WITH clauses | Yes — preserved |
| Calculated fields / metrics | Yes — wren-core expands them inside the model CTE |
| `COUNT(*)` without columns | Yes — model CTE selects `1` (only needs rows) |

---

## SQL rules for writing queries

1. **Use model names, not database table names** — write `FROM orders`, not `FROM public.orders`
2. **Write dialect-neutral SQL** — the engine translates to the target database dialect
3. **Column names must match the MDL** — use the names defined in `mdl.json`, not the underlying database column names
4. **Hidden columns are excluded** — columns with `"isHidden": true` are not available in `SELECT *`
5. **Relationship columns are excluded** — relationship fields don't appear as selectable columns; use JOINs instead

---

## Diagnosing errors with dry-plan

`dry-plan` shows the expanded SQL without executing it. This separates MDL-level issues from database-level issues.

### Step 1 — Run dry-plan

```bash
wren dry-plan --sql "SELECT o_custkey, SUM(o_totalprice) FROM orders GROUP BY 1"
```

### Step 2 — Interpret the result

| dry-plan result | Meaning | Fix |
|-----------------|---------|-----|
| **Succeeds** with valid SQL | MDL layer is fine; if execution fails, the database rejects the translated SQL | Read the DB error against the dry-plan output — the issue is in the generated SQL or DB state |
| **Fails** with "No model references found" | Your FROM clause doesn't match any MDL model name | Check model names: `wren memory fetch -q "<name>" --type model --threshold 0` |
| **Fails** with column error | A column you referenced doesn't exist in the model | Check columns: `wren memory fetch -q "<col>" --model <name> --threshold 0` |
| **Fails** with qualify error | sqlglot can't resolve an ambiguous or unknown column | Qualify the column explicitly: `model_name.column_name` |

### Step 3 — Compare dry-plan output with DB error

When execution fails but dry-plan succeeds, compare them side by side:

```bash
# Get the expanded SQL
wren dry-plan --sql "SELECT ..." 2>&1

# Run against DB and capture the error
wren --sql "SELECT ..." 2>&1
```

Common patterns:
- **Type mismatch**: The CTE exposes the raw column type; a function may not accept it in the target dialect
- **Missing table**: The underlying table referenced in the model definition doesn't exist in the database
- **Permission denied**: The DB user lacks access to the underlying tables
- **Syntax difference**: Rare — usually means a sqlglot dialect translation gap

---

## Fallback behavior

If the rewriter detects no model references in your SQL (e.g. `SELECT 1` or queries against raw database tables), it falls back to passing the entire query through wren-core's `transform_sql()` directly. This means:

- Queries that don't reference any MDL model still work
- The fallback path does NOT use CTE injection — it transforms the whole query at once
- If you expect model expansion but get none, check that your FROM clause uses model names from the MDL
41 changes: 37 additions & 4 deletions wren/.claude/CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,25 +8,43 @@ Standalone Python SDK and CLI for Wren Engine. Wraps `wren-core-py` (PyO3 bindin
wren/
src/wren/
engine.py — WrenEngine facade (transpile / query / dry_run / dry_plan)
cli.py — Typer CLI: wren query|dry-run|transpile|validate
cli.py — Typer CLI: wren query|dry-run|dry-plan|validate
mdl/ — wren-core-py session context + manifest extraction helpers
cte_rewriter.py — CTE-based query rewriting
wren_dialect.py — Custom sqlglot dialect
connector/ — Per-datasource Ibis connectors (factory.py + one file per source)
factory.py — DataSource → connector dispatch registry
base.py — Base connector interface
ibis.py — Shared Ibis-backed connector (trino, clickhouse, snowflake, athena)
postgres.py, mysql.py, mssql.py, bigquery.py, duckdb.py, oracle.py,
redshift.py, spark.py, databricks.py, canner.py
model/
data_source.py — DataSource enum + per-source ConnectionInfo factories
error.py — WrenError, ErrorCode, ErrorPhase
memory/ — Optional LanceDB-backed semantic memory (requires `wren[memory]`)
store.py — LanceDB vector store operations
schema_indexer.py — MDL schema embedding/indexing
embeddings.py — Sentence transformer integration
cli.py — `wren memory` CLI subcommands
tests/
unit/ — test_engine.py, test_cte_rewriter.py, test_memory.py
connectors/ — test_duckdb.py, test_postgres.py, test_mysql.py
suite/ — Shared test helpers (manifests.py, query.py)
```

## Build & Development

```bash
cd wren
just install # build wren-core-py wheel + uv sync
just install-all # with all optional extras
just install-all # with all optional extras (including memory)
just install-extra <extra> # e.g. just install-extra postgres
just install-memory # install memory extra (lancedb + sentence-transformers)
just dev # run `wren` CLI
just test # pytest tests/
just test-memory # memory-specific tests
just lint # ruff format --check + ruff check
just format # ruff auto-fix
just format # ruff auto-fix (also aliased as `just fmt`)
just build # uv build (produces wheel)
```

Expand All @@ -45,9 +63,24 @@ Uses `uv` (not Poetry). `pyproject.toml` uses `hatchling` as build backend.

`connector/factory.py` dispatches on `DataSource` to return the right connector. Each connector wraps an Ibis backend and exposes `.query(sql, limit)` and `.dry_run(sql)`. Base class in `connector/base.py`; Ibis-backed connectors share `connector/ibis.py`.

- **Dedicated modules**: `postgres.py`, `mysql.py`, `mssql.py`, `bigquery.py`, `duckdb.py`, `oracle.py` (native oracledb, not Ibis), `redshift.py`, `spark.py`, `databricks.py`, `canner.py`
- **Shared Ibis module** (`ibis.py`): trino, clickhouse, snowflake, athena
- **File connectors**: `local_file`, `s3_file`, `minio_file`, `gcs_file` all map to duckdb
- **doris** maps to mysql connector (MySQL-compatible protocol)
- **canner** maps to postgres connector

## Memory Module (Optional)

`wren/src/wren/memory/` — LanceDB-backed semantic memory for schema and query retrieval. Install via `wren[memory]`.

- **`WrenMemory`** — Main API: `index_manifest()`, `get_context()`, `store_query()`, `recall_queries()`, `describe_schema()`, `schema_is_current()`, `status()`, `reset()`
- Uses sentence-transformers for embedding MDL schema items and NL↔SQL query pairs
- CLI: `wren memory index|fetch|store|recall` subcommands (auto-registered when extras installed)
- Backing store: LanceDB (local or remote via opendal)

## Optional Extras

Install per data-source extras: `postgres`, `mysql`, `bigquery`, `snowflake`, `clickhouse`, `trino`, `mssql`, `databricks`, `redshift`, `spark`, `athena`, `oracle`, `all`, `dev`.
Install per data-source extras: `postgres`, `mysql`, `bigquery`, `snowflake`, `clickhouse`, `trino`, `mssql`, `databricks`, `redshift`, `spark`, `athena`, `oracle`, `memory`, `all`, `dev`.

On macOS, `mysql` extra needs:
```bash
Expand Down
18 changes: 15 additions & 3 deletions wren/docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Translate MDL SQL to the native dialect SQL for your data source. No database co

```bash
wren dry-plan --sql 'SELECT order_id FROM "orders"'
wren dry-plan --sql 'SELECT order_id FROM "orders"' -d postgres # explicit datasource, no connection file needed
```

## `wren dry-run`
Expand All @@ -47,13 +48,14 @@ wren validate --sql 'SELECT * FROM "NonExistent"'

## Overriding defaults

All flags are optional when `~/.wren/mdl.json` and `~/.wren/connection_info.json` exist:
All flags are optional when `~/.wren/mdl.json` and `~/.wren/connection_info.json` exist.

The data source is always read from the `datasource` field in `connection_info.json` (or the inline `--connection-info` value). Only `dry-plan` accepts `--datasource` / `-d` as an override for transpile-only use without a connection file.

```bash
wren --sql '...' \
--mdl /path/to/other-mdl.json \
--connection-file /path/to/prod-connection_info.json \
--datasource postgres
--connection-file /path/to/prod-connection_info.json
```

Or pass connection info inline:
Expand All @@ -63,6 +65,16 @@ wren --sql 'SELECT COUNT(*) FROM "orders"' \
--connection-info '{"datasource":"mysql","host":"localhost","port":3306,"database":"mydb","user":"root","password":"secret"}'
```

Both flat and MCP/web envelope formats are accepted:

```bash
# Flat format
{"datasource": "postgres", "host": "localhost", "port": 5432, ...}

# Envelope format (auto-unwrapped)
{"datasource": "duckdb", "properties": {"url": "/data", "format": "duckdb"}}
```

---

## `wren memory` — Schema & Query Memory
Expand Down
36 changes: 36 additions & 0 deletions wren/docs/connections.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,42 @@

The `connection_info.json` file (or `--connection-info` / `--connection-file` flags) requires a `datasource` field plus the connector-specific fields below.

## Accepted formats

**Flat format** — all fields at the top level:

```json
{
"datasource": "postgres",
"host": "localhost",
"port": 5432,
"database": "mydb",
"user": "postgres",
"password": "secret"
}
```

**Envelope format** — connector fields nested under `properties` (used by MCP server and Wren web):

```json
{
"datasource": "postgres",
"properties": {
"host": "localhost",
"port": 5432,
"database": "mydb",
"user": "postgres",
"password": "secret"
}
}
```

Both formats are accepted. The CLI auto-flattens the envelope format.

---

## Per-connector fields

## MySQL

```json
Expand Down
Loading
Loading