Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
138 changes: 138 additions & 0 deletions cli-skills/wren-usage/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
---
name: wren-usage
description: "Wren Engine CLI workflow guide for AI agents. Answer data questions end-to-end using the wren CLI: gather schema context, recall past queries, write SQL through the MDL semantic layer, execute, and learn from confirmed results. Use when: agent needs to query data, connect a data source, handle errors, or manage MDL changes via the wren CLI."
license: Apache-2.0
metadata:
author: wren-engine
version: "1.0"
---

# Wren Engine CLI — Agent Workflow Guide

The `wren` CLI queries databases through an MDL (Model Definition Language) semantic layer. You write SQL against model names, not raw tables. The engine translates to the target dialect.

Two files drive everything (auto-discovered from `~/.wren/`):
- `mdl.json` — the semantic model
- `connection_info.json` — database credentials

For memory-specific decisions, see [references/memory.md](references/memory.md).

---

## Workflow 1: Answering a data question

### Step 1 — Gather context

| Situation | Command |
|-----------|---------|
| Default | `wren memory fetch -q "<question>"` |
| Need specific model's columns | `wren memory fetch -q "..." --model <name> --threshold 0` |
| Memory not installed | Read `~/.wren/mdl.json` directly |

### Step 2 — Recall past queries

```bash
wren memory recall -q "<question>" --limit 3
```

Use results as few-shot examples. Skip if empty.

### Step 3 — Write and execute SQL

```bash
wren --sql 'SELECT c_name, SUM(o_totalprice) FROM orders
JOIN customer ON orders.o_custkey = customer.c_custkey
GROUP BY 1 ORDER BY 2 DESC LIMIT 5'
```

**SQL rules:**
- Target MDL model names, not database tables
- Use `CAST(x AS type)`, not `::type`
- Avoid correlated subqueries — use JOINs or CTEs
- Write dialect-neutral SQL — the engine translates

### Step 4 — Handle the result

| Outcome | Action |
|---------|--------|
| User confirms correct | `wren memory store --nl "..." --sql "..."` |
| User continues with follow-up | Store, then handle follow-up |
| User says nothing | Do NOT store |
| User says wrong | Do NOT store — fix the SQL |
| Query error | See Error recovery below |

---

## Workflow 2: Error recovery

### "table not found"

1. Verify model name: `wren memory fetch -q "<name>" --type model --threshold 0`
2. Check MDL exists: `ls ~/.wren/mdl.json`
3. Verify column: `wren memory fetch -q "<column>" --model <name> --threshold 0`

### Connection error

1. Check: `cat ~/.wren/connection_info.json`
2. Test: `wren --sql "SELECT 1"`
3. Valid datasource values: `postgres`, `mysql`, `bigquery`, `snowflake`, `clickhouse`, `trino`, `mssql`, `databricks`, `redshift`, `spark`, `athena`, `oracle`, `duckdb`

### SQL syntax / planning error

1. Isolate the layer:
- `wren dry-plan --sql "..."` — if this fails, it is an MDL-level issue
- If dry-plan succeeds but execution fails, the DB rejects the translated SQL
2. Common fixes: replace `::` with `CAST()`, replace correlated subqueries with JOINs

---

## Workflow 3: Connecting a new data source

1. Create `~/.wren/connection_info.json` — see [wren/docs/connections.md](../../wren/docs/connections.md) for per-connector formats
2. Test: `wren --sql "SELECT 1"`
3. Place or create `~/.wren/mdl.json`
4. Index: `wren memory index`
5. Verify: `wren --sql "SELECT * FROM <model> LIMIT 5"`

---

## Workflow 4: After MDL changes

When the MDL is updated, downstream state goes stale:

```bash
# 1. Deploy updated MDL
cp updated-mdl.json ~/.wren/mdl.json

# 2. Re-index schema memory
wren memory index

# 3. Verify
wren --sql "SELECT * FROM <changed_model> LIMIT 1"
```

---

## Command decision tree

```
Get data back → wren --sql "..."
See translated SQL only → wren dry-plan --sql "..."
Validate against DB → wren dry-run --sql "..."
Schema context → wren memory fetch -q "..."
Filter by type/model → wren memory fetch -q "..." --type T --model M --threshold 0
Store confirmed query → wren memory store --nl "..." --sql "..."
Few-shot examples → wren memory recall -q "..."
Index stats → wren memory status
Re-index after MDL change → wren memory index
```

---

## Things to avoid

- Do not guess model or column names — check context first
- Do not store queries the user has not confirmed — success != correctness
- Do not re-index before every query — once per MDL change
- Do not use database-specific syntax — write ANSI SQL
- Do not pass passwords via `--connection-info` if shell history is shared — use `--connection-file`
112 changes: 112 additions & 0 deletions cli-skills/wren-usage/references/memory.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Wren Memory — When to index, context, store, and recall

This reference covers the decision logic for each memory command. The main workflow is in the parent SKILL.md.

---

## Schema context: `fetch` and `describe`

| Command | When to use |
|---------|-------------|
| `wren memory fetch -q "..."` | Default. Auto-selects full text (small schema) or embedding search (large schema) based on a 30K-char threshold. |
| `wren memory fetch -q "..." --type T --model M` | When you need filtering (forces search strategy on large schemas). |
| `wren memory describe` | When you want the full schema text and know it is small. |

The hybrid strategy works like this:
- Below 30K characters (~8K tokens): returns the entire schema as structured plain text — the LLM sees complete model-to-column relationships, join paths, and primary keys
- Above 30K characters: returns embedding search results — only the most relevant fragments

CJK-heavy schemas switch to search sooner (~1.5 chars per token vs 4 for English), which is the safe direction.

Override with `--threshold`:
```bash
wren memory fetch -q "revenue" --threshold 50000 # raise for larger context windows
```

---

## Indexing: `wren memory index`

**When to index:**
- After deploying a new or updated MDL
- When `wren memory status` shows `schema_items: 0 rows`
- When `wren memory fetch` returns stale results (references deleted models)

**When NOT to index:**
- Before every query — indexing is expensive, do it once per MDL change
- When only using `describe` or `fetch` with full strategy — those read the MDL directly

```bash
wren memory index --mdl ~/.wren/mdl.json
```

---

## Storing queries: `wren memory store`

**Store when ALL of these are true:**
1. The SQL query executed successfully
2. The user confirmed the result is correct, OR continued working with it (follow-up question, exported data, etc.)
3. There is a clear natural language question that the SQL answers

**Do NOT store when:**
- The query failed or returned an error
- The user said the result is wrong or asked to fix it
- The query is exploratory / throwaway (`SELECT * FROM orders LIMIT 5`)
- There is no natural language question — just raw SQL
- The user explicitly asked not to store it

```bash
wren memory store \
--nl "top 5 customers by revenue last quarter" \
--sql "SELECT c_name, SUM(o_totalprice) AS revenue ..." \
--datasource postgres
```

The `--nl` value should be the user's original question, not a paraphrase.

---

## Recalling queries: `wren memory recall`

**When to recall:**
- Before writing SQL for a new question, especially complex ones
- When the user asks something similar to a past question

```bash
wren memory recall -q "monthly revenue by category" --limit 3
```

Use results as few-shot examples: adapt the SQL pattern to the current question.

---

## Full lifecycle example

```
Session start:
1. wren memory status → if schema_items is 0: wren memory index

User asks a question:
2. wren memory recall -q "<question>" --limit 3
3. wren memory fetch -q "<question>"
4. Write SQL using recalled examples + schema context
5. wren --sql "..."

After execution:
6. Show results to user
7. User confirms → wren memory store --nl "..." --sql "..."
User says wrong → fix SQL, do NOT store
User silent → do NOT store
```

---

## Housekeeping

```bash
wren memory status # path, table names, row counts
wren memory reset --force # drop everything, start fresh
```

All memory commands accept `--path DIR` to override `~/.wren/memory/`.
12 changes: 11 additions & 1 deletion wren/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@ Translate natural SQL queries through an MDL (Modeling Definition Language) sema
pip install wren-engine[mysql] # MySQL
pip install wren-engine[postgres] # PostgreSQL
pip install wren-engine[duckdb] # DuckDB (local files)
pip install wren-engine[all] # All connectors
pip install 'wren-engine[memory]' # Schema & query memory (LanceDB)
pip install 'wren-engine[all]' # All connectors + memory
```

## Quick start
Expand Down Expand Up @@ -58,6 +59,15 @@ wren --sql 'SELECT order_id FROM "orders" LIMIT 10'

For the full CLI reference and per-datasource `connection_info.json` formats, see [`docs/cli.md`](docs/cli.md) and [`docs/connections.md`](docs/connections.md).

**4. Index schema for semantic search** (optional, requires `wren-engine[memory]`):

```bash
wren memory index # index MDL schema
wren memory fetch -q "customer order price" # fetch relevant schema context
wren memory store --nl "top customers" --sql "SELECT ..." # store NL→SQL pair
wren memory recall -q "best customers" # retrieve similar past queries
```

---

## Python SDK
Expand Down
117 changes: 117 additions & 0 deletions wren/docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,3 +62,120 @@ Or pass connection info inline:
wren --sql 'SELECT COUNT(*) FROM "orders"' \
--connection-info '{"datasource":"mysql","host":"localhost","port":3306,"database":"mydb","user":"root","password":"secret"}'
```

---

## `wren memory` — Schema & Query Memory

LanceDB-backed semantic memory for MDL schema search and NL→SQL retrieval. Requires the `memory` extra:

```bash
pip install 'wren-engine[memory]'
```

All `memory` subcommands accept `--path DIR` to override the default storage location (`~/.wren/memory/`).

### Hybrid strategy: full text vs. embedding search

When providing schema context to an LLM, there is a trade-off:

- **Small schemas** — the full plain-text description fits easily in the LLM context window and gives better results because the LLM sees the complete structure (model→column relationships, join paths, primary keys) rather than isolated fragments from a vector search.
- **Large schemas** — the full text exceeds what is practical to send in a single prompt, so embedding search is needed to retrieve only the relevant fragments.

`wren memory fetch` automatically picks the right strategy based on the **character length** of the generated plain-text description:

| Schema size | Threshold | Strategy |
|---|---|---|
| Below 30,000 chars (~8K tokens) | Default | Returns full plain text |
| Above 30,000 chars | Default | Returns embedding search results |

The threshold is measured in characters (not tokens) because character length is free to compute, while accurate token counting requires a tokeniser. The 4:1 chars-to-tokens ratio holds for English; CJK text compresses less (~1.5:1), so a CJK-heavy schema switches to embedding search sooner — which is the conservative direction.

The default threshold (30,000 chars) can be overridden with `--threshold`.

### `wren memory index`

Parse the MDL manifest and index all schema items (models, columns, relationships, views) into LanceDB with local embeddings.

```bash
wren memory index # uses ~/.wren/mdl.json
wren memory index --mdl /path/to/mdl.json # explicit MDL file
```

### `wren memory describe`

Print the full schema as structured plain text. No embedding or LanceDB required — this is a pure transformation of the MDL manifest into a human/LLM-readable format.

```bash
wren memory describe # uses ~/.wren/mdl.json
wren memory describe --mdl /path/to/mdl.json
```

### `wren memory fetch`

Get schema context for an LLM. Automatically chooses the best strategy based on schema size: full plain text for small schemas, embedding search for large schemas.

When using the search strategy, optional `--type` and `--model` filters narrow the results.

```bash
wren memory fetch -q "customer order price"
wren memory fetch -q "revenue" --type column --model orders
wren memory fetch -q "日期" --threshold 50000 --output json
```

| Flag | Description |
|------|-------------|
| `-q, --query` | Search query (required) |
| `--mdl` | Path to MDL JSON file |
| `-l, --limit` | Max results for search strategy (default: 5) |
| `-t, --type` | Filter: `model`, `column`, `relationship`, `view` (search strategy only) |
| `--model` | Filter by model name (search strategy only) |
| `--threshold` | Character threshold for full vs search (default: 30,000) |
| `-o, --output` | Output format: `table` (default), `json` |

### `wren memory store`

Store a natural-language-to-SQL pair for future few-shot retrieval.

```bash
wren memory store \
--nl "show top customers by revenue" \
--sql "SELECT c_name, sum(o_totalprice) FROM orders JOIN customer GROUP BY 1 ORDER BY 2 DESC" \
--datasource postgres
```

### `wren memory recall`

Search stored NL→SQL pairs by semantic similarity to a query.

```bash
wren memory recall -q "best customers"
wren memory recall -q "月度營收" --datasource mysql --limit 5 --output json
```

| Flag | Description |
|------|-------------|
| `-q, --query` | Search query (required) |
| `-l, --limit` | Max results (default: 3) |
| `-d, --datasource` | Filter by data source |
| `-o, --output` | Output format: `table` (default), `json` |

### `wren memory status`

Show index statistics: storage path, table names, and row counts.

```bash
wren memory status
# Path: /Users/you/.wren/memory
# schema_items: 47 rows
# query_history: 12 rows
```

### `wren memory reset`

Drop all memory tables and start fresh.

```bash
wren memory reset # prompts for confirmation
wren memory reset --force # skip confirmation
```
Loading
Loading