diff --git a/cli-skills/wren-generate-mdl/SKILL.md b/cli-skills/wren-generate-mdl/SKILL.md deleted file mode 100644 index 32fff3173..000000000 --- a/cli-skills/wren-generate-mdl/SKILL.md +++ /dev/null @@ -1,306 +0,0 @@ ---- -name: wren-generate-mdl -description: "Generate a Wren MDL project by exploring a database with available tools (SQLAlchemy, database drivers, MCP connectors, or raw SQL). Guides agents through schema discovery, type normalization, and MDL YAML generation using the wren CLI. Use when: user wants to create or set up a new MDL, onboard a new data source, or scaffold a project from an existing database." -license: Apache-2.0 -metadata: - author: wren-engine - version: "1.0" ---- - -# Generate Wren MDL — CLI Agent Workflow - -Builds an MDL project by discovering database schema and converting it -into Wren's YAML project format. The agent uses whatever database tools -are available in its environment for introspection; the wren CLI handles -type normalization, validation, and build. - -For memory and query workflows after setup, see the **wren-usage** skill. - ---- - -## Prerequisites - -- `wren` CLI installed (`pip install wren-engine[]`) -- A working database connection (credentials available to the agent) -- A wren profile configured (`wren profile add`) or connection info ready - ---- - -## Phase 1 — Establish connection and scope - -**Goal:** Confirm the agent can reach the database and agree on scope with the user. - -1. Verify connectivity using whichever tool is available: - - If SQLAlchemy: `engine.connect()` test - - If database driver: simple query like `SELECT 1` - - If wren profile exists: `wren profile debug` to check config - - If raw SQL via wren: `wren --sql "SELECT 1"` (requires profile or connection file) - -2. Ask the user: - - Which **schema(s)** or **dataset(s)** to include (skip if only one exists) - - Whether to include **all tables** or a subset - - The **datasource type** for wren (e.g., `postgres`, `bigquery`, `snowflake`) — needed for type normalization dialect - ---- - -## Phase 2 — Discover schema - -**Goal:** Collect table names, column names, column types, and constraints. - -Use whatever introspection method is available. Here are common approaches -ranked by convenience: - -### Option A: SQLAlchemy (recommended if available) - -```python -from sqlalchemy import create_engine, inspect - -engine = create_engine(connection_url) -inspector = inspect(engine) - -tables = inspector.get_table_names(schema="public") - -for table in tables: - columns = inspector.get_columns(table, schema="public") - # columns → [{"name": "id", "type": INTEGER(), "nullable": False, ...}] - - pk = inspector.get_pk_constraint(table, schema="public") - # pk → {"constrained_columns": ["id"], "name": "orders_pkey"} - - fks = inspector.get_foreign_keys(table, schema="public") - # fks → [{"constrained_columns": ["customer_id"], - # "referred_table": "customers", - # "referred_columns": ["id"]}] -``` - -### Option B: Database-specific driver - -- **psycopg / asyncpg (Postgres):** Query `information_schema.columns` and `information_schema.table_constraints` -- **google-cloud-bigquery:** `client.list_tables()`, `client.get_table()` → `table.schema` -- **snowflake-connector-python:** `SHOW COLUMNS IN TABLE`, `SHOW PRIMARY KEYS IN TABLE` -- **clickhouse-driver:** `DESCRIBE TABLE`, `system.tables` - -### Option C: Raw SQL via wren - -If no driver is available but a wren profile is configured, query -`information_schema` through wren itself: - -```bash -wren --sql "SELECT table_name FROM information_schema.tables WHERE table_schema = 'public'" -o json -wren --sql "SELECT column_name, data_type FROM information_schema.columns WHERE table_name = 'orders'" -o json -``` - -Note: this goes through the MDL layer, so it only works if you already -have a minimal MDL or if the database supports `information_schema` as -regular tables. For bootstrapping from zero, Option A or B is preferred. - ---- - -## Phase 3 — Normalize types - -**Goal:** Convert raw database types to wren-core-compatible types. - -### Python import (recommended for batch processing) - -```python -from wren.type_mapping import parse_type, parse_types - -# Single type -normalized = parse_type("character varying(255)", "postgres") # → "VARCHAR(255)" - -# Batch — entire table at once -columns = [ - {"column": "id", "raw_type": "int8"}, - {"column": "name", "raw_type": "character varying"}, - {"column": "total", "raw_type": "numeric(10,2)"}, -] -normalized_cols = parse_types(columns, dialect="postgres") -# Each dict now has a "type" key with the normalized value -``` - -### CLI (if Python import not available) - -Single type: -```bash -wren utils parse-type --type "character varying(255)" --dialect postgres -# → VARCHAR(255) -``` - -Batch (stdin JSON): -```bash -echo '[{"column":"id","raw_type":"int8"},{"column":"name","raw_type":"character varying"}]' \ - | wren utils parse-types --dialect postgres -``` - ---- - -## Phase 4 — Scaffold and write MDL project - -**Goal:** Create the YAML project structure. - -### Step 1 — Initialize project - -```bash -wren context init --path /path/to/project -``` - -This creates: -```text -project/ -├── wren_project.yml -├── models/ -├── views/ -├── relationships.yml -└── instructions.md -``` - -> **IMPORTANT: `catalog` and `schema` in `wren_project.yml`** -> -> These are Wren Engine's internal namespace — they are NOT the database's -> native catalog or schema. Keep the defaults (`catalog: wren`, `schema: public`) -> unless you are intentionally configuring a multi-project namespace. -> -> Your database's actual catalog/schema is specified per-model in `table_reference` -> (see Step 2). Do not copy database catalog/schema values into `wren_project.yml`. - -### Step 2 — Write model files - -For each table, create a YAML file under `models/`. Use snake_case -naming (the build step converts to camelCase automatically). - -```yaml -# models/orders/metadata.yml -name: orders -table_reference: - catalog: "" # database catalog (empty string if not applicable) - schema: public # database schema (this IS the DB schema) - table: orders # database table name -primary_key: order_id -columns: - - name: order_id - type: INTEGER - not_null: true - - name: customer_id - type: INTEGER - - name: total - type: "DECIMAL(10, 2)" - - name: status - type: VARCHAR - properties: - description: "Order status: pending, shipped, delivered, cancelled" -``` - -### Step 3 — Write relationships - -From foreign key constraints discovered in Phase 2: - -```yaml -# relationships.yml -- name: orders_customers - models: - - orders - - customers - join_type: many_to_one - condition: "orders.customer_id = customers.customer_id" -``` - -Join type mapping: -- FK table → PK table: `many_to_one` -- PK table → FK table: `one_to_many` -- Unique FK: `one_to_one` -- Junction table: `many_to_many` - -If no foreign keys were found, infer from naming conventions: -- Column `_id` or `_id` → likely FK to `
` -- Ask the user to confirm inferred relationships - -### Step 4 — Add descriptions (optional but valuable) - -Ask the user to describe: -- Each model (1-2 sentences about what the table represents) -- Key columns (especially calculated fields or non-obvious names) - -These descriptions are indexed by `wren memory index` and significantly -improve LLM query accuracy. - ---- - -## Phase 5 — Validate and build - -```bash -# Validate YAML structure and integrity -wren context validate --path /path/to/project - -# If strict mode is desired: -wren context validate --path /path/to/project --strict - -# Build JSON manifest -wren context build --path /path/to/project - -# Verify against database -wren --sql "SELECT * FROM LIMIT 1" -``` - -If validation fails, fix the reported issues and re-run. Common errors: -- Duplicate model/column names -- Missing primary key -- Relationship referencing non-existent model -- Invalid column type (try re-running through `parse_type`) - ---- - -## Phase 6 — Initialize memory - -```bash -# Index schema (generates seed NL-SQL examples automatically) -wren memory index - -# Verify -wren memory status -``` - -After this step, `wren memory fetch` and `wren memory recall` are -operational. See the **wren-usage** skill for query workflows. - ---- - -## Phase 7 — Iterate with the user - -The initial MDL is a starting point. Improve it by: -- Adding calculated columns based on business logic -- Adding views for common query patterns -- Refining descriptions based on actual query usage -- Adding access control (RLAC/CLAC) if needed - -Each change follows: edit YAML → `wren context validate` → -`wren context build` → `wren memory index`. - ---- - -## Quick reference - -| Task | Command / Method | -|------|-----------------| -| Discover tables | Agent's own tools (SQLAlchemy, driver, raw SQL) | -| Discover columns + types | Agent's own tools | -| Discover constraints | Agent's own tools | -| Normalize types (Python) | `from wren.type_mapping import parse_type` | -| Normalize types (CLI) | `wren utils parse-type --type T --dialect D` | -| Normalize types (batch) | `wren utils parse-types --dialect D < columns.json` | -| Scaffold project | `wren context init` | -| Write models | Create `models//metadata.yml` | -| Write relationships | Edit `relationships.yml` | -| Validate | `wren context validate` | -| Build manifest | `wren context build` | -| Test query | `wren --sql "SELECT * FROM LIMIT 1"` | -| Index memory | `wren memory index` | - ---- - -## Things to avoid - -- Do not hardcode database-specific type strings in MDL — always normalize via `parse_type` -- Do not skip validation before build — invalid YAML produces broken manifests silently -- Do not guess column types — introspect from the actual database -- Do not write relationships without confirming join conditions — wrong conditions cause silent query errors -- Do not skip `wren memory index` after build — stale indexes degrade recall quality diff --git a/cli-skills/wren-usage/SKILL.md b/cli-skills/wren-usage/SKILL.md deleted file mode 100644 index bb86997cb..000000000 --- a/cli-skills/wren-usage/SKILL.md +++ /dev/null @@ -1,246 +0,0 @@ ---- -name: wren-usage -description: "Wren Engine CLI workflow guide for AI agents. Answer data questions end-to-end using the wren CLI: gather schema context, recall past queries, write SQL through the MDL semantic layer, execute, and learn from confirmed results. Use when: agent needs to query data, connect a data source, handle errors, or manage MDL changes via the wren CLI." -license: Apache-2.0 -metadata: - author: wren-engine - version: "1.0" ---- - -# Wren Engine CLI — Agent Workflow Guide - -The `wren` CLI queries databases through an MDL (Model Definition Language) semantic layer. You write SQL against model names, not raw tables. The engine translates to the target dialect. - -Two files drive everything (auto-discovered from `~/.wren/`): -- `mdl.json` — the semantic model -- `connection_info.json` — database credentials + `datasource` field (e.g. `"datasource": "postgres"`) - -The data source is always read from `connection_info.json`. There is no `--datasource` flag on execution commands (`query`, `dry-run`, `validate`). Only `dry-plan` accepts `--datasource` / `-d` as an override (for transpile-only use without a connection file). - -For memory-specific decisions, see [references/memory.md](references/memory.md). -For SQL syntax, CTE-based modeling, and error diagnosis, see [references/wren-sql.md](references/wren-sql.md). - ---- - -## Workflow 1: Answering a data question - -### Step 1 — Gather context - -| Situation | Command | -|-----------|---------| -| Default | `wren memory fetch -q ""` | -| Need specific model's columns | `wren memory fetch -q "..." --model --threshold 0` | -| Memory not installed | Read `target/mdl.json` in the project directory directly | - -If this is the first query in the conversation, also run: - - wren context instructions - -If it returns content, treat it as **rules that override defaults** — apply them to all subsequent queries in this session. - -### Step 2 — Recall past queries - -```bash -wren memory recall -q "" --limit 3 -``` - -Use results as few-shot examples. Skip if empty. - -### Step 2.5 — Assess complexity (before writing SQL) - -If the question involves **any** of the following, consider decomposing: -- Multiple metrics or aggregations (e.g., "churn rate AND expansion revenue") -- Multi-step calculations (e.g., "month-over-month growth rate") -- Comparisons across segments (e.g., "by plan tier, by region") -- Time-series analysis requiring baseline + change (e.g., "retention curve") - -**Decomposition strategy:** -1. Identify the sub-questions (e.g., "total subscribers at start" + "subscribers who cancelled" → churn rate) -2. For each sub-question: - - `wren memory recall -q ""` — check if a similar pattern exists - - Write and execute a simple SQL - - Note the result -3. Combine sub-results to answer the original question - -**When NOT to decompose:** -- Single-table aggregation with GROUP BY — just write the SQL -- Simple JOINs that the MDL relationships already define -- Questions where `memory recall` returns a near-exact match - -This is a judgment call, not a rigid rule. If you're confident in a single -query, go ahead. Decompose when the SQL would be hard to debug if it fails. - -### Step 3 — Write, verify, and execute SQL - -**For simple queries** (single table or simple MDL-defined JOINs, straightforward aggregation): -Execute directly: -```bash -wren --sql 'SELECT c_name, SUM(o_totalprice) FROM orders -JOIN customer ON orders.o_custkey = customer.c_custkey -GROUP BY 1 ORDER BY 2 DESC LIMIT 5' -``` - -**For complex queries** (non-trivial JOINs not covered by MDL relationships, subqueries, multi-step logic): -Verify first with dry-plan: -```bash -wren dry-plan --sql 'SELECT ...' -``` - -Check the expanded SQL output: -- Are the correct models and columns referenced? -- Do the JOINs match expected relationships? -- Are CTEs expanded correctly? - -If the expanded SQL looks wrong, fix before executing. -If it looks correct, proceed: -```bash -wren --sql 'SELECT ...' -``` - -**SQL rules:** -- Target MDL model names, not database tables -- Write dialect-neutral SQL — the engine translates - -### Step 4 — Store and continue - -After successful execution, **store the query by default**: - -```bash -wren memory store --nl "" --sql "" -``` - -**Skip storing only when:** -- The query failed or returned an error -- The user said the result is wrong -- The query is exploratory (`SELECT * ... LIMIT N` without analytical clauses) -- There is no natural language question — just raw SQL -- The user explicitly asked not to store - -The CLI auto-detects exploratory queries — if you see no store hint -after execution, the query was classified as exploratory. - -| Outcome | Action | -|---------|--------| -| User confirms correct | Store | -| User continues with follow-up | Store, then handle follow-up | -| User says nothing (but question had clear NL description) | Store | -| User says wrong | Do NOT store — fix the SQL | -| Query error | See Error recovery below | - ---- - -## Workflow 2: Error recovery - -### "table not found" - -1. Verify model name: `wren memory fetch -q "" --type model --threshold 0` -2. Check MDL exists: `ls ~/.wren/mdl.json` -3. Verify column: `wren memory fetch -q "" --model --threshold 0` - -### Connection error - -1. Check: `cat ~/.wren/connection_info.json` -2. Verify the `datasource` field is present and valid -3. Test: `wren --sql "SELECT 1"` -4. Valid datasource values: `postgres`, `mysql`, `bigquery`, `snowflake`, `clickhouse`, `trino`, `mssql`, `databricks`, `redshift`, `spark`, `athena`, `oracle`, `duckdb` -5. Both flat format (`{"datasource": ..., "host": ...}`) and MCP envelope format (`{"datasource": ..., "properties": {...}}`) are accepted - -### SQL syntax / planning error (enhanced) - -#### Layer 1: Identify the failure point - -```bash -wren dry-plan --sql "" -``` - -| dry-plan result | Failure layer | Next step | -|-----------------|---------------|-----------| -| dry-plan fails | MDL / semantic | → Layer 2A | -| dry-plan succeeds, execution fails | DB / dialect | → Layer 2B | - -#### Layer 2A: MDL-level diagnosis (dry-plan failed) - -The dry-plan error message tells you exactly what's wrong: - -| Error pattern | Diagnosis | Fix | -|---------------|-----------|-----| -| `column 'X' not found in model 'Y'` | Wrong column name | `wren memory fetch -q "X" --model Y --threshold 0` to find correct name | -| `model 'X' not found` | Wrong model name | `wren memory fetch -q "X" --type model --threshold 0` | -| `ambiguous column 'X'` | Column exists in multiple models | Qualify with model name: `ModelName.column` | -| Planning error with JOIN | Relationship not defined in MDL | Check available relationships in context | - -**Key principle**: Fix ONE issue at a time. Re-run dry-plan after each fix -to see if new errors surface. - -#### Layer 2B: DB-level diagnosis (dry-plan OK, execution failed) - -The DB error + dry-plan output together pinpoint the issue: - -1. Read the dry-plan expanded SQL — this is what actually runs on the DB -2. Compare with the DB error message: - -| Error pattern | Diagnosis | Fix | -|---------------|-----------|-----| -| Type mismatch | Column type differs from assumed | Check column type in context, add explicit CAST | -| Function not supported | Dialect-specific function | Use dialect-neutral alternative | -| Permission denied | Table/schema access | Check connection credentials | -| Timeout | Query too expensive | Simplify: reduce JOINs, add filters, LIMIT | - -**For small models**: If the error message is unclear, try simplifying -the query to the smallest failing fragment. Execute subqueries independently -to isolate which part fails. - -For the CTE rewrite pipeline and additional error patterns, see [references/wren-sql.md](references/wren-sql.md). - ---- - -## Workflow 3: Connecting a new data source - -1. Create `~/.wren/connection_info.json` — see [wren/docs/connections.md](../../wren/docs/connections.md) for per-connector formats -2. Test: `wren --sql "SELECT 1"` -3. Place or create `~/.wren/mdl.json` -4. Index: `wren memory index` -5. Verify: `wren --sql "SELECT * FROM LIMIT 5"` - ---- - -## Workflow 4: After MDL changes - -When the MDL is updated, downstream state goes stale: - -```bash -# 1. Deploy updated MDL -cp updated-mdl.json ~/.wren/mdl.json - -# 2. Re-index schema memory -wren memory index - -# 3. Verify -wren --sql "SELECT * FROM LIMIT 1" -``` - ---- - -## Command decision tree - -``` -Get data back → wren --sql "..." -See translated SQL only → wren dry-plan --sql "..." (accepts -d if no connection file) -Validate against DB → wren dry-run --sql "..." -Schema context → wren memory fetch -q "..." -Filter by type/model → wren memory fetch -q "..." --type T --model M --threshold 0 -Store confirmed query → wren memory store --nl "..." --sql "..." -Few-shot examples → wren memory recall -q "..." -Index stats → wren memory status -Re-index after MDL change → wren memory index -``` - ---- - -## Things to avoid - -- Do not guess model or column names — check context first -- Do not store failed queries or queries the user said are wrong -- Do not skip storing successful queries with a clear NL question — default is to store -- Do not re-index before every query — once per MDL change -- Do not pass passwords via `--connection-info` if shell history is shared — use `--connection-file` diff --git a/docs/README.md b/docs/README.md index e69de29bb..23b8fe716 100644 --- a/docs/README.md +++ b/docs/README.md @@ -0,0 +1,17 @@ +# Wren Engine Documentation + +Wren Engine is an open-source semantic engine for AI agents and MCP clients. It translates SQL queries through MDL (Model Definition Language) and executes them against 22+ data sources. + +## Getting Started + +- [Quick Start](quickstart.md) -- Set up a local semantic layer with the jaffle_shop dataset using the Wren CLI and Claude Code. (~15 minutes) + +## Core Concepts + +- [Wren Project](wren_project.md) -- Project structure, YAML authoring, and how the CLI compiles models into a deployable manifest. + +### MDL Reference + +- [Model](mdl/model.md) -- Define semantic entities over physical tables or SQL expressions. +- [Relationship](mdl/relationship.md) -- Declare join paths between models for automatic resolution. +- [View](mdl/view.md) -- Named SQL queries that behave as virtual tables. diff --git a/docs/getting_started_with_claude_code.md b/docs/getting_started_with_claude_code.md deleted file mode 100644 index ff6532cff..000000000 --- a/docs/getting_started_with_claude_code.md +++ /dev/null @@ -1,326 +0,0 @@ -# Getting Started with Claude Code - -This guide walks you through setting up Wren Engine end-to-end using **Claude Code** — from connecting your database to running your first natural-language query. Claude Code's AI agent skills automate the tedious parts: schema introspection, MDL generation, project scaffolding, and Docker setup. - -**What you'll end up with:** - -- A running Wren Engine container (wren-ibis-server + MCP server) -- An MDL manifest generated from your real database schema -- The Wren MCP server registered in Claude Code so you can query your data in natural language - ---- - -## Prerequisites - -- [Claude Code](https://claude.ai/code) installed and authenticated -- [Docker Desktop](https://www.docker.com/products/docker-desktop/) (or Docker Engine) running -- A supported database (PostgreSQL, MySQL, BigQuery, Snowflake, ClickHouse, DuckDB, and more) - ---- - -## Step 1 — Install Wren skills - -Wren Engine provides Claude Code **skills** — reusable AI agent workflows for connecting databases, generating MDL, and managing the MCP server. - -Install all Wren skills with one command: - -```bash -npx skills add Canner/wren-engine --skill '*' --agent claude-code -``` - -Or use the install script directly: - -```bash -curl -fsSL https://raw.githubusercontent.com/Canner/wren-engine/main/skills/install.sh | bash -``` - -This installs the following skills into `~/.claude/skills/`: - -| Skill | Purpose | -|-------|---------| -| `wren-quickstart` | End-to-end guided setup | -| `wren-connection-info` | Connection field reference per data source | -| `wren-generate-mdl` | Generate MDL from a live database | -| `wren-project` | Save and build MDL as YAML files | -| `wren-mcp-setup` | Start the Docker container and register MCP | -| `wren-usage` | Day-to-day usage guide | -| `wren-sql` | Write and debug SQL queries | - -After installation, **start a new Claude Code session** so the skills are loaded. - ---- - -## Step 2 — Run the quickstart - -In Claude Code, run: - -``` -/wren-quickstart -``` - -This skill guides you through the full setup in four phases. You can also follow the phases manually below. - ---- - -## Manual setup - -If you prefer to run each step yourself, follow these phases in order. - -### Phase 1 — Create a workspace - -Create a directory on your host machine. This directory will be mounted into the Docker container so it can read and write MDL files. - -```bash -mkdir -p ${PWD}/wren-workspace -``` - -The completed workspace will look like: - -``` -${PWD}/wren-workspace/ -├── wren_project.yml -├── models/ -│ └── *.yml -├── relationships.yml -├── views.yml -└── target/ - └── mdl.json # Compiled MDL — loaded by the container -``` - -> **Connection info** is configured via the MCP server Web UI (`http://localhost:9001`) — it is not stored in the workspace. - -### Phase 2 — Start the Docker container - -#### Check for a newer image - -Before starting the container, pull the latest image to make sure you have the most recent version. - -**First time (image not yet pulled):** - -```bash -docker pull ghcr.io/canner/wren-engine-ibis:latest -``` - -**Already have the image locally?** Compare digests to detect updates: - -```bash -# Save current local digest -LOCAL_DIGEST=$(docker inspect --format='{{index .RepoDigests 0}}' ghcr.io/canner/wren-engine-ibis:latest 2>/dev/null || echo "") - -# Pull from registry (downloads only if remote digest differs) -docker pull ghcr.io/canner/wren-engine-ibis:latest - -# Compare digests -NEW_DIGEST=$(docker inspect --format='{{index .RepoDigests 0}}' ghcr.io/canner/wren-engine-ibis:latest 2>/dev/null || echo "") - -if [ "$LOCAL_DIGEST" != "$NEW_DIGEST" ]; then - echo "New image pulled — container will use the updated version." -else - echo "Already up to date." -fi -``` - -> If a `wren-mcp` container is already running and a new image was pulled, stop and remove it first: -> ```bash -> docker rm -f wren-mcp -> ``` - -#### Run the container - -Start the Wren Engine container, mounting your workspace: - -```bash -docker run -d \ - --name wren-mcp \ - -p 8000:8000 \ - -p 9000:9000 \ - -p 9001:9001 \ - -e ENABLE_MCP_SERVER=true \ - -e MCP_TRANSPORT=streamable-http \ - -e MCP_HOST=0.0.0.0 \ - -e MCP_PORT=9000 \ - -e WREN_URL=localhost:8000 \ - -e MDL_PATH=/workspace/target/mdl.json \ - -v ~/wren-workspace:/workspace \ - ghcr.io/canner/wren-engine-ibis:latest -``` - -Three services start inside the container: - -| Service | Port | Purpose | -|---------|------|---------| -| wren-ibis-server | 8000 | REST API for query execution and metadata | -| MCP server | 9000 | MCP endpoint for AI clients | -| Web UI | 9001 | Configuration UI (connection info, MDL editor, read-only mode) | - -Verify it is running: - -```bash -docker ps --filter name=wren-mcp -docker logs wren-mcp -``` - -> **Database on localhost?** If your database runs on your host machine, replace `localhost` / `127.0.0.1` with `host.docker.internal` in your connection settings — the container cannot reach the host's `localhost` directly. - -### Phase 3 — Configure connection and register MCP server - -Configure connection info in the Web UI at `http://localhost:9001` — select the data source type and enter credentials. Use `/wren-connection-info` in Claude Code for field reference per data source. - -> **Connection info can only be configured through the Web UI.** Do not attempt to set it programmatically. - -Then register the MCP server with Claude Code: - -```bash -claude mcp add --transport http wren http://localhost:9000/mcp -``` - -Verify it was added: - -```bash -claude mcp list -``` - -**Start a new Claude Code session** — MCP servers are only loaded at session start. - -### Phase 4 — Generate the MDL - -In the new session, run: - -``` -/wren-generate-mdl -``` - -The skill will: - -1. Run `health_check()` to verify the connection is working -2. Ask for your data source type (PostgreSQL, BigQuery, Snowflake, etc.) and optional schema filter -3. Call `list_remote_tables()` and `list_remote_constraints()` via MCP tools to introspect your database schema -4. Build the MDL JSON (models, columns, relationships) -5. Validate the manifest with `deploy_manifest()` + `dry_run()` - -> **Prerequisite:** The MCP server must be registered and a new session started (Phase 3). The `/wren-generate-mdl` skill uses MCP tools — do not call ibis-server API directly. - -Then save the MDL as a versioned YAML project: - -```text -/wren-project -``` - -This writes human-readable YAML files to your workspace and compiles `target/mdl.json`. - ---- - -## Verify and start querying - -In the new session, run a health check: - -``` -Use health_check() to verify Wren Engine is reachable. -``` - -Expected: `SELECT 1` returns successfully. - -Then start querying your data in natural language: - -``` -How many orders were placed last month? -``` - -``` -Show me the top 10 customers by revenue. -``` - ---- - -## Day-to-day usage - -Once set up, use `/wren-usage` in Claude Code for ongoing tasks: - -| Task | Skill | -|------|-------| -| Write or debug SQL | `/wren-sql` | -| Look up connection field reference | `/wren-connection-info` | -| Reconfigure connection via Web UI | `http://localhost:9001` | -| Add a model or column to the MDL | `/wren-project` | -| Regenerate MDL after schema changes | `/wren-generate-mdl` | -| Restart or reconfigure the MCP server | `/wren-mcp-setup` | - -### MCP server quick reference - -```bash -docker ps --filter name=wren-mcp # check status -docker logs wren-mcp # view logs -docker restart wren-mcp # restart -``` - -### MCP tool reference - -| Tool | Purpose | -|------|---------| -| `health_check()` | Verify Wren Engine is reachable | -| `query(sql=...)` | Execute SQL against the deployed MDL | -| `deploy(mdl_file_path=...)` | Load a compiled `mdl.json` | -| `list_remote_tables(...)` | Introspect database schema | - -> **Note:** Connection info is configured exclusively through the Web UI at `http://localhost:9001` — there is no MCP tool for setting credentials. - ---- - -## Troubleshooting - -**MCP tools not available after registration:** -Start a new Claude Code session. MCP servers are only loaded at session start. - -**Container not finding the MDL at startup:** -Confirm `~/wren-workspace/target/mdl.json` exists before starting the container. Check logs with `docker logs wren-mcp`. - -**Database connection refused inside Docker:** -Change `localhost` / `127.0.0.1` to `host.docker.internal` in your connection credentials. - -**MCP tools fail with "Session not found" after container restart:** -Start a new Claude Code session. Container restarts invalidate MCP sessions — the client must reconnect. - -**`wren-generate-mdl` fails because wren-ibis-server is not running:** -Start the container first (Phase 2), then run `/wren-generate-mdl`. wren-ibis-server is available at `http://localhost:8000` once the container is up. - -**Skill not found after installation:** -Start a new Claude Code session after installing skills — they are loaded at session start. - -For more detailed troubleshooting, invoke `/wren-mcp-setup` in Claude Code. - ---- - -## Updating skills - -Each skill checks for updates automatically and notifies you when a newer version is available. To force-update all skills: - -```bash -npx skills add Canner/wren-engine --skill '*' --agent claude-code -# or: -curl -fsSL https://raw.githubusercontent.com/Canner/wren-engine/main/skills/install.sh | bash -s -- --force -``` - -To update a single skill: - -```bash -npx skills add Canner/wren-engine --skill wren-generate-mdl --agent claude-code -# or: -curl -fsSL https://raw.githubusercontent.com/Canner/wren-engine/main/skills/install.sh | bash -s -- --force wren-generate-mdl -``` - ---- - -## Locking down with read-only mode - -Once you have confirmed that queries are returning correct results and the MDL is working as expected, enable **read-only mode** in the Web UI: - -1. Open `http://localhost:9001` -2. Toggle **Read-Only Mode** to on - -When read-only mode is enabled: - -- The AI agent can **query data** and **read metadata** through the deployed MDL as usual -- The AI agent **cannot** modify connection info, change the data source, or call `list_remote_tables()` / `list_remote_constraints()` to introspect the database directly -- This limits the agent to operating within the boundaries of the MDL you have defined, preventing it from accessing tables or schemas you have not explicitly modeled - -We recommend enabling read-only mode for day-to-day use. Turn it off temporarily when you need to regenerate the MDL or change connection settings. diff --git a/docs/mdl/model.md b/docs/mdl/model.md index ad185554f..22a18ab01 100644 --- a/docs/mdl/model.md +++ b/docs/mdl/model.md @@ -7,12 +7,15 @@ A **Model** is the core building block of Wren MDL. It maps a physical table (or Every model requires three things: 1. A **name** — the identifier used in queries (`SELECT * FROM customers`) -2. A **data source** — where the data lives (`table_reference`, `ref_sql`, or `base_object`) +2. A **data source** — where the data lives (`table_reference` or `ref_sql`) 3. **Columns** — the fields that are exposed ### YAML format (wren project) +Each model lives in its own directory under `models/` as `models//metadata.yml`. + ```yaml +# models/customers/metadata.yml name: customers table_reference: catalog: jaffle_shop @@ -22,15 +25,32 @@ primary_key: customer_id columns: - name: customer_id type: INTEGER + is_calculated: false + not_null: true is_primary_key: true + properties: {} - name: first_name type: VARCHAR + is_calculated: false + not_null: false + properties: {} - name: last_name type: VARCHAR + is_calculated: false + not_null: false + properties: {} - name: number_of_orders type: BIGINT + is_calculated: false + not_null: false + properties: {} - name: customer_lifetime_value type: DOUBLE + is_calculated: false + not_null: false + properties: {} +cached: false +properties: {} ``` ### JSON format (MDL manifest) @@ -59,14 +79,16 @@ columns: | Field | Required | Description | |-------|----------|-------------| | `name` | Yes | Unique identifier used in SQL queries | -| `table_reference` | One of three | Points to an existing physical table (`catalog.schema.table`) | -| `ref_sql` | One of three | A SQL SELECT statement used as the model's data source | -| `base_object` | One of three | References another model or view as the base | +| `table_reference` | One of two | Points to an existing physical table (`catalog.schema.table`) | +| `ref_sql` | One of two | A SQL SELECT statement used as the model's data source | | `columns` | Yes | List of columns to expose (see [Column Fields](#column-fields)) | | `primary_key` | No | Column name that uniquely identifies a row; required for relationships | +| `cached` | No | Whether query results for this model should be cached; `false` by default | | `properties` | No | Arbitrary key-value metadata (description, tags, etc.) | -## Data Source: Three Ways to Point at Data +## Data Source: Two Ways to Point at Data + +A model must define its source in exactly one of two ways. Using both `table_reference` and `ref_sql` in the same model is a validation error. ### 1. `table_reference` — map to a physical table @@ -84,33 +106,58 @@ table_reference: When a query like `SELECT * FROM orders` is executed, Wren rewrites it to the fully-qualified physical table. -### 2. `ref_sql` — define the model inline with SQL (not yet supported) +### 2. `ref_sql` — define the model with SQL Used when the model is derived — for example, a staging transform or a complex join that doesn't exist as a physical table. +The SQL can be inline in `metadata.yml` or in a separate `ref_sql.sql` file. The `.sql` file takes precedence if both exist. + +**Inline in metadata.yml:** + ```yaml -name: stg_orders +name: revenue_summary ref_sql: > - SELECT - id AS order_id, - user_id AS customer_id, - order_date, - status - FROM jaffle_shop.main.raw_orders + SELECT DATE_TRUNC('month', order_date) AS month, + SUM(total) AS total_revenue + FROM orders + GROUP BY 1 +columns: + - name: month + type: DATE + is_calculated: false + not_null: true + properties: {} + - name: total_revenue + type: DECIMAL + is_calculated: false + not_null: false + properties: {} ``` -### 3. `base_object` — inherit from another model (not yet supported) - -References an existing model or view by name as the base. Useful for layered modeling (raw → staging → mart). +**Separate SQL file:** ```yaml -name: active_orders -base_object: orders +# models/revenue_summary/metadata.yml +name: revenue_summary columns: - - name: order_id - type: INTEGER - - name: order_date + - name: month type: DATE + is_calculated: false + not_null: true + properties: {} + - name: total_revenue + type: DECIMAL + is_calculated: false + not_null: false + properties: {} +``` + +```sql +-- models/revenue_summary/ref_sql.sql +SELECT DATE_TRUNC('month', order_date) AS month, + SUM(total) AS total_revenue +FROM orders +GROUP BY 1 ``` ## Column Fields diff --git a/docs/mdl/view.md b/docs/mdl/view.md index e1722e1ea..74db4832e 100644 --- a/docs/mdl/view.md +++ b/docs/mdl/view.md @@ -4,22 +4,46 @@ A **View** is a named SQL query stored in the MDL. It behaves like a virtual tab ## Structure +Each view lives in its own directory under `views/` as `views//metadata.yml`. + +The `statement` SQL can be inline in `metadata.yml` or in a separate `sql.yml` file. The `sql.yml` file takes precedence if both exist. + +**Inline statement:** + +```yaml +# views/top_customers/metadata.yml +name: top_customers +statement: > + SELECT customer_id, SUM(total) AS lifetime_value + FROM wren.public.orders GROUP BY 1 ORDER BY 2 DESC LIMIT 100 +description: "Top customers by lifetime value" +properties: {} +``` + +**Separate SQL file:** + +```yaml +# views/monthly_revenue/metadata.yml +name: monthly_revenue +description: "Monthly revenue aggregation" +properties: {} +``` + ```yaml -# views.yml -views: - - name: high_value_orders - statement: > - SELECT order_id, customer_id, amount - FROM orders - WHERE amount > 100 +# views/monthly_revenue/sql.yml +statement: > + SELECT DATE_TRUNC('month', order_date) AS month, + SUM(total) AS total_revenue + FROM wren.public.orders + GROUP BY 1 ``` ### JSON format (MDL manifest) ```json { - "name": "high_value_orders", - "statement": "SELECT order_id, customer_id, amount FROM orders WHERE amount > 100" + "name": "top_customers", + "statement": "SELECT customer_id, SUM(total) AS lifetime_value FROM wren.public.orders GROUP BY 1 ORDER BY 2 DESC LIMIT 100" } ``` @@ -29,12 +53,14 @@ views: |-------|----------|-------------| | `name` | Yes | Unique identifier used in SQL queries | | `statement` | Yes | A complete SQL SELECT statement; may reference other models or views | +| `description` | No | Human-readable description of the view's purpose | +| `properties` | No | Arbitrary key-value metadata | ## Model vs View | | Model | View | |-|-------|------| -| Data source | Physical table, `ref_sql`, or `base_object` | SQL `statement` | +| Data source | `table_reference` or `ref_sql` | SQL `statement` | | Column declarations | Explicit (with types) | Inferred from `statement` | | Relationship columns | Supported | Not supported | | Calculated columns | Supported | Not supported | @@ -50,11 +76,14 @@ The jaffle_shop workspace ships with an empty `views.yml` (`views: []`), but vie ### Simple filter view ```yaml -- name: completed_orders - statement: > - SELECT order_id, customer_id, order_date, amount - FROM orders - WHERE status = 'completed' +# views/completed_orders/metadata.yml +name: completed_orders +statement: > + SELECT order_id, customer_id, order_date, amount + FROM orders + WHERE status = 'completed' +description: "Orders with completed status" +properties: {} ``` ```sql @@ -64,17 +93,20 @@ SELECT * FROM completed_orders WHERE amount > 50; ### Cross-model aggregation view ```yaml -- name: customer_order_summary - statement: > - SELECT - c.customer_id, - c.first_name, - c.last_name, - COUNT(o.order_id) AS total_orders, - SUM(o.amount) AS lifetime_value - FROM customers c - JOIN orders o ON c.customer_id = o.customer_id - GROUP BY c.customer_id, c.first_name, c.last_name +# views/customer_order_summary/metadata.yml +name: customer_order_summary +statement: > + SELECT + c.customer_id, + c.first_name, + c.last_name, + COUNT(o.order_id) AS total_orders, + SUM(o.amount) AS lifetime_value + FROM customers c + JOIN orders o ON c.customer_id = o.customer_id + GROUP BY c.customer_id, c.first_name, c.last_name +description: "Per-customer order counts and lifetime value" +properties: {} ``` The `statement` references `customers` and `orders` by their model names. The engine resolves them through the normal model pipeline after expanding the view. @@ -82,11 +114,14 @@ The `statement` references `customers` and `orders` by their model names. The en ### View referencing another view ```yaml -- name: vip_customers - statement: > - SELECT customer_id, first_name, last_name, lifetime_value - FROM customer_order_summary - WHERE lifetime_value > 500 +# views/vip_customers/metadata.yml +name: vip_customers +statement: > + SELECT customer_id, first_name, last_name, lifetime_value + FROM customer_order_summary + WHERE lifetime_value > 500 +description: "Customers with lifetime value over 500" +properties: {} ``` Views can reference other views. The engine expands all view references recursively before resolving model references. diff --git a/docs/quickstart.md b/docs/quickstart.md index f7752836a..0b4f6ee9f 100644 --- a/docs/quickstart.md +++ b/docs/quickstart.md @@ -1,29 +1,24 @@ -# Quickstart: Chat with jaffle_shop using Wren Engine + Claude Code +# Quick Start: Wren CLI with jaffle_shop -This guide gets you from zero to natural-language queries against the classic [jaffle_shop](https://github.com/dbt-labs/jaffle_shop_duckdb) dataset in about 15 minutes — no cloud database required. +Use natural-language questions against the **jaffle\_shop** dataset using **Wren Engine CLI** and **Claude Code** — no cloud database, no Docker, no MCP server. -**What you'll end up with:** - -- A local DuckDB database seeded with jaffle_shop data (customers, orders, payments, products) -- A running Wren Engine container (ibis-server + MCP server) -- An MDL manifest generated from the jaffle_shop schema -- The Wren MCP server registered in Claude Code so you can query your data in natural language +> **Time:** ~15 minutes +> +> **What you'll get:** A local semantic layer + memory system that lets an AI agent write accurate SQL by understanding your data's meaning, not just its schema. --- ## Prerequisites -| Tool | Notes | -|------|-------| -| [Claude Code](https://claude.ai/code) | Installed and authenticated | -| [Docker Desktop](https://www.docker.com/products/docker-desktop/) | Running | -| Python 3.9+ | For the dbt virtual environment in Step 1 | +- **Claude Code** — installed and authenticated ([install guide](https://docs.anthropic.com/en/docs/claude-code/overview)) +- **Python 3.9+** +- **Git** --- ## Step 1 — Seed the jaffle_shop dataset -Clone the jaffle_shop DuckDB project, set up a Python virtual environment, install dbt, and run the build to generate a local `.duckdb` file: +Clone the dbt jaffle\_shop project and build the DuckDB database: ```bash git clone https://github.com/dbt-labs/jaffle_shop_duckdb.git @@ -34,269 +29,292 @@ pip install dbt-core dbt-duckdb dbt build ``` -After `dbt build` completes, a `jaffle_shop.duckdb` file is created in the project directory. Note the absolute path — you'll need it shortly: +Verify the database file was created: ```bash -pwd # e.g. /Users/you/jaffle_shop_duckdb ls jaffle_shop.duckdb ``` -The database contains: +Note the **absolute path** to this directory — you'll need it when setting up the profile: -| Table | Description | -|-------|-------------| -| `customers` | Customer records with name and lifetime stats | -| `orders` | Orders with status, dates, and amounts | -| `order_items` | Line items per order | -| `products` | Product catalog with price and type | -| `supplies` | Supply costs per product | +```bash +pwd +# e.g. /Users/you/jaffle_shop_duckdb +``` --- -## Step 2 — Install Wren skills +## Step 2 — Install wren-engine Python package -Wren Engine provides Claude Code **skills** — AI agent workflows for connecting databases, generating MDL, and managing the MCP server. +Install `wren-engine` with the DuckDB connector, UI support, and memory system: ```bash -npx skills add Canner/wren-engine --skill '*' --agent claude-code -# or: -curl -fsSL https://raw.githubusercontent.com/Canner/wren-engine/main/skills/install.sh | bash +pip install "wren-engine[duckdb,ui,memory]" ``` -**Start a new Claude Code session** after installation — skills are loaded at session start. - ---- +> **Extras explained:** +> - `duckdb` — DuckDB connector (use `postgres`, `bigquery`, etc. for other data sources) +> - `ui` — browser-based profile configuration UI +> - `memory` — LanceDB-backed memory system for context retrieval and NL-SQL recall -## Step 3 — Create a workspace - -Create a directory to hold your MDL files. This directory is mounted into the Docker container: +Verify the installation: ```bash -mkdir -p ~/wren-workspace +wren version ``` --- -## Step 4 — Set up Wren Engine +## Step 3 — Install CLI skills -In your Claude Code session, run: +Skills are workflow guides that tell Claude Code how to use the Wren CLI effectively. Install both skills: +```bash +npx skills add Canner/wren-engine --skill '*' --agent claude-code +# or: +curl -fsSL https://raw.githubusercontent.com/Canner/wren-engine/main/skills/install.sh | bash ``` -/wren-quickstart -``` - -When prompted for connection details, provide: -| Field | Value | -|-------|-------| -| Data source type | `duckdb` | -| Database folder path | `/data` (the folder containing `jaffle_shop.duckdb`) | -| Workspace path | `~/wren-workspace` | -| jaffle_shop directory | your absolute path to `jaffle_shop_duckdb/` | +This installs two skills: -The skill handles everything: pulling the Docker image, starting the container (with the DuckDB file mounted at `/data`), configuring the connection, introspecting the schema, generating the MDL, saving the YAML project, and registering the MCP server. +| Skill | Purpose | +|-------|---------| +| **wren-usage** | Day-to-day workflow — gather context, recall past queries, write SQL, store results | +| **wren-generate-mdl** | One-time setup — explore database schema and generate the MDL project | -When it finishes, **start a new Claude Code session** and jump to [Step 5 — Start querying](#step-5--start-querying). - -> **Why use the skill?** DuckDB connection setup has several non-obvious requirements (the connection URL must point to a *directory*, not a `.duckdb` file; the catalog name is derived from the filename). The `/wren-quickstart` skill handles these details automatically. Manual setup is documented below for reference, but we strongly recommend using the skill for the first run. +> **Important:** Start a **new Claude Code session** after installing skills so they are loaded. --- -
-Manual setup (click to expand — for advanced users only) - -### Option B — Manual setup +## Step 4 — Set up a profile -Follow these steps if you prefer full control over each phase. +A profile stores your database connection info (like dbt profiles). Create one for the jaffle\_shop DuckDB database: -#### Phase 1 — Start the Wren Engine container - -Pull the latest image and start the container, mounting both your workspace and the jaffle_shop directory: +### Option A: Browser UI (recommended) ```bash -docker pull ghcr.io/canner/wren-engine-ibis:latest - -JAFFLE_SHOP_DIR=/Users/you/jaffle_shop_duckdb # ← replace with your actual path - -docker run -d \ - --name wren-mcp \ - -p 8000:8000 \ - -p 9000:9000 \ - -p 9001:9001 \ - -e ENABLE_MCP_SERVER=true \ - -e MCP_TRANSPORT=streamable-http \ - -e MCP_HOST=0.0.0.0 \ - -e MCP_PORT=9000 \ - -e WREN_URL=localhost:8000 \ - -e MDL_PATH=/workspace/target/mdl.json \ - -v ~/wren-workspace:/workspace \ - -v "$JAFFLE_SHOP_DIR":/data \ - ghcr.io/canner/wren-engine-ibis:latest +wren profile add --ui ``` -The DuckDB file is available inside the container at `/data/jaffle_shop.duckdb`. +This opens a browser form. Fill in: + +- **Profile name:** `jaffle-shop` +- **Data source:** `duckdb` +- **Database path:** `/Users/you/jaffle_shop_duckdb` (your absolute path from Step 1) -Verify it's running: +### Option B: Interactive CLI ```bash -docker ps --filter name=wren-mcp -curl http://localhost:8000/health +wren profile add --interactive ``` -#### Phase 2 — Configure connection and register MCP server +Follow the prompts to enter profile name, data source, and connection fields. -Configure the DuckDB connection via the Web UI at `http://localhost:9001`: +### Option C: From file -1. Open `http://localhost:9001` in your browser -2. Select data source type: **DUCKDB** -3. Set the connection info: +Create a YAML file `jaffle-profile.yml`: -| Field | Value | Description | -|-------|-------|-------------| -| `format` | `duckdb` | Must be `"duckdb"` | -| `url` | `/data` | Path to the **folder** containing `.duckdb` files (not the file itself) | +```yaml +datasource: duckdb +url: /Users/you/jaffle_shop_duckdb +``` + +Then import it: -The JSON looks like: -```json -{ "url": "/data", "format": "duckdb" } +```bash +wren profile add --from-file jaffle-profile.yml --name jaffle-shop ``` -> **Common mistake:** Do not point `url` to the `.duckdb` file directly (e.g. `/data/jaffle_shop.duckdb`). The ibis-server expects a **directory** — it scans for all `.duckdb` files in that directory and attaches them automatically. Pointing to the binary file causes a UTF-8 decode error. +--- -Then register the MCP server with Claude Code: +Verify the profile is active: ```bash -claude mcp add --transport http wren http://localhost:9000/mcp +wren profile list ``` -Verify it was added: +You should see `jaffle-shop` marked as active. Test the connection: ```bash -claude mcp list +wren profile debug ``` -**Start a new Claude Code session** — MCP servers are only loaded at session start. +--- -#### Phase 3 — Generate the MDL +## Step 5 — Initialize a Wren project -In the new session, run the skills in sequence: +Create a new directory for your project and scaffold the project structure: -```text -/wren-generate-mdl +```bash +mkdir -p ~/jaffle-wren && cd ~/jaffle-wren +wren context init ``` -The skill uses MCP tools (`health_check()`, `list_remote_tables()`, etc.) to introspect the database — these tools are only available after the MCP server is registered and a new session is started. - -Then save the MDL as a versioned YAML project: +This creates: -```text -/wren-project +``` +~/jaffle-wren/ +├── wren_project.yml # project metadata +├── models/ # one folder per table +├── views/ # reusable SQL views +├── relationships.yml # table join definitions +└── instructions.md # business rules for the AI ``` -This writes human-readable YAML files to `~/wren-workspace/` and compiles `target/mdl.json`. +Edit `wren_project.yml` to set the data source: + +```yaml +schema_version: 2 +name: jaffle_shop +version: "1.0" +# catalog and schema are Wren Engine's internal namespace for this MDL project. +# They are NOT your database's native catalog/schema. Keep the defaults. +catalog: wren +schema: public +data_source: duckdb +``` -
+> **Note:** `catalog` and `schema` here define the **Wren Engine namespace** — how the engine addresses this MDL project internally. They have nothing to do with your database's catalog or schema. The actual database location of each table is specified per-model in the `table_reference` section. Keep the defaults (`wren` / `public`) unless you plan to query across multiple MDL projects. --- -## Step 5 — Start querying +## Step 6 — Generate MDL with Claude Code -In the new session, verify the connection: +Now let Claude Code explore the database and generate the MDL project files. Open Claude Code **in the project directory**: -``` -Use health_check() to verify Wren Engine is reachable. +```bash +cd ~/jaffle-wren +claude ``` -Then ask questions in natural language: +Then ask: ``` -How many customers placed more than one order? +Use the wren-generate-mdl skill to explore the jaffle_shop database +and generate the MDL for all tables. The data source is DuckDB. ``` -``` -What are the top 5 products by total revenue? +Claude Code will: + +1. **Discover tables** — `customers`, `orders`, `products`, `supplies`, etc. +2. **Introspect columns and types** — using SQLAlchemy or `information_schema` +3. **Normalize types** — via `wren utils parse-type` +4. **Write model YAML files** — one folder per table under `models/` +5. **Infer relationships** — from foreign keys and naming conventions +6. **Add descriptions** — Claude may ask you to describe key tables/columns +7. **Validate and build** — `wren context validate` → `wren context build` +8. **Index memory** — `wren memory index` (generates seed NL-SQL examples) + +After completion, verify the project: + +```bash +wren context show +wren memory status ``` +--- + +## Step 7 — Start asking questions + +You're ready to go. In Claude Code, just ask questions in natural language: + ``` -Show me the order completion rate by month for the last year. +How many customers placed more than one order? ``` ``` -Which customers have the highest average order value? +What are the top 5 products by total revenue? ``` ``` -What percentage of orders were returned? +Show me the monthly order count trend. ``` -Wren Engine translates these questions into SQL against the jaffle_shop MDL and returns results directly in your chat. +Behind the scenes, Claude Code uses the **wren-usage** skill to: + +1. **Fetch context** (`wren memory fetch`) — find relevant tables and columns for your question +2. **Recall examples** (`wren memory recall`) — find similar past queries +3. **Write SQL** — using the semantic layer (model names, not raw table names) +4. **Execute** (`wren --sql "..."`) — run through the Wren engine +5. **Store** (`wren memory store`) — save successful NL-SQL pairs for future recall + +The more you ask, the smarter the system gets — each stored query improves future recall accuracy. --- -## What happens under the hood +## What's in the project + +After setup, your project directory looks like this: ``` -Your question → Claude Code - → MCP tool call → Wren MCP server (port 9000) - → wren-ibis-server (port 8000) - → MDL semantic layer (models + relationships) - → DuckDB query execution - → Results back to Claude Code +~/jaffle-wren/ +├── wren_project.yml +├── models/ +│ ├── customers/ +│ │ └── metadata.yml # table schema + descriptions +│ ├── orders/ +│ │ └── metadata.yml +│ ├── products/ +│ │ └── metadata.yml +│ └── supplies/ +│ └── metadata.yml +├── views/ +├── relationships.yml # e.g. orders → customers (many_to_one) +├── instructions.md # your business rules +├── .wren/ +│ └── memory/ # LanceDB index (auto-managed) +└── target/ + └── mdl.json # compiled manifest ``` -The MDL manifest acts as a semantic layer — it tells Wren how your tables relate to each other (e.g. `orders` belongs to `customers` via `customer_id`), so queries like "top customers by revenue" automatically join the right tables. - ---- +Key files to customize: -## Troubleshooting +- **`instructions.md`** — Add business rules, naming conventions, or query guidelines. Use `##` headings to organize by topic. Example: -**`dbt build` fails — adapter not found:** -Install the duckdb adapter: `uv tool install dbt-duckdb` + ```markdown + ## Naming Conventions + - "revenue" always means order total, not supply cost + - "active customers" means customers with at least one order in the last 90 days -**Container can't find the DuckDB file:** -Check that the `-v` flag points to the directory containing `jaffle_shop.duckdb`, and that the path inside the container (`/data/jaffle_shop.duckdb`) matches what you gave for the connection. + ## Query Rules + - Always use order_date for time-based filtering, not created_at + ``` -**`'utf-8' codec can't decode byte …` error when querying DuckDB:** -The connection info `url` is pointing to the `.duckdb` file instead of its parent directory. The ibis-server tries to read the path as JSON, hits the binary file, and fails. Fix: set `url` to the **folder** (e.g. `/data`), not the file (e.g. `/data/jaffle_shop.duckdb`). See the [connection info table in Phase 2](#phase-2--configure-the-connection-and-generate-the-mdl). +- **`models/*/metadata.yml`** — Add or refine `description` fields on models and columns. Better descriptions = better memory search. -**`Catalog "xxx" does not exist` error:** -When ibis-server attaches a DuckDB file, the catalog name is derived from the filename (e.g. `jaffle_shop.duckdb` → catalog `jaffle_shop`). Make sure the `catalog` in your MDL matches the DuckDB filename without the extension. +- **`relationships.yml`** — Add or fix join conditions. Wrong relationships cause silent query errors. -**`/wren-generate-mdl` fails immediately:** -The container must be running first. Run `docker ps --filter name=wren-mcp` to confirm, then retry. +After editing any file, rebuild and re-index: -**MCP tools not available:** -Start a new Claude Code session after running `claude mcp add`. MCP servers are loaded at session start only. - -**`health_check()` returns an error:** -Check container logs: `docker logs wren-mcp`. Confirm ports are listening: `curl http://localhost:8000/health`. Check connection info in the Web UI: `http://localhost:9001`. +```bash +wren context validate +wren context build +wren memory index +``` --- -## Next steps +## Useful commands reference | Task | Command | |------|---------| -| Add or edit MDL models | `/wren-project` | -| Write custom SQL | `/wren-sql` | -| Connect a different database | Web UI at `http://localhost:9001` (use `/wren-connection-info` for field reference) | -| Day-to-day usage guide | `/wren-usage` | - -For a deeper dive into how skills work or how to connect a cloud database, see [Getting Started with Claude Code](./getting_started_with_claude_code.md). +| Run SQL | `wren --sql "SELECT ..." -o table` | +| Preview planned SQL | `wren dry-plan --sql "SELECT ..."` | +| Validate SQL | `wren validate --sql "SELECT ..."` | +| Show project context | `wren context show` | +| Show instructions | `wren context instructions` | +| Build manifest | `wren context build` | +| Fetch context for a question | `wren memory fetch --question "..."` | +| Recall similar queries | `wren memory recall --question "..."` | +| Store a NL-SQL pair | `wren memory store --nl "..." --sql "..."` | +| Check memory status | `wren memory status` | +| Re-index memory | `wren memory index` | +| Switch profile | `wren profile switch ` | +| List profiles | `wren profile list` | --- -## Locking down with read-only mode - -Once you have confirmed that queries are returning correct results and the MDL is working as expected, enable **read-only mode** in the Web UI: - -1. Open `http://localhost:9001` -2. Toggle **Read-Only Mode** to on - -When read-only mode is enabled: - -- The AI agent can **query data** and **read metadata** through the deployed MDL as usual -- The AI agent **cannot** modify connection info, change the data source, or call `list_remote_tables()` / `list_remote_constraints()` to introspect the database directly -- This limits the agent to operating within the boundaries of the MDL you have defined, preventing it from accessing tables or schemas you have not explicitly modeled +## Next steps -We recommend enabling read-only mode for day-to-day use. Turn it off temporarily when you need to regenerate the MDL or change connection settings. +- **Add views** for frequently asked questions — views with good descriptions become high-quality recall examples +- **Refine instructions** as you discover query patterns the AI gets wrong diff --git a/docs/wren_project.md b/docs/wren_project.md index 16e17150c..5ac243544 100644 --- a/docs/wren_project.md +++ b/docs/wren_project.md @@ -1,33 +1,82 @@ # Wren Project -This guide explains the Wren MDL project structure, file formats, and typical workflow for managing your data models as a collection of YAML files. The Wren Project format makes it easy to version control your models, track changes, and separate connection information from model definitions. +A Wren project is a directory of YAML files that define a semantic layer (models, relationships, views, and instructions) over a database. It is the unit of authoring, version control, and deployment for MDL (Model Definition Language) definitions. -A Wren MDL project is a directory of YAML files that makes MDL manifests human-readable and version-control friendly — similar to how dbt organizes models. Instead of managing a single large JSON file, each model lives in its own YAML file, and the project is compiled to a deployable `mdl.json` when needed. +Instead of managing a single `mdl.json` by hand, you author each model in its own directory as human-readable YAML. The CLI compiles them into a deployable JSON manifest when needed. -YAML files use **snake_case** field names for readability. The compiled `target/mdl.json` uses **camelCase**, which is the wire format expected by ibis-server. +YAML files use **snake_case** field names for readability. The compiled `target/mdl.json` uses **camelCase**, which is the wire format expected by the engine. ## Project Structure ```text my_project/ -├── wren_project.yml # Project metadata (catalog, schema, data_source) +├── wren_project.yml # project metadata ├── models/ -│ ├── orders.yml # One file per model -│ ├── customers.yml -│ └── ... -├── relationships.yml # All relationships -└── views.yml # All views +│ ├── orders/ +│ │ └── metadata.yml # table_reference mode (physical table) +│ ├── customers/ +│ │ └── metadata.yml +│ └── revenue_summary/ +│ ├── metadata.yml # ref_sql mode (SQL-defined model) +│ └── ref_sql.sql # SQL in separate file (optional) +├── views/ +│ ├── monthly_revenue/ +│ │ ├── metadata.yml +│ │ └── sql.yml # statement in separate file (optional) +│ └── top_customers/ +│ └── metadata.yml # statement inline +├── relationships.yml # all relationships +├── instructions.md # user instructions for LLM (optional) +├── .wren/ # runtime state (gitignored) +│ └── memory/ # LanceDB index files +└── target/ + └── mdl.json # build output (gitignored) ``` -After building, the compiled file is written to: +Each model and view lives in its own subdirectory under `models/` and `views/` respectively. -```text -my_project/ -└── target/ - └── mdl.json # Deployable MDL JSON (camelCase) -``` +--- + +## What Lives Where -> **Connection info** is managed via the MCP server Web UI (`http://localhost:9001`) — it is not stored in the project directory. +A Wren project keeps schema artifacts together in the project directory. Global configuration lives separately in `~/.wren/`. + +| Artifact | Location | Scope | +|----------|----------|-------| +| Models, views, relationships | `/models/`, `/views/`, `/relationships.yml` | Project — version controlled | +| Instructions | `/instructions.md` | Project — references this project's model/column names | +| Compiled MDL | `/target/mdl.json` | Project — derived from YAML, gitignored | +| Memory (LanceDB) | `/.wren/memory/` | Project — indexes this project's schema, gitignored | +| Profiles (connections) | `~/.wren/profiles.yml` | Global — environment-specific (dev/prod credentials) | +| Global config | `~/.wren/config.yml` | Global — CLI preferences | + +**Why this separation?** Schema definitions are project-specific — they describe a particular data model. Connection credentials are environment-specific — the same project connects to different databases in dev vs. prod. Keeping them separate means projects are portable and safe to commit without leaking secrets. + +--- + +## Project Discovery + +When you run a wren command that needs the project (query, memory fetch, etc.), the CLI resolves the project root in this order: + +1. `--project` flag (explicit) +2. `WREN_PROJECT_HOME` environment variable +3. Walk up from the current directory looking for `wren_project.yml` +4. `default_project` in `~/.wren/config.yml` + +If no project is found, the CLI exits with an error and suggests running `wren context init` or setting `WREN_PROJECT_HOME`. + +Once the project root is resolved, all paths (MDL, instructions, memory) are determined relative to it. + +For running wren commands outside the project directory: + +```bash +# option A: environment variable +export WREN_PROJECT_HOME=~/projects/sales +wren --sql "SELECT ..." + +# option B: global config (~/.wren/config.yml) +default_project: ~/projects/sales +``` --- @@ -35,26 +84,44 @@ my_project/ ### `wren_project.yml` -The root metadata file describing the project: - ```yaml +schema_version: 2 name: my_project version: "1.0" catalog: wren schema: public -data_source: POSTGRES +data_source: postgres ``` | Field | Description | |-------|-------------| +| `schema_version` | Directory layout version. `2` = folder-per-entity (current). Owned by the CLI — do not bump manually. | | `name` | Project name | -| `catalog` | MDL catalog (matches the `catalog` in your MDL manifest) | -| `schema` | MDL schema | -| `data_source` | Data source type (e.g. `POSTGRES`, `BIGQUERY`, `SNOWFLAKE`) | +| `version` | User's own project version (free-form, no effect on parsing) | +| `catalog` | **Wren Engine namespace** — NOT your database catalog. Identifies this MDL project within the engine. Default: `wren`. | +| `schema` | **Wren Engine namespace** — NOT your database schema. Default: `public`. | +| `data_source` | Data source type (e.g. `postgres`, `bigquery`, `snowflake`) | + +> **`catalog` / `schema` are NOT database settings.** +> +> These two fields define the Wren Engine's internal namespace for addressing models in SQL. They exist to support future multi-project querying. For single-project use, keep the defaults (`catalog: wren`, `schema: public`). +> +> Your database's actual catalog and schema are specified per-model in the `table_reference` section of each model's `metadata.yml`. + +#### Two levels of catalog/schema + +The same field names appear in two places with completely different meanings: -### `models/.yml` +| Location | Refers to | Example | When to change | +|----------|-----------|---------|----------------| +| `wren_project.yml` → `catalog`, `schema` | Wren Engine namespace | `wren`, `public` | Only for multi-project setups | +| `models/*/metadata.yml` → `table_reference.catalog`, `table_reference.schema` | Database location | `""`, `main` | Must match your actual database | -One file per model. Example for an `orders` model: +### Model (`models//metadata.yml`) + +A model must define its source in exactly one of two ways: + +**table_reference** — maps to a physical table: ```yaml name: orders @@ -69,11 +136,6 @@ columns: not_null: true is_primary_key: true properties: {} - - name: customer_id - type: INTEGER - is_calculated: false - not_null: false - properties: {} - name: total type: DECIMAL is_calculated: false @@ -84,13 +146,51 @@ cached: false properties: {} ``` -### `relationships.yml` +**ref_sql** — defines the model via a SQL query. SQL can be inline in `metadata.yml` or in a separate `ref_sql.sql` file (the `.sql` file takes precedence if both exist): -All relationships between models in a single file: +```yaml +name: revenue_summary +columns: + - name: month + type: DATE + is_calculated: false + not_null: true + properties: {} + - name: total_revenue + type: DECIMAL + is_calculated: false + not_null: false + properties: {} +``` + +```sql +-- models/revenue_summary/ref_sql.sql +SELECT DATE_TRUNC('month', order_date) AS month, + SUM(total) AS total_revenue +FROM orders +GROUP BY 1 +``` + +Using both `table_reference` and `ref_sql` in the same model is a validation error. + +### View (`views//metadata.yml`) + +Views have a `statement` field. Like ref_sql models, the SQL can be inline in `metadata.yml` or in a separate `sql.yml` file (the `sql.yml` takes precedence if both exist): + +```yaml +name: top_customers +statement: > + SELECT customer_id, SUM(total) AS lifetime_value + FROM wren.public.orders GROUP BY 1 ORDER BY 2 DESC LIMIT 100 +description: "Top customers by lifetime value" +properties: {} +``` + +### `relationships.yml` ```yaml relationships: - - name: orders_customer + - name: orders_customers models: - orders - customers @@ -98,98 +198,117 @@ relationships: condition: orders.customer_id = customers.customer_id ``` -### `views.yml` +### `instructions.md` -All views in a single file: +Free-form Markdown with rules and guidelines for LLM-based query generation. Organize by topic with `##` headings: -```yaml -views: - - name: recent_orders - statement: SELECT * FROM wren.public.orders WHERE order_date > '2024-01-01' - properties: {} -``` +```markdown +## Business rules +- Revenue queries must use net_revenue, not gross_revenue +- All queries must filter status = 'completed' ---- +## Formatting +- Currency is TWD, display with thousand separators +- Timestamps are UTC+8 +``` -## Field Mapping +Instructions are consumed by agents, not by the engine. They are intentionally excluded from `target/mdl.json` — the wren-core rewrite pipeline has no use for them. Agents access instructions through two paths: -When converting between YAML (snake_case) and JSON (camelCase): - -| YAML field | JSON field | -|------------|------------| -| `data_source` | `dataSource` | -| `table_reference` | `tableReference` | -| `is_calculated` | `isCalculated` | -| `not_null` | `notNull` | -| `is_primary_key` | `isPrimaryKey` | -| `primary_key` | `primaryKey` | -| `join_type` | `joinType` | - -All other fields (`name`, `type`, `catalog`, `schema`, `table`, `condition`, `models`, `columns`, `cached`, `properties`) are identical in both formats. +- `wren context instructions` — returns full text, run once at session start to capture global constraints +- `wren memory fetch -q "..."` — returns relevant instruction chunks alongside schema context per query --- -## Building the Project - -Building compiles the YAML project into `target/mdl.json`: +## Lifecycle -```bash -# target/mdl.json — assembled MDL manifest (camelCase) +```text +wren context init → scaffold project in current directory + (edit models/, relationships.yml, instructions.md) +wren context validate → check YAML structure (no DB needed) +wren context build → compile to target/mdl.json +wren profile add my-pg ... → save connection to ~/.wren/profiles.yml +wren memory index → index schema + instructions into .wren/memory/ +wren --sql "SELECT 1" → verify connection +wren --sql "SELECT ..." → start querying ``` -After building, deploy the MDL via the MCP server: +After editing models, rebuild and re-index: ```text -deploy(mdl_file_path="/workspace/target/mdl.json") +wren context build +wren memory index ``` -Or place it in the workspace before starting the container so it is auto-loaded via `MDL_PATH`. - -Connection info is configured separately via the Web UI (`http://localhost:9001`) — it is not part of the build output. - --- -## Typical Workflow +## Migrating from MDL JSON -**1. Configure connection info** +If you already have an `mdl.json` (from the MCP server, an earlier Wren setup, or an AI agent that generated one), use `--from-mdl` to convert it into a v2 YAML project in one step: -Open the Web UI at `http://localhost:9001`, select the data source type, and fill in connection credentials. Use `/wren-connection-info` in Claude Code for per-connector field reference. +```bash +wren context init --from-mdl /path/to/mdl.json --path my_project +``` -**2. Generate MDL** +This reads the camelCase JSON, converts all fields to snake_case YAML, and writes out the full project structure: -Run `/wren-generate-mdl` in Claude Code. The skill uses MCP tools (`list_remote_tables`, `list_remote_constraints`) to introspect the database and build the MDL JSON. +```text +my_project/ +├── wren_project.yml # catalog, schema, data_source from the manifest +├── models/ +│ ├── orders/ +│ │ └── metadata.yml # one directory per model +│ └── customers/ +│ └── metadata.yml +├── views/ +│ └── top_customers/ +│ └── metadata.yml # one directory per view +├── relationships.yml +└── instructions.md +``` -**3. Save project** +After import, validate and build: -Write `wren_project.yml`, `models/*.yml`, `relationships.yml`, and `views.yml` by converting the MDL JSON (camelCase) to snake_case YAML. +```bash +wren context validate --path my_project +wren context build --path my_project +``` -**4. Add `target/` to `.gitignore`** +If the target directory already contains project files, add `--force` to overwrite: -```text -target/ +```bash +wren context init --from-mdl mdl.json --path my_project --force ``` -**5. Commit to version control** +> **When to use this:** You have an existing `mdl.json` that was authored by hand or generated by an older workflow (e.g. the MCP server's `mdl_save_project` tool), and you want to adopt the YAML project format for version control and CLI-driven workflows. -Commit the model YAML files — `target/` is excluded. +--- -**6. Build** +## Field Mapping -Read the YAML files, rename snake_case → camelCase, and write `target/mdl.json`. +The `build` step converts all YAML keys from snake_case to camelCase: -**7. Deploy** +| YAML | JSON | +|------|------| +| `table_reference` | `tableReference` | +| `ref_sql` | `refSql` | +| `is_calculated` | `isCalculated` | +| `not_null` | `notNull` | +| `is_primary_key` | `isPrimaryKey` | +| `primary_key` | `primaryKey` | +| `join_type` | `joinType` | +| `data_source` | `dataSource` | -```text -deploy(mdl_file_path="/workspace/target/mdl.json") -``` +Generic rule: split on `_`, capitalize each word after the first, join. All other fields (`name`, `type`, `catalog`, `schema`, `table`, `condition`, `models`, `columns`, `cached`, `properties`) are identical in both formats. --- -## Version Control Benefits +## .gitignore + +```text +target/ +.wren/ +``` -Storing MDL as a YAML project (rather than a single JSON blob) gives you: +Source YAML and `instructions.md` are committed. Build output (`target/`) is always gitignored — it is derived from source YAML and can be regenerated with `wren context build`. -- **Readable diffs** — model changes show up as clear line-level diffs in pull requests -- **One file per model** — merge conflicts are isolated to the affected model file -- **Separation of secrets** — connection info lives in the Web UI, not in the project; `target/` is gitignored -- **Reproducible builds** — `target/mdl.json` is always regenerated from source, never committed +`.wren/memory/` contains both schema indexes (derived, rebuildable) and query history (NL-SQL pairs confirmed by users, not rebuildable). If your team wants to share confirmed query history as few-shot examples across members, you can commit `.wren/memory/` — but be aware that LanceDB files are binary and may produce merge conflicts when multiple people index or store concurrently. diff --git a/skills-archive/.claude-plugin/marketplace.json b/skills-archive/.claude-plugin/marketplace.json new file mode 100644 index 000000000..62aac87a1 --- /dev/null +++ b/skills-archive/.claude-plugin/marketplace.json @@ -0,0 +1,43 @@ +{ + "name": "wren", + "owner": { + "name": "Canner", + "email": "dev@cannerdata.com" + }, + "metadata": { + "description": "Wren Engine skills — semantic SQL, MDL management, MCP server setup for 20+ data sources" + }, + "plugins": [ + { + "name": "wren", + "source": ".", + "description": "AI agent skills for Wren Engine — semantic SQL layer and MCP server for 20+ data sources.", + "version": "1.0.0", + "keywords": [ + "wren", + "sql", + "mdl", + "mcp", + "semantic-layer", + "docker", + "amazon-s3", + "apache-spark", + "apache-doris", + "athena", + "bigquery", + "clickhouse", + "databricks", + "duckdb", + "google-cloud-storage", + "minio", + "mysql", + "oracle", + "postgres", + "redshift", + "sql-server", + "snowflake", + "trino" + ] + } + ] +} diff --git a/skills-archive/.claude-plugin/plugin.json b/skills-archive/.claude-plugin/plugin.json new file mode 100644 index 000000000..178ce0252 --- /dev/null +++ b/skills-archive/.claude-plugin/plugin.json @@ -0,0 +1,38 @@ +{ + "name": "wren", + "description": "AI agent skills for Wren Engine — semantic SQL layer and MCP server for 20+ data sources.", + "version": "1.0.0", + "author": { + "name": "Canner", + "url": "https://www.getwren.ai/" + }, + "homepage": "https://www.getwren.ai/", + "repository": "https://github.com/Canner/wren-engine", + "license": "Apache-2.0", + "keywords": [ + "wren", + "sql", + "mdl", + "mcp", + "semantic-layer", + "data-source", + "docker", + "amazon-s3", + "apache-spark", + "apache-doris", + "athena", + "bigquery", + "clickhouse", + "databricks", + "duckdb", + "google-cloud-storage", + "minio", + "mysql", + "oracle", + "postgres", + "redshift", + "sql-server", + "snowflake", + "trino" + ] +} diff --git a/skills-archive/AUTHORING.md b/skills-archive/AUTHORING.md new file mode 100644 index 000000000..254f88d64 --- /dev/null +++ b/skills-archive/AUTHORING.md @@ -0,0 +1,116 @@ +# Skill Authoring Guide + +Skills in this project follow the [Agent Skills](https://agentskills.io/) open format. +Full specification: https://agentskills.io/specification + +--- + +## Directory Structure + +Each skill is a subdirectory containing a required `SKILL.md` and optional supporting directories: + +```text +skill-name/ +├── SKILL.md # Required — frontmatter + workflow instructions +├── references/ # Optional — detail files loaded on demand +│ ├── some-topic.md +│ └── another-topic.md +└── scripts/ # Optional — executable scripts the agent can run +``` + +--- + +## Frontmatter + +Every `SKILL.md` must open with YAML frontmatter: + +```yaml +--- +name: skill-name +description: "What this skill does and when to trigger it. Include specific + trigger keywords. This field is loaded at startup for every conversation." +compatibility: "Optional. Only include if the skill has specific environment + requirements (e.g. requires Docker, must run from ibis-server/)." +metadata: + author: wren-engine + version: "1.0" +--- +``` + +**Rules:** +- `name` must exactly match the parent directory name (lowercase, hyphens only) +- `description` is always loaded — keep it concise and keyword-rich so the agent can match it to user intent + +--- + +## Progressive Disclosure + +Skills load in three tiers. Design content for the tier where it is actually needed: + +| Tier | Content | When loaded | +|------|---------|-------------| +| 1 — Metadata | `name` + `description` (~100 tokens) | Always, at every startup | +| 2 — Instructions | Full `SKILL.md` body | When the skill is activated | +| 3 — Resources | Files in `references/` or `scripts/` | Only when the agent explicitly reads them | + +**Keep `SKILL.md` under 500 lines.** If the body is growing, move reference-only content to `references/`. + +--- + +## What Goes Where + +### Keep in `SKILL.md` +- Step-by-step workflow the agent follows +- Decision criteria and branching logic +- Short commands or invocations the agent needs immediately +- Quick reference tables (file paths, phase mappings, etc.) + +### Move to `references/` +Content that is only needed in certain code paths: +- Output templates (report formats, plan file formats) +- Per-case investigation details (e.g. per-stage debug steps) +- Large lookup tables (connection info examples, error pattern catalogs) +- Anything that would make `SKILL.md` exceed 300 lines + +Link to reference files from `SKILL.md` using paths relative to the skill root: +```markdown +Follow [references/diagnose.md](references/diagnose.md) for per-stage investigation steps. +``` + +When linking from `SKILLS.md` (one level up), prefix with the skill directory name: +```markdown +| [references/diagnose.md](my-skill/references/diagnose.md) | Diagnosis steps | +``` + +--- + +## Naming Conventions + +| Item | Convention | Example | +|------|-----------|---------| +| Skill directory | `kebab-case` | `wren-debugging/` | +| `name` field | same as directory | `wren-debugging` | +| Reference files | descriptive `kebab-case` | `plan-template.md`, `connection-info.md` | + +--- + +## Registration + +After creating a new skill: + +1. Add a section to [SKILLS.md](SKILLS.md) describing the skill, its trigger conditions, and reference files. +2. Add a row to the skills table in [README.md](README.md). +3. Add the skill name and version to [versions.json](versions.json). +4. Add an entry to [index.json](index.json) with `name`, `version`, `description`, `tags`, `dependencies` (if any), and `repository`. +5. Add the skill to the `ALL_SKILLS` array in [install.sh](install.sh). + +Both `versions.json` and `index.json` must stay in sync with the `version` field in the skill's `SKILL.md` frontmatter. Run `bash skills-archive/check-versions.sh` to verify parity before merging — the script validates both files. + +--- + +## Releasing a skill update + +1. Bump `version` in the skill's `SKILL.md` frontmatter. +2. Update the matching version in `versions.json`. +3. Update the matching version in `index.json`. +4. Run `bash skills-archive/check-versions.sh` — must pass before merging. diff --git a/skills-archive/README.md b/skills-archive/README.md new file mode 100644 index 000000000..0c7d775ea --- /dev/null +++ b/skills-archive/README.md @@ -0,0 +1,122 @@ +# Wren Engine Skills + +This directory contains reusable AI agent skills for working with Wren Engine. Skills are instruction files that teach AI agents (Claude, Cline, Cursor, etc.) how to perform specific workflows with Wren's tools and APIs. + +## Installation + +### Option 1 — Claude Code Plugin + +Add the marketplace and install: +``` +/plugin marketplace add Canner/wren-engine --path skills +/plugin install wren@wren +``` + +Or test locally during development: +```bash +claude --plugin-dir ./skills +``` + +Skills are namespaced as `/wren:` (e.g., `/wren:generate-mdl`, `/wren:wren-sql`). + +### Option 2 — npx skills + +Install all skills for Claude Code: +```bash +npx skills add Canner/wren-engine --skill '*' --agent claude-code +``` + +`npx skills` also supports Cursor, Windsurf, and 30+ other agent tools — replace `--agent claude-code` with your agent of choice. + +### Option 3 — ClawHub + +Two skills are published on [ClawHub](https://clawhub.ai) for agents that support the ClawHub registry: + +```bash +clawhub install wren-usage # main entry point — installs all dependent skills +clawhub install wren-http-api # standalone — HTTP JSON-RPC for non-MCP clients +``` + +### Option 4 — install script (from a local clone) + +```bash +bash skills-archive/install.sh # all skills +bash skills-archive/install.sh wren-generate-mdl wren-sql # specific skills +bash skills-archive/install.sh --force wren-generate-mdl # overwrite existing +``` + +### Option 5 — manual copy + +```bash +cp -r skills-archive/wren-usage ~/.claude/skills/ +# or all at once: +cp -r skills-archive/wren-usage skills/wren-generate-mdl skills/wren-project skills/wren-sql skills/wren-mcp-setup skills/wren-connection-info ~/.claude/skills/ +``` + +Once installed, invoke a skill by name in your conversation: + +```text +/wren-usage +/wren-quickstart +/wren-generate-mdl +/wren-project +/wren-sql +/wren-mcp-setup +``` + +> **Tip:** Use `--skill '*'` to install all skills at once, or specify individual skills (e.g., `--skill wren-generate-mdl --skill wren-sql`). + +## Available Skills + +| Skill | Description | +|-------|-------------| +| [wren-usage](wren-usage/SKILL.md) | **Primary skill** — daily usage guide: query data, manage MDL, connect databases, operate MCP server `ClawHub` | +| [wren-quickstart](wren-quickstart/SKILL.md) | End-to-end first-time setup — install skills, generate MDL, save project, start MCP server, verify setup | +| [wren-generate-mdl](wren-generate-mdl/SKILL.md) | Generate a Wren MDL manifest from a live database using ibis-server introspection | +| [wren-project](wren-project/SKILL.md) | Save, load, and build MDL manifests as version-controlled YAML project directories | +| [wren-sql](wren-sql/SKILL.md) | Write and correct SQL queries for Wren Engine — types, date/time, BigQuery dialect, error diagnosis | +| [wren-mcp-setup](wren-mcp-setup/SKILL.md) | Set up Wren Engine MCP via Docker, register with Claude Code or other MCP clients, and start querying | +| [wren-connection-info](wren-connection-info/SKILL.md) | Set up data source credentials — produces `connectionFilePath` or inline dict | +| [wren-http-api](wren-http-api/SKILL.md) | Interact with Wren MCP via plain HTTP JSON-RPC — no MCP SDK required (for OpenClaw, custom clients, scripts) `ClawHub` | + +See [SKILLS.md](SKILLS.md) for full details on each skill. + +## Updating Skills + +Each skill automatically checks for updates when invoked. If a newer version is available, the AI agent will notify you with the update command before continuing. + +To update manually at any time: + +```bash +# Re-add to reinstall the latest version +npx skills add Canner/wren-engine --skill '*' --agent claude-code + +# Or reinstall a specific skill +npx skills add Canner/wren-engine --skill wren-generate-mdl --agent claude-code +``` + +## Releasing a New Skill Version + +When updating a skill, two files must be kept in sync: + +1. Update `version` in the skill's `SKILL.md` frontmatter: + ```yaml + metadata: + author: wren-engine + version: "1.2" # bump this + ``` + +2. Update the matching entry in [`versions.json`](versions.json): + ```json + { + "wren-generate-mdl": "1.2" + } + ``` + +Both files must have the same version number. The `SKILL.md` version is what users have installed locally; `versions.json` is what the update check compares against. + +## Requirements + +- A running [ibis-server](../ibis-server/) instance +- The [Wren MCP server](../mcp-server/) connected to your AI client +- An AI client that supports skills (Claude Code, Cline, Cursor, etc.) diff --git a/skills-archive/SKILLS.md b/skills-archive/SKILLS.md new file mode 100644 index 000000000..bed09309b --- /dev/null +++ b/skills-archive/SKILLS.md @@ -0,0 +1,244 @@ +# Wren Engine Skill Reference + +Skills are instruction files that extend AI agents with Wren-specific workflows. Install them into your local skills folder and invoke them by name during a conversation. + +--- + +## wren-usage + +**File:** [wren-usage/SKILL.md](wren-usage/SKILL.md) + +**Primary entry point** for day-to-day Wren Engine usage. Identifies the user's task and delegates to the appropriate focused skill. Covers SQL queries, MDL management, database connections, and MCP server operations. + +### When to use + +- Writing or debugging SQL queries against a deployed MDL +- Adding or modifying models, columns, or relationships in the MDL +- Changing database credentials or data source +- Rebuilding `target/mdl.json` after project changes +- Restarting or reconfiguring the MCP server +- Any ongoing Wren task after initial setup is complete + +### Dependent skills + +| Skill | Purpose | +|-------|---------| +| `@wren-sql` | Write and debug SQL queries | +| `@wren-connection-info` | Set up or change database credentials | +| `@wren-generate-mdl` | Regenerate MDL from a changed database schema | +| `@wren-project` | Save, load, and build MDL YAML projects | +| `@wren-mcp-setup` | Reconfigure the MCP server | + +> Installing `wren-usage` via `install.sh` automatically installs all dependent skills: +> ```bash +> bash skills-archive/install.sh wren-usage +> ``` + +--- + +## wren-quickstart + +**File:** [wren-quickstart/SKILL.md](wren-quickstart/SKILL.md) + +End-to-end onboarding guide for Wren Engine. Orchestrates the full setup flow — from installing skills and creating a workspace, to generating an MDL, saving it as a versioned project, starting the MCP Docker container, and verifying everything works. + +### When to use + +- Setting up Wren Engine for the first time +- Onboarding a new data source from scratch +- Getting a new team member started with Wren MCP + +### Workflow summary + +1. Install required skills via `install.sh` +2. Create a workspace directory on the host machine +3. Generate MDL from the database (`@wren-generate-mdl`) +4. Save as a YAML project and compile to `target/` (`@wren-project`) +5. Start the Docker container and register the MCP server (`@wren-mcp-setup`) +6. Run `health_check()` to verify — then start a new session and query + +### Dependent skills + +| Skill | Purpose | +|-------|---------| +| `@wren-generate-mdl` | Introspect database and build MDL JSON | +| `@wren-project` | Save MDL as YAML project + compile to `target/` | +| `@wren-mcp-setup` | Start Docker container and register MCP server | + +--- + +## wren-generate-mdl + +**File:** [wren-generate-mdl/SKILL.md](wren-generate-mdl/SKILL.md) + +Generates a complete Wren MDL manifest by introspecting a live database through ibis-server — no local database drivers required. + +### When to use + +- Onboarding a new data source into Wren +- Scaffolding an MDL from an existing database schema +- Automating initial MDL setup as part of a data pipeline + +### Required MCP tools + +`setup_connection`, `list_remote_tables`, `list_remote_constraints`, `mdl_validate_manifest`, `mdl_save_project`, `deploy_manifest` + +### Supported data sources + +`POSTGRES`, `MYSQL`, `MSSQL`, `DUCKDB`, `BIGQUERY`, `SNOWFLAKE`, `CLICKHOUSE`, `TRINO`, `ATHENA`, `ORACLE`, `DATABRICKS` + +### Workflow summary + +1. Gather connection credentials from the user +2. Register the connection via `setup_connection` +3. Fetch table schema via `list_remote_tables` +4. Fetch foreign key constraints via `list_remote_constraints` +5. Optionally sample data for ambiguous columns +6. Build the MDL JSON (models, columns, relationships) +7. Validate via `mdl_validate_manifest` +8. Optionally save as a YAML project (see `wren-project`) +9. Deploy via `deploy_manifest` + +--- + +## wren-project + +**File:** [wren-project/SKILL.md](wren-project/SKILL.md) + +Manages Wren MDL manifests as human-readable YAML project directories — similar to dbt projects. Makes MDL version-control friendly by splitting the monolithic JSON into one YAML file per model. + +### When to use + +- Persisting an MDL to disk for version control (Git) +- Loading a saved YAML project back into a deployable MDL JSON +- Compiling a YAML project to `target/mdl.json` for deployment + +### Project layout + +``` +my_project/ +├── wren_project.yml # Catalog, schema, data source +├── models/ +│ ├── orders.yml # One file per model (snake_case fields) +│ └── customers.yml +├── relationships.yml +├── views.yml +└── target/ + └── mdl.json # Compiled output (camelCase, deployable) +``` + +### Key operations + +| Operation | Description | +|-----------|-------------| +| **Save** | Convert MDL JSON → YAML project directory (camelCase → snake_case) | +| **Load** | Read YAML project → assemble MDL JSON dict (snake_case → camelCase) | +| **Build** | Load + write result to `target/mdl.json` | +| **Deploy** | Pass `target/mdl.json` to `deploy(mdl_file_path=...)` | + +### Field mapping (YAML ↔ JSON) + +| YAML (snake_case) | JSON (camelCase) | +|-------------------|------------------| +| `data_source` | `dataSource` | +| `table_reference` | `tableReference` | +| `is_calculated` | `isCalculated` | +| `not_null` | `notNull` | +| `is_primary_key` | `isPrimaryKey` | +| `primary_key` | `primaryKey` | +| `join_type` | `joinType` | + +--- + +## wren-sql + +**File:** [wren-sql/SKILL.md](wren-sql/SKILL.md) + +Comprehensive SQL authoring and debugging guide for Wren Engine. Covers core query rules, filter strategies, supported types, aggregation, and links to topic-specific references. + +### When to use + +- Writing SQL queries against Wren Engine MDL models +- Debugging SQL errors across parsing, planning, transpiling, or execution stages +- Working with complex types (ARRAY, STRUCT, JSON/VARIANT) +- Writing date/time calculations or interval arithmetic +- Targeting BigQuery as a backend database + +### Reference files + +| File | Topic | +|------|-------| +| [references/correction.md](wren-sql/references/correction.md) | Error diagnosis and correction workflow | +| [references/datetime.md](wren-sql/references/datetime.md) | Date/time functions, intervals, epoch conversion | +| [references/types.md](wren-sql/references/types.md) | ARRAY, STRUCT, JSON/VARIANT/OBJECT types | +| [references/bigquery.md](wren-sql/references/bigquery.md) | BigQuery dialect quirks | + +--- + +## wren-mcp-setup + +**File:** [wren-mcp-setup/SKILL.md](wren-mcp-setup/SKILL.md) + +Sets up Wren Engine MCP server via Docker, registers it with an AI agent (Claude Code or other MCP clients), and starts a new session to begin interacting with Wren. + +### When to use + +- Running Wren MCP in Docker (no local Python/uv install required) +- Configuring Claude Code MCP to connect to a containerized Wren Engine +- Setting up Wren MCP for Cline, Cursor, or VS Code MCP Extension +- Fixing `localhost` → `host.docker.internal` in connection info for Docker + +### Workflow summary + +1. Ask user for workspace mount path +2. `docker run` with workspace mounted at `/workspace`, MCP server enabled on port 9000 +3. Rewrite `localhost` → `host.docker.internal` in connection credentials +4. Add `wren` MCP server to Claude Code using streamable-http on port 9000 (`claude mcp add`) +5. Start a new session so the MCP tools are loaded +6. Run `health_check()` to verify + +--- + +## wren-http-api + +**File:** [wren-http-api/SKILL.md](wren-http-api/SKILL.md) + +Interact with Wren Engine MCP server via plain HTTP JSON-RPC requests — no MCP client SDK required. Covers session initialization, tool discovery, and calling all 20+ Wren tools using standard HTTP POST with JSON-RPC 2.0 payloads. + +### When to use + +- The client cannot or prefers not to use the MCP protocol directly (e.g. OpenClaw) +- Building a custom HTTP integration with Wren Engine +- Calling Wren tools from shell scripts, CI pipelines, or non-MCP environments +- Debugging MCP tool calls with curl + +### Workflow summary + +1. Initialize a JSON-RPC session via `POST /mcp` with `initialize` method +2. Save the `Mcp-Session-Id` header from the response +3. Complete the handshake with `notifications/initialized` +4. Call any Wren tool via `tools/call` method with the session header +5. Parse SSE `data:` lines from responses + +--- + +## Installing a skill + +```bash +# Install wren-usage (auto-installs all dependencies) +bash skills-archive/install.sh wren-usage + +# Or install everything +bash skills-archive/install.sh +``` + +Then invoke in your AI client: + +``` +/wren-usage +/wren-generate-mdl +/wren-project +/wren-sql +/wren-mcp-setup +/wren-quickstart +``` diff --git a/skills-archive/check-versions.sh b/skills-archive/check-versions.sh new file mode 100755 index 000000000..66a0c4cbd --- /dev/null +++ b/skills-archive/check-versions.sh @@ -0,0 +1,68 @@ +#!/usr/bin/env bash +# Verify that versions.json and index.json (in this script's directory) both match +# the version in each skill's SKILL.md frontmatter. +# Exits non-zero if any mismatch is found. +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +VERSIONS_JSON="$SCRIPT_DIR/versions.json" +INDEX_JSON="$SCRIPT_DIR/index.json" +ERRORS=0 + +while IFS= read -r skill; do + skill_name="${skill//\"/}" + skill_name="${skill_name%%:*}" + skill_name="${skill_name// /}" + + versions_version=$(python3 -c "import json,sys; d=json.load(open('$VERSIONS_JSON')); print(d.get('$skill_name','MISSING'))") + + skill_file="$SCRIPT_DIR/$skill_name/SKILL.md" + if [ ! -f "$skill_file" ]; then + echo "ERROR: $skill_name listed in versions.json but $skill_file not found" >&2 + ERRORS=$((ERRORS + 1)) + continue + fi + + md_version=$(grep -m1 'version:' "$skill_file" | sed 's/.*version: *"\{0,1\}\([^"]*\)"\{0,1\}/\1/' | tr -d ' "') + + if [ "$versions_version" != "$md_version" ]; then + echo "MISMATCH: $skill_name — versions.json=$versions_version, SKILL.md=$md_version" >&2 + ERRORS=$((ERRORS + 1)) + else + echo "OK (versions.json): $skill_name @ $versions_version" + fi + + index_version=$(python3 -c " +import json, sys +skills = json.load(open('$INDEX_JSON')).get('skills', []) +match = next((s['version'] for s in skills if s['name'] == '$skill_name'), 'MISSING') +print(match) +") + + if [ "$index_version" != "$md_version" ]; then + echo "MISMATCH: $skill_name — index.json=$index_version, SKILL.md=$md_version" >&2 + ERRORS=$((ERRORS + 1)) + else + echo "OK (index.json): $skill_name @ $index_version" + fi +done < <(python3 -c " +import json +from pathlib import Path + +root = Path('$SCRIPT_DIR') +versions = set(json.load(open('$VERSIONS_JSON')).keys()) +index = {s['name'] for s in json.load(open('$INDEX_JSON')).get('skills', [])} +skill_dirs = {p.parent.name for p in root.glob('*/SKILL.md')} + +for name in sorted(versions | index | skill_dirs): + print(name) +") + +if [ "$ERRORS" -gt 0 ]; then + echo "" >&2 + echo "Found $ERRORS version mismatch(es). Update versions.json, index.json, or SKILL.md to match." >&2 + exit 1 +fi + +echo "" +echo "All skill versions match." diff --git a/skills-archive/index.json b/skills-archive/index.json new file mode 100644 index 000000000..5b60731f6 --- /dev/null +++ b/skills-archive/index.json @@ -0,0 +1,141 @@ +{ + "name": "wren-engine", + "description": "AI agent skills for Wren Engine — semantic SQL layer and MCP server for 20+ data sources.", + "homepage": "https://wren.ai", + "repository": "https://github.com/Canner/wren-engine", + "license": "Apache-2.0", + "skills": [ + { + "name": "wren-connection-info", + "version": "1.5", + "description": "Reference guide for Wren Engine connection info — required fields for all 18 data sources, sensitive field handling, Docker host hints, and credential encoding.", + "tags": [ + "wren", + "credentials", + "connection", + "database", + "security" + ], + "repository": "https://github.com/Canner/wren-engine/tree/main/skills-archive/wren-connection-info" + }, + { + "name": "wren-generate-mdl", + "version": "1.4", + "description": "Generate a Wren MDL manifest from a live database using MCP server introspection tools.", + "tags": [ + "wren", + "mdl", + "database", + "introspection", + "postgres", + "bigquery", + "snowflake", + "mysql", + "clickhouse", + "trino" + ], + "repository": "https://github.com/Canner/wren-engine/tree/main/skills-archive/wren-generate-mdl" + }, + { + "name": "wren-project", + "version": "1.5", + "description": "Save, load, and build Wren MDL manifests as YAML project directories for version control.", + "tags": [ + "wren", + "mdl", + "yaml", + "version-control", + "project", + "git" + ], + "repository": "https://github.com/Canner/wren-engine/tree/main/skills-archive/wren-project" + }, + { + "name": "wren-sql", + "version": "1.0", + "description": "Write and correct SQL queries targeting Wren Engine — types, date/time, BigQuery dialect, error diagnosis.", + "tags": [ + "wren", + "sql", + "bigquery", + "array", + "struct", + "datetime", + "mdl", + "text-to-sql" + ], + "repository": "https://github.com/Canner/wren-engine/tree/main/skills-archive/wren-sql" + }, + { + "name": "wren-mcp-setup", + "version": "1.4", + "description": "Set up Wren Engine MCP server via Docker and register it with an AI agent.", + "tags": [ + "wren", + "mcp", + "docker", + "claude-code", + "cursor", + "cline" + ], + "repository": "https://github.com/Canner/wren-engine/tree/main/skills-archive/wren-mcp-setup" + }, + { + "name": "wren-quickstart", + "version": "1.3", + "description": "End-to-end quickstart for Wren Engine — from zero to querying.", + "tags": [ + "wren", + "quickstart", + "onboarding", + "mcp", + "docker" + ], + "dependencies": [ + "wren-generate-mdl", + "wren-project", + "wren-mcp-setup" + ], + "repository": "https://github.com/Canner/wren-engine/tree/main/skills-archive/wren-quickstart" + }, + { + "name": "wren-http-api", + "version": "1.0", + "description": "Interact with Wren Engine MCP server via plain HTTP JSON-RPC requests — no MCP client SDK required.", + "tags": [ + "wren", + "http", + "json-rpc", + "api", + "openclaw", + "rest" + ], + "repository": "https://github.com/Canner/wren-engine/tree/main/skills-archive/wren-http-api" + }, + { + "name": "wren-usage", + "version": "1.2", + "description": "Wren Engine — semantic SQL engine for AI agents. Query 22+ data sources through a modeling layer. Main entry point for setup, SQL, MDL generation, and MCP server operations.", + "tags": [ + "wren", + "usage", + "sql", + "mdl", + "mcp", + "database", + "semantic-layer", + "postgres", + "bigquery", + "snowflake" + ], + "dependencies": [ + "wren-generate-mdl", + "wren-project", + "wren-sql", + "wren-mcp-setup", + "wren-http-api" + ], + "repository": "https://github.com/Canner/wren-engine/tree/main/skills-archive/wren-usage" + } + ] +} diff --git a/skills-archive/install.sh b/skills-archive/install.sh new file mode 100755 index 000000000..046869e76 --- /dev/null +++ b/skills-archive/install.sh @@ -0,0 +1,182 @@ +#!/usr/bin/env bash +# Install Wren Engine skills into your local AI agent skills directory. +# +# Usage: +# ./install.sh # install all skills +# ./install.sh wren-generate-mdl # install specific skills +# ./install.sh --force wren-sql # overwrite without prompt +# curl -fsSL https://raw.githubusercontent.com/Canner/wren-engine/main/skills-archive/install.sh | bash +# curl -fsSL .../install.sh | bash -s -- wren-generate-mdl + +set -euo pipefail + +REPO="Canner/wren-engine" +BRANCH="${WREN_SKILLS_BRANCH:-main}" +DEST="${CLAUDE_SKILLS_DIR:-$HOME/.claude/skills}" +ALL_SKILLS=(wren-generate-mdl wren-project wren-sql wren-mcp-setup wren-quickstart wren-connection-info wren-usage wren-http-api) + +# Parse --force flag and skill list from arguments +FORCE=false +SELECTED_SKILLS=() +for arg in "$@"; do + if [ "$arg" = "--force" ]; then + FORCE=true + else + SELECTED_SKILLS+=("$arg") + fi +done + +if [ "${#SELECTED_SKILLS[@]}" -eq 0 ]; then + SELECTED_SKILLS=("${ALL_SKILLS[@]}") +fi + +# Validate requested skills +for skill in "${SELECTED_SKILLS[@]}"; do + valid=false + for known in "${ALL_SKILLS[@]}"; do + if [ "$skill" = "$known" ]; then valid=true; break; fi + done + if [ "$valid" = false ]; then + echo "Unknown skill: $skill" >&2 + echo "Available: ${ALL_SKILLS[*]}" >&2 + exit 1 + fi +done + +# Detect whether we are running from a local clone or piped via curl. +# When piped, BASH_SOURCE[0] is empty or "/dev/stdin". +SCRIPT_DIR="" +if [ -n "${BASH_SOURCE[0]:-}" ] && [ "${BASH_SOURCE[0]}" != "/dev/stdin" ]; then + SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +fi + +# Locate index.json for dependency resolution (local or remote) +INDEX_JSON="" +if [ -n "$SCRIPT_DIR" ] && [ -f "$SCRIPT_DIR/index.json" ]; then + INDEX_JSON="$SCRIPT_DIR/index.json" +fi + +# Expand SELECTED_SKILLS to include dependencies declared in index.json. +# Only runs when python3 is available and index.json is accessible. +expand_with_deps() { + local json_file="$1" + shift + local -a input=("$@") + local -a result=() + + skill_in_result() { + local s="$1" + for r in "${result[@]:-}"; do [ "$r" = "$s" ] && return 0; done + return 1 + } + + is_known_skill() { + local s="$1" + for known in "${ALL_SKILLS[@]}"; do [ "$s" = "$known" ] && return 0; done + return 1 + } + + for skill in "${input[@]}"; do + skill_in_result "$skill" || result+=("$skill") + + if [ -n "$json_file" ] && command -v python3 &>/dev/null; then + while IFS= read -r dep; do + [ -z "$dep" ] && continue + is_known_skill "$dep" || continue + if ! skill_in_result "$dep"; then + echo " + $dep (dependency of $skill)" >&2 + result+=("$dep") + fi + done < <(python3 -c " +import json, sys +try: + d = json.load(open(sys.argv[1])) + s = next((x for x in d.get('skills', []) if x['name'] == sys.argv[2]), None) + if s: + for dep in s.get('dependencies', []): + print(dep) +except Exception: + pass +" "$json_file" "$skill" 2>/dev/null) + fi + done + + printf '%s\n' "${result[@]}" +} + +# Only expand deps when installing specific skills (not the full set) +if [ "${#SELECTED_SKILLS[@]}" -lt "${#ALL_SKILLS[@]}" ] && [ -n "$INDEX_JSON" ]; then + EXPANDED=() + while IFS= read -r line; do + [ -n "$line" ] && EXPANDED+=("$line") + done < <(expand_with_deps "$INDEX_JSON" "${SELECTED_SKILLS[@]}") + SELECTED_SKILLS=("${EXPANDED[@]}") +fi + +install_from_local() { + local src="$1" skill="$2" dest_dir="$3" + if [ "$FORCE" = false ] && [ -d "$dest_dir" ]; then + echo " Skipping $skill (already exists). Use --force to overwrite." + return + fi + rm -rf "$dest_dir" + cp -r "$src/$skill" "$dest_dir" + echo " Installed $skill" +} + +install_from_archive() { + local tmpdir="$1" skill="$2" dest_dir="$3" + if [ "$FORCE" = false ] && [ -d "$dest_dir" ]; then + echo " Skipping $skill (already exists). Use --force to overwrite." + return + fi + if [ ! -d "$tmpdir/$skill" ]; then + echo " Failed: $skill not found in archive" >&2 + return 1 + fi + rm -rf "$dest_dir" + cp -r "$tmpdir/$skill" "$dest_dir" + echo " Installed $skill" +} + +mkdir -p "$DEST" + +if [ -n "$SCRIPT_DIR" ] && [ -d "$SCRIPT_DIR/wren-generate-mdl" ]; then + # ---- Local mode: copy directly from repo ---- + echo "Installing from local repo: $SCRIPT_DIR" + echo "Destination: $DEST" + echo "" + for skill in "${SELECTED_SKILLS[@]}"; do + install_from_local "$SCRIPT_DIR" "$skill" "$DEST/$skill" + done +else + # ---- Remote mode: download GitHub archive ---- + echo "Downloading skills from GitHub ($REPO @ $BRANCH)..." + echo "Destination: $DEST" + echo "" + tmpdir=$(mktemp -d) + trap 'rm -rf "$tmpdir"' EXIT + + # Build the list of paths to extract from the tarball + extract_paths=() + for skill in "${SELECTED_SKILLS[@]}"; do + extract_paths+=("wren-engine-${BRANCH}/skills-archive/${skill}") + done + + curl -fsSL "https://github.com/$REPO/archive/refs/heads/$BRANCH.tar.gz" \ + | tar -xz -C "$tmpdir" --strip-components=2 "${extract_paths[@]}" + + for skill in "${SELECTED_SKILLS[@]}"; do + install_from_archive "$tmpdir" "$skill" "$DEST/$skill" + done +fi + +echo "" +echo "Done. Invoke skills in your AI client:" +for skill in "${SELECTED_SKILLS[@]}"; do + echo " /$skill" +done +echo "" +echo "To update skills later, re-run with --force:" +echo " curl -fsSL https://raw.githubusercontent.com/Canner/wren-engine/main/skills-archive/install.sh | bash -s -- --force" +echo "Or check for updates: each skill notifies you automatically when a newer version is available." diff --git a/skills-archive/versions.json b/skills-archive/versions.json new file mode 100644 index 000000000..67b174e04 --- /dev/null +++ b/skills-archive/versions.json @@ -0,0 +1,10 @@ +{ + "wren-generate-mdl": "1.4", + "wren-connection-info": "1.5", + "wren-project": "1.5", + "wren-sql": "1.0", + "wren-mcp-setup": "1.4", + "wren-quickstart": "1.3", + "wren-http-api": "1.0", + "wren-usage": "1.2" +} diff --git a/skills/wren-connection-info/SKILL.md b/skills-archive/wren-connection-info/SKILL.md similarity index 100% rename from skills/wren-connection-info/SKILL.md rename to skills-archive/wren-connection-info/SKILL.md diff --git a/skills/wren-connection-info/references/databases.md b/skills-archive/wren-connection-info/references/databases.md similarity index 100% rename from skills/wren-connection-info/references/databases.md rename to skills-archive/wren-connection-info/references/databases.md diff --git a/skills/wren-connection-info/references/file-sources.md b/skills-archive/wren-connection-info/references/file-sources.md similarity index 100% rename from skills/wren-connection-info/references/file-sources.md rename to skills-archive/wren-connection-info/references/file-sources.md diff --git a/skills-archive/wren-generate-mdl/SKILL.md b/skills-archive/wren-generate-mdl/SKILL.md new file mode 100644 index 000000000..dda1c32b3 --- /dev/null +++ b/skills-archive/wren-generate-mdl/SKILL.md @@ -0,0 +1,214 @@ +--- +name: wren-generate-mdl +description: Generate a Wren MDL manifest from a database using ibis-server metadata endpoints. Use when a user wants to create or set up a new Wren MDL, scaffold a manifest from an existing database, or onboard a new data source without installing any database drivers locally. +compatibility: Requires a running ibis-server (default port 8000). No local database drivers needed. +license: Apache-2.0 +metadata: + author: wren-engine + version: "1.4" +--- + +# Generate Wren MDL + +## Version check + +Silently fetch `https://raw.githubusercontent.com/Canner/wren-engine/main/skills-archive/versions.json`. +Compare the `wren-generate-mdl` key with this skill's version (from the frontmatter above). +If the remote version is newer, notify the user before proceeding: + +> A newer version of the **wren-generate-mdl** skill is available. +> Update with: +> ``` +> npx skills add Canner/wren-engine --skill wren-generate-mdl --agent claude-code +> ``` + +Then continue with the workflow below regardless of update status. + +--- + +Generates a Wren MDL manifest by using ibis-server to introspect the database schema — no local database drivers required. All schema discovery goes through ibis-server, which already has drivers for all supported data sources. + +## Workflow + +Follow these steps in order. Do not skip steps or ask unnecessary questions between them. + +### Step 1 — Verify connection and choose data source + +> **Connection info can ONLY be configured through the Web UI at `http://localhost:9001`.** Do not attempt to set connection info programmatically via ibis-server API calls, curl, or any other method. The ibis-server does not expose a public API for writing connection info — only the Web UI can do this. + +Confirm the MCP server has a working connection before proceeding: + +```text +health_check() +``` + +If the health check fails, or if the user has not yet configured a connection, direct them to the Web UI at `http://localhost:9001` to enter their data source credentials. Wait for the user to confirm the connection is saved before continuing. + +Ask the user for: +1. **Data source type** (e.g. `POSTGRES`, `BIGQUERY`, `SNOWFLAKE`, …) — needed to set `dataSource` in the MDL +2. **Schema filter** (optional) — if the database has many schemas, ask which schema(s) to include + +After this step you will have: +- `data_source`: e.g. `"POSTGRES"` +- Optional `schema_filter`: used to narrow down results in subsequent steps + +### Step 2 — Fetch table schema + +```text +list_remote_tables() +``` + +Returns a list of tables with their column names and types. Each table entry has a `properties.schema` field — use it to filter to the user's target schema if specified. + +If this fails: +1. Check that read-only mode is **disabled** in the Web UI (`http://localhost:9001`) — `list_remote_tables()` will fail when read-only mode is on, even if the connection is healthy. +2. Ask the user to verify connection info in the Web UI if read-only mode is already off. + +### Step 3 — Fetch relationships + +```text +list_remote_constraints() +``` + +Returns foreign key constraints. Use these to build `Relationship` entries in the MDL. If the response is empty (`[]`), infer relationships from column naming conventions (e.g. `order_id` → `orders.id`). + +If this fails, verify that read-only mode is disabled in the Web UI (`http://localhost:9001`). + +### Step 4 — Build MDL JSON + +Construct the manifest following the [MDL structure](#mdl-structure) below. + +Rules: +- `catalog`: use `"wren"` unless the user specifies otherwise +- `schema`: use the target schema name (e.g. `"public"` for PostgreSQL default, `"jaffle_shop"` if user specified) +- `dataSource`: set to the enum value from Step 1 (e.g. `"POSTGRES"`) +- `tableReference.catalog`: set to the database name (not `"wren"`) +- Each table → one `Model`. Set `tableReference.table` to the exact table name +- Each column → one `Column`. Use the exact DB column name +- Mark primary key columns with `"isPrimaryKey": true` and set `primaryKey` on the model +- For FK columns, add a `Relationship` entry linking the two models +- Omit calculated columns for now — they can be added later + +### Step 5 — Validate + +Deploy the draft MDL and validate it with a dry run: + +```text +deploy_manifest(mdl=) +dry_run(sql="SELECT * FROM LIMIT 1") +``` + +If `dry_run` succeeds, the MDL is valid. If it fails, fix the reported errors, call `deploy_manifest` again with the corrected MDL, and retry. + +### Step 6 — Save project (optional) + +Ask the user if they want to save the MDL as a YAML project directory (useful for version control). + +If yes, follow the **wren-project** skill (`skills-archive/wren-project/SKILL.md`) to write the YAML files and build `target/mdl.json`. + +### Step 7 — Deploy final MDL + +``` +deploy_manifest(mdl=) +``` + +Confirm success to the user. The MDL is now active and queries can run. + +--- + +## MDL Structure + +```json +{ + "catalog": "wren", + "schema": "public", + "dataSource": "POSTGRES", + "models": [ + { + "name": "orders", + "tableReference": { + "catalog": "", + "schema": "public", + "table": "orders" + }, + "columns": [ + { + "name": "order_id", + "type": "INTEGER", + "isCalculated": false, + "notNull": true, + "isPrimaryKey": true, + "properties": {} + }, + { + "name": "customer_id", + "type": "INTEGER", + "isCalculated": false, + "notNull": false, + "properties": {} + }, + { + "name": "total", + "type": "DECIMAL", + "isCalculated": false, + "notNull": false, + "properties": {} + } + ], + "primaryKey": "order_id", + "cached": false, + "properties": {} + } + ], + "relationships": [ + { + "name": "orders_customer", + "models": ["orders", "customers"], + "joinType": "MANY_TO_ONE", + "condition": "orders.customer_id = customers.customer_id" + } + ], + "views": [] +} +``` + +### Column types + +Map SQL/ibis types to MDL type strings: + +| SQL / ibis type | MDL type | +|-----------------|----------| +| INT, INTEGER, INT4 | `INTEGER` | +| BIGINT, INT8 | `BIGINT` | +| SMALLINT, INT2 | `SMALLINT` | +| FLOAT, FLOAT4, REAL | `FLOAT` | +| DOUBLE, FLOAT8 | `DOUBLE` | +| DECIMAL, NUMERIC | `DECIMAL` | +| VARCHAR, TEXT, STRING | `VARCHAR` | +| CHAR | `CHAR` | +| BOOLEAN, BOOL | `BOOLEAN` | +| DATE | `DATE` | +| TIMESTAMP, DATETIME | `TIMESTAMP` | +| TIMESTAMPTZ | `TIMESTAMPTZ` | +| JSON, JSONB | `JSON` | +| ARRAY | `ARRAY` | +| BYTES, BYTEA | `BYTES` | + +When in doubt, use `VARCHAR` as a safe fallback. + +### Relationship join types + +| Cardinality | `joinType` value | +|-------------|-----------------| +| Many-to-one (FK table → PK table) | `MANY_TO_ONE` | +| One-to-many | `ONE_TO_MANY` | +| One-to-one | `ONE_TO_ONE` | +| Many-to-many | `MANY_TO_MANY` | + +--- + +## Connection setup + +Connection info is configured **exclusively** via the MCP server Web UI at `http://localhost:9001`. There is no API endpoint for setting connection info — do not attempt to configure it programmatically. See the **wren-mcp-setup** skill for Docker setup instructions. + +> **Note:** If the Web UI is disabled (`WEB_UI_ENABLED=false`), connection info must be pre-configured in `~/.wren/connection_info.json` before starting the container. Use `/wren-connection-info` in Claude Code for the required fields per data source. diff --git a/skills/wren-http-api/SKILL.md b/skills-archive/wren-http-api/SKILL.md similarity index 100% rename from skills/wren-http-api/SKILL.md rename to skills-archive/wren-http-api/SKILL.md diff --git a/skills/wren-http-api/references/response-format.md b/skills-archive/wren-http-api/references/response-format.md similarity index 100% rename from skills/wren-http-api/references/response-format.md rename to skills-archive/wren-http-api/references/response-format.md diff --git a/skills/wren-http-api/references/tools.md b/skills-archive/wren-http-api/references/tools.md similarity index 100% rename from skills/wren-http-api/references/tools.md rename to skills-archive/wren-http-api/references/tools.md diff --git a/skills/wren-http-api/scripts/session.sh b/skills-archive/wren-http-api/scripts/session.sh similarity index 100% rename from skills/wren-http-api/scripts/session.sh rename to skills-archive/wren-http-api/scripts/session.sh diff --git a/skills/wren-mcp-setup/SKILL.md b/skills-archive/wren-mcp-setup/SKILL.md similarity index 100% rename from skills/wren-mcp-setup/SKILL.md rename to skills-archive/wren-mcp-setup/SKILL.md diff --git a/skills/wren-project/SKILL.md b/skills-archive/wren-project/SKILL.md similarity index 100% rename from skills/wren-project/SKILL.md rename to skills-archive/wren-project/SKILL.md diff --git a/skills/wren-quickstart/SKILL.md b/skills-archive/wren-quickstart/SKILL.md similarity index 100% rename from skills/wren-quickstart/SKILL.md rename to skills-archive/wren-quickstart/SKILL.md diff --git a/skills/wren-sql/SKILL.md b/skills-archive/wren-sql/SKILL.md similarity index 100% rename from skills/wren-sql/SKILL.md rename to skills-archive/wren-sql/SKILL.md diff --git a/skills/wren-sql/references/bigquery.md b/skills-archive/wren-sql/references/bigquery.md similarity index 100% rename from skills/wren-sql/references/bigquery.md rename to skills-archive/wren-sql/references/bigquery.md diff --git a/skills/wren-sql/references/correction.md b/skills-archive/wren-sql/references/correction.md similarity index 100% rename from skills/wren-sql/references/correction.md rename to skills-archive/wren-sql/references/correction.md diff --git a/skills/wren-sql/references/datetime.md b/skills-archive/wren-sql/references/datetime.md similarity index 100% rename from skills/wren-sql/references/datetime.md rename to skills-archive/wren-sql/references/datetime.md diff --git a/skills/wren-sql/references/types.md b/skills-archive/wren-sql/references/types.md similarity index 100% rename from skills/wren-sql/references/types.md rename to skills-archive/wren-sql/references/types.md diff --git a/skills-archive/wren-usage/SKILL.md b/skills-archive/wren-usage/SKILL.md new file mode 100644 index 000000000..3c255317f --- /dev/null +++ b/skills-archive/wren-usage/SKILL.md @@ -0,0 +1,168 @@ +--- +name: wren-usage +description: "Wren Engine — semantic SQL engine for AI agents. Query 22+ data sources (PostgreSQL, BigQuery, Snowflake, MySQL, ClickHouse, etc.) through a modeling layer (MDL). This skill is the main entry point: it guides setup, delegates to focused sub-skills for SQL authoring, MDL generation, project management, and MCP server operations. Use when: write SQL, query data, generate or update MDL, change database connection, manage YAML projects, set up or operate MCP server, or get started with Wren Engine for the first time." +license: Apache-2.0 +metadata: + author: wren-engine + version: "1.2" +--- + +# Wren Engine — Usage Guide + +## Version check + +Silently fetch `https://raw.githubusercontent.com/Canner/wren-engine/main/skills-archive/versions.json`. +Compare the `wren-usage` key with this skill's version (from the frontmatter above). +If the remote version is newer, notify the user before proceeding: + +> A newer version of the **wren-usage** skill is available. +> Update with: +> ``` +> npx skills add Canner/wren-engine --skill wren-usage --agent claude-code +> ``` + +Then continue with the workflow below regardless of update status. + +--- + +This skill is your day-to-day reference for working with Wren Engine. It delegates to focused sub-skills for each task. + +--- + +## Step 0 — Install dependent skills (first time only) + +Check whether the required skills are already installed in `~/.claude/skills/`. If any are missing, tell the user to run: + +```bash +# Option A — npx skills (works with Claude Code, Cursor, and 30+ agents) +npx skills add Canner/wren-engine --skill '*' --agent claude-code + +# Option B — Clawhub (if installed via clawhub) +clawhub install wren-usage +``` + +This installs `wren-usage` and its dependent skills (`wren-connection-info`, `wren-generate-mdl`, `wren-project`, `wren-sql`, `wren-mcp-setup`, `wren-http-api`) into `~/.claude/skills/`. + +After installation, the user must **start a new session** for the new skills to be loaded. + +> If the user only wants the MCP server set up (no Docker yet), use `/wren-quickstart` for a guided end-to-end walkthrough instead. + +--- + +## What do you want to do? + +Identify the user's intent and delegate to the appropriate skill: + +| Task | Skill | +|------|-------| +| Write or debug a SQL query | `@wren-sql` | +| Connect to a new database / change credentials | `@wren-connection-info` | +| Generate MDL from an existing database | `@wren-generate-mdl` | +| Save MDL to YAML files (version control) | `@wren-project` | +| Load a saved YAML project / rebuild `target/mdl.json` | `@wren-project` | +| Add a new model or column to the MDL | `@wren-project` | +| Start, reset, or reconfigure the MCP server | `@wren-mcp-setup` | +| Call Wren tools via HTTP JSON-RPC (no MCP SDK) | `@wren-http-api` | +| First-time setup from scratch | `@wren-quickstart` | + +--- + +## Common workflows + +### Query your data + +Invoke `@wren-sql` to write a SQL query against the deployed MDL. + +Key rules: +- Query MDL model names directly (e.g. `SELECT * FROM orders`) +- Use `CAST` for type conversions, not `::` syntax +- Avoid correlated subqueries — use JOINs or CTEs instead + +```sql +-- Example: revenue by month +SELECT DATE_TRUNC('month', order_date) AS month, + SUM(total) AS revenue +FROM orders +GROUP BY 1 +ORDER BY 1 +``` + +For type-specific patterns (ARRAY, STRUCT, JSON), date/time arithmetic, or BigQuery dialect quirks, invoke `@wren-sql` for full guidance. + +--- + +### Update connection credentials + +To change credentials, direct the user to the MCP server Web UI at `http://localhost:9001`. Connection info can only be configured through the Web UI — do not attempt to set it programmatically. + +Invoke `@wren-connection-info` for a reference of required fields per data source (so you can guide the user on what to enter in the Web UI). + +--- + +### Extend the MDL + +To add a model, column, relationship, or view to an existing project: + +1. Invoke `@wren-project` — **Load** the existing YAML project into an MDL dict +2. Edit the relevant YAML file (e.g. `models/orders.yml`) +3. Invoke `@wren-project` — **Build** to compile updated `target/mdl.json` +4. Call `deploy(mdl_file_path="./target/mdl.json")` to apply the change + +--- + +### Regenerate MDL from database + +When the database schema has changed and the MDL needs to be refreshed: + +1. Invoke `@wren-connection-info` — confirm or update credentials +2. Invoke `@wren-generate-mdl` — re-introspect the database and rebuild the MDL JSON +3. Invoke `@wren-project` — **Save** the new MDL as an updated YAML project +4. Invoke `@wren-project` — **Build** to compile `target/mdl.json` +5. Deploy + +--- + +### MCP server operations + +| Operation | Command | +|-----------|---------| +| Check status | `docker ps --filter name=wren-mcp` | +| View logs | `docker logs wren-mcp` | +| Restart | `docker restart wren-mcp` | +| Full reconfigure | Invoke `@wren-mcp-setup` | +| Verify health | `health_check()` via MCP tools | + +--- + +## Quick reference — MCP tools + +| Tool | Purpose | +|------|---------| +| `health_check()` | Verify Wren Engine is reachable | +| `query(sql=...)` | Execute a SQL query against the deployed MDL | +| `deploy(mdl_file_path=...)` | Load a compiled `mdl.json` | +| `setup_connection(...)` | Configure data source credentials | +| `list_remote_tables(...)` | Introspect database schema | +| `mdl_validate_manifest(...)` | Validate an MDL JSON dict | +| `mdl_save_project(...)` | Save MDL as a YAML project | + +--- + +## Troubleshooting quick guide + +**Query fails with "table not found":** +- The MDL may not be deployed. Run `deploy(mdl_file_path="./target/mdl.json")`. +- Check model names match exactly (case-sensitive). + +**Connection error on queries:** +- Verify credentials with `@wren-connection-info`. +- Inside Docker: use `host.docker.internal` instead of `localhost`. + +**MDL changes not reflected:** +- Re-run `@wren-project` **Build** step and re-deploy. + +**MCP tools unavailable:** +- Start a new Claude Code session after registering the MCP server. +- Check: `docker ps --filter name=wren-mcp` and `docker logs wren-mcp`. + +For detailed MCP setup troubleshooting, invoke `@wren-mcp-setup`. diff --git a/skills/.claude-plugin/marketplace.json b/skills/.claude-plugin/marketplace.json index 62aac87a1..c1ddd21e9 100644 --- a/skills/.claude-plugin/marketplace.json +++ b/skills/.claude-plugin/marketplace.json @@ -5,38 +5,33 @@ "email": "dev@cannerdata.com" }, "metadata": { - "description": "Wren Engine skills — semantic SQL, MDL management, MCP server setup for 20+ data sources" + "description": "Wren Engine CLI skills — semantic SQL, MDL generation, query workflows for 22+ data sources" }, "plugins": [ { "name": "wren", "source": ".", - "description": "AI agent skills for Wren Engine — semantic SQL layer and MCP server for 20+ data sources.", + "description": "AI agent skills for Wren Engine CLI — semantic SQL layer for 22+ data sources.", "version": "1.0.0", "keywords": [ "wren", "sql", "mdl", - "mcp", + "cli", "semantic-layer", - "docker", - "amazon-s3", - "apache-spark", - "apache-doris", - "athena", + "postgres", "bigquery", + "snowflake", + "mysql", "clickhouse", + "trino", + "mssql", "databricks", - "duckdb", - "google-cloud-storage", - "minio", - "mysql", - "oracle", - "postgres", "redshift", - "sql-server", - "snowflake", - "trino" + "spark", + "athena", + "oracle", + "duckdb" ] } ] diff --git a/skills/.claude-plugin/plugin.json b/skills/.claude-plugin/plugin.json index 178ce0252..662048e4a 100644 --- a/skills/.claude-plugin/plugin.json +++ b/skills/.claude-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "wren", - "description": "AI agent skills for Wren Engine — semantic SQL layer and MCP server for 20+ data sources.", + "description": "AI agent skills for Wren Engine CLI — semantic SQL layer for 22+ data sources.", "version": "1.0.0", "author": { "name": "Canner", @@ -13,26 +13,21 @@ "wren", "sql", "mdl", - "mcp", + "cli", "semantic-layer", "data-source", - "docker", - "amazon-s3", - "apache-spark", - "apache-doris", - "athena", + "postgres", "bigquery", + "snowflake", + "mysql", "clickhouse", + "trino", + "mssql", "databricks", - "duckdb", - "google-cloud-storage", - "minio", - "mysql", - "oracle", - "postgres", "redshift", - "sql-server", - "snowflake", - "trino" + "spark", + "athena", + "oracle", + "duckdb" ] } diff --git a/skills/AUTHORING.md b/skills/AUTHORING.md index f8f0081cb..13be36cce 100644 --- a/skills/AUTHORING.md +++ b/skills/AUTHORING.md @@ -29,8 +29,7 @@ Every `SKILL.md` must open with YAML frontmatter: name: skill-name description: "What this skill does and when to trigger it. Include specific trigger keywords. This field is loaded at startup for every conversation." -compatibility: "Optional. Only include if the skill has specific environment - requirements (e.g. requires Docker, must run from ibis-server/)." +license: Apache-2.0 metadata: author: wren-engine version: "1.0" @@ -77,20 +76,15 @@ Link to reference files from `SKILL.md` using paths relative to the skill root: Follow [references/diagnose.md](references/diagnose.md) for per-stage investigation steps. ``` -When linking from `SKILLS.md` (one level up), prefix with the skill directory name: -```markdown -| [references/diagnose.md](my-skill/references/diagnose.md) | Diagnosis steps | -``` - --- ## Naming Conventions | Item | Convention | Example | |------|-----------|---------| -| Skill directory | `kebab-case` | `wren-debugging/` | -| `name` field | same as directory | `wren-debugging` | -| Reference files | descriptive `kebab-case` | `plan-template.md`, `connection-info.md` | +| Skill directory | `kebab-case` | `wren-generate-mdl/` | +| `name` field | same as directory | `wren-generate-mdl` | +| Reference files | descriptive `kebab-case` | `memory.md`, `wren-sql.md` | --- @@ -104,7 +98,7 @@ After creating a new skill: 4. Add an entry to [index.json](index.json) with `name`, `version`, `description`, `tags`, `dependencies` (if any), and `repository`. 5. Add the skill to the `ALL_SKILLS` array in [install.sh](install.sh). -Both `versions.json` and `index.json` must stay in sync with the `version` field in the skill's `SKILL.md` frontmatter. Run `bash skills/check-versions.sh` to verify parity before merging — the script validates both files. +Both `versions.json` and `index.json` must stay in sync with the `version` field in the skill's `SKILL.md` frontmatter. Run `bash skills/check-versions.sh` to verify parity before merging. --- diff --git a/skills/README.md b/skills/README.md index 44ad0d65b..c8726de0f 100644 --- a/skills/README.md +++ b/skills/README.md @@ -1,6 +1,6 @@ -# Wren Engine Skills +# Wren Engine CLI Skills -This directory contains reusable AI agent skills for working with Wren Engine. Skills are instruction files that teach AI agents (Claude, Cline, Cursor, etc.) how to perform specific workflows with Wren's tools and APIs. +This directory contains AI agent skills for working with the Wren Engine CLI (`wren`). Skills are instruction files that teach AI agents how to query data, generate MDL projects, and manage semantic layers using the `wren` CLI — no Docker or MCP server required. ## Installation @@ -17,7 +17,7 @@ Or test locally during development: claude --plugin-dir ./skills ``` -Skills are namespaced as `/wren:` (e.g., `/wren:generate-mdl`, `/wren:wren-sql`). +Skills are namespaced as `/wren:` (e.g., `/wren:wren-generate-mdl`, `/wren:wren-usage`). ### Option 2 — npx skills @@ -28,95 +28,71 @@ npx skills add Canner/wren-engine --skill '*' --agent claude-code `npx skills` also supports Cursor, Windsurf, and 30+ other agent tools — replace `--agent claude-code` with your agent of choice. -### Option 3 — ClawHub - -Two skills are published on [ClawHub](https://clawhub.ai) for agents that support the ClawHub registry: - -```bash -clawhub install wren-usage # main entry point — installs all dependent skills -clawhub install wren-http-api # standalone — HTTP JSON-RPC for non-MCP clients -``` - -### Option 4 — install script (from a local clone) +### Option 3 — install script (from a local clone) ```bash -bash skills/install.sh # all skills -bash skills/install.sh wren-generate-mdl wren-sql # specific skills -bash skills/install.sh --force wren-generate-mdl # overwrite existing +bash skills/install.sh # all skills +bash skills/install.sh wren-usage # specific skill (auto-installs dependencies) +bash skills/install.sh --force wren-usage # overwrite existing ``` -### Option 5 — manual copy +### Option 4 — manual copy ```bash -cp -r skills/wren-usage ~/.claude/skills/ -# or all at once: -cp -r skills/wren-usage skills/wren-generate-mdl skills/wren-project skills/wren-sql skills/wren-mcp-setup skills/wren-connection-info ~/.claude/skills/ +cp -r skills/wren-usage skills/wren-generate-mdl ~/.claude/skills/ ``` Once installed, invoke a skill by name in your conversation: ```text /wren-usage -/wren-quickstart /wren-generate-mdl -/wren-project -/wren-sql -/wren-mcp-setup ``` -> **Tip:** Use `--skill '*'` to install all skills at once, or specify individual skills (e.g., `--skill wren-generate-mdl --skill wren-sql`). +> **Tip:** Use `--skill '*'` to install all skills at once, or specify individual skills. ## Available Skills | Skill | Description | |-------|-------------| -| [wren-usage](wren-usage/SKILL.md) | **Primary skill** — daily usage guide: query data, manage MDL, connect databases, operate MCP server `ClawHub` | -| [wren-quickstart](wren-quickstart/SKILL.md) | End-to-end first-time setup — install skills, generate MDL, save project, start MCP server, verify setup | -| [wren-generate-mdl](wren-generate-mdl/SKILL.md) | Generate a Wren MDL manifest from a live database using ibis-server introspection | -| [wren-project](wren-project/SKILL.md) | Save, load, and build MDL manifests as version-controlled YAML project directories | -| [wren-sql](wren-sql/SKILL.md) | Write and correct SQL queries for Wren Engine — types, date/time, BigQuery dialect, error diagnosis | -| [wren-mcp-setup](wren-mcp-setup/SKILL.md) | Set up Wren Engine MCP via Docker, register with Claude Code or other MCP clients, and start querying | -| [wren-connection-info](wren-connection-info/SKILL.md) | Set up data source credentials — produces `connectionFilePath` or inline dict | -| [wren-http-api](wren-http-api/SKILL.md) | Interact with Wren MCP via plain HTTP JSON-RPC — no MCP SDK required (for OpenClaw, custom clients, scripts) `ClawHub` | +| [wren-usage](wren-usage/SKILL.md) | **Primary skill** — CLI workflow guide: query data via `wren --sql`, gather schema context with `wren memory`, store/recall queries, handle errors | +| [wren-generate-mdl](wren-generate-mdl/SKILL.md) | Generate a Wren MDL project from a live database — schema discovery, type normalization, YAML generation | -See [SKILLS.md](SKILLS.md) for full details on each skill. +### wren-usage reference files -## Updating Skills +| File | Topic | +|------|-------| +| [references/memory.md](wren-usage/references/memory.md) | When to index, fetch, store, and recall | +| [references/wren-sql.md](wren-usage/references/wren-sql.md) | CTE rewrite pipeline, SQL rules, error diagnosis | -Each skill automatically checks for updates when invoked. If a newer version is available, the AI agent will notify you with the update command before continuing. +## Updating Skills -To update manually at any time: +Each skill automatically checks for updates when invoked. To update manually: ```bash # Re-add to reinstall the latest version npx skills add Canner/wren-engine --skill '*' --agent claude-code -# Or reinstall a specific skill -npx skills add Canner/wren-engine --skill wren-generate-mdl --agent claude-code +# Or reinstall from a local clone +bash skills/install.sh --force ``` ## Releasing a New Skill Version -When updating a skill, two files must be kept in sync: - -1. Update `version` in the skill's `SKILL.md` frontmatter: - ```yaml - metadata: - author: wren-engine - version: "1.2" # bump this - ``` +When updating a skill, three files must be kept in sync: -2. Update the matching entry in [`versions.json`](versions.json): - ```json - { - "wren-generate-mdl": "1.2" - } - ``` +1. Update `version` in the skill's `SKILL.md` frontmatter +2. Update the matching entry in [`versions.json`](versions.json) +3. Update the matching entry in [`index.json`](index.json) -Both files must have the same version number. The `SKILL.md` version is what users have installed locally; `versions.json` is what the update check compares against. +Run `bash skills/check-versions.sh` to verify parity before merging. ## Requirements -- A running [ibis-server](../ibis-server/) instance -- The [Wren MCP server](../mcp-server/) connected to your AI client +- `wren` CLI installed (`pip install wren-engine` or `pip install wren-engine[]`) +- A database connection (configured via `wren profile add` or `~/.wren/connection_info.json`) - An AI client that supports skills (Claude Code, Cline, Cursor, etc.) + +## Archived Skills (MCP-based) + +The previous MCP server-based skills are preserved in [`skills-archive/`](../skills-archive/). Those skills require a running ibis-server and MCP server. The CLI skills in this directory replace that workflow with the standalone `wren` CLI. diff --git a/skills/SKILLS.md b/skills/SKILLS.md index 083c5f29d..df992e108 100644 --- a/skills/SKILLS.md +++ b/skills/SKILLS.md @@ -1,4 +1,4 @@ -# Wren Engine Skill Reference +# Wren Engine CLI Skill Reference Skills are instruction files that extend AI agents with Wren-specific workflows. Install them into your local skills folder and invoke them by name during a conversation. @@ -8,62 +8,28 @@ Skills are instruction files that extend AI agents with Wren-specific workflows. **File:** [wren-usage/SKILL.md](wren-usage/SKILL.md) -**Primary entry point** for day-to-day Wren Engine usage. Identifies the user's task and delegates to the appropriate focused skill. Covers SQL queries, MDL management, database connections, and MCP server operations. +**Primary entry point** for day-to-day Wren Engine CLI usage. Covers the full query workflow: gather schema context, recall past queries, write SQL through the MDL semantic layer, execute via `wren --sql`, and store confirmed results. ### When to use -- Writing or debugging SQL queries against a deployed MDL -- Adding or modifying models, columns, or relationships in the MDL -- Changing database credentials or data source -- Rebuilding `target/mdl.json` after project changes -- Restarting or reconfiguring the MCP server +- Answering data questions using the `wren` CLI +- Debugging SQL errors (MDL-level vs DB-level diagnosis) +- Connecting a new data source via `wren profile` +- Re-indexing memory after MDL changes - Any ongoing Wren task after initial setup is complete -### Dependent skills - -| Skill | Purpose | -|-------|---------| -| `@wren-sql` | Write and debug SQL queries | -| `@wren-connection-info` | Set up or change database credentials | -| `@wren-generate-mdl` | Regenerate MDL from a changed database schema | -| `@wren-project` | Save, load, and build MDL YAML projects | -| `@wren-mcp-setup` | Reconfigure the MCP server | - -> Installing `wren-usage` via `install.sh` automatically installs all dependent skills: -> ```bash -> bash skills/install.sh wren-usage -> ``` - ---- - -## wren-quickstart - -**File:** [wren-quickstart/SKILL.md](wren-quickstart/SKILL.md) - -End-to-end onboarding guide for Wren Engine. Orchestrates the full setup flow — from installing skills and creating a workspace, to generating an MDL, saving it as a versioned project, starting the MCP Docker container, and verifying everything works. - -### When to use - -- Setting up Wren Engine for the first time -- Onboarding a new data source from scratch -- Getting a new team member started with Wren MCP - -### Workflow summary +### Reference files -1. Install required skills via `install.sh` -2. Create a workspace directory on the host machine -3. Generate MDL from the database (`@wren-generate-mdl`) -4. Save as a YAML project and compile to `target/` (`@wren-project`) -5. Start the Docker container and register the MCP server (`@wren-mcp-setup`) -6. Run `health_check()` to verify — then start a new session and query +| File | Topic | +|------|-------| +| [references/memory.md](wren-usage/references/memory.md) | When to index, fetch, store, and recall | +| [references/wren-sql.md](wren-usage/references/wren-sql.md) | CTE rewrite pipeline, SQL rules, error diagnosis | ### Dependent skills | Skill | Purpose | |-------|---------| -| `@wren-generate-mdl` | Introspect database and build MDL JSON | -| `@wren-project` | Save MDL as YAML project + compile to `target/` | -| `@wren-mcp-setup` | Start Docker container and register MCP server | +| `wren-generate-mdl` | Generate or regenerate MDL from a database | --- @@ -71,161 +37,30 @@ End-to-end onboarding guide for Wren Engine. Orchestrates the full setup flow **File:** [wren-generate-mdl/SKILL.md](wren-generate-mdl/SKILL.md) -Generates a complete Wren MDL manifest by introspecting a live database through ibis-server — no local database drivers required. +Generates a Wren MDL project by exploring a live database using whatever tools are available to the agent (SQLAlchemy, database drivers, raw SQL). Handles schema discovery, type normalization via `wren utils parse-type`, and YAML project scaffolding via `wren context init`. ### When to use - Onboarding a new data source into Wren -- Scaffolding an MDL from an existing database schema -- Automating initial MDL setup as part of a data pipeline - -### Required MCP tools - -`setup_connection`, `list_remote_tables`, `list_remote_constraints`, `mdl_validate_manifest`, `mdl_save_project`, `deploy_manifest` - -### Supported data sources - -`POSTGRES`, `MYSQL`, `MSSQL`, `DUCKDB`, `BIGQUERY`, `SNOWFLAKE`, `CLICKHOUSE`, `TRINO`, `ATHENA`, `ORACLE`, `DATABRICKS` - -### Workflow summary - -1. Gather connection credentials from the user -2. Register the connection via `setup_connection` -3. Fetch table schema via `list_remote_tables` -4. Fetch foreign key constraints via `list_remote_constraints` -5. Optionally sample data for ambiguous columns -6. Build the MDL JSON (models, columns, relationships) -7. Validate via `mdl_validate_manifest` -8. Optionally save as a YAML project (see `wren-project`) -9. Deploy via `deploy_manifest` - ---- - -## wren-project - -**File:** [wren-project/SKILL.md](wren-project/SKILL.md) - -Manages Wren MDL manifests as human-readable YAML project directories — similar to dbt projects. Makes MDL version-control friendly by splitting the monolithic JSON into one YAML file per model. - -### When to use - -- Persisting an MDL to disk for version control (Git) -- Loading a saved YAML project back into a deployable MDL JSON -- Compiling a YAML project to `target/mdl.json` for deployment - -### Project layout - -``` -my_project/ -├── wren_project.yml # Catalog, schema, data source -├── models/ -│ ├── orders.yml # One file per model (snake_case fields) -│ └── customers.yml -├── relationships.yml -├── views.yml -└── target/ - └── mdl.json # Compiled output (camelCase, deployable) -``` - -### Key operations - -| Operation | Description | -|-----------|-------------| -| **Save** | Convert MDL JSON → YAML project directory (camelCase → snake_case) | -| **Load** | Read YAML project → assemble MDL JSON dict (snake_case → camelCase) | -| **Build** | Load + write result to `target/mdl.json` | -| **Deploy** | Pass `target/mdl.json` to `deploy(mdl_file_path=...)` | - -### Field mapping (YAML ↔ JSON) - -| YAML (snake_case) | JSON (camelCase) | -|-------------------|------------------| -| `data_source` | `dataSource` | -| `table_reference` | `tableReference` | -| `is_calculated` | `isCalculated` | -| `not_null` | `notNull` | -| `is_primary_key` | `isPrimaryKey` | -| `primary_key` | `primaryKey` | -| `join_type` | `joinType` | - ---- - -## wren-sql - -**File:** [wren-sql/SKILL.md](wren-sql/SKILL.md) - -Comprehensive SQL authoring and debugging guide for Wren Engine. Covers core query rules, filter strategies, supported types, aggregation, and links to topic-specific references. - -### When to use - -- Writing SQL queries against Wren Engine MDL models -- Debugging SQL errors across parsing, planning, transpiling, or execution stages -- Working with complex types (ARRAY, STRUCT, JSON/VARIANT) -- Writing date/time calculations or interval arithmetic -- Targeting BigQuery as a backend database - -### Reference files - -| File | Topic | -|------|-------| -| [references/correction.md](wren-sql/references/correction.md) | Error diagnosis and correction workflow | -| [references/datetime.md](wren-sql/references/datetime.md) | Date/time functions, intervals, epoch conversion | -| [references/types.md](wren-sql/references/types.md) | ARRAY, STRUCT, JSON/VARIANT/OBJECT types | -| [references/bigquery.md](wren-sql/references/bigquery.md) | BigQuery dialect quirks | - ---- - -## wren-mcp-setup - -**File:** [wren-mcp-setup/SKILL.md](wren-mcp-setup/SKILL.md) - -Sets up Wren Engine MCP server via Docker, registers it with an AI agent (Claude Code or other MCP clients), and starts a new session to begin interacting with Wren. - -### When to use - -- Running Wren MCP in Docker (no local Python/uv install required) -- Configuring Claude Code MCP to connect to a containerized Wren Engine -- Setting up Wren MCP for Cline, Cursor, or VS Code MCP Extension -- Fixing `localhost` → `host.docker.internal` in connection info for Docker - -### Workflow summary - -1. Ask user for workspace mount path -2. `docker run` with workspace mounted at `/workspace`, MCP server enabled on port 9000 -3. Rewrite `localhost` → `host.docker.internal` in connection credentials -4. Add `wren` MCP server to Claude Code using streamable-http on port 9000 (`claude mcp add`) -5. Start a new session so the MCP tools are loaded -6. Run `health_check()` to verify - ---- - -## wren-http-api - -**File:** [wren-http-api/SKILL.md](wren-http-api/SKILL.md) - -Interact with Wren Engine MCP server via plain HTTP JSON-RPC requests — no MCP client SDK required. Covers session initialization, tool discovery, and calling all 20+ Wren tools using standard HTTP POST with JSON-RPC 2.0 payloads. - -### When to use - -- The client cannot or prefers not to use the MCP protocol directly (e.g. OpenClaw) -- Building a custom HTTP integration with Wren Engine -- Calling Wren tools from shell scripts, CI pipelines, or non-MCP environments -- Debugging MCP tool calls with curl +- Scaffolding an MDL project from an existing database schema +- Re-generating models after database schema changes ### Workflow summary -1. Initialize a JSON-RPC session via `POST /mcp` with `initialize` method -2. Save the `Mcp-Session-Id` header from the response -3. Complete the handshake with `notifications/initialized` -4. Call any Wren tool via `tools/call` method with the session header -5. Parse SSE `data:` lines from responses +1. Establish connection and agree on scope with the user +2. Discover schema (tables, columns, types, constraints) +3. Normalize types via `wren.type_mapping.parse_type` or `wren utils parse-type` +4. Scaffold project with `wren context init` +5. Write model YAML files and `relationships.yml` +6. Validate (`wren context validate`) and build (`wren context build`) +7. Initialize memory (`wren memory index`) --- ## Installing a skill ```bash -# Install wren-usage (auto-installs all dependencies) +# Install wren-usage (auto-installs dependencies) bash skills/install.sh wren-usage # Or install everything @@ -237,8 +72,4 @@ Then invoke in your AI client: ``` /wren-usage /wren-generate-mdl -/wren-project -/wren-sql -/wren-mcp-setup -/wren-quickstart ``` diff --git a/skills/index.json b/skills/index.json index 4f1ca26c0..86257e20e 100644 --- a/skills/index.json +++ b/skills/index.json @@ -1,32 +1,20 @@ { "name": "wren-engine", - "description": "AI agent skills for Wren Engine — semantic SQL layer and MCP server for 20+ data sources.", + "description": "AI agent skills for Wren Engine CLI — semantic SQL layer for 22+ data sources.", "homepage": "https://wren.ai", "repository": "https://github.com/Canner/wren-engine", "license": "Apache-2.0", "skills": [ - { - "name": "wren-connection-info", - "version": "1.5", - "description": "Reference guide for Wren Engine connection info — required fields for all 18 data sources, sensitive field handling, Docker host hints, and credential encoding.", - "tags": [ - "wren", - "credentials", - "connection", - "database", - "security" - ], - "repository": "https://github.com/Canner/wren-engine/tree/main/skills/wren-connection-info" - }, { "name": "wren-generate-mdl", - "version": "1.4", - "description": "Generate a Wren MDL manifest from a live database using MCP server introspection tools.", + "version": "1.0", + "description": "Generate a Wren MDL project by exploring a database with available tools (SQLAlchemy, database drivers, MCP connectors, or raw SQL). Guides agents through schema discovery, type normalization, and MDL YAML generation using the wren CLI.", "tags": [ "wren", "mdl", "database", "introspection", + "cli", "postgres", "bigquery", "snowflake", @@ -36,104 +24,24 @@ ], "repository": "https://github.com/Canner/wren-engine/tree/main/skills/wren-generate-mdl" }, - { - "name": "wren-project", - "version": "1.5", - "description": "Save, load, and build Wren MDL manifests as YAML project directories for version control.", - "tags": [ - "wren", - "mdl", - "yaml", - "version-control", - "project", - "git" - ], - "repository": "https://github.com/Canner/wren-engine/tree/main/skills/wren-project" - }, - { - "name": "wren-sql", - "version": "1.0", - "description": "Write and correct SQL queries targeting Wren Engine — types, date/time, BigQuery dialect, error diagnosis.", - "tags": [ - "wren", - "sql", - "bigquery", - "array", - "struct", - "datetime", - "mdl", - "text-to-sql" - ], - "repository": "https://github.com/Canner/wren-engine/tree/main/skills/wren-sql" - }, - { - "name": "wren-mcp-setup", - "version": "1.4", - "description": "Set up Wren Engine MCP server via Docker and register it with an AI agent.", - "tags": [ - "wren", - "mcp", - "docker", - "claude-code", - "cursor", - "cline" - ], - "repository": "https://github.com/Canner/wren-engine/tree/main/skills/wren-mcp-setup" - }, - { - "name": "wren-quickstart", - "version": "1.3", - "description": "End-to-end quickstart for Wren Engine — from zero to querying.", - "tags": [ - "wren", - "quickstart", - "onboarding", - "mcp", - "docker" - ], - "dependencies": [ - "wren-generate-mdl", - "wren-project", - "wren-mcp-setup" - ], - "repository": "https://github.com/Canner/wren-engine/tree/main/skills/wren-quickstart" - }, - { - "name": "wren-http-api", - "version": "1.0", - "description": "Interact with Wren Engine MCP server via plain HTTP JSON-RPC requests — no MCP client SDK required.", - "tags": [ - "wren", - "http", - "json-rpc", - "api", - "openclaw", - "rest" - ], - "repository": "https://github.com/Canner/wren-engine/tree/main/skills/wren-http-api" - }, { "name": "wren-usage", - "version": "1.2", - "description": "Wren Engine — semantic SQL engine for AI agents. Query 22+ data sources through a modeling layer. Main entry point for setup, SQL, MDL generation, and MCP server operations.", + "version": "1.0", + "description": "Wren Engine CLI workflow guide for AI agents. Answer data questions end-to-end using the wren CLI: gather schema context, recall past queries, write SQL through the MDL semantic layer, execute, and learn from confirmed results.", "tags": [ "wren", "usage", "sql", "mdl", - "mcp", - "database", + "cli", + "memory", "semantic-layer", "postgres", "bigquery", "snowflake" ], "dependencies": [ - "wren-generate-mdl", - "wren-project", - "wren-sql", - "wren-mcp-setup", - "wren-http-api" + "wren-generate-mdl" ], "repository": "https://github.com/Canner/wren-engine/tree/main/skills/wren-usage" } diff --git a/skills/install.sh b/skills/install.sh index 71ca31594..92f53057a 100755 --- a/skills/install.sh +++ b/skills/install.sh @@ -1,10 +1,10 @@ #!/usr/bin/env bash -# Install Wren Engine skills into your local AI agent skills directory. +# Install Wren Engine CLI skills into your local AI agent skills directory. # # Usage: # ./install.sh # install all skills -# ./install.sh wren-generate-mdl # install specific skills -# ./install.sh --force wren-sql # overwrite without prompt +# ./install.sh wren-usage # install specific skills +# ./install.sh --force wren-usage # overwrite without prompt # curl -fsSL https://raw.githubusercontent.com/Canner/wren-engine/main/skills/install.sh | bash # curl -fsSL .../install.sh | bash -s -- wren-generate-mdl @@ -13,7 +13,7 @@ set -euo pipefail REPO="Canner/wren-engine" BRANCH="${WREN_SKILLS_BRANCH:-main}" DEST="${CLAUDE_SKILLS_DIR:-$HOME/.claude/skills}" -ALL_SKILLS=(wren-generate-mdl wren-project wren-sql wren-mcp-setup wren-quickstart wren-connection-info wren-usage wren-http-api) +ALL_SKILLS=(wren-generate-mdl wren-usage) # Parse --force flag and skill list from arguments FORCE=false @@ -44,7 +44,6 @@ for skill in "${SELECTED_SKILLS[@]}"; do done # Detect whether we are running from a local clone or piped via curl. -# When piped, BASH_SOURCE[0] is empty or "/dev/stdin". SCRIPT_DIR="" if [ -n "${BASH_SOURCE[0]:-}" ] && [ "${BASH_SOURCE[0]}" != "/dev/stdin" ]; then SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" @@ -57,7 +56,6 @@ if [ -n "$SCRIPT_DIR" ] && [ -f "$SCRIPT_DIR/index.json" ]; then fi # Expand SELECTED_SKILLS to include dependencies declared in index.json. -# Only runs when python3 is available and index.json is accessible. expand_with_deps() { local json_file="$1" shift @@ -157,7 +155,6 @@ else tmpdir=$(mktemp -d) trap 'rm -rf "$tmpdir"' EXIT - # Build the list of paths to extract from the tarball extract_paths=() for skill in "${SELECTED_SKILLS[@]}"; do extract_paths+=("wren-engine-${BRANCH}/skills/${skill}") @@ -179,4 +176,3 @@ done echo "" echo "To update skills later, re-run with --force:" echo " curl -fsSL https://raw.githubusercontent.com/Canner/wren-engine/main/skills/install.sh | bash -s -- --force" -echo "Or check for updates: each skill notifies you automatically when a newer version is available." diff --git a/skills/versions.json b/skills/versions.json index 67b174e04..da444f079 100644 --- a/skills/versions.json +++ b/skills/versions.json @@ -1,10 +1,4 @@ { - "wren-generate-mdl": "1.4", - "wren-connection-info": "1.5", - "wren-project": "1.5", - "wren-sql": "1.0", - "wren-mcp-setup": "1.4", - "wren-quickstart": "1.3", - "wren-http-api": "1.0", - "wren-usage": "1.2" + "wren-generate-mdl": "1.0", + "wren-usage": "1.0" } diff --git a/skills/wren-generate-mdl/SKILL.md b/skills/wren-generate-mdl/SKILL.md index 10fbc2bbe..32fff3173 100644 --- a/skills/wren-generate-mdl/SKILL.md +++ b/skills/wren-generate-mdl/SKILL.md @@ -1,214 +1,306 @@ --- name: wren-generate-mdl -description: Generate a Wren MDL manifest from a database using ibis-server metadata endpoints. Use when a user wants to create or set up a new Wren MDL, scaffold a manifest from an existing database, or onboard a new data source without installing any database drivers locally. -compatibility: Requires a running ibis-server (default port 8000). No local database drivers needed. +description: "Generate a Wren MDL project by exploring a database with available tools (SQLAlchemy, database drivers, MCP connectors, or raw SQL). Guides agents through schema discovery, type normalization, and MDL YAML generation using the wren CLI. Use when: user wants to create or set up a new MDL, onboard a new data source, or scaffold a project from an existing database." license: Apache-2.0 metadata: author: wren-engine - version: "1.4" + version: "1.0" --- -# Generate Wren MDL +# Generate Wren MDL — CLI Agent Workflow -## Version check +Builds an MDL project by discovering database schema and converting it +into Wren's YAML project format. The agent uses whatever database tools +are available in its environment for introspection; the wren CLI handles +type normalization, validation, and build. -Silently fetch `https://raw.githubusercontent.com/Canner/wren-engine/main/skills/versions.json`. -Compare the `wren-generate-mdl` key with this skill's version (from the frontmatter above). -If the remote version is newer, notify the user before proceeding: +For memory and query workflows after setup, see the **wren-usage** skill. -> A newer version of the **wren-generate-mdl** skill is available. -> Update with: -> ``` -> npx skills add Canner/wren-engine --skill wren-generate-mdl --agent claude-code -> ``` +--- + +## Prerequisites -Then continue with the workflow below regardless of update status. +- `wren` CLI installed (`pip install wren-engine[]`) +- A working database connection (credentials available to the agent) +- A wren profile configured (`wren profile add`) or connection info ready --- -Generates a Wren MDL manifest by using ibis-server to introspect the database schema — no local database drivers required. All schema discovery goes through ibis-server, which already has drivers for all supported data sources. +## Phase 1 — Establish connection and scope + +**Goal:** Confirm the agent can reach the database and agree on scope with the user. -## Workflow +1. Verify connectivity using whichever tool is available: + - If SQLAlchemy: `engine.connect()` test + - If database driver: simple query like `SELECT 1` + - If wren profile exists: `wren profile debug` to check config + - If raw SQL via wren: `wren --sql "SELECT 1"` (requires profile or connection file) -Follow these steps in order. Do not skip steps or ask unnecessary questions between them. +2. Ask the user: + - Which **schema(s)** or **dataset(s)** to include (skip if only one exists) + - Whether to include **all tables** or a subset + - The **datasource type** for wren (e.g., `postgres`, `bigquery`, `snowflake`) — needed for type normalization dialect -### Step 1 — Verify connection and choose data source +--- -> **Connection info can ONLY be configured through the Web UI at `http://localhost:9001`.** Do not attempt to set connection info programmatically via ibis-server API calls, curl, or any other method. The ibis-server does not expose a public API for writing connection info — only the Web UI can do this. +## Phase 2 — Discover schema -Confirm the MCP server has a working connection before proceeding: +**Goal:** Collect table names, column names, column types, and constraints. -```text -health_check() +Use whatever introspection method is available. Here are common approaches +ranked by convenience: + +### Option A: SQLAlchemy (recommended if available) + +```python +from sqlalchemy import create_engine, inspect + +engine = create_engine(connection_url) +inspector = inspect(engine) + +tables = inspector.get_table_names(schema="public") + +for table in tables: + columns = inspector.get_columns(table, schema="public") + # columns → [{"name": "id", "type": INTEGER(), "nullable": False, ...}] + + pk = inspector.get_pk_constraint(table, schema="public") + # pk → {"constrained_columns": ["id"], "name": "orders_pkey"} + + fks = inspector.get_foreign_keys(table, schema="public") + # fks → [{"constrained_columns": ["customer_id"], + # "referred_table": "customers", + # "referred_columns": ["id"]}] ``` -If the health check fails, or if the user has not yet configured a connection, direct them to the Web UI at `http://localhost:9001` to enter their data source credentials. Wait for the user to confirm the connection is saved before continuing. +### Option B: Database-specific driver -Ask the user for: -1. **Data source type** (e.g. `POSTGRES`, `BIGQUERY`, `SNOWFLAKE`, …) — needed to set `dataSource` in the MDL -2. **Schema filter** (optional) — if the database has many schemas, ask which schema(s) to include +- **psycopg / asyncpg (Postgres):** Query `information_schema.columns` and `information_schema.table_constraints` +- **google-cloud-bigquery:** `client.list_tables()`, `client.get_table()` → `table.schema` +- **snowflake-connector-python:** `SHOW COLUMNS IN TABLE`, `SHOW PRIMARY KEYS IN TABLE` +- **clickhouse-driver:** `DESCRIBE TABLE`, `system.tables` -After this step you will have: -- `data_source`: e.g. `"POSTGRES"` -- Optional `schema_filter`: used to narrow down results in subsequent steps +### Option C: Raw SQL via wren -### Step 2 — Fetch table schema +If no driver is available but a wren profile is configured, query +`information_schema` through wren itself: -```text -list_remote_tables() +```bash +wren --sql "SELECT table_name FROM information_schema.tables WHERE table_schema = 'public'" -o json +wren --sql "SELECT column_name, data_type FROM information_schema.columns WHERE table_name = 'orders'" -o json ``` -Returns a list of tables with their column names and types. Each table entry has a `properties.schema` field — use it to filter to the user's target schema if specified. +Note: this goes through the MDL layer, so it only works if you already +have a minimal MDL or if the database supports `information_schema` as +regular tables. For bootstrapping from zero, Option A or B is preferred. -If this fails: -1. Check that read-only mode is **disabled** in the Web UI (`http://localhost:9001`) — `list_remote_tables()` will fail when read-only mode is on, even if the connection is healthy. -2. Ask the user to verify connection info in the Web UI if read-only mode is already off. +--- -### Step 3 — Fetch relationships +## Phase 3 — Normalize types -```text -list_remote_constraints() +**Goal:** Convert raw database types to wren-core-compatible types. + +### Python import (recommended for batch processing) + +```python +from wren.type_mapping import parse_type, parse_types + +# Single type +normalized = parse_type("character varying(255)", "postgres") # → "VARCHAR(255)" + +# Batch — entire table at once +columns = [ + {"column": "id", "raw_type": "int8"}, + {"column": "name", "raw_type": "character varying"}, + {"column": "total", "raw_type": "numeric(10,2)"}, +] +normalized_cols = parse_types(columns, dialect="postgres") +# Each dict now has a "type" key with the normalized value ``` -Returns foreign key constraints. Use these to build `Relationship` entries in the MDL. If the response is empty (`[]`), infer relationships from column naming conventions (e.g. `order_id` → `orders.id`). +### CLI (if Python import not available) -If this fails, verify that read-only mode is disabled in the Web UI (`http://localhost:9001`). +Single type: +```bash +wren utils parse-type --type "character varying(255)" --dialect postgres +# → VARCHAR(255) +``` -### Step 4 — Build MDL JSON +Batch (stdin JSON): +```bash +echo '[{"column":"id","raw_type":"int8"},{"column":"name","raw_type":"character varying"}]' \ + | wren utils parse-types --dialect postgres +``` + +--- -Construct the manifest following the [MDL structure](#mdl-structure) below. +## Phase 4 — Scaffold and write MDL project -Rules: -- `catalog`: use `"wren"` unless the user specifies otherwise -- `schema`: use the target schema name (e.g. `"public"` for PostgreSQL default, `"jaffle_shop"` if user specified) -- `dataSource`: set to the enum value from Step 1 (e.g. `"POSTGRES"`) -- `tableReference.catalog`: set to the database name (not `"wren"`) -- Each table → one `Model`. Set `tableReference.table` to the exact table name -- Each column → one `Column`. Use the exact DB column name -- Mark primary key columns with `"isPrimaryKey": true` and set `primaryKey` on the model -- For FK columns, add a `Relationship` entry linking the two models -- Omit calculated columns for now — they can be added later +**Goal:** Create the YAML project structure. -### Step 5 — Validate +### Step 1 — Initialize project -Deploy the draft MDL and validate it with a dry run: +```bash +wren context init --path /path/to/project +``` +This creates: ```text -deploy_manifest(mdl=) -dry_run(sql="SELECT * FROM LIMIT 1") +project/ +├── wren_project.yml +├── models/ +├── views/ +├── relationships.yml +└── instructions.md +``` + +> **IMPORTANT: `catalog` and `schema` in `wren_project.yml`** +> +> These are Wren Engine's internal namespace — they are NOT the database's +> native catalog or schema. Keep the defaults (`catalog: wren`, `schema: public`) +> unless you are intentionally configuring a multi-project namespace. +> +> Your database's actual catalog/schema is specified per-model in `table_reference` +> (see Step 2). Do not copy database catalog/schema values into `wren_project.yml`. + +### Step 2 — Write model files + +For each table, create a YAML file under `models/`. Use snake_case +naming (the build step converts to camelCase automatically). + +```yaml +# models/orders/metadata.yml +name: orders +table_reference: + catalog: "" # database catalog (empty string if not applicable) + schema: public # database schema (this IS the DB schema) + table: orders # database table name +primary_key: order_id +columns: + - name: order_id + type: INTEGER + not_null: true + - name: customer_id + type: INTEGER + - name: total + type: "DECIMAL(10, 2)" + - name: status + type: VARCHAR + properties: + description: "Order status: pending, shipped, delivered, cancelled" ``` -If `dry_run` succeeds, the MDL is valid. If it fails, fix the reported errors, call `deploy_manifest` again with the corrected MDL, and retry. +### Step 3 — Write relationships -### Step 6 — Save project (optional) +From foreign key constraints discovered in Phase 2: -Ask the user if they want to save the MDL as a YAML project directory (useful for version control). +```yaml +# relationships.yml +- name: orders_customers + models: + - orders + - customers + join_type: many_to_one + condition: "orders.customer_id = customers.customer_id" +``` -If yes, follow the **wren-project** skill (`skills/wren-project/SKILL.md`) to write the YAML files and build `target/mdl.json`. +Join type mapping: +- FK table → PK table: `many_to_one` +- PK table → FK table: `one_to_many` +- Unique FK: `one_to_one` +- Junction table: `many_to_many` -### Step 7 — Deploy final MDL +If no foreign keys were found, infer from naming conventions: +- Column `
_id` or `_id` → likely FK to `
` +- Ask the user to confirm inferred relationships -``` -deploy_manifest(mdl=) +### Step 4 — Add descriptions (optional but valuable) + +Ask the user to describe: +- Each model (1-2 sentences about what the table represents) +- Key columns (especially calculated fields or non-obvious names) + +These descriptions are indexed by `wren memory index` and significantly +improve LLM query accuracy. + +--- + +## Phase 5 — Validate and build + +```bash +# Validate YAML structure and integrity +wren context validate --path /path/to/project + +# If strict mode is desired: +wren context validate --path /path/to/project --strict + +# Build JSON manifest +wren context build --path /path/to/project + +# Verify against database +wren --sql "SELECT * FROM LIMIT 1" ``` -Confirm success to the user. The MDL is now active and queries can run. +If validation fails, fix the reported issues and re-run. Common errors: +- Duplicate model/column names +- Missing primary key +- Relationship referencing non-existent model +- Invalid column type (try re-running through `parse_type`) --- -## MDL Structure - -```json -{ - "catalog": "wren", - "schema": "public", - "dataSource": "POSTGRES", - "models": [ - { - "name": "orders", - "tableReference": { - "catalog": "", - "schema": "public", - "table": "orders" - }, - "columns": [ - { - "name": "order_id", - "type": "INTEGER", - "isCalculated": false, - "notNull": true, - "isPrimaryKey": true, - "properties": {} - }, - { - "name": "customer_id", - "type": "INTEGER", - "isCalculated": false, - "notNull": false, - "properties": {} - }, - { - "name": "total", - "type": "DECIMAL", - "isCalculated": false, - "notNull": false, - "properties": {} - } - ], - "primaryKey": "order_id", - "cached": false, - "properties": {} - } - ], - "relationships": [ - { - "name": "orders_customer", - "models": ["orders", "customers"], - "joinType": "MANY_TO_ONE", - "condition": "orders.customer_id = customers.customer_id" - } - ], - "views": [] -} +## Phase 6 — Initialize memory + +```bash +# Index schema (generates seed NL-SQL examples automatically) +wren memory index + +# Verify +wren memory status ``` -### Column types - -Map SQL/ibis types to MDL type strings: - -| SQL / ibis type | MDL type | -|-----------------|----------| -| INT, INTEGER, INT4 | `INTEGER` | -| BIGINT, INT8 | `BIGINT` | -| SMALLINT, INT2 | `SMALLINT` | -| FLOAT, FLOAT4, REAL | `FLOAT` | -| DOUBLE, FLOAT8 | `DOUBLE` | -| DECIMAL, NUMERIC | `DECIMAL` | -| VARCHAR, TEXT, STRING | `VARCHAR` | -| CHAR | `CHAR` | -| BOOLEAN, BOOL | `BOOLEAN` | -| DATE | `DATE` | -| TIMESTAMP, DATETIME | `TIMESTAMP` | -| TIMESTAMPTZ | `TIMESTAMPTZ` | -| JSON, JSONB | `JSON` | -| ARRAY | `ARRAY` | -| BYTES, BYTEA | `BYTES` | - -When in doubt, use `VARCHAR` as a safe fallback. - -### Relationship join types - -| Cardinality | `joinType` value | -|-------------|-----------------| -| Many-to-one (FK table → PK table) | `MANY_TO_ONE` | -| One-to-many | `ONE_TO_MANY` | -| One-to-one | `ONE_TO_ONE` | -| Many-to-many | `MANY_TO_MANY` | +After this step, `wren memory fetch` and `wren memory recall` are +operational. See the **wren-usage** skill for query workflows. --- -## Connection setup +## Phase 7 — Iterate with the user + +The initial MDL is a starting point. Improve it by: +- Adding calculated columns based on business logic +- Adding views for common query patterns +- Refining descriptions based on actual query usage +- Adding access control (RLAC/CLAC) if needed + +Each change follows: edit YAML → `wren context validate` → +`wren context build` → `wren memory index`. + +--- + +## Quick reference + +| Task | Command / Method | +|------|-----------------| +| Discover tables | Agent's own tools (SQLAlchemy, driver, raw SQL) | +| Discover columns + types | Agent's own tools | +| Discover constraints | Agent's own tools | +| Normalize types (Python) | `from wren.type_mapping import parse_type` | +| Normalize types (CLI) | `wren utils parse-type --type T --dialect D` | +| Normalize types (batch) | `wren utils parse-types --dialect D < columns.json` | +| Scaffold project | `wren context init` | +| Write models | Create `models//metadata.yml` | +| Write relationships | Edit `relationships.yml` | +| Validate | `wren context validate` | +| Build manifest | `wren context build` | +| Test query | `wren --sql "SELECT * FROM LIMIT 1"` | +| Index memory | `wren memory index` | + +--- -Connection info is configured **exclusively** via the MCP server Web UI at `http://localhost:9001`. There is no API endpoint for setting connection info — do not attempt to configure it programmatically. See the **wren-mcp-setup** skill for Docker setup instructions. +## Things to avoid -> **Note:** If the Web UI is disabled (`WEB_UI_ENABLED=false`), connection info must be pre-configured in `~/.wren/connection_info.json` before starting the container. Use `/wren-connection-info` in Claude Code for the required fields per data source. +- Do not hardcode database-specific type strings in MDL — always normalize via `parse_type` +- Do not skip validation before build — invalid YAML produces broken manifests silently +- Do not guess column types — introspect from the actual database +- Do not write relationships without confirming join conditions — wrong conditions cause silent query errors +- Do not skip `wren memory index` after build — stale indexes degrade recall quality diff --git a/skills/wren-usage/SKILL.md b/skills/wren-usage/SKILL.md index 4b25573a8..bb86997cb 100644 --- a/skills/wren-usage/SKILL.md +++ b/skills/wren-usage/SKILL.md @@ -1,168 +1,246 @@ --- name: wren-usage -description: "Wren Engine — semantic SQL engine for AI agents. Query 22+ data sources (PostgreSQL, BigQuery, Snowflake, MySQL, ClickHouse, etc.) through a modeling layer (MDL). This skill is the main entry point: it guides setup, delegates to focused sub-skills for SQL authoring, MDL generation, project management, and MCP server operations. Use when: write SQL, query data, generate or update MDL, change database connection, manage YAML projects, set up or operate MCP server, or get started with Wren Engine for the first time." +description: "Wren Engine CLI workflow guide for AI agents. Answer data questions end-to-end using the wren CLI: gather schema context, recall past queries, write SQL through the MDL semantic layer, execute, and learn from confirmed results. Use when: agent needs to query data, connect a data source, handle errors, or manage MDL changes via the wren CLI." license: Apache-2.0 metadata: author: wren-engine - version: "1.2" + version: "1.0" --- -# Wren Engine — Usage Guide +# Wren Engine CLI — Agent Workflow Guide -## Version check +The `wren` CLI queries databases through an MDL (Model Definition Language) semantic layer. You write SQL against model names, not raw tables. The engine translates to the target dialect. -Silently fetch `https://raw.githubusercontent.com/Canner/wren-engine/main/skills/versions.json`. -Compare the `wren-usage` key with this skill's version (from the frontmatter above). -If the remote version is newer, notify the user before proceeding: +Two files drive everything (auto-discovered from `~/.wren/`): +- `mdl.json` — the semantic model +- `connection_info.json` — database credentials + `datasource` field (e.g. `"datasource": "postgres"`) -> A newer version of the **wren-usage** skill is available. -> Update with: -> ``` -> npx skills add Canner/wren-engine --skill wren-usage --agent claude-code -> ``` +The data source is always read from `connection_info.json`. There is no `--datasource` flag on execution commands (`query`, `dry-run`, `validate`). Only `dry-plan` accepts `--datasource` / `-d` as an override (for transpile-only use without a connection file). -Then continue with the workflow below regardless of update status. +For memory-specific decisions, see [references/memory.md](references/memory.md). +For SQL syntax, CTE-based modeling, and error diagnosis, see [references/wren-sql.md](references/wren-sql.md). --- -This skill is your day-to-day reference for working with Wren Engine. It delegates to focused sub-skills for each task. +## Workflow 1: Answering a data question ---- +### Step 1 — Gather context -## Step 0 — Install dependent skills (first time only) +| Situation | Command | +|-----------|---------| +| Default | `wren memory fetch -q ""` | +| Need specific model's columns | `wren memory fetch -q "..." --model --threshold 0` | +| Memory not installed | Read `target/mdl.json` in the project directory directly | -Check whether the required skills are already installed in `~/.claude/skills/`. If any are missing, tell the user to run: +If this is the first query in the conversation, also run: -```bash -# Option A — npx skills (works with Claude Code, Cursor, and 30+ agents) -npx skills add Canner/wren-engine --skill '*' --agent claude-code + wren context instructions -# Option B — Clawhub (if installed via clawhub) -clawhub install wren-usage +If it returns content, treat it as **rules that override defaults** — apply them to all subsequent queries in this session. + +### Step 2 — Recall past queries + +```bash +wren memory recall -q "" --limit 3 ``` -This installs `wren-usage` and its dependent skills (`wren-connection-info`, `wren-generate-mdl`, `wren-project`, `wren-sql`, `wren-mcp-setup`, `wren-http-api`) into `~/.claude/skills/`. +Use results as few-shot examples. Skip if empty. -After installation, the user must **start a new session** for the new skills to be loaded. +### Step 2.5 — Assess complexity (before writing SQL) -> If the user only wants the MCP server set up (no Docker yet), use `/wren-quickstart` for a guided end-to-end walkthrough instead. +If the question involves **any** of the following, consider decomposing: +- Multiple metrics or aggregations (e.g., "churn rate AND expansion revenue") +- Multi-step calculations (e.g., "month-over-month growth rate") +- Comparisons across segments (e.g., "by plan tier, by region") +- Time-series analysis requiring baseline + change (e.g., "retention curve") ---- +**Decomposition strategy:** +1. Identify the sub-questions (e.g., "total subscribers at start" + "subscribers who cancelled" → churn rate) +2. For each sub-question: + - `wren memory recall -q ""` — check if a similar pattern exists + - Write and execute a simple SQL + - Note the result +3. Combine sub-results to answer the original question -## What do you want to do? +**When NOT to decompose:** +- Single-table aggregation with GROUP BY — just write the SQL +- Simple JOINs that the MDL relationships already define +- Questions where `memory recall` returns a near-exact match -Identify the user's intent and delegate to the appropriate skill: +This is a judgment call, not a rigid rule. If you're confident in a single +query, go ahead. Decompose when the SQL would be hard to debug if it fails. -| Task | Skill | -|------|-------| -| Write or debug a SQL query | `@wren-sql` | -| Connect to a new database / change credentials | `@wren-connection-info` | -| Generate MDL from an existing database | `@wren-generate-mdl` | -| Save MDL to YAML files (version control) | `@wren-project` | -| Load a saved YAML project / rebuild `target/mdl.json` | `@wren-project` | -| Add a new model or column to the MDL | `@wren-project` | -| Start, reset, or reconfigure the MCP server | `@wren-mcp-setup` | -| Call Wren tools via HTTP JSON-RPC (no MCP SDK) | `@wren-http-api` | -| First-time setup from scratch | `@wren-quickstart` | +### Step 3 — Write, verify, and execute SQL ---- +**For simple queries** (single table or simple MDL-defined JOINs, straightforward aggregation): +Execute directly: +```bash +wren --sql 'SELECT c_name, SUM(o_totalprice) FROM orders +JOIN customer ON orders.o_custkey = customer.c_custkey +GROUP BY 1 ORDER BY 2 DESC LIMIT 5' +``` -## Common workflows +**For complex queries** (non-trivial JOINs not covered by MDL relationships, subqueries, multi-step logic): +Verify first with dry-plan: +```bash +wren dry-plan --sql 'SELECT ...' +``` + +Check the expanded SQL output: +- Are the correct models and columns referenced? +- Do the JOINs match expected relationships? +- Are CTEs expanded correctly? + +If the expanded SQL looks wrong, fix before executing. +If it looks correct, proceed: +```bash +wren --sql 'SELECT ...' +``` -### Query your data +**SQL rules:** +- Target MDL model names, not database tables +- Write dialect-neutral SQL — the engine translates -Invoke `@wren-sql` to write a SQL query against the deployed MDL. +### Step 4 — Store and continue -Key rules: -- Query MDL model names directly (e.g. `SELECT * FROM orders`) -- Use `CAST` for type conversions, not `::` syntax -- Avoid correlated subqueries — use JOINs or CTEs instead +After successful execution, **store the query by default**: -```sql --- Example: revenue by month -SELECT DATE_TRUNC('month', order_date) AS month, - SUM(total) AS revenue -FROM orders -GROUP BY 1 -ORDER BY 1 +```bash +wren memory store --nl "" --sql "" ``` -For type-specific patterns (ARRAY, STRUCT, JSON), date/time arithmetic, or BigQuery dialect quirks, invoke `@wren-sql` for full guidance. +**Skip storing only when:** +- The query failed or returned an error +- The user said the result is wrong +- The query is exploratory (`SELECT * ... LIMIT N` without analytical clauses) +- There is no natural language question — just raw SQL +- The user explicitly asked not to store + +The CLI auto-detects exploratory queries — if you see no store hint +after execution, the query was classified as exploratory. + +| Outcome | Action | +|---------|--------| +| User confirms correct | Store | +| User continues with follow-up | Store, then handle follow-up | +| User says nothing (but question had clear NL description) | Store | +| User says wrong | Do NOT store — fix the SQL | +| Query error | See Error recovery below | --- -### Update connection credentials +## Workflow 2: Error recovery -To change credentials, direct the user to the MCP server Web UI at `http://localhost:9001`. Connection info can only be configured through the Web UI — do not attempt to set it programmatically. +### "table not found" -Invoke `@wren-connection-info` for a reference of required fields per data source (so you can guide the user on what to enter in the Web UI). +1. Verify model name: `wren memory fetch -q "" --type model --threshold 0` +2. Check MDL exists: `ls ~/.wren/mdl.json` +3. Verify column: `wren memory fetch -q "" --model --threshold 0` ---- +### Connection error -### Extend the MDL +1. Check: `cat ~/.wren/connection_info.json` +2. Verify the `datasource` field is present and valid +3. Test: `wren --sql "SELECT 1"` +4. Valid datasource values: `postgres`, `mysql`, `bigquery`, `snowflake`, `clickhouse`, `trino`, `mssql`, `databricks`, `redshift`, `spark`, `athena`, `oracle`, `duckdb` +5. Both flat format (`{"datasource": ..., "host": ...}`) and MCP envelope format (`{"datasource": ..., "properties": {...}}`) are accepted -To add a model, column, relationship, or view to an existing project: +### SQL syntax / planning error (enhanced) -1. Invoke `@wren-project` — **Load** the existing YAML project into an MDL dict -2. Edit the relevant YAML file (e.g. `models/orders.yml`) -3. Invoke `@wren-project` — **Build** to compile updated `target/mdl.json` -4. Call `deploy(mdl_file_path="./target/mdl.json")` to apply the change +#### Layer 1: Identify the failure point ---- +```bash +wren dry-plan --sql "" +``` -### Regenerate MDL from database +| dry-plan result | Failure layer | Next step | +|-----------------|---------------|-----------| +| dry-plan fails | MDL / semantic | → Layer 2A | +| dry-plan succeeds, execution fails | DB / dialect | → Layer 2B | -When the database schema has changed and the MDL needs to be refreshed: +#### Layer 2A: MDL-level diagnosis (dry-plan failed) -1. Invoke `@wren-connection-info` — confirm or update credentials -2. Invoke `@wren-generate-mdl` — re-introspect the database and rebuild the MDL JSON -3. Invoke `@wren-project` — **Save** the new MDL as an updated YAML project -4. Invoke `@wren-project` — **Build** to compile `target/mdl.json` -5. Deploy +The dry-plan error message tells you exactly what's wrong: ---- +| Error pattern | Diagnosis | Fix | +|---------------|-----------|-----| +| `column 'X' not found in model 'Y'` | Wrong column name | `wren memory fetch -q "X" --model Y --threshold 0` to find correct name | +| `model 'X' not found` | Wrong model name | `wren memory fetch -q "X" --type model --threshold 0` | +| `ambiguous column 'X'` | Column exists in multiple models | Qualify with model name: `ModelName.column` | +| Planning error with JOIN | Relationship not defined in MDL | Check available relationships in context | -### MCP server operations +**Key principle**: Fix ONE issue at a time. Re-run dry-plan after each fix +to see if new errors surface. -| Operation | Command | -|-----------|---------| -| Check status | `docker ps --filter name=wren-mcp` | -| View logs | `docker logs wren-mcp` | -| Restart | `docker restart wren-mcp` | -| Full reconfigure | Invoke `@wren-mcp-setup` | -| Verify health | `health_check()` via MCP tools | +#### Layer 2B: DB-level diagnosis (dry-plan OK, execution failed) + +The DB error + dry-plan output together pinpoint the issue: + +1. Read the dry-plan expanded SQL — this is what actually runs on the DB +2. Compare with the DB error message: + +| Error pattern | Diagnosis | Fix | +|---------------|-----------|-----| +| Type mismatch | Column type differs from assumed | Check column type in context, add explicit CAST | +| Function not supported | Dialect-specific function | Use dialect-neutral alternative | +| Permission denied | Table/schema access | Check connection credentials | +| Timeout | Query too expensive | Simplify: reduce JOINs, add filters, LIMIT | + +**For small models**: If the error message is unclear, try simplifying +the query to the smallest failing fragment. Execute subqueries independently +to isolate which part fails. + +For the CTE rewrite pipeline and additional error patterns, see [references/wren-sql.md](references/wren-sql.md). --- -## Quick reference — MCP tools +## Workflow 3: Connecting a new data source -| Tool | Purpose | -|------|---------| -| `health_check()` | Verify Wren Engine is reachable | -| `query(sql=...)` | Execute a SQL query against the deployed MDL | -| `deploy(mdl_file_path=...)` | Load a compiled `mdl.json` | -| `setup_connection(...)` | Configure data source credentials | -| `list_remote_tables(...)` | Introspect database schema | -| `mdl_validate_manifest(...)` | Validate an MDL JSON dict | -| `mdl_save_project(...)` | Save MDL as a YAML project | +1. Create `~/.wren/connection_info.json` — see [wren/docs/connections.md](../../wren/docs/connections.md) for per-connector formats +2. Test: `wren --sql "SELECT 1"` +3. Place or create `~/.wren/mdl.json` +4. Index: `wren memory index` +5. Verify: `wren --sql "SELECT * FROM LIMIT 5"` --- -## Troubleshooting quick guide +## Workflow 4: After MDL changes + +When the MDL is updated, downstream state goes stale: -**Query fails with "table not found":** -- The MDL may not be deployed. Run `deploy(mdl_file_path="./target/mdl.json")`. -- Check model names match exactly (case-sensitive). +```bash +# 1. Deploy updated MDL +cp updated-mdl.json ~/.wren/mdl.json + +# 2. Re-index schema memory +wren memory index + +# 3. Verify +wren --sql "SELECT * FROM LIMIT 1" +``` + +--- -**Connection error on queries:** -- Verify credentials with `@wren-connection-info`. -- Inside Docker: use `host.docker.internal` instead of `localhost`. +## Command decision tree -**MDL changes not reflected:** -- Re-run `@wren-project` **Build** step and re-deploy. +``` +Get data back → wren --sql "..." +See translated SQL only → wren dry-plan --sql "..." (accepts -d if no connection file) +Validate against DB → wren dry-run --sql "..." +Schema context → wren memory fetch -q "..." +Filter by type/model → wren memory fetch -q "..." --type T --model M --threshold 0 +Store confirmed query → wren memory store --nl "..." --sql "..." +Few-shot examples → wren memory recall -q "..." +Index stats → wren memory status +Re-index after MDL change → wren memory index +``` + +--- -**MCP tools unavailable:** -- Start a new Claude Code session after registering the MCP server. -- Check: `docker ps --filter name=wren-mcp` and `docker logs wren-mcp`. +## Things to avoid -For detailed MCP setup troubleshooting, invoke `@wren-mcp-setup`. +- Do not guess model or column names — check context first +- Do not store failed queries or queries the user said are wrong +- Do not skip storing successful queries with a clear NL question — default is to store +- Do not re-index before every query — once per MDL change +- Do not pass passwords via `--connection-info` if shell history is shared — use `--connection-file` diff --git a/cli-skills/wren-usage/references/memory.md b/skills/wren-usage/references/memory.md similarity index 100% rename from cli-skills/wren-usage/references/memory.md rename to skills/wren-usage/references/memory.md diff --git a/cli-skills/wren-usage/references/wren-sql.md b/skills/wren-usage/references/wren-sql.md similarity index 100% rename from cli-skills/wren-usage/references/wren-sql.md rename to skills/wren-usage/references/wren-sql.md diff --git a/wren/docs/wren_project.md b/wren/docs/wren_project.md deleted file mode 100644 index 06a5ecae9..000000000 --- a/wren/docs/wren_project.md +++ /dev/null @@ -1,272 +0,0 @@ -# Wren Project - -A Wren project is a directory of YAML files that define a semantic layer (models, relationships, views, and instructions) over a database. It is the unit of authoring, version control, and deployment for MDL (Model Definition Language) definitions. - -Instead of managing a single `mdl.json` by hand, you author each model in its own directory as human-readable YAML. The CLI compiles them into a deployable JSON manifest when needed. - -YAML files use **snake_case** field names for readability. The compiled `target/mdl.json` uses **camelCase**, which is the wire format expected by the engine. - -## Project Structure - -```text -my_project/ -├── wren_project.yml # project metadata -├── models/ -│ ├── orders/ -│ │ └── metadata.yml # table_reference mode (physical table) -│ ├── customers/ -│ │ └── metadata.yml -│ └── revenue_summary/ -│ ├── metadata.yml # ref_sql mode (SQL-defined model) -│ └── ref_sql.sql # SQL in separate file (optional) -├── views/ -│ ├── monthly_revenue/ -│ │ ├── metadata.yml -│ │ └── sql.yml # statement in separate file (optional) -│ └── top_customers/ -│ └── metadata.yml # statement inline -├── relationships.yml # all relationships -├── instructions.md # user instructions for LLM (optional) -├── .wren/ # runtime state (gitignored) -│ └── memory/ # LanceDB index files -└── target/ - └── mdl.json # build output (gitignored) -``` - -Each model and view lives in its own subdirectory under `models/` and `views/` respectively. - ---- - -## What Lives Where - -A Wren project keeps schema artifacts together in the project directory. Global configuration lives separately in `~/.wren/`. - -| Artifact | Location | Scope | -|----------|----------|-------| -| Models, views, relationships | `/models/`, `/views/`, `/relationships.yml` | Project — version controlled | -| Instructions | `/instructions.md` | Project — references this project's model/column names | -| Compiled MDL | `/target/mdl.json` | Project — derived from YAML, gitignored | -| Memory (LanceDB) | `/.wren/memory/` | Project — indexes this project's schema, gitignored | -| Profiles (connections) | `~/.wren/profiles.yml` | Global — environment-specific (dev/prod credentials) | -| Global config | `~/.wren/config.yml` | Global — CLI preferences | - -**Why this separation?** Schema definitions are project-specific — they describe a particular data model. Connection credentials are environment-specific — the same project connects to different databases in dev vs. prod. Keeping them separate means projects are portable and safe to commit without leaking secrets. - ---- - -## Project Discovery - -When you run a wren command that needs the project (query, memory fetch, etc.), the CLI resolves the project root in this order: - -1. `--project` flag (explicit) -2. `WREN_PROJECT_HOME` environment variable -3. Walk up from the current directory looking for `wren_project.yml` -4. `default_project` in `~/.wren/config.yml` - -If no project is found, the CLI exits with an error and suggests running `wren context init` or setting `WREN_PROJECT_HOME`. - -Once the project root is resolved, all paths (MDL, instructions, memory) are determined relative to it. - -For running wren commands outside the project directory: - -```bash -# option A: environment variable -export WREN_PROJECT_HOME=~/projects/sales -wren --sql "SELECT ..." - -# option B: global config (~/.wren/config.yml) -default_project: ~/projects/sales -``` - ---- - -## Project Files - -### `wren_project.yml` - -```yaml -schema_version: 2 -name: my_project -version: "1.0" -catalog: wren -schema: public -data_source: postgres -``` - -| Field | Description | -|-------|-------------| -| `schema_version` | Directory layout version. `2` = folder-per-entity (current). Owned by the CLI — do not bump manually. | -| `name` | Project name | -| `version` | User's own project version (free-form, no effect on parsing) | -| `catalog` | **Wren Engine namespace** — NOT your database catalog. Identifies this MDL project within the engine. Default: `wren`. | -| `schema` | **Wren Engine namespace** — NOT your database schema. Default: `public`. | -| `data_source` | Data source type (e.g. `postgres`, `bigquery`, `snowflake`) | - -> **`catalog` / `schema` are NOT database settings.** -> -> These two fields define the Wren Engine's internal namespace for addressing models in SQL. They exist to support future multi-project querying. For single-project use, keep the defaults (`catalog: wren`, `schema: public`). -> -> Your database's actual catalog and schema are specified per-model in the `table_reference` section of each model's `metadata.yml`. - -#### Two levels of catalog/schema - -The same field names appear in two places with completely different meanings: - -| Location | Refers to | Example | When to change | -|----------|-----------|---------|----------------| -| `wren_project.yml` → `catalog`, `schema` | Wren Engine namespace | `wren`, `public` | Only for multi-project setups | -| `models/*/metadata.yml` → `table_reference.catalog`, `table_reference.schema` | Database location | `""`, `main` | Must match your actual database | - -### Model (`models//metadata.yml`) - -A model must define its source in exactly one of two ways: - -**table_reference** — maps to a physical table: - -```yaml -name: orders -table_reference: - catalog: "" - schema: public - table: orders -columns: - - name: order_id - type: INTEGER - is_calculated: false - not_null: true - is_primary_key: true - properties: {} - - name: total - type: DECIMAL - is_calculated: false - not_null: false - properties: {} -primary_key: order_id -cached: false -properties: {} -``` - -**ref_sql** — defines the model via a SQL query. SQL can be inline in `metadata.yml` or in a separate `ref_sql.sql` file (the `.sql` file takes precedence if both exist): - -```yaml -name: revenue_summary -columns: - - name: month - type: DATE - is_calculated: false - not_null: true - properties: {} - - name: total_revenue - type: DECIMAL - is_calculated: false - not_null: false - properties: {} -``` - -```sql --- models/revenue_summary/ref_sql.sql -SELECT DATE_TRUNC('month', order_date) AS month, - SUM(total) AS total_revenue -FROM orders -GROUP BY 1 -``` - -Using both `table_reference` and `ref_sql` in the same model is a validation error. - -### View (`views//metadata.yml`) - -Views have a `statement` field. Like ref_sql models, the SQL can be inline in `metadata.yml` or in a separate `sql.yml` file (the `sql.yml` takes precedence if both exist): - -```yaml -name: top_customers -statement: > - SELECT customer_id, SUM(total) AS lifetime_value - FROM wren.public.orders GROUP BY 1 ORDER BY 2 DESC LIMIT 100 -description: "Top customers by lifetime value" -properties: {} -``` - -### `relationships.yml` - -```yaml -relationships: - - name: orders_customers - models: - - orders - - customers - join_type: MANY_TO_ONE - condition: orders.customer_id = customers.customer_id -``` - -### `instructions.md` - -Free-form Markdown with rules and guidelines for LLM-based query generation. Organize by topic with `##` headings: - -```markdown -## Business rules -- Revenue queries must use net_revenue, not gross_revenue -- All queries must filter status = 'completed' - -## Formatting -- Currency is TWD, display with thousand separators -- Timestamps are UTC+8 -``` - -Instructions are consumed by agents, not by the engine. They are intentionally excluded from `target/mdl.json` — the wren-core rewrite pipeline has no use for them. Agents access instructions through two paths: - -- `wren context instructions` — returns full text, run once at session start to capture global constraints -- `wren memory fetch -q "..."` — returns relevant instruction chunks alongside schema context per query - ---- - -## Lifecycle - -```text -wren context init → scaffold project in current directory - (edit models/, relationships.yml, instructions.md) -wren context validate → check YAML structure (no DB needed) -wren context build → compile to target/mdl.json -wren profile add my-pg ... → save connection to ~/.wren/profiles.yml -wren memory index → index schema + instructions into .wren/memory/ -wren --sql "SELECT 1" → verify connection -wren --sql "SELECT ..." → start querying -``` - -After editing models, rebuild and re-index: - -```text -wren context build -wren memory index -``` - ---- - -## Field Mapping - -The `build` step converts all YAML keys from snake_case to camelCase: - -| YAML | JSON | -|------|------| -| `table_reference` | `tableReference` | -| `ref_sql` | `refSql` | -| `is_calculated` | `isCalculated` | -| `not_null` | `notNull` | -| `is_primary_key` | `isPrimaryKey` | -| `primary_key` | `primaryKey` | -| `join_type` | `joinType` | -| `data_source` | `dataSource` | - -Generic rule: split on `_`, capitalize each word after the first, join. All other fields (`name`, `type`, `catalog`, `schema`, `table`, `condition`, `models`, `columns`, `cached`, `properties`) are identical in both formats. - ---- - -## .gitignore - -```text -target/ -.wren/ -``` - -Source YAML and `instructions.md` are committed. Build output (`target/`) is always gitignored — it is derived from source YAML and can be regenerated with `wren context build`. - -`.wren/memory/` contains both schema indexes (derived, rebuildable) and query history (NL-SQL pairs confirmed by users, not rebuildable). If your team wants to share confirmed query history as few-shot examples across members, you can commit `.wren/memory/` — but be aware that LanceDB files are binary and may produce merge conflicts when multiple people index or store concurrently.