Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions .github/workflows/sync-docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
name: Sync Docs to Website

on:
push:
branches: [main]
paths:
- 'docs/get_started/**'
- 'docs/concept/**'
- 'docs/guide/**'
- 'docs/reference/**'

permissions:
contents: read

jobs:
sync-docs:
runs-on: ubuntu-latest
steps:
- name: Checkout wren-engine
uses: actions/checkout@v4

- name: Checkout doc website
uses: actions/checkout@v4
with:
repository: ${{ vars.DOCS_REPO }}
token: ${{ secrets.CROSS_REPO_TOKEN }}
path: _docs-site
ref: ${{ vars.DOCS_REPO_BRANCH }}

- name: Sync doc directories
run: |
TARGET="_docs-site/docs/oss/engine"

for dir in get_started concept guide reference; do
rm -rf "${TARGET}/${dir}"
cp -r "docs/${dir}" "${TARGET}/${dir}"
done

- name: Check for changes
id: diff
working-directory: _docs-site
run: |
git diff --quiet && echo "changed=false" >> "$GITHUB_OUTPUT" || echo "changed=true" >> "$GITHUB_OUTPUT"

- name: Create PR
if: steps.diff.outputs.changed == 'true'
working-directory: _docs-site
env:
GH_TOKEN: ${{ secrets.CROSS_REPO_TOKEN }}
run: |
BRANCH="sync/engine-docs-${GITHUB_SHA::8}"
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
git checkout -b "${BRANCH}"
git add -A
git commit -m "docs: sync from wren-engine@${GITHUB_SHA::8}"
git push origin "${BRANCH}"
gh pr create \
--title "docs: sync Wren Engine docs from wren-engine" \
--body "Auto-synced from [wren-engine@\`${GITHUB_SHA::8}\`](https://github.com/Canner/wren-engine/commit/${GITHUB_SHA})." \
--base "${{ vars.DOCS_REPO_BRANCH }}"
15 changes: 15 additions & 0 deletions docs/.sync.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Declarative sync config: wren-engine/docs β†’ doc website
# Actual repo name and branch are stored in GitHub repository variables
# (vars.DOCS_REPO, vars.DOCS_REPO_BRANCH) β€” not hardcoded here.
target_dir: docs/oss/engine

# Directories synced (recursive copy, destructive β€” deletions propagate)
sync_dirs:
- get_started
- concept
- guide
- reference

# Not synced (excluded by being outside sync_dirs):
# - README.md
# - .sync.yml
39 changes: 30 additions & 9 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,38 @@
# Wren Engine Documentation

Wren Engine is an open-source semantic engine for AI agents and MCP clients. It translates SQL queries through MDL (Model Definition Language) and executes them against 22+ data sources.
This directory is the **single source of truth** for Wren Engine docs published at [docs.getwren.ai](https://docs.getwren.ai/oss/engine).

## Getting Started
Changes merged to `main` are automatically synced to the doc website via GitHub Actions.

- [Quick Start](quickstart.md) -- Set up a local semantic layer with the jaffle_shop dataset using the Wren CLI and Claude Code. (~15 minutes)
## Get Started

## Core Concepts
- [Installation](get_started/installation.md)
- [Quick Start](get_started/quickstart.md)
- [Connect Your Database](get_started/connect.md)

- [Wren Project](wren_project.md) -- Project structure, YAML authoring, and how the CLI compiles models into a deployable manifest.
## Concepts

### MDL Reference
- [What is Context?](concept/what_is_context.md)
- [What is MDL?](concept/what_is_mdl.md)
- [Benefits for LLMs](concept/benefits_llm.md)
- [Architecture](concept/architecture.md)

- [Model](mdl/model.md) -- Define semantic entities over physical tables or SQL expressions.
- [Relationship](mdl/relationship.md) -- Declare join paths between models for automatic resolution.
- [View](mdl/view.md) -- Named SQL queries that behave as virtual tables.
## Guides

- [Data Modeling Overview](guide/modeling/overview.md)
- [Wren Project Structure](guide/modeling/wren_project.md)
- [Models](guide/modeling/model.md)
- [Relations](guide/modeling/relation.md)
- [Views](guide/modeling/view.md)
- [Memory](guide/memory.md)
- [Profiles](guide/profiles.md)

## Reference

- [CLI Reference](reference/cli.md)
- [Skills](reference/skills.md)

## Not synced

- `README.md` β€” this file
- `.sync.yml` β€” sync configuration
217 changes: 217 additions & 0 deletions docs/concept/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@
# Architecture

Wren Engine CLI is a modular Python application that transforms semantic SQL through an MDL layer before executing it against your database. This page explains how the components fit together.

## Overview

```text
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Wren CLI (Typer) β”‚
β”‚ β”‚
β”‚ --sql / query dry-plan dry-run version β”‚
β”‚ context profile memory utils β”‚
β””β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β”‚ β”‚ β”‚
β–Ό β–Ό β”‚ β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Profile β”‚ β”‚ Context β”‚ β”‚ β”‚ Memory Layer β”‚
β”‚ Mgmt β”‚ β”‚ Mgmt β”‚ β”‚ β”‚ (LanceDB) β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ ~/.wren/ β”‚ β”‚ init β”‚ β”‚ β”‚ schema_items β”‚
β”‚ profiles β”‚ β”‚ validate β”‚ β”‚ β”‚ query_history β”‚
β”‚ .yml β”‚ β”‚ build β”‚ β”‚ β”‚ β”‚
β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ fetch / recall β”‚
β”‚ β”‚ β”‚ β”‚ store / index β”‚
β”‚ connection β”‚ mdl.json β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ info β”‚ β”‚
└──────┐ β”Œβ”€β”€β”€β”€β”€β”˜ β”‚
β–Ό β–Ό β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ WrenEngine β”‚β—„β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ (dry-plan, query, dry-run)
β”‚ β”‚
β”‚ plan() β”‚
β”‚ execute() β”‚
β””β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”˜
β”‚ β”‚
plan β”‚ β”‚ execute
β”‚ β”‚
β–Ό β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ SQL Planning β”‚ β”‚ Connectors β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ sqlglot β”‚ β”‚ β”‚
β”‚ parse β”‚ β”‚ Postgres DuckDB β”‚
β”‚ qualify β”‚ β”‚ BigQuery MySQL β”‚
β”‚ transpile β”‚ β”‚ Snowflake Trino β”‚
β”‚ β”‚ β”‚ ...18+ sources β”‚
β”‚ CTE Rewriter β”‚ β”‚ β”‚
β”‚ inject CTEs β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β”‚
β”‚ Policy check β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ wren-core-py β”‚
β”‚ (Rust / PyO3) β”‚
β”‚ β”‚
β”‚ SessionContext β”‚
β”‚ ManifestExtractorβ”‚
β”‚ transform_sql() β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## Components

### CLI layer

The top-level command router, built on [Typer](https://typer.tiangolo.com/). It parses flags, discovers the MDL project and active profile, then delegates to WrenEngine or the appropriate subsystem.

| Command | What it does |
|---------|-------------|
| `wren --sql` / `wren query` | Plan + execute SQL, return results |
| `wren dry-plan` | Plan only β€” show the expanded SQL without executing |
| `wren dry-run` | Validate SQL against the live database without returning rows |
| `wren context` | Project management β€” init, validate, build, show |
| `wren profile` | Connection management β€” add, switch, list, debug, rm |
| `wren memory` | Schema indexing and NL-SQL recall |
| `wren utils` | Type normalization utilities |

### WrenEngine

The central orchestrator (`engine.py`). It owns the plan-then-execute pipeline:

1. Receive user SQL
2. Call the SQL planning subsystem to expand MDL references
3. Pass the planned SQL to a connector for execution
4. Return results as a PyArrow table

### SQL planning

Transforms user SQL from semantic model references to executable database SQL. Three libraries collaborate:

- **sqlglot** β€” parses SQL, qualifies table/column references, transpiles between dialects
- **CTE Rewriter** β€” identifies which MDL models are referenced, builds a CTE for each, and injects them into the query
- **wren-core-py** β€” Rust engine (via PyO3 bindings) that expands model definitions, resolves calculated fields, and handles relationship joins

The planning pipeline:

```
User SQL (e.g. SELECT * FROM orders WHERE status = 'pending')
β”‚
β”œβ”€β”€ sqlglot: parse β†’ qualify tables β†’ normalize identifiers
β”œβ”€β”€ Extract referenced table names β†’ ["orders"]
β”œβ”€β”€ ManifestExtractor: filter MDL to only referenced models
β”œβ”€β”€ Policy check (strict mode, denied functions)
β”œβ”€β”€ CTE Rewriter:
β”‚ β”œβ”€β”€ For each model: wren-core transform_sql() β†’ expanded CTE
β”‚ └── Inject CTEs into original query
└── sqlglot: transpile to target dialect (postgres, bigquery, etc.)
β”‚
β–Ό
WITH "orders" AS (
SELECT o_orderkey, o_custkey, o_totalprice
FROM "public"."orders"
)
SELECT * FROM "orders" WHERE status = 'pending'
```

### Connectors

Data source connectors execute the planned SQL against the actual database. Each connector implements a common interface for query execution, dry-run validation, and connection lifecycle.

Supported data sources: PostgreSQL, MySQL, BigQuery, Snowflake, DuckDB, ClickHouse, Trino, SQL Server, Databricks, Redshift, Oracle, Athena, Apache Spark, and more.

Each connector:
- Receives dialect-specific SQL from the planning stage
- Executes against the target database
- Handles type coercion (Decimal, UUID, etc.)
- Returns a PyArrow table

### Profile management

Stores named database connections in `~/.wren/profiles.yml`. One profile is active at a time. All `wren` commands use the active profile unless overridden with explicit flags.

See [Profiles](../guide/profiles.md) for details.

### Context management

Manages the MDL project lifecycle β€” YAML authoring, validation, and compilation to `target/mdl.json`.

Key operations:
- `wren context init` β€” scaffold a new project (or import from existing `mdl.json`)
- `wren context validate` β€” check YAML structure without a database
- `wren context build` β€” compile snake_case YAML to camelCase JSON
- `wren context show` β€” display the current project summary

See [Wren Project](../guide/modeling/wren_project.md) for the project format.

### Memory layer

A LanceDB-backed semantic index with two collections:

| Collection | Contents | Purpose |
|------------|----------|---------|
| **schema_items** | Models, columns, relationships, views | Semantic schema search per question |
| **query_history** | Confirmed NL β†’ SQL pairs | Few-shot recall for similar questions |

The memory layer enables the self-learning loop: each confirmed query improves future recall accuracy.

See [Memory](../guide/memory.md) for details.

### wren-core (Rust engine)

The core semantic engine, written in Rust and exposed to Python via PyO3 bindings (`wren-core-py`). It handles:

- **SessionContext** β€” maintains the MDL state and provides `transform_sql()` for expanding model definitions into SQL
- **ManifestExtractor** β€” filters the full MDL manifest to only the models referenced in a query, reducing planning overhead
- **Model expansion** β€” resolves `table_reference` and `ref_sql` models into physical SQL, handles calculated fields, and expands relationship joins

The Rust engine is where the MDL semantics are enforced β€” it is the source of truth for how models map to SQL.

## Data flows

### Query execution

```
wren --sql "SELECT customer_id, SUM(total) FROM orders GROUP BY 1"
β”‚
β”œβ”€β”€ 1. Discover MDL: project auto-discovery β†’ target/mdl.json
β”œβ”€β”€ 2. Resolve connection: active profile β†’ ~/.wren/profiles.yml
β”œβ”€β”€ 3. Plan: sqlglot parse β†’ extract models β†’ wren-core CTE expand β†’ transpile
β”œβ”€β”€ 4. Execute: connector β†’ database β†’ PyArrow table
└── 5. Output: format as table / csv / json
```

### Project build

```
wren context build
β”‚
β”œβ”€β”€ Read wren_project.yml + models/*/ + views/*/ + relationships.yml
β”œβ”€β”€ Validate structure and references
β”œβ”€β”€ Convert snake_case β†’ camelCase
└── Write target/mdl.json
```

### Memory lifecycle

```
wren memory index β†’ Parse MDL, embed schema items, store in LanceDB
wren memory fetch -q "..." β†’ Embed query, search schema_items, return context
wren memory recall -q "..."β†’ Embed query, search query_history, return examples
wren memory store β†’ Embed NL-SQL pair, append to query_history
```

## Key dependencies

| Dependency | Role |
|------------|------|
| **wren-core-py** | Rust semantic engine (PyO3 bindings) |
| **sqlglot** | SQL parsing, qualification, dialect transpilation |
| **database connectors** | Data source execution layer |
| **pyarrow** | Query result representation |
| **lancedb** | Vector storage for memory layer |
| **sentence-transformers** | Local embeddings for memory search |
| **typer** | CLI framework |
| **pydantic** | Config and connection validation |
Loading
Loading