Canner · goldmedal · Apr 8, 2026 · Apr 8, 2026
diff --git a/.github/workflows/sync-docs.yml b/.github/workflows/sync-docs.yml
@@ -0,0 +1,61 @@
+name: Sync Docs to Website
+
+on:
+  push:
+    branches: [main]
+    paths:
+      - 'docs/get_started/**'
+      - 'docs/concept/**'
+      - 'docs/guide/**'
+      - 'docs/reference/**'
+
+permissions:
+  contents: read
+
+jobs:
+  sync-docs:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout wren-engine
+        uses: actions/checkout@v4
+
+      - name: Checkout doc website
+        uses: actions/checkout@v4
+        with:
+          repository: ${{ vars.DOCS_REPO }}
+          token: ${{ secrets.CROSS_REPO_TOKEN }}
+          path: _docs-site
+          ref: ${{ vars.DOCS_REPO_BRANCH }}
+
+      - name: Sync doc directories
+        run: |
+          TARGET="_docs-site/docs/oss/engine"
+
+          for dir in get_started concept guide reference; do
+            rm -rf "${TARGET}/${dir}"
+            cp -r "docs/${dir}" "${TARGET}/${dir}"
+          done
+
+      - name: Check for changes
+        id: diff
+        working-directory: _docs-site
+        run: |
+          git diff --quiet && echo "changed=false" >> "$GITHUB_OUTPUT" || echo "changed=true" >> "$GITHUB_OUTPUT"
+
+      - name: Create PR
+        if: steps.diff.outputs.changed == 'true'
+        working-directory: _docs-site
+        env:
+          GH_TOKEN: ${{ secrets.CROSS_REPO_TOKEN }}
+        run: |
+          BRANCH="sync/engine-docs-${GITHUB_SHA::8}"
+          git config user.name "github-actions[bot]"
+          git config user.email "github-actions[bot]@users.noreply.github.com"
+          git checkout -b "${BRANCH}"
+          git add -A
+          git commit -m "docs: sync from wren-engine@${GITHUB_SHA::8}"
+          git push origin "${BRANCH}"
+          gh pr create \
+            --title "docs: sync Wren Engine docs from wren-engine" \
+            --body "Auto-synced from [wren-engine@\`${GITHUB_SHA::8}\`](https://github.com/Canner/wren-engine/commit/${GITHUB_SHA})." \
+            --base "${{ vars.DOCS_REPO_BRANCH }}"
diff --git a/docs/.sync.yml b/docs/.sync.yml
@@ -0,0 +1,15 @@
+# Declarative sync config: wren-engine/docs → doc website
+# Actual repo name and branch are stored in GitHub repository variables
+# (vars.DOCS_REPO, vars.DOCS_REPO_BRANCH) — not hardcoded here.
+target_dir: docs/oss/engine
+
+# Directories synced (recursive copy, destructive — deletions propagate)
+sync_dirs:
+  - get_started
+  - concept
+  - guide
+  - reference
+
+# Not synced (excluded by being outside sync_dirs):
+# - README.md
+# - .sync.yml
diff --git a/docs/README.md b/docs/README.md
@@ -1,17 +1,38 @@
 # Wren Engine Documentation
 
-Wren Engine is an open-source semantic engine for AI agents and MCP clients. It translates SQL queries through MDL (Model Definition Language) and executes them against 22+ data sources.
+This directory is the **single source of truth** for Wren Engine docs published at [docs.getwren.ai](https://docs.getwren.ai/oss/engine).
 
-## Getting Started
+Changes merged to `main` are automatically synced to the doc website via GitHub Actions.
 
-- [Quick Start](quickstart.md) -- Set up a local semantic layer with the jaffle_shop dataset using the Wren CLI and Claude Code. (~15 minutes)
+## Get Started
 
-## Core Concepts
+- [Installation](get_started/installation.md)
+- [Quick Start](get_started/quickstart.md)
+- [Connect Your Database](get_started/connect.md)
 
-- [Wren Project](wren_project.md) -- Project structure, YAML authoring, and how the CLI compiles models into a deployable manifest.
+## Concepts
 
-### MDL Reference
+- [What is Context?](concept/what_is_context.md)
+- [What is MDL?](concept/what_is_mdl.md)
+- [Benefits for LLMs](concept/benefits_llm.md)
+- [Architecture](concept/architecture.md)
 
-- [Model](mdl/model.md) -- Define semantic entities over physical tables or SQL expressions.
-- [Relationship](mdl/relationship.md) -- Declare join paths between models for automatic resolution.
-- [View](mdl/view.md) -- Named SQL queries that behave as virtual tables.
+## Guides
+
+- [Data Modeling Overview](guide/modeling/overview.md)
+- [Wren Project Structure](guide/modeling/wren_project.md)
+- [Models](guide/modeling/model.md)
+- [Relations](guide/modeling/relation.md)
+- [Views](guide/modeling/view.md)
+- [Memory](guide/memory.md)
+- [Profiles](guide/profiles.md)
+
+## Reference
+
+- [CLI Reference](reference/cli.md)
+- [Skills](reference/skills.md)
+
+## Not synced
+
+- `README.md` — this file
+- `.sync.yml` — sync configuration
diff --git a/docs/concept/architecture.md b/docs/concept/architecture.md
@@ -0,0 +1,217 @@
+# Architecture
+
+Wren Engine CLI is a modular Python application that transforms semantic SQL through an MDL layer before executing it against your database. This page explains how the components fit together.
+
+## Overview
+
+```text
+┌──────────────────────────────────────────────────────────┐
+│                      Wren CLI (Typer)                    │
+│                                                          │
+│  --sql / query   dry-plan   dry-run   version            │
+│  context         profile    memory    utils              │
+└──┬──────────────┬──────────────┬──────────────┬──────────┘
+   │              │              │              │
+   ▼              ▼              │              ▼
+┌────────────┐ ┌────────────┐    │   ┌────────────────────┐
+│ Profile    │ │ Context    │    │   │ Memory Layer       │
+│ Mgmt       │ │ Mgmt       │    │   │ (LanceDB)          │
+│            │ │            │    │   │                    │
+│ ~/.wren/   │ │ init       │    │   │ schema_items       │
+│ profiles   │ │ validate   │    │   │ query_history      │
+│ .yml       │ │ build      │    │   │                    │
+└─────┬──────┘ └─────┬──────┘    │   │ fetch / recall     │
+      │              │           │   │ store / index      │
+      │   connection │ mdl.json  │   └────────────────────┘
+      │       info   │           │
+      └──────┐ ┌─────┘           │
+             ▼ ▼                 │
+      ┌──────────────┐           │
+      │  WrenEngine  │◄──────────┘  (dry-plan, query, dry-run)
+      │              │
+      │  plan()      │
+      │  execute()   │
+      └──┬───────┬───┘
+         │       │
+    plan │       │ execute
+         │       │
+         ▼       ▼
+┌──────────────┐ ┌──────────────────┐
+│ SQL Planning │ │ Connectors       │
+│              │ │                  │
+│ sqlglot      │ │                  │
+│  parse       │ │ Postgres  DuckDB │
+│  qualify     │ │ BigQuery  MySQL  │
+│  transpile   │ │ Snowflake Trino  │
+│              │ │ ...18+ sources   │
+│ CTE Rewriter │ │                  │
+│  inject CTEs │ └──────────────────┘
+│              │
+│ Policy check │
+└──────┬───────┘
+       │
+       ▼
+┌──────────────────┐
+│ wren-core-py     │
+│ (Rust / PyO3)    │
+│                  │
+│ SessionContext   │
+│ ManifestExtractor│
+│ transform_sql()  │
+└──────────────────┘
+```
+
+## Components
+
+### CLI layer
+
+The top-level command router, built on [Typer](https://typer.tiangolo.com/). It parses flags, discovers the MDL project and active profile, then delegates to WrenEngine or the appropriate subsystem.
+
+| Command | What it does |
+|---------|-------------|
+| `wren --sql` / `wren query` | Plan + execute SQL, return results |
+| `wren dry-plan` | Plan only — show the expanded SQL without executing |
+| `wren dry-run` | Validate SQL against the live database without returning rows |
+| `wren context` | Project management — init, validate, build, show |
+| `wren profile` | Connection management — add, switch, list, debug, rm |
+| `wren memory` | Schema indexing and NL-SQL recall |
+| `wren utils` | Type normalization utilities |
+
+### WrenEngine
+
+The central orchestrator (`engine.py`). It owns the plan-then-execute pipeline:
+
+1. Receive user SQL
+2. Call the SQL planning subsystem to expand MDL references
+3. Pass the planned SQL to a connector for execution
+4. Return results as a PyArrow table
+
+### SQL planning
+
+Transforms user SQL from semantic model references to executable database SQL. Three libraries collaborate:
+
+- **sqlglot** — parses SQL, qualifies table/column references, transpiles between dialects
+- **CTE Rewriter** — identifies which MDL models are referenced, builds a CTE for each, and injects them into the query
+- **wren-core-py** — Rust engine (via PyO3 bindings) that expands model definitions, resolves calculated fields, and handles relationship joins
+
+The planning pipeline:
+
+```
+User SQL (e.g. SELECT * FROM orders WHERE status = 'pending')
+  │
+  ├── sqlglot: parse → qualify tables → normalize identifiers
+  ├── Extract referenced table names → ["orders"]
+  ├── ManifestExtractor: filter MDL to only referenced models
+  ├── Policy check (strict mode, denied functions)
+  ├── CTE Rewriter:
+  │     ├── For each model: wren-core transform_sql() → expanded CTE
+  │     └── Inject CTEs into original query
+  └── sqlglot: transpile to target dialect (postgres, bigquery, etc.)
+        │
+        ▼
+  WITH "orders" AS (
+    SELECT o_orderkey, o_custkey, o_totalprice
+    FROM "public"."orders"
+  )
+  SELECT * FROM "orders" WHERE status = 'pending'
+```
+
+### Connectors
+
+Data source connectors execute the planned SQL against the actual database. Each connector implements a common interface for query execution, dry-run validation, and connection lifecycle.
+
+Supported data sources: PostgreSQL, MySQL, BigQuery, Snowflake, DuckDB, ClickHouse, Trino, SQL Server, Databricks, Redshift, Oracle, Athena, Apache Spark, and more.
+
+Each connector:
+- Receives dialect-specific SQL from the planning stage
+- Executes against the target database
+- Handles type coercion (Decimal, UUID, etc.)
+- Returns a PyArrow table
+
+### Profile management
+
+Stores named database connections in `~/.wren/profiles.yml`. One profile is active at a time. All `wren` commands use the active profile unless overridden with explicit flags.
+
+See [Profiles](../guide/profiles.md) for details.
+
+### Context management
+
+Manages the MDL project lifecycle — YAML authoring, validation, and compilation to `target/mdl.json`.
+
+Key operations:
+- `wren context init` — scaffold a new project (or import from existing `mdl.json`)
+- `wren context validate` — check YAML structure without a database
+- `wren context build` — compile snake_case YAML to camelCase JSON
+- `wren context show` — display the current project summary
+
+See [Wren Project](../guide/modeling/wren_project.md) for the project format.
+
+### Memory layer
+
+A LanceDB-backed semantic index with two collections:
+
+| Collection | Contents | Purpose |
+|------------|----------|---------|
+| **schema_items** | Models, columns, relationships, views | Semantic schema search per question |
+| **query_history** | Confirmed NL → SQL pairs | Few-shot recall for similar questions |
+
+The memory layer enables the self-learning loop: each confirmed query improves future recall accuracy.
+
+See [Memory](../guide/memory.md) for details.
+
+### wren-core (Rust engine)
+
+The core semantic engine, written in Rust and exposed to Python via PyO3 bindings (`wren-core-py`). It handles:
+
+- **SessionContext** — maintains the MDL state and provides `transform_sql()` for expanding model definitions into SQL
+- **ManifestExtractor** — filters the full MDL manifest to only the models referenced in a query, reducing planning overhead
+- **Model expansion** — resolves `table_reference` and `ref_sql` models into physical SQL, handles calculated fields, and expands relationship joins
+
+The Rust engine is where the MDL semantics are enforced — it is the source of truth for how models map to SQL.
+
+## Data flows
+
+### Query execution
+
+```
+wren --sql "SELECT customer_id, SUM(total) FROM orders GROUP BY 1"
+  │
+  ├── 1. Discover MDL: project auto-discovery → target/mdl.json
+  ├── 2. Resolve connection: active profile → ~/.wren/profiles.yml
+  ├── 3. Plan: sqlglot parse → extract models → wren-core CTE expand → transpile
+  ├── 4. Execute: connector → database → PyArrow table
+  └── 5. Output: format as table / csv / json
+```
+
+### Project build
+
+```
+wren context build
+  │
+  ├── Read wren_project.yml + models/*/ + views/*/ + relationships.yml
+  ├── Validate structure and references
+  ├── Convert snake_case → camelCase
+  └── Write target/mdl.json
+```
+
+### Memory lifecycle
+
+```
+wren memory index          → Parse MDL, embed schema items, store in LanceDB
+wren memory fetch -q "..." → Embed query, search schema_items, return context
+wren memory recall -q "..."→ Embed query, search query_history, return examples
+wren memory store          → Embed NL-SQL pair, append to query_history
+```
+
+## Key dependencies
+
+| Dependency | Role |
+|------------|------|
+| **wren-core-py** | Rust semantic engine (PyO3 bindings) |
+| **sqlglot** | SQL parsing, qualification, dialect transpilation |
+| **database connectors** | Data source execution layer |
+| **pyarrow** | Query result representation |
+| **lancedb** | Vector storage for memory layer |
+| **sentence-transformers** | Local embeddings for memory search |
+| **typer** | CLI framework |
+| **pydantic** | Config and connection validation |