From 5f877b78bfd8daac2e274f7e9f3050c11ff3d271 Mon Sep 17 00:00:00 2001 From: Jax Liu Date: Wed, 11 Mar 2026 10:58:42 +0800 Subject: [PATCH 1/3] docs: expand CLAUDE.md with full build, architecture, and skills reference Co-Authored-By: Claude Sonnet 4.6 --- .claude/CLAUDE.md | 180 +++++++++++++++++++++++++++++++++++++++------- 1 file changed, 155 insertions(+), 25 deletions(-) diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index 84bbb94df..6b3672491 100644 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -1,28 +1,94 @@ # CLAUDE.md -Wren Engine (OSS) — open-source semantic SQL engine for MCP clients and AI agents. Translates queries through MDL (Modeling Definition Language) against 22+ data sources, powered by Apache DataFusion (Canner fork). +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Project Overview + +Wren Engine is a semantic engine for MCP clients and AI agents. It translates SQL queries through a semantic layer (MDL - Modeling Definition Language) and executes them against 22+ data sources (PostgreSQL, BigQuery, Snowflake, Spark, etc.). The engine is powered by Apache DataFusion (Canner fork). ## Repository Structure -- **wren-core/** — Rust semantic engine: MDL analysis, query planning, DataFusion integration -- **wren-core-base/** — Shared Rust crate: manifest types (`Model`, `Column`, `Metric`, `Relationship`, `View`) -- **wren-core-py/** — PyO3 Python bindings for wren-core, built with Maturin -- **ibis-server/** — FastAPI REST server: query execution, validation, 22 connector backends -- **mcp-server/** — MCP server exposing Wren Engine to AI agents -- **skills/** — User-facing MCP skills (wren-sql, wren-quickstart, etc.) +Four main modules: + +- **wren-core/** — Rust semantic engine (Cargo workspace: `core/`, `sqllogictest/`, `benchmarks/`, `wren-example/`). Handles MDL analysis, query planning, logical plan optimization, and SQL generation via DataFusion. +- **wren-core-base/** — Shared Rust crate with manifest types (`Model`, `Column`, `Metric`, `Relationship`, `View`). Has optional `python-binding` feature for PyO3 compatibility. +- **wren-core-py/** — PyO3 bindings exposing wren-core to Python. Built with Maturin. +- **ibis-server/** — FastAPI web server (Python 3.11). Provides REST API for query execution, validation, and metadata. Uses Ibis framework for data source connectivity. +- **mcp-server/** — MCP server exposing Wren Engine to AI agents (Claude, Cline, Cursor). + +Supporting modules: `wren-core-legacy/` (Java engine, fallback for v2 queries), `mock-web-server/`, `benchmark/`, `example/`. + +## Build & Development Commands -Supporting: `wren-core-legacy/` (Java fallback for v2), `mock-web-server/`, `benchmark/`, `example/` +### wren-core (Rust) +```bash +cd wren-core +cargo check --all-targets # Compile check +cargo test --lib --tests --bins # Run tests (set RUST_MIN_STACK=8388608) +cargo fmt --all # Format Rust code +cargo clippy --all-targets --all-features -- -D warnings # Lint +taplo fmt # Format Cargo.toml files +``` + +Most unit tests are in `wren-core/core/src/mdl/mod.rs`. SQL end-to-end tests use sqllogictest files in `wren-core/sqllogictest/test_files/`. -## Build Quick Reference +### wren-core-py (Python bindings) +```bash +cd wren-core-py +just install # Poetry install +just develop # Build dev wheel with maturin +just test-rs # Rust tests (cargo test --no-default-features) +just test-py # Python tests (pytest) +just test # Both +just format # cargo fmt + ruff + taplo +``` + +### ibis-server (FastAPI) +```bash +cd ibis-server +just install # Poetry install + build wren-core-py wheel + cython rebuild +just dev # Dev server on port 8000 +just run # Production server on port 8000 +just test # Run pytest with marker (e.g., just test postgres) +just lint # ruff format check + ruff check +just format # ruff auto-fix + taplo +``` -| Module | Install | Test | Format / Lint | -|--------|---------|------|---------------| -| wren-core | — | `cargo test --lib --tests --bins` | `cargo fmt --all` / `cargo clippy` | -| wren-core-py | `just install` | `just test` | `just format` | -| ibis-server | `just install` | `just test ` | `just format` / `just lint` | -| mcp-server | `uv sync` | — | — | +Available test markers: `postgres`, `mysql`, `mssql`, `bigquery`, `snowflake`, `clickhouse`, `trino`, `oracle`, `athena`, `duckdb`, `athena_spark`, `databricks`, `spark`, `local_file`, `s3_file`, `gcs_file`, `minio_file`, `functions`, `profile`, `cache`, `unit`, `enterprise`, `beta`. + +### mcp-server +Uses `uv` for dependency management. See `mcp-server/README.md`. + +### Docker (ibis-server image) + +```bash +cd ibis-server +just docker-build # current platform, Rust built locally +just docker-build linux/amd64 # single specific platform +just docker-build linux/amd64,linux/arm64 --push # multi-arch (must --push, cannot load locally) +``` + +**Two build strategies controlled by `WHEEL_SOURCE` build-arg:** + +| Scenario | Strategy | Speed | +|----------|----------|-------| +| Target == host platform | `WHEEL_SOURCE=local` — Rust built on host via maturin+zig | Fast (reuses host cargo cache) | +| Cross-platform / multi-arch | `WHEEL_SOURCE=docker` — Rust built inside Docker via BuildKit cache mounts | Slow first build, incremental after | + +`just docker-build` auto-detects host platform and chooses the right strategy. + +**Prerequisites for local strategy (one-time setup):** +```bash +brew install zig +rustup target add aarch64-unknown-linux-gnu # Apple Silicon +rustup target add x86_64-unknown-linux-gnu # Intel Mac +``` -> Detailed commands, env vars, and test markers → see each module's README +**Key constraints:** +- Multi-arch builds always use `WHEEL_SOURCE=docker` (Rust compiled inside Docker) +- Multi-arch images cannot be loaded locally — `--push` to a registry is required +- BuildKit must be enabled (`DOCKER_BUILDKIT=1` or Docker Desktop default) +- Build contexts required: `wren-core-py`, `wren-core`, `wren-core-base`, `mcp-server` (all relative to `ibis-server/`) ## Architecture: Query Flow @@ -32,19 +98,83 @@ SQL Query → ibis-server (FastAPI v3 router) → wren-core-py (PyO3 FFI) → wren-core (Rust: MDL analysis → logical plan → optimization) → DataFusion (query planning) - → Connector (Ibis/sqlglot → dialect SQL) + → Connector (data source-specific SQL via Ibis/sqlglot) → Native execution (Postgres, BigQuery, etc.) + → Response (with optional query caching) ``` -Fallback: if wren-core (v3) fails, ibis-server retries via wren-core-legacy (Java, v2). +If wren-core (v3) fails, ibis-server falls back to the legacy Java engine (v2). -Key files: `ibis-server/app/routers/v3/connector.py`, `wren-core/core/src/logical_plan/analyze/` +## Key Architecture Details + +**wren-core internals** (`wren-core/core/src/`): +- `mdl/` — Core MDL processing: `WrenMDL` (manifest + symbol table), `AnalyzedWrenMDL` (with lineage), function definitions (scalar/aggregate/window per dialect), type planning +- `logical_plan/analyze/` — DataFusion analyzer rules: `ModelAnalyzeRule` (TableScan → ModelPlanNode), scope tracking, access control (RLAC/CLAC), view expansion, relationship chain resolution +- `logical_plan/optimize/` — Optimization passes: type coercion, timestamp simplification +- `sql/` — SQL parsing and analysis + +**ibis-server internals** (`ibis-server/app/`): +- `routers/v3/connector.py` — Main API endpoints (query, validate, dry-plan, metadata) +- `model/metadata/` — Per-connector implementations (22 connectors), each with its own metadata handling +- `model/metadata/factory.py` — Connector instantiation +- `mdl/` — MDL processing: `core.py` (session context), `rewriter.py` (query rewriting), `substitute.py` (model substitution) +- `custom_ibis/`, `custom_sqlglot/` — Ibis and SQLGlot extensions for Wren-specific behavior + +**Manifest types** (`wren-core-base/src/mdl/`): +- `manifest.rs` — `Manifest`, `Model`, `Column`, `Metric`, `Relationship`, `View`, `RowLevelAccessControl`, `ColumnLevelAccessControl` +- `builder.rs` — Fluent `ManifestBuilder` API +- Uses `wren-manifest-macro` for auto-generating Pydantic-compatible Python classes + +## Running ibis-server Tests Locally + +Required environment variables (see `.github/workflows/ibis-ci.yml` for CI values): +```bash +export QUERY_CACHE_STORAGE_TYPE=local +export WREN_ENGINE_ENDPOINT=http://localhost:8080 +export WREN_WEB_ENDPOINT=http://localhost:3000 +export PROFILING_STORE_PATH=file:///tmp/profiling +``` + +On macOS, `psycopg2` may fail to build due to missing OpenSSL linkage: +```bash +LDFLAGS="-L$(brew --prefix openssl)/lib" CPPFLAGS="-I$(brew --prefix openssl)/include" just install +``` + +Connector tests use testcontainers (Docker required). Example running a single connector: +```bash +just test clickhouse # runs pytest -m clickhouse +``` + +TPCH test data is generated via DuckDB's TPCH extension (`CALL dbgen(sf=0.01)`) and loaded into the testcontainer at module scope. See `tests/routers/v3/connector/clickhouse/conftest.py` for the pattern. + +## Known wren-core Limitations + +**ModelAnalyzeRule — correlated subquery column resolution**: The `ModelAnalyzeRule` in `wren-core/core/src/logical_plan/analyze/` cannot resolve outer column references inside correlated subqueries. It only sees the subquery's own table scope. This affects TPCH Q2, Q4, Q15, Q17, Q20, Q21, Q22. See `ibis-server/tests/routers/v3/connector/clickhouse/TPCH_ISSUES.md`. ## Conventions -- **Commits**: Conventional commits — `feat:`, `fix:`, `chore:`, `refactor:`, `test:`, `docs:`, `perf:`, `deps:` -- **Releases**: Automated via release-please -- **Rust**: `cargo fmt`, `clippy -D warnings`, `taplo fmt` for TOML -- **Python**: `ruff` (line-length 88, Python 3.11 target), Poetry for deps -- **Snapshot tests**: wren-core uses `insta` -- **CI**: Rust CI on `wren-core/**`; ibis CI on all PRs; core-py CI on `wren-core-py/**` or `wren-core/**` +- **Commits**: Conventional commits (`feat:`, `fix:`, `chore:`, `refactor:`, `test:`, `docs:`, `perf:`, `deps:`). Releases are automated via release-please. +- **Rust**: Format with `cargo fmt`, lint with `clippy -D warnings`, TOML formatting with `taplo`. +- **Python**: Format and lint with `ruff` (line-length 88, target Python 3.11). Poetry for dependency management. +- **DataFusion fork**: `https://github.com/Canner/datafusion.git` branch `canner/v49.0.1`. Also forked Ibis: `https://github.com/Canner/ibis.git` branch `canner/10.8.1`. +- **Snapshot testing**: wren-core uses `insta` for Rust snapshot tests. +- **CI**: Rust CI runs on `wren-core/**` changes. ibis CI runs on all PRs. Core-py CI runs on `wren-core-py/**` or `wren-core/**` changes. + +## Skills + +Project-level skills are stored in `.claude/skills/`. Use these when working with Wren Engine SQL: + +- **wren-text-to-sql** — Rules for generating SQL queries targeting Wren Engine (MDL models, filter strategies, data types, aggregation). Trigger when asked to write SQL for Wren. +- **wren-sql-correction** — Diagnostic workflow for fixing SQL errors across parsing, planning, transpiling, and execution stages. Trigger when debugging a Wren SQL error. +- **wren-bigquery-dialect** — BigQuery-specific SQL rules (TIMESTAMP intervals, type casting, DATE_DIFF argument order, GROUP BY alias restrictions). Trigger when `dataSource` is BigQuery. +- **wren-array-types** — ARRAY literal syntax and UNNEST patterns. Trigger when writing SQL involving array columns. +- **wren-calculated-fields** — How to interpret and use pre-computed Calculated Field columns in MDL. Trigger when the schema contains columns marked as Calculated Fields. +- **wren-date-time** — DATE_TRUNC, EXTRACT, DATE_DIFF, interval arithmetic, epoch conversion. Trigger when writing date/time SQL. +- **wren-semi-structured-types** — GET_PATH, AS_VARCHAR/AS_INTEGER/AS_ARRAY for JSON/VARIANT/OBJECT columns. Trigger when querying semi-structured data. +- **wren-structured-types** — STRUCT type definition and dot-notation field access. Trigger when querying STRUCT columns. +- **wren-mcp-usage** — Wren MCP server setup, tool reference, connection config, and query workflow. Trigger when working on or asking about the MCP server. + +Additional skills in `skills/` (agentskills.io format, portable across agent tools): + +- **generate-mdl** (`skills/generate-mdl/SKILL.md`) — Step-by-step workflow for generating a Wren MDL manifest from a database via ibis-server metadata endpoints (no local DB drivers required). Trigger when a user wants to create a new MDL, onboard a new data source, or scaffold a manifest from an existing database. +- **wren-project** (`skills/wren-project/SKILL.md`) — Save, load, and build Wren MDL manifests as YAML project directories. Trigger when a user wants to persist MDL as YAML files, load a YAML project, or compile to `target/mdl.json`. From d6c5c9a4854ab19c1b4eb366919ef530a2601b19 Mon Sep 17 00:00:00 2001 From: Jax Liu Date: Wed, 11 Mar 2026 11:02:27 +0800 Subject: [PATCH 2/3] chore: remove non-exist skill --- .claude/CLAUDE.md | 19 ------------------- 1 file changed, 19 deletions(-) diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index 6b3672491..7597e8d48 100644 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -159,22 +159,3 @@ TPCH test data is generated via DuckDB's TPCH extension (`CALL dbgen(sf=0.01)`) - **DataFusion fork**: `https://github.com/Canner/datafusion.git` branch `canner/v49.0.1`. Also forked Ibis: `https://github.com/Canner/ibis.git` branch `canner/10.8.1`. - **Snapshot testing**: wren-core uses `insta` for Rust snapshot tests. - **CI**: Rust CI runs on `wren-core/**` changes. ibis CI runs on all PRs. Core-py CI runs on `wren-core-py/**` or `wren-core/**` changes. - -## Skills - -Project-level skills are stored in `.claude/skills/`. Use these when working with Wren Engine SQL: - -- **wren-text-to-sql** — Rules for generating SQL queries targeting Wren Engine (MDL models, filter strategies, data types, aggregation). Trigger when asked to write SQL for Wren. -- **wren-sql-correction** — Diagnostic workflow for fixing SQL errors across parsing, planning, transpiling, and execution stages. Trigger when debugging a Wren SQL error. -- **wren-bigquery-dialect** — BigQuery-specific SQL rules (TIMESTAMP intervals, type casting, DATE_DIFF argument order, GROUP BY alias restrictions). Trigger when `dataSource` is BigQuery. -- **wren-array-types** — ARRAY literal syntax and UNNEST patterns. Trigger when writing SQL involving array columns. -- **wren-calculated-fields** — How to interpret and use pre-computed Calculated Field columns in MDL. Trigger when the schema contains columns marked as Calculated Fields. -- **wren-date-time** — DATE_TRUNC, EXTRACT, DATE_DIFF, interval arithmetic, epoch conversion. Trigger when writing date/time SQL. -- **wren-semi-structured-types** — GET_PATH, AS_VARCHAR/AS_INTEGER/AS_ARRAY for JSON/VARIANT/OBJECT columns. Trigger when querying semi-structured data. -- **wren-structured-types** — STRUCT type definition and dot-notation field access. Trigger when querying STRUCT columns. -- **wren-mcp-usage** — Wren MCP server setup, tool reference, connection config, and query workflow. Trigger when working on or asking about the MCP server. - -Additional skills in `skills/` (agentskills.io format, portable across agent tools): - -- **generate-mdl** (`skills/generate-mdl/SKILL.md`) — Step-by-step workflow for generating a Wren MDL manifest from a database via ibis-server metadata endpoints (no local DB drivers required). Trigger when a user wants to create a new MDL, onboard a new data source, or scaffold a manifest from an existing database. -- **wren-project** (`skills/wren-project/SKILL.md`) — Save, load, and build Wren MDL manifests as YAML project directories. Trigger when a user wants to persist MDL as YAML files, load a YAML project, or compile to `target/mdl.json`. From e8d57a7a7439816a3afca928cefe75c2031c5676 Mon Sep 17 00:00:00 2001 From: Jax Liu Date: Wed, 11 Mar 2026 11:05:33 +0800 Subject: [PATCH 3/3] docs: add doris test marker and clarify project description Co-Authored-By: Claude Sonnet 4.6 --- .claude/CLAUDE.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index 7597e8d48..ba9eedd50 100644 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co ## Project Overview -Wren Engine is a semantic engine for MCP clients and AI agents. It translates SQL queries through a semantic layer (MDL - Modeling Definition Language) and executes them against 22+ data sources (PostgreSQL, BigQuery, Snowflake, Spark, etc.). The engine is powered by Apache DataFusion (Canner fork). +Wren Engine (OSS) is an open source semantic engine for MCP clients and AI agents. It translates SQL queries through a semantic layer (MDL - Modeling Definition Language) and executes them against 22+ data sources (PostgreSQL, BigQuery, Snowflake, Spark, etc.). The engine is powered by Apache DataFusion (Canner fork). ## Repository Structure @@ -54,7 +54,7 @@ just lint # ruff format check + ruff check just format # ruff auto-fix + taplo ``` -Available test markers: `postgres`, `mysql`, `mssql`, `bigquery`, `snowflake`, `clickhouse`, `trino`, `oracle`, `athena`, `duckdb`, `athena_spark`, `databricks`, `spark`, `local_file`, `s3_file`, `gcs_file`, `minio_file`, `functions`, `profile`, `cache`, `unit`, `enterprise`, `beta`. +Available test markers: `postgres`, `mysql`, `mssql`, `bigquery`, `snowflake`, `clickhouse`, `trino`, `oracle`, `athena`, `duckdb`, `athena_spark`, `databricks`, `spark`, `doris`, `local_file`, `s3_file`, `gcs_file`, `minio_file`, `functions`, `profile`, `cache`, `unit`, `enterprise`, `beta`. ### mcp-server Uses `uv` for dependency management. See `mcp-server/README.md`.