Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
161 changes: 136 additions & 25 deletions .claude/CLAUDE.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,94 @@
# CLAUDE.md

Wren Engine (OSS) β€” open-source semantic SQL engine for MCP clients and AI agents. Translates queries through MDL (Modeling Definition Language) against 22+ data sources, powered by Apache DataFusion (Canner fork).
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

Wren Engine (OSS) is an open source semantic engine for MCP clients and AI agents. It translates SQL queries through a semantic layer (MDL - Modeling Definition Language) and executes them against 22+ data sources (PostgreSQL, BigQuery, Snowflake, Spark, etc.). The engine is powered by Apache DataFusion (Canner fork).

## Repository Structure

- **wren-core/** β€” Rust semantic engine: MDL analysis, query planning, DataFusion integration
- **wren-core-base/** β€” Shared Rust crate: manifest types (`Model`, `Column`, `Metric`, `Relationship`, `View`)
- **wren-core-py/** β€” PyO3 Python bindings for wren-core, built with Maturin
- **ibis-server/** β€” FastAPI REST server: query execution, validation, 22 connector backends
- **mcp-server/** β€” MCP server exposing Wren Engine to AI agents
- **skills/** β€” User-facing MCP skills (wren-sql, wren-quickstart, etc.)
Four main modules:

- **wren-core/** β€” Rust semantic engine (Cargo workspace: `core/`, `sqllogictest/`, `benchmarks/`, `wren-example/`). Handles MDL analysis, query planning, logical plan optimization, and SQL generation via DataFusion.
- **wren-core-base/** β€” Shared Rust crate with manifest types (`Model`, `Column`, `Metric`, `Relationship`, `View`). Has optional `python-binding` feature for PyO3 compatibility.
- **wren-core-py/** β€” PyO3 bindings exposing wren-core to Python. Built with Maturin.
- **ibis-server/** β€” FastAPI web server (Python 3.11). Provides REST API for query execution, validation, and metadata. Uses Ibis framework for data source connectivity.
- **mcp-server/** β€” MCP server exposing Wren Engine to AI agents (Claude, Cline, Cursor).

Comment on lines +11 to +18
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟑 Minor

Fix the module count in this section.

Line 11 says β€œFour main modules,” but the list contains five entries (wren-core, wren-core-base, wren-core-py, ibis-server, mcp-server). Please make the heading and bullets agree.

πŸ€– Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/CLAUDE.md around lines 11 - 18, Update the heading text that
currently reads "Four main modules:" to accurately reflect the list length
(change it to "Five main modules:") in the .claude/CLAUDE.md section that lists
wren-core, wren-core-base, wren-core-py, ibis-server, and mcp-server so the
heading and the bullet list agree.

Supporting modules: `wren-core-legacy/` (Java engine, fallback for v2 queries), `mock-web-server/`, `benchmark/`, `example/`.

## Build & Development Commands

### wren-core (Rust)
```bash
cd wren-core
cargo check --all-targets # Compile check
cargo test --lib --tests --bins # Run tests (set RUST_MIN_STACK=8388608)
cargo fmt --all # Format Rust code
cargo clippy --all-targets --all-features -- -D warnings # Lint
taplo fmt # Format Cargo.toml files
```

Most unit tests are in `wren-core/core/src/mdl/mod.rs`. SQL end-to-end tests use sqllogictest files in `wren-core/sqllogictest/test_files/`.

### wren-core-py (Python bindings)
```bash
cd wren-core-py
just install # Poetry install
just develop # Build dev wheel with maturin
just test-rs # Rust tests (cargo test --no-default-features)
just test-py # Python tests (pytest)
just test # Both
just format # cargo fmt + ruff + taplo
```

### ibis-server (FastAPI)
```bash
cd ibis-server
just install # Poetry install + build wren-core-py wheel + cython rebuild
just dev # Dev server on port 8000
just run # Production server on port 8000
just test <MARKER> # Run pytest with marker (e.g., just test postgres)
just lint # ruff format check + ruff check
just format # ruff auto-fix + taplo
```

Available test markers: `postgres`, `mysql`, `mssql`, `bigquery`, `snowflake`, `clickhouse`, `trino`, `oracle`, `athena`, `duckdb`, `athena_spark`, `databricks`, `spark`, `doris`, `local_file`, `s3_file`, `gcs_file`, `minio_file`, `functions`, `profile`, `cache`, `unit`, `enterprise`, `beta`.

### mcp-server
Uses `uv` for dependency management. See `mcp-server/README.md`.

### Docker (ibis-server image)

```bash
cd ibis-server
just docker-build # current platform, Rust built locally
just docker-build linux/amd64 # single specific platform
just docker-build linux/amd64,linux/arm64 --push # multi-arch (must --push, cannot load locally)
```

**Two build strategies controlled by `WHEEL_SOURCE` build-arg:**

Supporting: `wren-core-legacy/` (Java fallback for v2), `mock-web-server/`, `benchmark/`, `example/`
| Scenario | Strategy | Speed |
|----------|----------|-------|
| Target == host platform | `WHEEL_SOURCE=local` β€” Rust built on host via maturin+zig | Fast (reuses host cargo cache) |
| Cross-platform / multi-arch | `WHEEL_SOURCE=docker` β€” Rust built inside Docker via BuildKit cache mounts | Slow first build, incremental after |

## Build Quick Reference
`just docker-build` auto-detects host platform and chooses the right strategy.

| Module | Install | Test | Format / Lint |
|--------|---------|------|---------------|
| wren-core | β€” | `cargo test --lib --tests --bins` | `cargo fmt --all` / `cargo clippy` |
| wren-core-py | `just install` | `just test` | `just format` |
| ibis-server | `just install` | `just test <MARKER>` | `just format` / `just lint` |
| mcp-server | `uv sync` | β€” | β€” |
**Prerequisites for local strategy (one-time setup):**
```bash
brew install zig
rustup target add aarch64-unknown-linux-gnu # Apple Silicon
rustup target add x86_64-unknown-linux-gnu # Intel Mac
```
Comment on lines +80 to +85
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟑 Minor

Scope these prerequisites to macOS or add a Linux variant.

brew install zig is macOS-specific, but the heading reads like universal setup for the local build strategy. That will mislead non-macOS contributors unless this is labeled explicitly or paired with Linux instructions.

πŸ“ Suggested wording
-**Prerequisites for local strategy (one-time setup):**
+**Prerequisites for local strategy on macOS (one-time setup):**

+If Linux is also supported for the local strategy, add the equivalent package-manager command there as well.

πŸ€– Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/CLAUDE.md around lines 80 - 85, The prerequisites under the
"Prerequisites for local strategy (one-time setup):" section currently list a
macOS-specific command ("brew install zig") as if universal; update this section
to either scope the macOS command (e.g., prefix "macOS:" before "brew install
zig") or add equivalent Linux package instructions (e.g., "Linux:" with
appropriate package-manager commands for Zig such as apt/yum/pacman or a link to
Zig install instructions) and keep the rustup target lines as a cross-platform
note (or label them "macOS / Linux"). Ensure the edit references the same
heading "Prerequisites for local strategy (one-time setup):" so readers on
non-macOS platforms see the correct install steps.


> Detailed commands, env vars, and test markers β†’ see each module's README
**Key constraints:**
- Multi-arch builds always use `WHEEL_SOURCE=docker` (Rust compiled inside Docker)
- Multi-arch images cannot be loaded locally β€” `--push` to a registry is required
- BuildKit must be enabled (`DOCKER_BUILDKIT=1` or Docker Desktop default)
- Build contexts required: `wren-core-py`, `wren-core`, `wren-core-base`, `mcp-server` (all relative to `ibis-server/`)

## Architecture: Query Flow

Expand All @@ -32,19 +98,64 @@ SQL Query β†’ ibis-server (FastAPI v3 router)
β†’ wren-core-py (PyO3 FFI)
β†’ wren-core (Rust: MDL analysis β†’ logical plan β†’ optimization)
β†’ DataFusion (query planning)
β†’ Connector (Ibis/sqlglot β†’ dialect SQL)
β†’ Connector (data source-specific SQL via Ibis/sqlglot)
β†’ Native execution (Postgres, BigQuery, etc.)
β†’ Response (with optional query caching)
```

If wren-core (v3) fails, ibis-server falls back to the legacy Java engine (v2).

## Key Architecture Details

**wren-core internals** (`wren-core/core/src/`):
- `mdl/` β€” Core MDL processing: `WrenMDL` (manifest + symbol table), `AnalyzedWrenMDL` (with lineage), function definitions (scalar/aggregate/window per dialect), type planning
- `logical_plan/analyze/` β€” DataFusion analyzer rules: `ModelAnalyzeRule` (TableScan β†’ ModelPlanNode), scope tracking, access control (RLAC/CLAC), view expansion, relationship chain resolution
- `logical_plan/optimize/` β€” Optimization passes: type coercion, timestamp simplification
- `sql/` β€” SQL parsing and analysis

**ibis-server internals** (`ibis-server/app/`):
- `routers/v3/connector.py` β€” Main API endpoints (query, validate, dry-plan, metadata)
- `model/metadata/` β€” Per-connector implementations (22 connectors), each with its own metadata handling
- `model/metadata/factory.py` β€” Connector instantiation
- `mdl/` β€” MDL processing: `core.py` (session context), `rewriter.py` (query rewriting), `substitute.py` (model substitution)
- `custom_ibis/`, `custom_sqlglot/` β€” Ibis and SQLGlot extensions for Wren-specific behavior

**Manifest types** (`wren-core-base/src/mdl/`):
- `manifest.rs` β€” `Manifest`, `Model`, `Column`, `Metric`, `Relationship`, `View`, `RowLevelAccessControl`, `ColumnLevelAccessControl`
- `builder.rs` β€” Fluent `ManifestBuilder` API
- Uses `wren-manifest-macro` for auto-generating Pydantic-compatible Python classes

## Running ibis-server Tests Locally

Required environment variables (see `.github/workflows/ibis-ci.yml` for CI values):
```bash
export QUERY_CACHE_STORAGE_TYPE=local
export WREN_ENGINE_ENDPOINT=http://localhost:8080
export WREN_WEB_ENDPOINT=http://localhost:3000
export PROFILING_STORE_PATH=file:///tmp/profiling
```

On macOS, `psycopg2` may fail to build due to missing OpenSSL linkage:
```bash
LDFLAGS="-L$(brew --prefix openssl)/lib" CPPFLAGS="-I$(brew --prefix openssl)/include" just install
```

Connector tests use testcontainers (Docker required). Example running a single connector:
```bash
just test clickhouse # runs pytest -m clickhouse
```

Fallback: if wren-core (v3) fails, ibis-server retries via wren-core-legacy (Java, v2).
TPCH test data is generated via DuckDB's TPCH extension (`CALL dbgen(sf=0.01)`) and loaded into the testcontainer at module scope. See `tests/routers/v3/connector/clickhouse/conftest.py` for the pattern.

## Known wren-core Limitations

Key files: `ibis-server/app/routers/v3/connector.py`, `wren-core/core/src/logical_plan/analyze/`
**ModelAnalyzeRule β€” correlated subquery column resolution**: The `ModelAnalyzeRule` in `wren-core/core/src/logical_plan/analyze/` cannot resolve outer column references inside correlated subqueries. It only sees the subquery's own table scope. This affects TPCH Q2, Q4, Q15, Q17, Q20, Q21, Q22. See `ibis-server/tests/routers/v3/connector/clickhouse/TPCH_ISSUES.md`.

## Conventions

- **Commits**: Conventional commits β€” `feat:`, `fix:`, `chore:`, `refactor:`, `test:`, `docs:`, `perf:`, `deps:`
- **Releases**: Automated via release-please
- **Rust**: `cargo fmt`, `clippy -D warnings`, `taplo fmt` for TOML
- **Python**: `ruff` (line-length 88, Python 3.11 target), Poetry for deps
- **Snapshot tests**: wren-core uses `insta`
- **CI**: Rust CI on `wren-core/**`; ibis CI on all PRs; core-py CI on `wren-core-py/**` or `wren-core/**`
- **Commits**: Conventional commits (`feat:`, `fix:`, `chore:`, `refactor:`, `test:`, `docs:`, `perf:`, `deps:`). Releases are automated via release-please.
- **Rust**: Format with `cargo fmt`, lint with `clippy -D warnings`, TOML formatting with `taplo`.
- **Python**: Format and lint with `ruff` (line-length 88, target Python 3.11). Poetry for dependency management.
- **DataFusion fork**: `https://github.com/Canner/datafusion.git` branch `canner/v49.0.1`. Also forked Ibis: `https://github.com/Canner/ibis.git` branch `canner/10.8.1`.
- **Snapshot testing**: wren-core uses `insta` for Rust snapshot tests.
- **CI**: Rust CI runs on `wren-core/**` changes. ibis CI runs on all PRs. Core-py CI runs on `wren-core-py/**` or `wren-core/**` changes.
Loading