Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
161 changes: 161 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
# AGENTS.md

This file provides guidance to Codex (Codex.ai/code) when working with code in this repository.

## Project Overview

Wren Engine (OSS) is an open source semantic engine for MCP clients and AI agents. It translates SQL queries through a semantic layer (MDL - Modeling Definition Language) and executes them against 22+ data sources (PostgreSQL, BigQuery, Snowflake, Spark, etc.). The engine is powered by Apache DataFusion (Canner fork).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent data source count.

Line 7 claims "22+ data sources" but README.md lines 162-178 only list 17 sources. Verify the correct count and ensure consistency across documentation.

Suggested verification
#!/bin/bash
# Description: Count actual connector implementations

echo "Connectors in ibis-server:"
fd -t f -e py . ibis-server/app/model/metadata/ --exec basename {} .py | grep -v '__' | grep -v 'factory' | grep -v 'base' | sort | wc -l

echo -e "\nListed in README:"
echo "17"
🧰 Tools
🪛 LanguageTool

[grammar] ~7-~7: Use a hyphen to join words.
Context: ...t Overview Wren Engine (OSS) is an open source semantic engine for MCP clients a...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AGENTS.md` at line 7, The AGENTS.md claim "22+ data sources" is inconsistent
with README.md's list of 17 connectors; verify the actual connector count by
counting connector implementations (e.g., files under
ibis-server/app/model/metadata) and either update AGENTS.md to the correct
number or expand the README list to match; specifically edit the phrase "Wren
Engine (OSS) ... executes them against 22+ data sources" in AGENTS.md and the
connector list in README.md (lines enumerating sources) so both reflect the
verified count and ensure any summary header (e.g., "22+ data sources") and the
detailed list stay consistent.


## Repository Structure

Four main modules:

- **wren-core/** — Rust semantic engine (Cargo workspace: `core/`, `sqllogictest/`, `benchmarks/`, `wren-example/`). Handles MDL analysis, query planning, logical plan optimization, and SQL generation via DataFusion.
- **wren-core-base/** — Shared Rust crate with manifest types (`Model`, `Column`, `Metric`, `Relationship`, `View`). Has optional `python-binding` feature for PyO3 compatibility.
- **wren-core-py/** — PyO3 bindings exposing wren-core to Python. Built with Maturin.
- **ibis-server/** — FastAPI web server (Python 3.11). Provides REST API for query execution, validation, and metadata. Uses Ibis framework for data source connectivity.
- **mcp-server/** — MCP server exposing Wren Engine to AI agents (Codex, Cline, Cursor).

Supporting modules: `wren-core-legacy/` (Java engine, fallback for v2 queries), `mock-web-server/`, `benchmark/`, `example/`.

## Build & Development Commands

### wren-core (Rust)
```bash
cd wren-core
cargo check --all-targets # Compile check
cargo test --lib --tests --bins # Run tests (set RUST_MIN_STACK=8388608)
cargo fmt --all # Format Rust code
cargo clippy --all-targets --all-features -- -D warnings # Lint
taplo fmt # Format Cargo.toml files
```

Most unit tests are in `wren-core/core/src/mdl/mod.rs`. SQL end-to-end tests use sqllogictest files in `wren-core/sqllogictest/test_files/`.

### wren-core-py (Python bindings)
```bash
cd wren-core-py
just install # Poetry install
just develop # Build dev wheel with maturin
just test-rs # Rust tests (cargo test --no-default-features)
just test-py # Python tests (pytest)
just test # Both
just format # cargo fmt + ruff + taplo
```

### ibis-server (FastAPI)
```bash
cd ibis-server
just install # Poetry install + build wren-core-py wheel + cython rebuild
just dev # Dev server on port 8000
just run # Production server on port 8000
just test <MARKER> # Run pytest with marker (e.g., just test postgres)
just lint # ruff format check + ruff check
just format # ruff auto-fix + taplo
```

Available test markers: `postgres`, `mysql`, `mssql`, `bigquery`, `snowflake`, `clickhouse`, `trino`, `oracle`, `athena`, `duckdb`, `athena_spark`, `databricks`, `spark`, `doris`, `local_file`, `s3_file`, `gcs_file`, `minio_file`, `functions`, `profile`, `cache`, `unit`, `enterprise`, `beta`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Extract registered pytest markers from pyproject.toml

rg -A 50 '^\[tool\.pytest\.ini_options\]' ibis-server/pyproject.toml | rg 'markers\s*=' -A 30

Repository: Canner/wren-engine

Length of output: 1142


Update test marker list to match pytest configuration.

The listed markers contain inaccuracies: athena_spark, profile, cache, unit, and enterprise are not registered in the pytest configuration, while canner and redshift are missing from the list. Update to: postgres, mysql, mssql, bigquery, snowflake, clickhouse, trino, oracle, athena, duckdb, databricks, spark, doris, local_file, s3_file, gcs_file, minio_file, functions, beta, canner, redshift.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AGENTS.md` at line 57, Update the test marker list string that starts with
"Available test markers:" to match the pytest config by replacing the inaccurate
entries and adding the missing ones so it reads: postgres, mysql, mssql,
bigquery, snowflake, clickhouse, trino, oracle, athena, duckdb, databricks,
spark, doris, local_file, s3_file, gcs_file, minio_file, functions, beta,
canner, redshift; modify the marker list text in the AGENTS.md line containing
that sentence to exactly reflect this corrected set.


### mcp-server
Uses `uv` for dependency management. See `mcp-server/README.md`.

### Docker (ibis-server image)

```bash
cd ibis-server
just docker-build # current platform, Rust built locally
just docker-build linux/amd64 # single specific platform
just docker-build linux/amd64,linux/arm64 --push # multi-arch (must --push, cannot load locally)
```

**Two build strategies controlled by `WHEEL_SOURCE` build-arg:**

| Scenario | Strategy | Speed |
|----------|----------|-------|
| Target == host platform | `WHEEL_SOURCE=local` — Rust built on host via maturin+zig | Fast (reuses host cargo cache) |
| Cross-platform / multi-arch | `WHEEL_SOURCE=docker` — Rust built inside Docker via BuildKit cache mounts | Slow first build, incremental after |

`just docker-build` auto-detects host platform and chooses the right strategy.

**Prerequisites for local strategy (one-time setup):**
```bash
brew install zig
rustup target add aarch64-unknown-linux-gnu # Apple Silicon
rustup target add x86_64-unknown-linux-gnu # Intel Mac
```

**Key constraints:**
- Multi-arch builds always use `WHEEL_SOURCE=docker` (Rust compiled inside Docker)
- Multi-arch images cannot be loaded locally — `--push` to a registry is required
- BuildKit must be enabled (`DOCKER_BUILDKIT=1` or Docker Desktop default)
- Build contexts required: `wren-core-py`, `wren-core`, `wren-core-base`, `mcp-server` (all relative to `ibis-server/`)

## Architecture: Query Flow

```
SQL Query → ibis-server (FastAPI v3 router)
→ MDL Processing (manifest cache, validation)
→ wren-core-py (PyO3 FFI)
→ wren-core (Rust: MDL analysis → logical plan → optimization)
→ DataFusion (query planning)
→ Connector (data source-specific SQL via Ibis/sqlglot)
→ Native execution (Postgres, BigQuery, etc.)
→ Response (with optional query caching)
```

If wren-core (v3) fails, ibis-server falls back to the legacy Java engine (v2).

## Key Architecture Details

**wren-core internals** (`wren-core/core/src/`):
- `mdl/` — Core MDL processing: `WrenMDL` (manifest + symbol table), `AnalyzedWrenMDL` (with lineage), function definitions (scalar/aggregate/window per dialect), type planning
- `logical_plan/analyze/` — DataFusion analyzer rules: `ModelAnalyzeRule` (TableScan → ModelPlanNode), scope tracking, access control (RLAC/CLAC), view expansion, relationship chain resolution
- `logical_plan/optimize/` — Optimization passes: type coercion, timestamp simplification
- `sql/` — SQL parsing and analysis

**ibis-server internals** (`ibis-server/app/`):
- `routers/v3/connector.py` — Main API endpoints (query, validate, dry-plan, metadata)
- `model/metadata/` — Per-connector implementations (22 connectors), each with its own metadata handling
- `model/metadata/factory.py` — Connector instantiation
- `mdl/` — MDL processing: `core.py` (session context), `rewriter.py` (query rewriting), `substitute.py` (model substitution)
- `custom_ibis/`, `custom_sqlglot/` — Ibis and SQLGlot extensions for Wren-specific behavior

**Manifest types** (`wren-core-base/src/mdl/`):
- `manifest.rs` — `Manifest`, `Model`, `Column`, `Metric`, `Relationship`, `View`, `RowLevelAccessControl`, `ColumnLevelAccessControl`
- `builder.rs` — Fluent `ManifestBuilder` API
- Uses `wren-manifest-macro` for auto-generating Pydantic-compatible Python classes
Comment on lines +110 to +126
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Verify key architecture paths mentioned in documentation

paths=(
  "wren-core/core/src/mdl"
  "wren-core/core/src/logical_plan/analyze"
  "wren-core/core/src/logical_plan/optimize"
  "wren-core/core/src/sql"
  "ibis-server/app/routers/v3/connector.py"
  "ibis-server/app/model/metadata/factory.py"
  "ibis-server/app/mdl/core.py"
  "wren-core-base/src/mdl/manifest.rs"
  "wren-core-base/src/mdl/builder.rs"
)

for path in "${paths[@]}"; do
  if [ -e "$path" ]; then
    echo "$path"
  else
    echo "✗ Missing: $path"
  fi
done

Repository: Canner/wren-engine

Length of output: 394


🏁 Script executed:

# Find sql-related directories and files in wren-core/core/src
find wren-core/core/src -type d -name "*sql*" 2>/dev/null | head -20

Repository: Canner/wren-engine

Length of output: 44


🏁 Script executed:

# Search for sql parser/analyzer in the codebase
rg -l "sql.*pars|parser.*sql" wren-core/core/src --type rust | head -20

Repository: Canner/wren-engine

Length of output: 514


🏁 Script executed:

# Check the directory structure of wren-core/core/src to understand module organization
ls -la wren-core/core/src/

Repository: Canner/wren-engine

Length of output: 338


🏁 Script executed:

# Check the module structure in lib.rs
cat wren-core/core/src/lib.rs

Repository: Canner/wren-engine

Length of output: 384


🏁 Script executed:

# Search for sql-related code across wren-core/core/src
rg -i "sql" wren-core/core/src --type rust -l | head -20

Repository: Canner/wren-engine

Length of output: 989


Remove inaccurate path reference for SQL parsing.

The wren-core/core/src/sql/ module does not exist. SQL parsing is handled via DataFusion's sqlparser library (re-exported in lib.rs), not a dedicated wren-core module. Remove this bullet point or replace it with clarification that SQL parsing is delegated to DataFusion.

Current (incorrect) snippet
- `sql/` — SQL parsing and analysis

All other referenced paths are accurate and verified.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AGENTS.md` around lines 110 - 126, Remove or update the incorrect `- \`sql/\`
— SQL parsing and analysis` bullet in the list of wren-core internals: delete
that line and replace it with a brief clarification that SQL parsing is
delegated to DataFusion’s sqlparser (re-exported in `lib.rs`) rather than
handled by a `wren-core/core/src/sql/` module, so references to SQL parsing
should point to the DataFusion/sqlparser re-export in `lib.rs`.


## Running ibis-server Tests Locally

Required environment variables (see `.github/workflows/ibis-ci.yml` for CI values):
```bash
export QUERY_CACHE_STORAGE_TYPE=local
export WREN_ENGINE_ENDPOINT=http://localhost:8080
export WREN_WEB_ENDPOINT=http://localhost:3000
export PROFILING_STORE_PATH=file:///tmp/profiling
```

On macOS, `psycopg2` may fail to build due to missing OpenSSL linkage:
```bash
LDFLAGS="-L$(brew --prefix openssl)/lib" CPPFLAGS="-I$(brew --prefix openssl)/include" just install
```

Connector tests use testcontainers (Docker required). Example running a single connector:
```bash
just test clickhouse # runs pytest -m clickhouse
```

TPCH test data is generated via DuckDB's TPCH extension (`CALL dbgen(sf=0.01)`) and loaded into the testcontainer at module scope. See `tests/routers/v3/connector/clickhouse/conftest.py` for the pattern.

## Known wren-core Limitations

**ModelAnalyzeRule — correlated subquery column resolution**: The `ModelAnalyzeRule` in `wren-core/core/src/logical_plan/analyze/` cannot resolve outer column references inside correlated subqueries. It only sees the subquery's own table scope. This affects TPCH Q2, Q4, Q15, Q17, Q20, Q21, Q22. See `ibis-server/tests/routers/v3/connector/clickhouse/TPCH_ISSUES.md`.

## Conventions

- **Commits**: Conventional commits (`feat:`, `fix:`, `chore:`, `refactor:`, `test:`, `docs:`, `perf:`, `deps:`). Releases are automated via release-please.
- **Rust**: Format with `cargo fmt`, lint with `clippy -D warnings`, TOML formatting with `taplo`.
- **Python**: Format and lint with `ruff` (line-length 88, target Python 3.11). Poetry for dependency management.
- **DataFusion fork**: `https://github.com/Canner/datafusion.git` branch `canner/v49.0.1`. Also forked Ibis: `https://github.com/Canner/ibis.git` branch `canner/10.8.1`.
- **Snapshot testing**: wren-core uses `insta` for Rust snapshot tests.
- **CI**: Rust CI runs on `wren-core/**` changes. ibis CI runs on all PRs. Core-py CI runs on `wren-core-py/**` or `wren-core/**` changes.
Loading
Loading