diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 000000000..b6e049acb --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,161 @@ +# AGENTS.md + +This file provides guidance to Codex (Codex.ai/code) when working with code in this repository. + +## Project Overview + +Wren Engine (OSS) is an open source semantic engine for MCP clients and AI agents. It translates SQL queries through a semantic layer (MDL - Modeling Definition Language) and executes them against 22+ data sources (PostgreSQL, BigQuery, Snowflake, Spark, etc.). The engine is powered by Apache DataFusion (Canner fork). + +## Repository Structure + +Four main modules: + +- **wren-core/** — Rust semantic engine (Cargo workspace: `core/`, `sqllogictest/`, `benchmarks/`, `wren-example/`). Handles MDL analysis, query planning, logical plan optimization, and SQL generation via DataFusion. +- **wren-core-base/** — Shared Rust crate with manifest types (`Model`, `Column`, `Metric`, `Relationship`, `View`). Has optional `python-binding` feature for PyO3 compatibility. +- **wren-core-py/** — PyO3 bindings exposing wren-core to Python. Built with Maturin. +- **ibis-server/** — FastAPI web server (Python 3.11). Provides REST API for query execution, validation, and metadata. Uses Ibis framework for data source connectivity. +- **mcp-server/** — MCP server exposing Wren Engine to AI agents (Codex, Cline, Cursor). + +Supporting modules: `wren-core-legacy/` (Java engine, fallback for v2 queries), `mock-web-server/`, `benchmark/`, `example/`. + +## Build & Development Commands + +### wren-core (Rust) +```bash +cd wren-core +cargo check --all-targets # Compile check +cargo test --lib --tests --bins # Run tests (set RUST_MIN_STACK=8388608) +cargo fmt --all # Format Rust code +cargo clippy --all-targets --all-features -- -D warnings # Lint +taplo fmt # Format Cargo.toml files +``` + +Most unit tests are in `wren-core/core/src/mdl/mod.rs`. SQL end-to-end tests use sqllogictest files in `wren-core/sqllogictest/test_files/`. + +### wren-core-py (Python bindings) +```bash +cd wren-core-py +just install # Poetry install +just develop # Build dev wheel with maturin +just test-rs # Rust tests (cargo test --no-default-features) +just test-py # Python tests (pytest) +just test # Both +just format # cargo fmt + ruff + taplo +``` + +### ibis-server (FastAPI) +```bash +cd ibis-server +just install # Poetry install + build wren-core-py wheel + cython rebuild +just dev # Dev server on port 8000 +just run # Production server on port 8000 +just test # Run pytest with marker (e.g., just test postgres) +just lint # ruff format check + ruff check +just format # ruff auto-fix + taplo +``` + +Available test markers: `postgres`, `mysql`, `mssql`, `bigquery`, `snowflake`, `clickhouse`, `trino`, `oracle`, `athena`, `duckdb`, `athena_spark`, `databricks`, `spark`, `doris`, `local_file`, `s3_file`, `gcs_file`, `minio_file`, `functions`, `profile`, `cache`, `unit`, `enterprise`, `beta`. + +### mcp-server +Uses `uv` for dependency management. See `mcp-server/README.md`. + +### Docker (ibis-server image) + +```bash +cd ibis-server +just docker-build # current platform, Rust built locally +just docker-build linux/amd64 # single specific platform +just docker-build linux/amd64,linux/arm64 --push # multi-arch (must --push, cannot load locally) +``` + +**Two build strategies controlled by `WHEEL_SOURCE` build-arg:** + +| Scenario | Strategy | Speed | +|----------|----------|-------| +| Target == host platform | `WHEEL_SOURCE=local` — Rust built on host via maturin+zig | Fast (reuses host cargo cache) | +| Cross-platform / multi-arch | `WHEEL_SOURCE=docker` — Rust built inside Docker via BuildKit cache mounts | Slow first build, incremental after | + +`just docker-build` auto-detects host platform and chooses the right strategy. + +**Prerequisites for local strategy (one-time setup):** +```bash +brew install zig +rustup target add aarch64-unknown-linux-gnu # Apple Silicon +rustup target add x86_64-unknown-linux-gnu # Intel Mac +``` + +**Key constraints:** +- Multi-arch builds always use `WHEEL_SOURCE=docker` (Rust compiled inside Docker) +- Multi-arch images cannot be loaded locally — `--push` to a registry is required +- BuildKit must be enabled (`DOCKER_BUILDKIT=1` or Docker Desktop default) +- Build contexts required: `wren-core-py`, `wren-core`, `wren-core-base`, `mcp-server` (all relative to `ibis-server/`) + +## Architecture: Query Flow + +``` +SQL Query → ibis-server (FastAPI v3 router) + → MDL Processing (manifest cache, validation) + → wren-core-py (PyO3 FFI) + → wren-core (Rust: MDL analysis → logical plan → optimization) + → DataFusion (query planning) + → Connector (data source-specific SQL via Ibis/sqlglot) + → Native execution (Postgres, BigQuery, etc.) + → Response (with optional query caching) +``` + +If wren-core (v3) fails, ibis-server falls back to the legacy Java engine (v2). + +## Key Architecture Details + +**wren-core internals** (`wren-core/core/src/`): +- `mdl/` — Core MDL processing: `WrenMDL` (manifest + symbol table), `AnalyzedWrenMDL` (with lineage), function definitions (scalar/aggregate/window per dialect), type planning +- `logical_plan/analyze/` — DataFusion analyzer rules: `ModelAnalyzeRule` (TableScan → ModelPlanNode), scope tracking, access control (RLAC/CLAC), view expansion, relationship chain resolution +- `logical_plan/optimize/` — Optimization passes: type coercion, timestamp simplification +- `sql/` — SQL parsing and analysis + +**ibis-server internals** (`ibis-server/app/`): +- `routers/v3/connector.py` — Main API endpoints (query, validate, dry-plan, metadata) +- `model/metadata/` — Per-connector implementations (22 connectors), each with its own metadata handling +- `model/metadata/factory.py` — Connector instantiation +- `mdl/` — MDL processing: `core.py` (session context), `rewriter.py` (query rewriting), `substitute.py` (model substitution) +- `custom_ibis/`, `custom_sqlglot/` — Ibis and SQLGlot extensions for Wren-specific behavior + +**Manifest types** (`wren-core-base/src/mdl/`): +- `manifest.rs` — `Manifest`, `Model`, `Column`, `Metric`, `Relationship`, `View`, `RowLevelAccessControl`, `ColumnLevelAccessControl` +- `builder.rs` — Fluent `ManifestBuilder` API +- Uses `wren-manifest-macro` for auto-generating Pydantic-compatible Python classes + +## Running ibis-server Tests Locally + +Required environment variables (see `.github/workflows/ibis-ci.yml` for CI values): +```bash +export QUERY_CACHE_STORAGE_TYPE=local +export WREN_ENGINE_ENDPOINT=http://localhost:8080 +export WREN_WEB_ENDPOINT=http://localhost:3000 +export PROFILING_STORE_PATH=file:///tmp/profiling +``` + +On macOS, `psycopg2` may fail to build due to missing OpenSSL linkage: +```bash +LDFLAGS="-L$(brew --prefix openssl)/lib" CPPFLAGS="-I$(brew --prefix openssl)/include" just install +``` + +Connector tests use testcontainers (Docker required). Example running a single connector: +```bash +just test clickhouse # runs pytest -m clickhouse +``` + +TPCH test data is generated via DuckDB's TPCH extension (`CALL dbgen(sf=0.01)`) and loaded into the testcontainer at module scope. See `tests/routers/v3/connector/clickhouse/conftest.py` for the pattern. + +## Known wren-core Limitations + +**ModelAnalyzeRule — correlated subquery column resolution**: The `ModelAnalyzeRule` in `wren-core/core/src/logical_plan/analyze/` cannot resolve outer column references inside correlated subqueries. It only sees the subquery's own table scope. This affects TPCH Q2, Q4, Q15, Q17, Q20, Q21, Q22. See `ibis-server/tests/routers/v3/connector/clickhouse/TPCH_ISSUES.md`. + +## Conventions + +- **Commits**: Conventional commits (`feat:`, `fix:`, `chore:`, `refactor:`, `test:`, `docs:`, `perf:`, `deps:`). Releases are automated via release-please. +- **Rust**: Format with `cargo fmt`, lint with `clippy -D warnings`, TOML formatting with `taplo`. +- **Python**: Format and lint with `ruff` (line-length 88, target Python 3.11). Poetry for dependency management. +- **DataFusion fork**: `https://github.com/Canner/datafusion.git` branch `canner/v49.0.1`. Also forked Ibis: `https://github.com/Canner/ibis.git` branch `canner/10.8.1`. +- **Snapshot testing**: wren-core uses `insta` for Rust snapshot tests. +- **CI**: Rust CI runs on `wren-core/**` changes. ibis CI runs on all PRs. Core-py CI runs on `wren-core-py/**` or `wren-core/**` changes. diff --git a/README.md b/README.md index a449994be..a8feb1a6f 100644 --- a/README.md +++ b/README.md @@ -2,12 +2,16 @@ - + Wren AI logo

Wren Engine

+

+ The open context engine for AI agents +

+

@@ -23,99 +27,217 @@

-> Wren Engine is the Semantic Engine for MCP Clients and AI Agents. -> [Wren AI](https://github.com/Canner/WrenAI) GenBI AI Agent is based on Wren Engine. - - +> Wren Engine is the open foundation behind Wren AI: a semantic, governed, agent-ready context layer for business data. -## 🔌 Supported Data Sources -- [BigQuery](https://docs.getwren.ai/oss/wren_engine_api#tag/BigQueryProjectConnectionInfo) -- [Google Cloud Storage](https://docs.getwren.ai/oss/wren_engine_api#tag/GcsFileConnectionInfo) -- [Local Files](https://docs.getwren.ai/oss/wren_engine_api#tag/LocalFileConnectionInfo) -- [MS SQL Server](https://docs.getwren.ai/oss/wren_engine_api#tag/MSSqlConnectionInfo) -- [Minio](https://docs.getwren.ai/oss/wren_engine_api#tag/MinioFileConnectionInfo) -- [MySQL Server](https://docs.getwren.ai/oss/wren_engine_api#tag/MySqlConnectionInfo) -- [Oracle Server](https://docs.getwren.ai/oss/wren_engine_api#tag/OracleConnectionInfo) -- [PostgreSQL Server](https://docs.getwren.ai/oss/wren_engine_api#tag/PostgresConnectionInfo) -- [Amazon S3](https://docs.getwren.ai/oss/wren_engine_api#tag/S3FileConnectionInfo) -- [Snowflake](https://docs.getwren.ai/oss/wren_engine_api#tag/SnowflakeConnectionInfo) -- [Trino](https://docs.getwren.ai/oss/wren_engine_api#tag/TrinoConnectionInfo) -- [Athena](https://docs.getwren.ai/oss/wren_engine_api#tag/AthenaConnectionInfo) -- [Databricks](https://docs.getwren.ai/oss/wren_engine_api#tag/DatabricksTokenConnectionInfo) -- [Redshift](https://docs.getwren.ai/oss/wren_engine_api#tag/RedshiftConnectionInfo) -- [Apache Spark](https://docs.getwren.ai/oss/wren_engine_api#tag/SparkConnectionInfo) - -## 😫 Challenge Today +

+ with_wren_engine +

-At the enterprise level, the stakes - and the complexity - are much higher. Businesses run on structured data stored in cloud warehouses, relational databases, and secure filesystems. From BI dashboards to CRM updates and compliance workflows, AI must not only execute commands but also **understand and retrieve the right data, with precision and in context**. +## Why Wren Engine -While many community and official MCP servers already support connections to major databases like PostgreSQL, MySQL, SQL Server, and more, there's a problem: **raw access to data isn't enough**. +AI agents can already call tools, browse docs, and write code. What they still struggle with is business context. -Enterprises need: -- Accurate semantic understanding of their data models -- Trusted calculations and aggregations in reporting -- Clarity on business terms, like "active customer," "net revenue," or "churn rate" -- User-based permissions and access control +Enterprise data is not just rows in a warehouse. It is definitions, metrics, relationships, permissions, lineage, and intent. An agent that can connect to PostgreSQL or Snowflake still does not know what "net revenue", "active customer", or "pipeline coverage" actually mean in your company.

without_wren_engine

-Natural language alone isn't enough to drive complex workflows across enterprise data systems. You need a layer that interprets intent, maps it to the correct data, applies calculations accurately, and ensures security. +Wren Engine exists to solve that gap. -## 🎯 Our Mission +It gives AI agents a semantic layer they can reason over, so they can: -Wren Engine is on a mission to power the future of MCP clients and AI agents through the Model Context Protocol (MCP) — a new open standard that connects LLMs with tools, databases, and enterprise systems. +- understand models instead of raw tables +- use trusted metrics instead of inventing SQL +- follow relationships instead of guessing joins +- respect governance instead of bypassing it +- turn natural language into accurate, explainable data access -As part of the MCP ecosystem, Wren Engine provides a **semantic engine** powered the next generation semantic layer that enables AI agents to access business data with accuracy, context, and governance. +This is the open source context engine for teams building the next generation of agent experiences. -By building the semantic layer directly into MCP clients, such as Claude, Cline, Cursor, etc. Wren Engine empowers AI Agents with precise business context and ensures accurate data interactions across diverse enterprise environments. +## The Vision -We believe the future of enterprise AI lies in **context-aware, composable systems**. That’s why Wren Engine is designed to be: +We believe the future of AI is not tool calling alone. It is context-rich systems where agents can reason, retrieve, plan, and act on top of a shared understanding of business reality. -- 🔌 **Embeddable** into any MCP client or AI agentic workflow -- 🔄 **Interoperable** with modern data stacks (PostgreSQL, MySQL, Snowflake, etc.) -- 🧠 **Semantic-first**, enabling AI to “understand” your data model and business logic -- 🔐 **Governance-ready**, respecting roles, access controls, and definitions +Wren Engine is our open source contribution to that future. -

- with_wren_engine -

+It is the semantic and execution foundation beneath Wren AI, and it is designed to be useful well beyond a single product: -With Wren Engine, you can scale AI adoption across teams — not just with better automation, but with better understanding. +- embedded in MCP servers and agent workflows +- connected to modern warehouses, databases, and file systems +- expressive enough to model business meaning through MDL +- robust enough to support governed enterprise use cases +- open enough for the community to extend, integrate, and build on -***Check our full article*** +If Wren AI is the full vision, Wren Engine is the open core that makes that vision interoperable. -🤩 [Our Mission - Fueling the Next Wave of AI Agents: Building the Foundation for Future MCP Clients and Enterprise Data Access](https://getwren.ai/post/fueling-the-next-wave-of-ai-agents-building-the-foundation-for-future-mcp-clients-and-enterprise-data-access) +## What Wren Engine Does -## 🚀 Get Started with MCP -[MCP Server README](mcp-server/README.md) +Wren Engine turns business data into agent-usable context. -https://github.com/user-attachments/assets/dab9b50f-70d7-4eb3-8fc8-2ab55dc7d2ec +At a high level: +1. You describe your business domain with Wren's semantic model and MDL. +2. Wren Engine analyzes intent, models, relationships, and access rules. +3. It plans and generates correct queries across your underlying data sources. +4. MCP clients and AI agents interact with that context through a clean interface. -👉 Blog Post Tutorial: [Powering AI-driven workflows with Wren Engine and Zapier via the Model Context Protocol (MCP)](https://getwren.ai/post/powering-ai-driven-workflows-with-wren-engine-and-zapier-via-the-model-context-protocol-mcp?utm_campaign=10904457-MCP&utm_content=330804773&utm_medium=social&utm_source=linkedin&hss_channel=lcp-89794921) +That means your agent is no longer asking, "Which raw table should I query?" -## 🤔 Concepts +It is asking, "Which business concept, metric, or governed slice of context do I need to answer this task correctly?" + +## Built For Agent Builders + +Wren Engine is especially useful for the open source community building agent-native workflows in tools like: + +- OpenClaw +- Cloud Code +- VS Code +- Claude Desktop +- Cline +- Cursor + +If your environment can speak MCP, call HTTP APIs, or embed a semantic service, Wren Engine can become the context layer behind your agent. + +Use it to power experiences like: + +- natural-language analytics with trusted business definitions +- AI copilots that can answer questions across governed enterprise data +- agents that generate dashboards, reports, and workflow decisions +- code assistants that need real business context, not just schema dumps +- internal AI tools that should be grounded in semantic models instead of ad hoc SQL + +## Why Open Source + +We think agent infrastructure should be composable. + +The world does not need one more closed box that only works in one UI, one cloud, or one workflow. It needs shared infrastructure that developers can inspect, extend, self-host, and integrate anywhere. + +Wren Engine is open source so the community can: + +- run it locally or in their own stack +- connect it to their preferred MCP client or IDE +- contribute connectors, optimizations, and semantic capabilities +- build opinionated agent products on a transparent foundation +- help define what a real context layer for AI should look like + +## Architecture At A Glance + +```text +User / Agent + -> MCP Client or App (OpenClaw, Cloud Code, VS Code, Claude Desktop, Cline, Cursor, etc.) + -> Wren MCP Server or HTTP API + -> Wren Engine semantic layer + -> Query planning and optimization + -> Your warehouse, database, or file-backed data source +``` + +Core ideas: + +- `MDL` captures business meaning, not just physical schema +- `wren-core` performs semantic analysis and query planning in Rust +- `ibis-server` provides the execution and connector-facing API layer +- `mcp-server` makes Wren easy to use from MCP-compatible agents + +## Repository Map + +This repository contains the core engine modules: + +| Module | What it does | +| --- | --- | +| [`wren-core`](./wren-core) | Rust semantic engine powered by Apache DataFusion for MDL analysis, planning, and optimization | +| [`wren-core-base`](./wren-core-base) | Shared manifest and modeling types | +| [`wren-core-py`](./wren-core-py) | PyO3 bindings that expose the engine to Python | +| [`ibis-server`](./ibis-server/) | FastAPI server for query execution, validation, metadata, and connectors | +| [`mcp-server`](./mcp-server/) | MCP server for AI agents and MCP-compatible clients | + +Supporting modules include `wren-core-legacy`, `example`, `mock-web-server`, and benchmarking utilities. + +## Supported Data Sources + +Wren Engine is built to work across modern data stacks, including warehouses, databases, and file-based sources. + +Current open source support includes connectors such as: + +- Amazon S3 +- Apache Spark +- Athena +- BigQuery +- Databricks +- DuckDB +- Google Cloud Storage +- Local files +- MinIO +- MySQL +- Oracle +- PostgreSQL +- SQL Server +- Snowflake +- Trino +- Redshift + +See the connector API docs in the project documentation for the latest connection schemas and capabilities. + +## Get Started + +### Use Wren through MCP + +If you want to use Wren Engine from an AI agent or MCP-capable IDE, start here: + +- [MCP Server README](./mcp-server/README.md) + +The MCP server includes: + +- a local Web UI for connection and MDL setup +- read-only mode for safer agent usage +- manifest deployment and validation tools +- metadata tools for remote schema discovery + +### Learn the concepts -- [Powering Semantic SQL for AI Agents with Apache DataFusion](https://getwren.ai/post/powering-semantic-sql-for-ai-agents-with-apache-datafusion) - [Quick start with Wren Engine](https://docs.getwren.ai/oss/engine/get_started/quickstart) - [What is semantics?](https://docs.getwren.ai/oss/engine/concept/what_is_semantics) - [What is Modeling Definition Language (MDL)?](https://docs.getwren.ai/oss/engine/concept/what_is_mdl) - [Benefits of Wren Engine with LLMs](https://docs.getwren.ai/oss/engine/concept/benefits_llm) +- [Powering Semantic SQL for AI Agents with Apache DataFusion](https://getwren.ai/post/powering-semantic-sql-for-ai-agents-with-apache-datafusion) + +### Developer entry points + +- [`wren-core/README.md`](./wren-core/README.md) +- [`wren-core-py/README.md`](./wren-core-py/README.md) +- [`ibis-server/README.md`](./ibis-server/README.md) +- [`mcp-server/README.md`](./mcp-server/README.md) + +## Local Development + +Common workflows: + +```bash +# Rust semantic engine +cd wren-core +cargo check --all-targets + +# Python + connector server +cd ibis-server +just install +just dev + +# MCP server +cd mcp-server +# see module README for uv-based setup +``` + +## Project Status -## 🚧 Project Status -Wren Engine is currently in the beta version. The project team is actively working on progress and aiming to release new versions at least biweekly. +Wren Engine is actively evolving in the open. The current focus is to make the semantic layer, execution path, and MCP integration stronger for real-world agent workflows. -## 🛠️ Developer Guides -The project consists of 4 main modules: -1. [ibis-server](./ibis-server/): the Web server of Wren Engine powered by FastAPI and Ibis -2. [wren-core](./wren-core): the semantic core written in Rust powered by [Apache DataFusion](https://github.com/apache/datafusion) -3. [wren-core-py](./wren-core-py): the Python binding for wren-core -4. [mcp-server](./mcp-server/): the MCP server of Wren Engine powered by [MCP Python SDK](https://github.com/modelcontextprotocol/python-sdk) +If you are building with agents today, this is a great time to get involved. -## ⭐️ Community +## Community -- Welcome to our [Discord server](https://discord.gg/5DvshJqG8Z) to give us feedback! -- If there is any issues, please visit [Github Issues](https://github.com/Canner/wren-engine/issues). +- Join our [Discord community](https://discord.gg/5DvshJqG8Z) +- Open a [GitHub issue](https://github.com/Canner/wren-engine/issues) +- Explore [Wren AI](https://github.com/Canner/WrenAI) to see the broader product vision +- Read our mission piece: [Fueling the Next Wave of AI Agents](https://getwren.ai/post/fueling-the-next-wave-of-ai-agents-building-the-foundation-for-future-mcp-clients-and-enterprise-data-access) +Wren Engine is for builders who believe AI needs better context, not just better prompts.