Skip to content

feat(wren): CTE-based SQL planning with per-model expansion#1479

Merged
douenergy merged 10 commits intoCanner:mainfrom
goldmedal:feat/wren-cte-base-sql
Mar 26, 2026
Merged

feat(wren): CTE-based SQL planning with per-model expansion#1479
douenergy merged 10 commits intoCanner:mainfrom
goldmedal:feat/wren-cte-base-sql

Conversation

@goldmedal
Copy link
Copy Markdown
Contributor

@goldmedal goldmedal commented Mar 25, 2026

Summary

  • Replace whole-query session_context.transform_sql() with a CTE-based rewriter that expands each MDL model independently, then injects results as CTEs into the original SQL
  • Uses sqlglot's qualify_tables + normalize_identifiers + qualify_columns pipeline to fully resolve column references (SELECT *, aliases, correlated subqueries)
  • Each model CTE is parsed from Wren dialect and output in the target data source dialect — eliminates the separate _transpile() step
  • Replace CLI transpile command with dry-plan; remove WrenEngine.transpile(), unify on dry_plan()
  • Add fallback parameter to CTERewriter/WrenEngine so tests can disable silent fallback to the legacy path
  • Fix relationship column filter to handle "relationship": null from extracted manifests

Test plan

  • All 19 unit tests pass (test_cte_rewriter.py + test_engine.py)
  • All tests use fallback=False to ensure CTE path is always exercised
  • Run connector integration tests (just test-connector duckdb, just test-connector postgres)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Breaking Changes

    • CLI command renamed from "transpile" to "dry-plan" (now produces a planning/expanded-SQL output).
  • New Features

    • Model-aware SQL planning: referenced models are expanded into injected CTEs during dry planning.
    • Improved SQL dialect handling for date/time arithmetic so more functions parse correctly.
  • Chores

    • Updated the combined database extras to reference the engine package variant.

goldmedal and others added 2 commits March 25, 2026 21:59
Replace the whole-query `session_context.transform_sql()` approach with a
CTE-based rewriter that:

1. Parses user SQL with sqlglot (source dialect)
2. Uses qualify_tables + normalize_identifiers + qualify_columns to
   resolve all column references (including SELECT *, aliases, and
   correlated subqueries)
3. Calls wren-core transform_sql once per model with a simple
   SELECT col1, col2 FROM model
4. Parses each wren-core output with Wren dialect and injects as CTEs
5. Outputs final SQL in the target data source dialect

This eliminates the need for a separate _transpile() step since dialect
conversion happens naturally via sqlglot AST generation.

Also:
- Replace CLI `transpile` command with `dry-plan`
- Remove WrenEngine.transpile(), unify on dry_plan()
- Add fallback parameter to CTERewriter/WrenEngine for test strictness
- Fix relationship column filter (handle null values from extracted manifests)
- All tests use fallback=False to catch silent CTE path failures

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added the dependencies Pull requests that update a dependency file label Mar 25, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 25, 2026

📝 Walkthrough

Walkthrough

Replaces transpile flow with a dry-plan API, adds CTERewriter to expand MDL model references into injected CTEs using sqlglot, introduces a small Wren sqlglot dialect, updates engine planning to use the rewriter with an optional fallback, updates DuckDB query handling, and adjusts tests.

Changes

Cohort / File(s) Summary
Dependency Management
wren/pyproject.toml
Updated all extra to depend on wren-engine[...] instead of wren[...].
CLI Command Refactor
wren/src/wren/cli.py
Replaced transpile command with dry-plan (transpile(...)dry_plan(...)) and switched engine call to engine.dry_plan.
Core Engine
wren/src/wren/engine.py
Removed transpile() and sqlglot write-dialect handling; added fallback: bool = True kw-only init arg; _plan() now uses CTERewriter and parses input SQL for referenced tables.
SQL Rewriting & Dialect
wren/src/wren/mdl/cte_rewriter.py, wren/src/wren/mdl/wren_dialect.py
Added CTERewriter to expand model references into generated CTEs (with optional fallback) and get_sqlglot_dialect; added Wren sqlglot dialect extensions for date/time functions.
Connector Update
wren/src/wren/connector/duckdb.py
DuckDBConnector.query() now uses to_arrow_table() instead of fetch_arrow_table().
Tests — fixtures & units
wren/tests/... (connectors/*, suite/query.py, unit/test_engine.py, unit/test_cte_rewriter.py)
Updated WrenEngine instantiations to pass fallback=False; replaced transpile tests with dry-plan equivalents; added comprehensive CTERewriter unit tests covering single/multi-model, edge cases, fallback, correlated subqueries, and dialect mapping.

Sequence Diagram(s)

sequenceDiagram
    participant User as User SQL
    participant CLI as CLI (dry-plan)
    participant Engine as WrenEngine
    participant Planner as _plan()
    participant Rewriter as CTERewriter
    participant Session as SessionContext
    participant DB as Connector/DB

    User->>CLI: run dry-plan(sql)
    CLI->>Engine: engine.dry_plan(sql)
    Engine->>Planner: _plan(sql, manifest, data_source)
    Planner->>Rewriter: instantiate(manifest, session, data_source, fallback)
    Planner->>Rewriter: rewrite(sql)
    Rewriter->>Session: transform per-model SELECTs
    Session-->>Rewriter: expanded model SQL
    Rewriter-->>Planner: rewritten SQL with injected CTEs
    Planner->>Engine: planned/executable SQL
    Engine->>DB: query/planned SQL (dry-run or execute)
    DB-->>Engine: result/rows or dry-run output
    Engine-->>CLI: print expanded SQL / results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

core, python

Suggested reviewers

  • douenergy
  • onlyjackfrost

Poem

🐰 I hopped through manifests, stitched CTEs with care,
Turned model names to WITH clauses, sprouted SQL in the air.
Dry-plan hums softly where transpile used to be,
I nibble dialect quirks and leave a tidy tree. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 43.75% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: introduction of CTE-based SQL planning with per-model expansion, which is the core architectural shift in this changeset across multiple files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…CTE rewriter

Models referenced in FROM clause but without specific column references
(e.g. SELECT COUNT(*) FROM model) now generate CTEs with all columns
instead of being treated as "no model references found".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@goldmedal goldmedal force-pushed the feat/wren-cte-base-sql branch from 3985317 to 432988c Compare March 25, 2026 14:06
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@wren/src/wren/mdl/cte_rewriter.py`:
- Around line 141-144: The code only records user CTE names from the root WITH
via _collect_user_cte_names(), so nested WITHs' CTEs (e.g., a local "WITH
customer AS (...)") are not treated as shadowing and _collect_model_columns()
may incorrectly inject a top-level model CTE; update CTE tracking to be
scope-aware by walking the AST and collecting CTE names per scope (not just
root), then change the lookup in the table collection logic (the loop using
qualified.find_all(exp.Table) and the checks comparing table_name against
self.model_dict and user_cte_names) to consult the nearest-enclosing scope's CTE
set; ensure the same fix is applied to the other occurrence referenced (the
block around the other conditional that currently uses the flat user_cte_names
set).
- Around line 104-109: The current guard uses only used_columns and so treats
table-only model queries (e.g., SELECT COUNT(*) FROM "orders") as having no
model refs; update the check that decides fallback so it also considers
table/model references extracted earlier (e.g., any used_models/used_table_refs
or whatever symbol holds referenced models) and only trigger
self.fallback/session_context.transform_sql when there are truly no model
references at all; ensure queries with table-only references follow the CTE
serialization path (the same path taken by rewrite()/CTE generation) instead of
bypassing it.
- Around line 150-171: The current code collects used columns in a set (used:
dict[str, set[str]]) and then sorts them in _build_model_ctes (sorted(columns)),
which reorders SELECT * column output; change the collection to preserve
manifest/order and stop sorting before building CTEs: replace the per-model
value type from set to a list (or OrderedSet-like append-if-missing) where you
append columns in the order discovered, and in _build_model_ctes use the
incoming columns sequence directly (use columns rather than sorted(columns))
when computing resolved and col_list; update references to used, used_columns,
_build_model_ctes, and resolved to reflect the ordered collection.

In `@wren/tests/unit/test_cte_rewriter.py`:
- Around line 88-93: Change the helper _make_rewriter to default fallback to
False by adding a parameter fallback: bool = False, and forward that flag into
the session creation so the returned CTERewriter uses the new planner by
default; e.g. update the call to get_session_context(...) to include/propagate
the fallback argument (or named parameter) so tests can opt in with
_make_rewriter(..., fallback=True) when they need legacy behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7fdbe27a-6976-434e-9d83-9c81e7a47b57

📥 Commits

Reviewing files that changed from the base of the PR and between 0843fa2 and 13b7179.

⛔ Files ignored due to path filters (1)
  • wren/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (11)
  • wren/pyproject.toml
  • wren/src/wren/cli.py
  • wren/src/wren/connector/duckdb.py
  • wren/src/wren/engine.py
  • wren/src/wren/mdl/cte_rewriter.py
  • wren/src/wren/mdl/wren_dialect.py
  • wren/tests/connectors/test_duckdb.py
  • wren/tests/connectors/test_postgres.py
  • wren/tests/suite/query.py
  • wren/tests/unit/test_cte_rewriter.py
  • wren/tests/unit/test_engine.py

The fallback path (session_context.transform_sql) outputs Wren dialect,
which needs to be transpiled to the target data source dialect before
returning — matching the CTE path's behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
wren/src/wren/mdl/cte_rewriter.py (2)

122-123: ⚠️ Potential issue | 🟠 Major

Preserve discovered column order; set + sorted() can change SELECT * schema order.

Line 153 stores columns in a set, and Line 174 sorts them. That can reorder exposed columns for star queries, changing visible output schema order.

Suggested fix (ordered de-dup instead of set)
- ) -> dict[str, set[str]]:
-     """Return ``{model_name: {col1, col2, ...}}`` for all referenced models.
+ ) -> dict[str, list[str]]:
+     """Return ``{model_name: [col1, col2, ...]}`` for all referenced models.
...
- used: dict[str, set[str]] = {m: set() for m in alias_to_model.values()}
+ used: dict[str, list[str]] = {m: [] for m in alias_to_model.values()}
+ seen: dict[str, set[str]] = {m: set() for m in alias_to_model.values()}
...
            if model_name:
-               used[model_name].add(col.name)
+               if col.name not in seen[model_name]:
+                   seen[model_name].add(col.name)
+                   used[model_name].append(col.name)
- def _build_model_ctes(self, used_columns: dict[str, set[str]]) -> list[exp.CTE]:
+ def _build_model_ctes(self, used_columns: dict[str, list[str]]) -> list[exp.CTE]:
...
-                resolved = [orig.get(c, c) for c in sorted(columns)]
+                resolved = [orig.get(c, c) for c in columns]

Also applies to: 153-160, 168-175

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/mdl/cte_rewriter.py` around lines 122 - 123, The code currently
accumulates columns into a set and later sorts them (the docstring shows Return
``{model_name: {col1, col2, ...}}``), which loses and reorders the discovered
column order; change the implementation to preserve discovery order by
collecting columns in an ordered container (e.g., use a list and de-duplicate
with a local seen set or use dict.fromkeys) instead of a set, remove the final
sorted() step, and update the return shape to map model_name to an ordered
sequence (e.g., dict[str, list[str]]) so SELECT * retains the original schema
order; adjust any callers expecting set[str] accordingly.

101-102: ⚠️ Potential issue | 🟠 Major

Make CTE shadowing scope-aware; root-only tracking is still incorrect.

Line 223 only reads the root WITH, but Line 144 uses that flat set to exclude model names globally. A nested WITH <model_name> AS (...) can still be mistaken for an MDL model and trigger wrong top-level CTE injection.

Suggested direction
- user_cte_names = self._collect_user_cte_names(ast)
- used_columns = self._collect_model_columns(ast, user_cte_names)
+ used_columns = self._collect_model_columns(ast)

- def _collect_model_columns(
-     self, ast: exp.Expression, user_cte_names: set[str]
- ) -> dict[str, set[str]]:
+ def _collect_model_columns(self, ast: exp.Expression) -> dict[str, set[str]]:
    ...
    for table in qualified.find_all(exp.Table):
        table_name = table.name
-       if table_name not in self.model_dict or table_name in user_cte_names:
+       if table_name not in self.model_dict or self._is_shadowed_by_cte_scope(table, table_name):
            continue
+ `@staticmethod`
+ def _is_shadowed_by_cte_scope(table: exp.Table, table_name: str) -> bool:
+     node: exp.Expression | None = table
+     target = table_name.lower()
+     while node is not None:
+         with_clause = node.args.get("with_")
+         if with_clause:
+             for cte in with_clause.expressions:
+                 alias = cte.args.get("alias")
+                 if alias and getattr(alias.this, "name", str(alias.this)).lower() == target:
+                     return True
+         node = node.parent
+     return False

Also applies to: 120-145, 219-233

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/mdl/cte_rewriter.py` around lines 101 - 102, The current logic
flattens CTE names by calling self._collect_user_cte_names(ast) and then passing
that global set into self._collect_model_columns(ast, user_cte_names), which
misses nested WITH shadowing; change the implementation so CTE collection is
scope-aware and the correct per-scope CTE set is used when examining a given
query block. Concretely: update _collect_user_cte_names to accept an AST node
(or add a new helper) that walks INTO each WITH block and returns CTE names
visible at that scope, and modify callers (the place that currently calls
_collect_user_cte_names(ast) and _collect_model_columns(ast, ...)) to compute
and pass the scope-specific CTE set to _collect_model_columns (or call
_collect_model_columns per query block with that node-specific set) so nested
WITH <name> correctly shadows MDL model names; ensure symbols
_collect_user_cte_names and _collect_model_columns are the ones updated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@wren/src/wren/mdl/cte_rewriter.py`:
- Around line 122-123: The code currently accumulates columns into a set and
later sorts them (the docstring shows Return ``{model_name: {col1, col2,
...}}``), which loses and reorders the discovered column order; change the
implementation to preserve discovery order by collecting columns in an ordered
container (e.g., use a list and de-duplicate with a local seen set or use
dict.fromkeys) instead of a set, remove the final sorted() step, and update the
return shape to map model_name to an ordered sequence (e.g., dict[str,
list[str]]) so SELECT * retains the original schema order; adjust any callers
expecting set[str] accordingly.
- Around line 101-102: The current logic flattens CTE names by calling
self._collect_user_cte_names(ast) and then passing that global set into
self._collect_model_columns(ast, user_cte_names), which misses nested WITH
shadowing; change the implementation so CTE collection is scope-aware and the
correct per-scope CTE set is used when examining a given query block.
Concretely: update _collect_user_cte_names to accept an AST node (or add a new
helper) that walks INTO each WITH block and returns CTE names visible at that
scope, and modify callers (the place that currently calls
_collect_user_cte_names(ast) and _collect_model_columns(ast, ...)) to compute
and pass the scope-specific CTE set to _collect_model_columns (or call
_collect_model_columns per query block with that node-specific set) so nested
WITH <name> correctly shadows MDL model names; ensure symbols
_collect_user_cte_names and _collect_model_columns are the ones updated.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ddcf8c82-6e8f-4d1a-9fac-d934be809133

📥 Commits

Reviewing files that changed from the base of the PR and between 13b7179 and 1393bd8.

📒 Files selected for processing (1)
  • wren/src/wren/mdl/cte_rewriter.py

goldmedal and others added 5 commits March 25, 2026 22:27
…traction

The input SQL is in the target dialect (e.g. Postgres), not DataFusion.
Use sqlglot to parse and extract table names for manifest extraction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…er, test strictness

- Collect user CTE names from all scopes (find_all(exp.With)), not just
  root, so nested WITH clauses properly shadow model names
- Preserve manifest column order in CTE generation instead of sorting
  alphabetically — fixes SELECT * column order
- Default test helper _make_rewriter to fallback=False so tests exercise
  the CTE path strictly

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verifies that a WITH clause inside a subquery correctly shadows the
model name, preventing a model CTE from being generated for it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…servation

- Nested CTE in subquery correctly shadows model name
- WITH RECURSIVE keyword preserved after model CTE injection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
wren/src/wren/engine.py (1)

163-172: ⚠️ Potential issue | 🟠 Major

Preserve manifest/planner failures instead of masking them as SQL errors.

This path now swallows every extractor failure by reverting to the full manifest, and anything raised by get_session_context() / CTERewriter() is then funneled into the generic planning wrapper. A malformed manifest or extractor regression will surface as bad user SQL instead of keeping its real code/phase.

Also applies to: 181-187

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/engine.py` around lines 163 - 172, Current broad except
Exception around manifest extraction masks manifest/extractor/planner failures
as SQL errors; instead, narrow the try/except so only SQL parsing failures are
handled locally and all other exceptions (from get_manifest_extractor,
extractor.extract_by, to_json_base64, get_session_context, CTERewriter, etc.)
are allowed to surface or are re-raised with context. Specifically, replace the
blanket except Exception with targeted except for sql parsing errors from
sqlglot (e.g., sqlglot.errors.ParseError) around parse_one/get_sqlglot_dialect,
and do not swallow exceptions from get_manifest_extractor, extractor.extract_by,
to_json_base64, get_session_context, or CTERewriter — either let them propagate
or wrap them in a clear, phase-specific error so they are not reported as
generic SQL planning errors; apply the same change to the other occurrence
(lines handling effective_manifest later).
♻️ Duplicate comments (1)
wren/src/wren/mdl/cte_rewriter.py (1)

145-146: ⚠️ Potential issue | 🟠 Major

CTE shadowing is still resolved globally.

_collect_user_cte_names() now walks every WITH, but it still flattens them into one set. That means a nested WITH customer AS (...) can suppress rewriting for an unrelated outer FROM customer, because the check at Lines 145-146 has no notion of the nearest enclosing scope. Track CTE names per scope instead of globally.

Also applies to: 221-234

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/mdl/cte_rewriter.py` around lines 145 - 146, The current global
CTE-name set from _collect_user_cte_names() causes nested WITH clauses to shadow
unrelated tables; change the algorithm to track CTE names per scope (e.g.,
maintain a stack/list of sets pushed when entering a WITH node and popped when
exiting) and replace the global membership checks (the conditional using
user_cte_names in the block that contains "if table_name not in self.model_dict
or table_name in user_cte_names: continue") with lookups against the innermost
scope set (or test against the stacked scopes for visibility rules you want).
Update the same pattern applied around the other block (the logic covering lines
221-234) so both places consult the current scope's CTE set rather than a
flattened global set and ensure _collect_user_cte_names() or its caller is
adjusted to build/return scoped sets (or the traversal itself manages the scope
stack).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@wren/src/wren/engine.py`:
- Around line 163-172: Current broad except Exception around manifest extraction
masks manifest/extractor/planner failures as SQL errors; instead, narrow the
try/except so only SQL parsing failures are handled locally and all other
exceptions (from get_manifest_extractor, extractor.extract_by, to_json_base64,
get_session_context, CTERewriter, etc.) are allowed to surface or are re-raised
with context. Specifically, replace the blanket except Exception with targeted
except for sql parsing errors from sqlglot (e.g., sqlglot.errors.ParseError)
around parse_one/get_sqlglot_dialect, and do not swallow exceptions from
get_manifest_extractor, extractor.extract_by, to_json_base64,
get_session_context, or CTERewriter — either let them propagate or wrap them in
a clear, phase-specific error so they are not reported as generic SQL planning
errors; apply the same change to the other occurrence (lines handling
effective_manifest later).

---

Duplicate comments:
In `@wren/src/wren/mdl/cte_rewriter.py`:
- Around line 145-146: The current global CTE-name set from
_collect_user_cte_names() causes nested WITH clauses to shadow unrelated tables;
change the algorithm to track CTE names per scope (e.g., maintain a stack/list
of sets pushed when entering a WITH node and popped when exiting) and replace
the global membership checks (the conditional using user_cte_names in the block
that contains "if table_name not in self.model_dict or table_name in
user_cte_names: continue") with lookups against the innermost scope set (or test
against the stacked scopes for visibility rules you want). Update the same
pattern applied around the other block (the logic covering lines 221-234) so
both places consult the current scope's CTE set rather than a flattened global
set and ensure _collect_user_cte_names() or its caller is adjusted to
build/return scoped sets (or the traversal itself manages the scope stack).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 647bef17-8a8e-4853-9563-5c1d44aa0489

📥 Commits

Reviewing files that changed from the base of the PR and between 1393bd8 and 6cfbb1a.

📒 Files selected for processing (3)
  • wren/src/wren/engine.py
  • wren/src/wren/mdl/cte_rewriter.py
  • wren/tests/unit/test_cte_rewriter.py
✅ Files skipped from review due to trivial changes (1)
  • wren/tests/unit/test_cte_rewriter.py

@goldmedal goldmedal requested a review from douenergy March 26, 2026 01:56
@douenergy
Copy link
Copy Markdown
Contributor

douenergy commented Mar 26, 2026

Given the following SQL:

 WITH Orders AS (SELECT 1 AS o_orderkey, 'OPEN' AS o_orderstatus)
  SELECT * FROM Orders

The current rewrite process generates a duplicate CTE named Orders.
This should not happen — we expect at most one CTE with the same name (case-insensitive).

We should address this issue in a follow-up PR.
Below is a test snippet that reproduces and validates the problem:

rw = _make_rewriter(_SINGLE_MODEL_MANIFEST, fallback=True)
result = rw.rewrite(
    "WITH Orders AS (SELECT 1 AS o_orderkey, 'OPEN' AS o_orderstatus) "
    "SELECT * FROM Orders"
)
print("\n--- generated SQL ---")
print(result)
ast = sqlglot.parse_one(result, dialect="duckdb")
with_clause = ast.args.get("with_")
if with_clause:
    cte_names = [
        cte.args["alias"].this.name
        for cte in with_clause.expressions
        if cte.args.get("alias")
    ]
    print("--- CTE names in output ---")
    for name in cte_names:
        print(f"  {repr(name)}")
    orders_count = sum(1 for n in cte_names if n.lower() == "orders")
    print(f"--- 'orders' count (case-insensitive): {orders_count} ---")
    assert orders_count <= 1, (
        f"Duplicate 'orders' CTE (case-insensitive) in output:\n{result}"
    )

@goldmedal

@douenergy douenergy merged commit fca2b11 into Canner:main Mar 26, 2026
10 checks passed
@goldmedal goldmedal deleted the feat/wren-cte-base-sql branch March 26, 2026 09:15
douenergy added a commit to douenergy/wren-engine that referenced this pull request Mar 26, 2026
…spile→dry-plan

- Rename default conn file from conn.json to connection_info.json to match
  mcp-server's ~/.wren/connection_info.json default
- Rename `wren transpile` CLI command to `wren dry-plan` (aligns with
  PR Canner#1479 which removed WrenEngine.transpile() in favour of dry_plan())
- Update all references in cli.py, README.md, docs/cli.md,
  docs/connections.md, and .claude/skills/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
douenergy added a commit to douenergy/wren-engine that referenced this pull request Mar 26, 2026
…spile→dry-plan

- Rename default conn file from conn.json to connection_info.json to match
  mcp-server's ~/.wren/connection_info.json default
- Rename `wren transpile` CLI command to `wren dry-plan` (aligns with
  PR Canner#1479 which removed WrenEngine.transpile() in favour of dry_plan())
- Update all references in cli.py, README.md, docs/cli.md,
  docs/connections.md, and .claude/skills/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
goldmedal added a commit to goldmedal/wren-engine that referenced this pull request Mar 30, 2026
_collect_user_cte_names() collected raw CTE names (e.g. "Orders")
while _collect_model_columns() compared against normalized/lowercased
table names ("orders"). The case-sensitive check missed the shadow,
generating a duplicate CTE. Lowercase names on collection to match
sqlglot's normalize_identifiers behavior.

Closes Canner#1479 (comment)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants