feat(wren): add LanceDB-backed memory layer for schema and query retrieval by goldmedal · Pull Request #1494 · Canner/wren-engine

goldmedal · 2026-03-31T01:56:49Z

Summary

Add wren.memory module with LanceDB-backed schema and query memory for the standalone wren CLI
Hybrid retrieval strategy: small schemas returned as full plain text, large schemas use embedding search (paraphrase-multilingual-MiniLM-L12-v2, local)
7 CLI commands: index, fetch, describe, store, recall, status, reset
Agent skill (cli-skills/wren-usage/) with decision-driven workflow guide teaching agents when to fetch context, store confirmed queries, and handle errors
36 unit + integration tests, all passing locally
Optional dependency: pip install wren-engine[memory]

Key design decisions

Hybrid context strategy: wren memory fetch auto-selects full text vs embedding search based on a 30K-char threshold (~8K tokens). Full text gives better LLM results for small schemas; embedding search scales for large ones.
No auto-store: agents store NL-to-SQL pairs only after user confirmation — query success alone does not imply correctness.
Single fetch command instead of separate context + search — reduces agent confusion from two similar commands.
tableReference excluded from describe output — users query model names, not physical tables.

`cli-skills/` placement

Agent skills for the wren CLI are placed under cli-skills/ temporarily to avoid polluting the skills/ directory, which contains the already-released MCP server skills. Once the standalone CLI is officially released, cli-skills/ will be merged into skills/.

CI note

Memory integration tests (13 tests using LanceDB + sentence-transformers) are skipped in CI intentionally. The wren[memory] extras pull in torch (~200MB) and a model download (~80MB), which would add ~4 minutes to every CI run. Since:

The 23 pure function tests (schema_indexer, describe_schema, manifest_hash) run in CI and cover the most frequently changed code
LanceDB integration is stable and unlikely to break without a version bump
Local testing is easy via just install-memory && just test-memory

We keep CI lean and rely on local testing for the embedding/LanceDB layer.

Test plan

23 pure function tests in CI (schema_indexer, describe_schema, manifest_hash)
13 LanceDB integration tests locally via just test-memory (index, fetch, store, recall, status, reset, hybrid strategy)
Manual: wren memory index && wren memory fetch -q "customer orders" against a real MDL
Manual: verify wren memory describe output excludes tableReference

Summary by CodeRabbit

New Features
- Added a "memory" CLI group for semantic schema memory: index, describe, fetch (hybrid full vs embedding), recall, store, status, and reset; optional path/output flags and filters.
Documentation
- New end-to-end usage guides, decision trees, error-recovery steps, indexing/store/recall rules, and README/CLI docs for memory workflows.
Chores
- New optional "memory" install extra to enable LanceDB-based semantic memory.
Tests
- Added unit and integration tests covering indexing, search strategies, persistence, and CLI behaviors.

coderabbitai · 2026-03-31T01:57:04Z

📝 Walkthrough

Walkthrough

Adds a LanceDB-backed semantic memory subsystem: WrenMemory API, MemoryStore, schema indexing/describe/fetch (hybrid full vs embedding), NL→SQL store/recall, CLI subcommands, docs, tests, and an optional memory extra.

Changes

Cohort / File(s)	Summary
Documentation – Usage & Reference `cli-skills/wren-usage/SKILL.md`, `cli-skills/wren-usage/references/memory.md`	New end-to-end usage and reference docs covering configuration, workflows, error recovery, decision logic, and `wren memory` command behaviors.
Documentation – CLI & Installation `wren/README.md`, `wren/docs/cli.md`	Documented new `memory` extra and added `wren memory` CLI section (index/describe/fetch/store/recall/status/reset) and shared `--path` flag.
Packaging / Extras `wren/pyproject.toml`	Added optional extra `memory = ["lancedb>=0.6","sentence-transformers>=2.2"]`; updated `all` extra to include `wren-engine[...]` and added `spark` and `memory`.
Public API & Core Memory `wren/src/wren/memory/__init__.py`, `wren/src/wren/memory/store.py`, `wren/src/wren/memory/schema_indexer.py`, `wren/src/wren/memory/embeddings.py`	New `WrenMemory` class and LanceDB-backed `MemoryStore`; manifest hashing, schema extraction/describe, embedding abstraction, schema_items and query_history tables, indexing/search, store/recall, status, and reset.
CLI Integration `wren/src/wren/cli.py`, `wren/src/wren/memory/cli.py`	Runtime-optional registration of `memory` Typer sub-app (imports guarded); `memory_app` commands and CLI helpers for manifest loading, store construction, output formatting, and confirmation flow.
Tests & Build Targets `wren/tests/unit/test_memory.py`, `wren/justfile`	New unit/integration tests for indexer, store, and WrenMemory (skipped if extras missing); added `install-memory` and `test-memory` just targets.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant CLI as Wren CLI
    participant Memory as WrenMemory / MemoryStore
    participant Schema as Schema Indexer
    participant Emb as Embeddings
    participant DB as LanceDB

    User->>CLI: wren memory index --mdl manifest.json
    CLI->>Schema: extract_schema_items(manifest)
    Schema-->>CLI: schema items
    CLI->>Emb: compute_source_embeddings(texts)
    Emb-->>DB: store vectors
    DB-->>CLI: indexing complete

    User->>CLI: wren memory fetch --query "..."
    CLI->>Memory: get_context(manifest, query, ...)
    alt description length ≤ threshold
        Memory->>Schema: describe_schema(manifest)
        Schema-->>Memory: full schema text
        Memory-->>CLI: strategy=full + text
    else
        Memory->>Emb: compute_query_embeddings(query)
        Emb-->>DB: vector search
        DB-->>Memory: matching items
        Memory-->>CLI: strategy=search + items
    end

    User->>CLI: wren memory store --nl "..." --sql "..."
    CLI->>Emb: compute_query_embeddings(nl)
    Emb-->>DB: store query embedding in query_history
    DB-->>CLI: stored

    User->>CLI: wren memory recall --query "similar"
    CLI->>Emb: compute_query_embeddings(query)
    Emb-->>DB: similarity search in query_history
    DB-->>CLI: matching past queries

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat(wren): extend standalone CLI with MySQL support and auto-discovery #1476: touches wren/src/wren/cli.py (runtime CLI hooks/subapp registration may interact).
feat(wren): CTE-based SQL planning with per-model expansion #1479: modifies pyproject extras (overlaps changes to all extra and dependency layout).

Suggested labels

core, python

Suggested reviewers

douenergy

Poem

🐰
I hopped through manifests, columns, and view,
Indexed the meadow so queries find you.
With LanceDB seeds and embeddings in tow,
Memory recalls what the users will know.
Hooray—now schema gardens merrily grow!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.45% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely describes the main change: adding a LanceDB-backed memory layer to the wren CLI for schema and query retrieval.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Introduce wren.memory with hybrid retrieval strategy: small schemas are returned as full plain text for LLM context, large schemas use embedding search (paraphrase-multilingual-MiniLM-L12-v2, local). - schema_indexer: extract models/columns/relationships/views into indexable records, plus describe_schema() for plain-text output - store: LanceDB CRUD with get_context() hybrid auto-selection based on a 30K-char threshold (~8K tokens) - embeddings: local sentence-transformers via LanceDB registry - WrenMemory public API for programmatic use - 37 unit + integration tests - Optional dependency: pip install wren-engine[memory] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Register wren memory sub-app with commands: index, describe, context, search, store, recall, status, reset. Gracefully no-ops when wren[memory] extra is not installed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add wren memory section to cli.md with hybrid strategy explanation (30K-char threshold, char-based measurement rationale, CJK handling). Document all 8 subcommands with flags and examples. Update README with memory extra installation and quick-start step. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Teach agents when to index schema, search for context, store confirmed NL-to-SQL pairs, and recall past queries. Key principle: the agent stores only after user confirmation — query success alone does not imply correctness. Also registers wren-memory in index.json, versions.json, and adds it to wren-usage delegation table and dependencies. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Separate from the Docker-based MCP skills in skills/, these are for agents using the wren CLI directly. - wren-usage: decision-driven workflow guide covering 4 main flows: answering data questions, error recovery, data source onboarding, and MDL change handling. Teaches agents when/why to run each command, not just how. - wren-memory: copied from skills/wren-memory — memory-specific decisions (when to store, recall, index). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge wren-memory into wren-usage following the agent skills spec: - SKILL.md (138 lines): decision-driven workflows for answering questions, error recovery, onboarding, and MDL changes - references/memory.md: when to index, store, recall, and the hybrid context strategy Removes the separate wren-memory skill — memory decisions are now a reference under wren-usage, not a standalone skill. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Consolidate search_schema into get_context to give agents a single command for schema retrieval. context auto-selects full text vs embedding search, and now accepts --type and --model filters for the search strategy. Removes the standalone search CLI command. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Rename the CLI command from `wren memory context` to `wren memory fetch` for consistency with other verb-based commands (index, store, recall, reset). Python API remains get_context() unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Memory skills are CLI-specific and live in cli-skills/ now. Remove from skills/, index.json, versions.json, and revert wren-usage dependency/delegation references. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Users query model names (SELECT * FROM model_name), not physical tables. Table references are an internal MDL detail that adds noise to the LLM context without helping SQL generation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The lancedb import happens inside MemoryStore.__init__, not at module import time. Move the try/except to wrap the full construction so CI without wren[memory] properly skips instead of erroring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Convenience for local testing of the memory module with LanceDB and sentence-transformers. CI skips these tests since the extras are not installed there. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 11

🧹 Nitpick comments (5)

wren/tests/unit/test_memory.py (1)

167-209: Add a regression check for the removed tableReference output.

The changed contract here is absence, but these tests only assert positive fragments. A negative assertion on the raw table-reference values would lock down the behavior that still needs manual verification.

Suggested test addition

     def test_contains_primary_key(self):
         text = describe_schema(_MANIFEST)
         assert "Primary key: o_orderkey" in text

+    def test_excludes_table_reference(self):
+        text = describe_schema(_MANIFEST)
+        assert "tableReference" not in text
+        assert "test.public.orders" not in text
+        assert "test.public.customer" not in text
+
     def test_empty_manifest(self):
         text = describe_schema({})
         assert text == ""

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@wren/tests/unit/test_memory.py` around lines 167 - 209, Add a negative
regression test to ensure the removed tableReference output is not present: in
test_memory.py near the existing describe_schema(_MANIFEST) assertions (function
describe_schema and fixture/_MANIFEST), add an assertion that the raw table
reference string(s) (e.g., any "tableReference" tokens or specific raw table
identifiers present in the old contract) are not in the returned text (e.g.,
assert "tableReference" not in text or assert "<raw_table_name>" not in text) to
lock down the absence of that output.

cli-skills/wren-usage/SKILL.md (2)

7-7: Consider adding a references field to metadata.

The SKILL.md links to references/memory.md, but the frontmatter metadata doesn't declare this dependency. Adding a references list could help tooling discover related documentation.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cli-skills/wren-usage/SKILL.md` at line 7, The SKILL.md frontmatter only
includes version ("1.0") and should declare linked docs; add a references
metadata field in the frontmatter (e.g., references: - references/memory.md) so
tooling can discover related docs; update the SKILL.md frontmatter near the
existing version key and include any other referenced files in that list (for
example "references/memory.md") to keep metadata and links in sync.

37-38: Clarify "Skip if empty" behavior.

The instruction "Skip if empty" is brief. Consider specifying what "empty" means (no results returned) and what the agent should do instead (proceed directly to Step 3).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cli-skills/wren-usage/SKILL.md` around lines 37 - 38, The line "Use results
as few-shot examples. Skip if empty." is ambiguous; update the SKILL.md text so
"empty" is defined (e.g., results array is null, undefined, or has length 0 / no
results returned) and state the exact fallback behavior: if results are empty,
do not attempt to create few-shot examples and proceed directly to Step 3.
Reference the original sentence ("Use results as few-shot examples. Skip if
empty.") and the target flow ("Step 3") so readers know when to skip.

wren/src/wren/memory/schema_indexer.py (1)

104-113: Consider handling empty models list more explicitly.

The fallback to "?" for missing models (lines 105-106) is fine for display, but if models is empty, the relationship text will show ? → ? which may be confusing. This edge case is unlikely in valid MDL but worth noting.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/memory/schema_indexer.py` around lines 104 - 113, The
relationship rendering currently falls back to "?" for missing models which
yields "? → ?" for empty models; update the logic in schema_indexer.py (the
models / left / right handling and the lines.append for "Relationship: {name}")
to explicitly handle an empty models list—either skip emitting the relationship
entirely or replace the placeholders with a clear token like "<no models
specified>" and optionally emit a warning via the module logger; ensure you
update the code paths that compute left and right and the relationship line so
the output is unambiguous when models is empty.

wren/src/wren/memory/cli.py (1)

100-102: Overly broad exception handling hides errors.

Catching all exceptions with except Exception and silently falling back to plain output could mask real issues (e.g., pandas import errors vs. data formatting errors). Consider catching more specific exceptions or at least logging the error.

♻️ Suggested improvement

-        except Exception:
+        except ImportError:
+            # pandas not installed, fall back to plain output
             for r in results:
                 typer.echo(str(r))

If other pandas errors are expected, you could add logging:

except Exception as e:
    # Log for debugging but don't fail
    # logger.debug(f"Table formatting failed: {e}")
    for r in results:
        typer.echo(str(r))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/memory/cli.py` around lines 100 - 102, The except Exception
block that falls back to printing results is too broad and may hide real errors;
update the try/except in wren/src/wren/memory/cli.py to catch specific expected
errors (e.g., ImportError, pandas-related exceptions, ValueError) or capture the
exception as a variable (e) and log it before falling back, then continue to
iterate over results and call typer.echo(str(r)); reference the current block
handling results and typer.echo to locate and replace the generic except
Exception with a targeted except clause that logs the error (using your module
logger or typer) and then prints results.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cli-skills/wren-usage/references/memory.md`:
- Line 37: The phrasing is inconsistent: replace the incorrect reference to
"context with full strategy" with "fetch with full strategy" so it matches the
actual CLI command; update the sentence that mentions `describe` or `context` to
read `describe` or `fetch` (or say "fetch with full strategy") ensuring the
mention of the "full strategy" variant aligns with the `fetch` command and its
usage.
- Line 7: The header text is inconsistent with the actual CLI command: change
the schema context header from using "context" to match the implemented command
name `wren memory fetch`; update the Markdown heading (the line showing "Schema
context: `context` and `describe`") so it reads "Schema context: `fetch` and
`describe`" or otherwise replace `context` with `fetch` to match the `wren
memory fetch` command referenced elsewhere.

In `@wren/pyproject.toml`:
- Around line 54-56: Update the "all" extras list in pyproject.toml to include
the missing "spark" connector and correct the self-reference to the actual
package name "wren-engine" (not "wren"); specifically modify the all extra entry
(the list assigned to all = [...]) so it contains
"wren-engine[postgres,mysql,bigquery,snowflake,clickhouse,trino,mssql,databricks,redshift,athena,oracle,spark,memory]"
ensuring pip install wren-engine[all] will resolve and include spark.

In `@wren/README.md`:
- Around line 13-14: The README examples use unquoted extras which are
interpreted as shell globs (e.g., the lines showing pip install
wren-engine[memory] and wren-engine[all]); update those two examples to quote
the package specs (wrap wren-engine[memory] and wren-engine[all] in single or
double quotes) so paste-and-run works across shells and match the CLI docs
style.

In `@wren/src/wren/cli.py`:
- Around line 362-367: The current bare except ImportError around importing
memory_app and calling app.add_typer will swallow unexpected import errors;
change it to only silence missing optional dependency errors by catching
ModuleNotFoundError (or inspect the caught ImportError and re-raise unless it's
a ModuleNotFoundError for the optional packages). Concretely, replace the bare
except ImportError with either "except ModuleNotFoundError: pass" or "except
ImportError as e: if not isinstance(e, ModuleNotFoundError): raise" so that
memory_app and app.add_typer still propagate syntax/other import failures while
only suppressing the absent optional extras.

In `@wren/src/wren/memory/cli.py`:
- Around line 84-89: The code currently mutates each result dict in results by
doing r[k] = v.isoformat() (in the loop inside the CLI JSON dump), which can
cause side effects if those dicts are used elsewhere; modify the logic to build
a new list of sanitized dictionaries (e.g., iterate results and for each r
produce a new dict copying keys/values but replacing datetime-like values using
v.isoformat()) and then json.dumps that new list instead of mutating the
original results; locate the loop that iterates "for r in results" and the inner
"for k, v in r.items()" and replace it with creating and serializing a
non-mutating copy.

In `@wren/src/wren/memory/store.py`:
- Around line 89-112: index_schema currently always recreates the _SCHEMA_TABLE
because create_table uses mode="overwrite" regardless of the replace argument;
update index_schema to (1) if items is empty and replace is True, explicitly
drop _SCHEMA_TABLE (using the existing _SCHEMA_TABLE in _table_names(self._db)
check) and return 0, (2) if items is empty and replace is False, return 0
without touching the table, and (3) when items exist, if replace is True
create/overwrite the table as now, but if replace is False and the table already
exists (check via _SCHEMA_TABLE in _table_names(self._db)) append the new items
using the DB API to add/insert rows (e.g., table.add or equivalent) instead of
calling create_table with mode="overwrite"; keep the embedding steps and
attaching of "vector" to each item before storing.
- Around line 176-182: The where clause builds predicate strings by splicing
user inputs (item_type, model_name, datasource) directly into q.where, which
breaks for values containing single quotes; before appending to where_parts (the
block that creates where_parts and calls q.where with " AND ".join(where_parts))
escape single quotes by doubling them (use value.replace("'", "''")) for
item_type and model_name in the shown block and likewise for the datasource
usage later (the similar code around lines 239-240) so all interpolated filter
values are safe when passed to q.where.
- Around line 24-52: The Arrow schemas _schema_items_arrow_schema and
_query_history_arrow_schema are hard-coded to _DEFAULT_DIM (384) which breaks
non-384-dim embedders; update callers (notably index_schema() and store_query())
to compute the actual embedding dimension from the embedding function (or model)
and pass it into these schema constructors (e.g., call
_schema_items_arrow_schema(dim=actual_dim) and
_query_history_arrow_schema(dim=actual_dim)); alternatively, until dimension
negotiation exists, enforce/reject unsupported model_name values in the
constructor so index_schema() and store_query() never build schemas with the
wrong dim.

In `@wren/tests/unit/test_memory.py`:
- Line 71: These three pure schema test classes (e.g., TestManifestHash and the
other two test classes around lines 86 and 166) are missing the pytest marker
used by the justfile filter; add the pytest marker by importing pytest if needed
and decorating each test class with `@pytest.mark.unit` (or apply the marker via
pytest.mark.unit on the class definition) so they are included when running
`just test-unit`; ensure the decorator appears immediately above each class
definition (e.g., above TestManifestHash).
- Around line 216-225: The fixtures memory_store and wren_memory currently catch
all ImportError which hides real import problems; change them to first check for
the required optional extras (test for presence of packages lancedb,
sentence-transformers, pyarrow as appropriate) and call pytest.skip if those
extras are missing, then perform the direct imports of wren.memory.store /
wren.memory.embeddings / wren.memory.schema_indexer (i.e., import MemoryStore,
embeddings, SchemaIndexer) without wrapping in a try/except so any internal
ImportError surfaces as a test failure; ensure references to the fixtures
(memory_store, wren_memory) and the class/function names (MemoryStore,
embeddings, SchemaIndexer) are used to locate and update the code.

---

Nitpick comments:
In `@cli-skills/wren-usage/SKILL.md`:
- Line 7: The SKILL.md frontmatter only includes version ("1.0") and should
declare linked docs; add a references metadata field in the frontmatter (e.g.,
references: - references/memory.md) so tooling can discover related docs; update
the SKILL.md frontmatter near the existing version key and include any other
referenced files in that list (for example "references/memory.md") to keep
metadata and links in sync.
- Around line 37-38: The line "Use results as few-shot examples. Skip if empty."
is ambiguous; update the SKILL.md text so "empty" is defined (e.g., results
array is null, undefined, or has length 0 / no results returned) and state the
exact fallback behavior: if results are empty, do not attempt to create few-shot
examples and proceed directly to Step 3. Reference the original sentence ("Use
results as few-shot examples. Skip if empty.") and the target flow ("Step 3") so
readers know when to skip.

In `@wren/src/wren/memory/cli.py`:
- Around line 100-102: The except Exception block that falls back to printing
results is too broad and may hide real errors; update the try/except in
wren/src/wren/memory/cli.py to catch specific expected errors (e.g.,
ImportError, pandas-related exceptions, ValueError) or capture the exception as
a variable (e) and log it before falling back, then continue to iterate over
results and call typer.echo(str(r)); reference the current block handling
results and typer.echo to locate and replace the generic except Exception with a
targeted except clause that logs the error (using your module logger or typer)
and then prints results.

In `@wren/src/wren/memory/schema_indexer.py`:
- Around line 104-113: The relationship rendering currently falls back to "?"
for missing models which yields "? → ?" for empty models; update the logic in
schema_indexer.py (the models / left / right handling and the lines.append for
"Relationship: {name}") to explicitly handle an empty models list—either skip
emitting the relationship entirely or replace the placeholders with a clear
token like "<no models specified>" and optionally emit a warning via the module
logger; ensure you update the code paths that compute left and right and the
relationship line so the output is unambiguous when models is empty.

In `@wren/tests/unit/test_memory.py`:
- Around line 167-209: Add a negative regression test to ensure the removed
tableReference output is not present: in test_memory.py near the existing
describe_schema(_MANIFEST) assertions (function describe_schema and
fixture/_MANIFEST), add an assertion that the raw table reference string(s)
(e.g., any "tableReference" tokens or specific raw table identifiers present in
the old contract) are not in the returned text (e.g., assert "tableReference"
not in text or assert "<raw_table_name>" not in text) to lock down the absence
of that output.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4a02d04d-09c3-4641-985e-57b6305a4b02

📥 Commits

Reviewing files that changed from the base of the PR and between 3a02b13 and 273f42e.

⛔ Files ignored due to path filters (1)

wren/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (13)

cli-skills/wren-usage/SKILL.md
cli-skills/wren-usage/references/memory.md
wren/README.md
wren/docs/cli.md
wren/justfile
wren/pyproject.toml
wren/src/wren/cli.py
wren/src/wren/memory/__init__.py
wren/src/wren/memory/cli.py
wren/src/wren/memory/embeddings.py
wren/src/wren/memory/schema_indexer.py
wren/src/wren/memory/store.py
wren/tests/unit/test_memory.py

cli-skills/wren-usage/references/memory.md

wren/pyproject.toml

wren/README.md

wren/src/wren/cli.py

wren/src/wren/memory/store.py

wren/tests/unit/test_memory.py

- Fix "context" → "fetch" naming in memory.md reference docs - Fix `all` extra: correct self-reference to wren-engine, add missing spark - Quote pip install extras in README for zsh compatibility - Narrow ImportError catch in cli.py to only missing optional deps - Avoid mutating input dicts during JSON serialization in memory/cli.py - Escape single quotes in LanceDB where-clause values in store.py - Add @pytest.mark.unit to TestManifestHash, TestExtractSchemaItems, TestDescribeSchema - Add regression test asserting tableReference is excluded from describe output Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (4)

wren/src/wren/cli.py (1)
362-370: ⚠️ Potential issue | 🟠 Major

Only suppress known optional-module misses when registering memory_app.

This except ImportError also hides import failures coming from wren.memory.cli or broken installs of its dependencies, and the CLI just loses the command group. Narrow the handler to known optional-module ModuleNotFoundErrors and re-raise everything else.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/cli.py` around lines 362 - 370, Replace the broad "except
ImportError: pass" around the optional imports (lancedb, sentence_transformers,
and from wren.memory.cli import memory_app; app.add_typer(memory_app)) with a
handler that only swallows ModuleNotFoundError for those specific optional
modules: catch ModuleNotFoundError as e and if e.name is one of "lancedb",
"sentence_transformers", or "wren.memory.cli" then ignore, otherwise re-raise
the exception so real import errors in wren.memory.cli or its deps surface.
wren/tests/unit/test_memory.py (1)
225-233: ⚠️ Potential issue | 🟠 Major

These fixtures still hide import regressions by skipping.

The blanket ImportError catches failures in wren.memory.store / wren.memory themselves, so broken imports can show up as skipped tests instead of failures. Check the optional packages first, then import and construct the Wren objects without a broad catch so internal regressions fail the suite.

Also applies to: 328-335
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/tests/unit/test_memory.py` around lines 225 - 233, The fixture
memory_store currently catches ImportError broadly which masks import-time
regressions; change it to first check for the optional dependency (e.g., use
pytest.importorskip("wren.memory", reason=...) or
pytest.importorskip("wren.memory.store")) and only then import MemoryStore and
construct it (call MemoryStore(path=tmp_path)) without wrapping that
import/instantiation in a wide try/except so real import errors in wren.memory
or MemoryStore surface as test failures; update the same pattern for the other
fixture block at lines 328-335.
wren/src/wren/memory/store.py (2)
29-56: ⚠️ Potential issue | 🟠 Major

Custom embedding models still create 384-dim tables.

model_name is configurable, but both Arrow schema builders default to _DEFAULT_DIM and both create_table() call sites use that default. Any embedder that does not return 384 values will break once index_schema() or store_query() writes rows. Please derive the schema dimension from self._embed_fn or reject non-default models until that is supported end-to-end.

Also applies to: 79-90, 112-115, 222-225
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/memory/store.py` around lines 29 - 56, Both Arrow schema
builders (_schema_items_arrow_schema and _query_history_arrow_schema) always use
the hardcoded _DEFAULT_DIM which causes tables to be created for 384-dim only;
update create_table/code paths that call these (including index_schema() and
store_query()) to derive the vector dimension from the embedding function on the
store instance (self._embed_fn) or explicitly validate and reject non-default
embedding models until full support is implemented. Concretely: inspect
self._embed_fn (or its config) to obtain the returned embedding length, pass
that length into _schema_items_arrow_schema and _query_history_arrow_schema when
creating schemas/tables, and add a validation branch in the
create_table/index_schema/store_query call sites to raise a clear error if the
embedding dim cannot be determined or differs from _DEFAULT_DIM. Reference
symbols: _schema_items_arrow_schema, _query_history_arrow_schema, create_table,
index_schema, store_query, and self._embed_fn.
99-117: ⚠️ Potential issue | 🟠 Major

replace=False still overwrites the existing schema index.

create_table(..., mode="overwrite") on Lines 112-117 runs regardless of replace, so the non-replacing path still discards prior rows. The early return on Line 100 also leaves stale schema_items behind when replace=True and the new manifest extracts no items.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/memory/store.py` around lines 99 - 117, The code
unconditionally calls self._db.create_table(..., mode="overwrite") which
overwrites the schema index even when replace is False, and the early return
when items is empty prevents dropping the table when replace is True; fix by:
after items = extract_schema_items(manifest) do not return immediately on empty
items — if replace is True then drop _SCHEMA_TABLE when present via
_table_names/_SCHEMA_TABLE and return 0; otherwise (replace is False) simply
return 0 without touching the DB; only call self._db.create_table when items is
non-empty, and when replace is False ensure you do not use mode="overwrite" (use
append/default) while when replace is True you may create with mode="overwrite"
after dropping the table; reference symbols: extract_schema_items, items,
replace, _SCHEMA_TABLE, _table_names, self._db.drop_table,
self._db.create_table.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@wren/src/wren/memory/cli.py`:
- Around line 82-93: The code currently treats any non-"json" value as the table
branch, so typos silently change output; update the output validation in the
json branch (the block that checks output = output.lower() and then if output ==
"json":) to explicitly reject unsupported values by mirroring the validation in
wren/src/wren/cli.py::_print_result — i.e., allow only the supported output
names (accept "json" and the other expected modes) and for any other value raise
a user-facing error (e.g., via typer.BadParameter or typer.echo + typer.Exit) so
typos no longer fall through to the table renderer. Ensure you reference the
same set of valid outputs and use the same error messaging/exit behavior as
_print_result.
- Around line 62-73: The ImportError handling should only cover importing
MemoryStore and its construction: move the MemoryStore(path=path) call into the
try block inside _get_store so missing optional deps (like lancedb or
sentence_transformers used elsewhere by get_embedding_function or __init__)
raise a typer.Exit with the "wren[memory] extras not installed" message; ensure
the except ImportError only wraps the import/constructor of MemoryStore (not
other module imports) so unrelated ImportError from embeddings or schema_indexer
aren't misreported.

---

Duplicate comments:
In `@wren/src/wren/cli.py`:
- Around line 362-370: Replace the broad "except ImportError: pass" around the
optional imports (lancedb, sentence_transformers, and from wren.memory.cli
import memory_app; app.add_typer(memory_app)) with a handler that only swallows
ModuleNotFoundError for those specific optional modules: catch
ModuleNotFoundError as e and if e.name is one of "lancedb",
"sentence_transformers", or "wren.memory.cli" then ignore, otherwise re-raise
the exception so real import errors in wren.memory.cli or its deps surface.

In `@wren/src/wren/memory/store.py`:
- Around line 29-56: Both Arrow schema builders (_schema_items_arrow_schema and
_query_history_arrow_schema) always use the hardcoded _DEFAULT_DIM which causes
tables to be created for 384-dim only; update create_table/code paths that call
these (including index_schema() and store_query()) to derive the vector
dimension from the embedding function on the store instance (self._embed_fn) or
explicitly validate and reject non-default embedding models until full support
is implemented. Concretely: inspect self._embed_fn (or its config) to obtain the
returned embedding length, pass that length into _schema_items_arrow_schema and
_query_history_arrow_schema when creating schemas/tables, and add a validation
branch in the create_table/index_schema/store_query call sites to raise a clear
error if the embedding dim cannot be determined or differs from _DEFAULT_DIM.
Reference symbols: _schema_items_arrow_schema, _query_history_arrow_schema,
create_table, index_schema, store_query, and self._embed_fn.
- Around line 99-117: The code unconditionally calls self._db.create_table(...,
mode="overwrite") which overwrites the schema index even when replace is False,
and the early return when items is empty prevents dropping the table when
replace is True; fix by: after items = extract_schema_items(manifest) do not
return immediately on empty items — if replace is True then drop _SCHEMA_TABLE
when present via _table_names/_SCHEMA_TABLE and return 0; otherwise (replace is
False) simply return 0 without touching the DB; only call self._db.create_table
when items is non-empty, and when replace is False ensure you do not use
mode="overwrite" (use append/default) while when replace is True you may create
with mode="overwrite" after dropping the table; reference symbols:
extract_schema_items, items, replace, _SCHEMA_TABLE, _table_names,
self._db.drop_table, self._db.create_table.

In `@wren/tests/unit/test_memory.py`:
- Around line 225-233: The fixture memory_store currently catches ImportError
broadly which masks import-time regressions; change it to first check for the
optional dependency (e.g., use pytest.importorskip("wren.memory", reason=...) or
pytest.importorskip("wren.memory.store")) and only then import MemoryStore and
construct it (call MemoryStore(path=tmp_path)) without wrapping that
import/instantiation in a wide try/except so real import errors in wren.memory
or MemoryStore surface as test failures; update the same pattern for the other
fixture block at lines 328-335.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e42fbc34-eb80-4db4-a537-10a031f27c2f

📥 Commits

Reviewing files that changed from the base of the PR and between 273f42e and ebd5699.

⛔ Files ignored due to path filters (1)

wren/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (7)

cli-skills/wren-usage/references/memory.md
wren/README.md
wren/pyproject.toml
wren/src/wren/cli.py
wren/src/wren/memory/cli.py
wren/src/wren/memory/store.py
wren/tests/unit/test_memory.py

✅ Files skipped from review due to trivial changes (2)

wren/README.md
cli-skills/wren-usage/references/memory.md

🚧 Files skipped from review as they are similar to previous changes (1)

wren/pyproject.toml

wren/src/wren/memory/cli.py

- Fix index_schema replace logic: replace=False now appends instead of overwriting; empty items with replace=True drops stale table - Fix embedding dimension mismatch: derive actual dim from model probe at init instead of hardcoding _DEFAULT_DIM in Arrow schemas - Use pytest.importorskip in test fixtures to only skip on genuinely missing extras; internal ImportErrors now surface as failures - Reject unsupported --output format values in memory CLI Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@wren/src/wren/memory/cli.py`:
- Around line 172-183: The CLI always prints a plain "Strategy: ..." line and
ignores the --output flag for the "full" path; fix fetch so it honors the output
option and emits pure JSON when output == "json". In the block that calls
_load_manifest, _get_store and store.get_context, remove the plain
typer.echo("Strategy: ...") and instead, when output == "json", serialize a JSON
object containing at least {"strategy": strategy, "schema": <schema>} for
strategy == "full" or {"strategy": strategy, "results": <results>} for other
strategies (use json.dumps and typer.echo); otherwise preserve the current
human-readable behavior (print schema text for "full" and call
_print_results(manifest_or_results, output) for non-full). Ensure you reference
the result from store.get_context and pass the output flag through to
_print_results so machine consumers receive valid JSON.

In `@wren/src/wren/memory/store.py`:
- Around line 285-288: The reset() method currently iterates all names returned
by _table_names(self._db) and calls self._db.drop_table(name), which can erase
unrelated LanceDB tables under a caller-controlled path; change reset() to only
target the known internal tables (e.g. "schema_items" and "query_history"):
enumerate _table_names(self._db), filter to the allowed set of table names you
want to drop, and call self._db.drop_table(name) only for those names (or
alternatively use an explicit existence check like if "schema_items" in
_table_names(...) before dropping), keeping the rest of the database intact.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e4715376-c7b1-495f-a672-7b8eeb2634b1

📥 Commits

Reviewing files that changed from the base of the PR and between ebd5699 and 5da354c.

📒 Files selected for processing (3)

wren/src/wren/memory/cli.py
wren/src/wren/memory/store.py
wren/tests/unit/test_memory.py

wren/src/wren/memory/cli.py

wren/src/wren/memory/store.py

- Fix fetch --output json: emit valid JSON for both full and search strategies instead of mixing plain text with JSON - Fix _get_store: move MemoryStore construction inside try block and narrow catch to ModuleNotFoundError with name guard - Scope reset() to only drop schema_items and query_history tables instead of all tables in the LanceDB directory - Scope _search_schema by mdl_hash so replace=False with multiple manifests returns only rows from the current manifest - Change schema_is_current to require ALL rows match current hash (was .any(), now .all()) to detect stale mixed-manifest state Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (1)

wren/src/wren/memory/store.py (1)

300-304: Optional micro-refactor: avoid repeated table-name lookups in reset().

Line 303 recomputes _table_names(self._db) on each loop iteration. Cache once to reduce repeated metadata calls and simplify flow.

♻️ Suggested cleanup

 def reset(self) -> None:
     """Drop Wren memory tables."""
-    for name in (_SCHEMA_TABLE, _QUERY_TABLE):
-        if name in _table_names(self._db):
+    existing = set(_table_names(self._db))
+    for name in (_SCHEMA_TABLE, _QUERY_TABLE):
+        if name in existing:
             self._db.drop_table(name)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@wren/src/wren/memory/store.py` around lines 300 - 304, In reset(), avoid
calling _table_names(self._db) twice: call it once (e.g., assign to a local like
table_names = _table_names(self._db)) and then iterate over or check membership
for _SCHEMA_TABLE and _QUERY_TABLE against that cached table_names before
calling self._db.drop_table(name); update references to use the local variable
so metadata is fetched only once and the flow is simpler.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@wren/src/wren/memory/store.py`:
- Around line 300-304: In reset(), avoid calling _table_names(self._db) twice:
call it once (e.g., assign to a local like table_names = _table_names(self._db))
and then iterate over or check membership for _SCHEMA_TABLE and _QUERY_TABLE
against that cached table_names before calling self._db.drop_table(name); update
references to use the local variable so metadata is fetched only once and the
flow is simpler.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 53ee8be5-0d65-4a3e-aa4d-6ce7db5f30da

📥 Commits

Reviewing files that changed from the base of the PR and between 5da354c and 886aa7e.

📒 Files selected for processing (2)

wren/src/wren/memory/cli.py
wren/src/wren/memory/store.py

🚧 Files skipped from review as they are similar to previous changes (1)

wren/src/wren/memory/cli.py

github-actions bot added documentation Improvements or additions to documentation dependencies Pull requests that update a dependency file ci labels Mar 31, 2026

goldmedal and others added 11 commits March 31, 2026 10:00

feat(wren): add CLI commands for memory module

91c5502

Register wren memory sub-app with commands: index, describe, context, search, store, recall, status, reset. Gracefully no-ops when wren[memory] extra is not installed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore(skills): remove wren-memory from MCP skills

d5f80e7

Memory skills are CLI-specific and live in cli-skills/ now. Remove from skills/, index.json, versions.json, and revert wren-usage dependency/delegation references. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

update lock

71e5428

goldmedal force-pushed the feat/wren-memory branch from 5e2262e to 71e5428 Compare March 31, 2026 02:00

goldmedal and others added 2 commits March 31, 2026 10:09

chore(wren): add just install-memory and test-memory scripts

273f42e

Convenience for local testing of the memory module with LanceDB and sentence-transformers. CI skips these tests since the extras are not installed there. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai bot reviewed Mar 31, 2026

View reviewed changes

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

wren/src/wren/memory/cli.py Outdated Show resolved Hide resolved

wren/src/wren/memory/cli.py Show resolved Hide resolved

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

wren/src/wren/memory/cli.py Show resolved Hide resolved

wren/src/wren/memory/store.py Show resolved Hide resolved

wren/src/wren/memory/store.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

goldmedal requested a review from douenergy April 1, 2026 05:19

douenergy approved these changes Apr 1, 2026

View reviewed changes

douenergy merged commit dfdff01 into Canner:main Apr 1, 2026
8 checks passed

This was referenced Apr 2, 2026

feat(memory): seed query_history with canonical NL-SQL pairs on index #1510

Open

feat(wren): add context command group for YAML-based MDL project management #1511

Open

Conversation

goldmedal commented Mar 31, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key design decisions

cli-skills/ placement

CI note

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

goldmedal commented Mar 31, 2026 •

edited by coderabbitai bot

Loading

`cli-skills/` placement

coderabbitai bot commented Mar 31, 2026 •

edited

Loading