feat(wren): add LanceDB-backed memory layer for schema and query retrieval#1494
feat(wren): add LanceDB-backed memory layer for schema and query retrieval#1494douenergy merged 16 commits intoCanner:mainfrom
Conversation
📝 WalkthroughWalkthroughAdds a LanceDB-backed semantic memory subsystem: WrenMemory API, MemoryStore, schema indexing/describe/fetch (hybrid full vs embedding), NL→SQL store/recall, CLI subcommands, docs, tests, and an optional Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant CLI as Wren CLI
participant Memory as WrenMemory / MemoryStore
participant Schema as Schema Indexer
participant Emb as Embeddings
participant DB as LanceDB
User->>CLI: wren memory index --mdl manifest.json
CLI->>Schema: extract_schema_items(manifest)
Schema-->>CLI: schema items
CLI->>Emb: compute_source_embeddings(texts)
Emb-->>DB: store vectors
DB-->>CLI: indexing complete
User->>CLI: wren memory fetch --query "..."
CLI->>Memory: get_context(manifest, query, ...)
alt description length ≤ threshold
Memory->>Schema: describe_schema(manifest)
Schema-->>Memory: full schema text
Memory-->>CLI: strategy=full + text
else
Memory->>Emb: compute_query_embeddings(query)
Emb-->>DB: vector search
DB-->>Memory: matching items
Memory-->>CLI: strategy=search + items
end
User->>CLI: wren memory store --nl "..." --sql "..."
CLI->>Emb: compute_query_embeddings(nl)
Emb-->>DB: store query embedding in query_history
DB-->>CLI: stored
User->>CLI: wren memory recall --query "similar"
CLI->>Emb: compute_query_embeddings(query)
Emb-->>DB: similarity search in query_history
DB-->>CLI: matching past queries
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Introduce wren.memory with hybrid retrieval strategy: small schemas are returned as full plain text for LLM context, large schemas use embedding search (paraphrase-multilingual-MiniLM-L12-v2, local). - schema_indexer: extract models/columns/relationships/views into indexable records, plus describe_schema() for plain-text output - store: LanceDB CRUD with get_context() hybrid auto-selection based on a 30K-char threshold (~8K tokens) - embeddings: local sentence-transformers via LanceDB registry - WrenMemory public API for programmatic use - 37 unit + integration tests - Optional dependency: pip install wren-engine[memory] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Register wren memory sub-app with commands: index, describe, context, search, store, recall, status, reset. Gracefully no-ops when wren[memory] extra is not installed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add wren memory section to cli.md with hybrid strategy explanation (30K-char threshold, char-based measurement rationale, CJK handling). Document all 8 subcommands with flags and examples. Update README with memory extra installation and quick-start step. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Teach agents when to index schema, search for context, store confirmed NL-to-SQL pairs, and recall past queries. Key principle: the agent stores only after user confirmation — query success alone does not imply correctness. Also registers wren-memory in index.json, versions.json, and adds it to wren-usage delegation table and dependencies. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Separate from the Docker-based MCP skills in skills/, these are for agents using the wren CLI directly. - wren-usage: decision-driven workflow guide covering 4 main flows: answering data questions, error recovery, data source onboarding, and MDL change handling. Teaches agents when/why to run each command, not just how. - wren-memory: copied from skills/wren-memory — memory-specific decisions (when to store, recall, index). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merge wren-memory into wren-usage following the agent skills spec: - SKILL.md (138 lines): decision-driven workflows for answering questions, error recovery, onboarding, and MDL changes - references/memory.md: when to index, store, recall, and the hybrid context strategy Removes the separate wren-memory skill — memory decisions are now a reference under wren-usage, not a standalone skill. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Consolidate search_schema into get_context to give agents a single command for schema retrieval. context auto-selects full text vs embedding search, and now accepts --type and --model filters for the search strategy. Removes the standalone search CLI command. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rename the CLI command from `wren memory context` to `wren memory fetch` for consistency with other verb-based commands (index, store, recall, reset). Python API remains get_context() unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Memory skills are CLI-specific and live in cli-skills/ now. Remove from skills/, index.json, versions.json, and revert wren-usage dependency/delegation references. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Users query model names (SELECT * FROM model_name), not physical tables. Table references are an internal MDL detail that adds noise to the LLM context without helping SQL generation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5e2262e to
71e5428
Compare
The lancedb import happens inside MemoryStore.__init__, not at module import time. Move the try/except to wrap the full construction so CI without wren[memory] properly skips instead of erroring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Convenience for local testing of the memory module with LanceDB and sentence-transformers. CI skips these tests since the extras are not installed there. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 11
🧹 Nitpick comments (5)
wren/tests/unit/test_memory.py (1)
167-209: Add a regression check for the removedtableReferenceoutput.The changed contract here is absence, but these tests only assert positive fragments. A negative assertion on the raw table-reference values would lock down the behavior that still needs manual verification.
Suggested test addition
def test_contains_primary_key(self): text = describe_schema(_MANIFEST) assert "Primary key: o_orderkey" in text + def test_excludes_table_reference(self): + text = describe_schema(_MANIFEST) + assert "tableReference" not in text + assert "test.public.orders" not in text + assert "test.public.customer" not in text + def test_empty_manifest(self): text = describe_schema({}) assert text == ""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wren/tests/unit/test_memory.py` around lines 167 - 209, Add a negative regression test to ensure the removed tableReference output is not present: in test_memory.py near the existing describe_schema(_MANIFEST) assertions (function describe_schema and fixture/_MANIFEST), add an assertion that the raw table reference string(s) (e.g., any "tableReference" tokens or specific raw table identifiers present in the old contract) are not in the returned text (e.g., assert "tableReference" not in text or assert "<raw_table_name>" not in text) to lock down the absence of that output.cli-skills/wren-usage/SKILL.md (2)
7-7: Consider adding areferencesfield to metadata.The SKILL.md links to
references/memory.md, but the frontmatter metadata doesn't declare this dependency. Adding areferenceslist could help tooling discover related documentation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cli-skills/wren-usage/SKILL.md` at line 7, The SKILL.md frontmatter only includes version ("1.0") and should declare linked docs; add a references metadata field in the frontmatter (e.g., references: - references/memory.md) so tooling can discover related docs; update the SKILL.md frontmatter near the existing version key and include any other referenced files in that list (for example "references/memory.md") to keep metadata and links in sync.
37-38: Clarify "Skip if empty" behavior.The instruction "Skip if empty" is brief. Consider specifying what "empty" means (no results returned) and what the agent should do instead (proceed directly to Step 3).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cli-skills/wren-usage/SKILL.md` around lines 37 - 38, The line "Use results as few-shot examples. Skip if empty." is ambiguous; update the SKILL.md text so "empty" is defined (e.g., results array is null, undefined, or has length 0 / no results returned) and state the exact fallback behavior: if results are empty, do not attempt to create few-shot examples and proceed directly to Step 3. Reference the original sentence ("Use results as few-shot examples. Skip if empty.") and the target flow ("Step 3") so readers know when to skip.wren/src/wren/memory/schema_indexer.py (1)
104-113: Consider handling emptymodelslist more explicitly.The fallback to
"?"for missing models (lines 105-106) is fine for display, but ifmodelsis empty, the relationship text will show? → ?which may be confusing. This edge case is unlikely in valid MDL but worth noting.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wren/src/wren/memory/schema_indexer.py` around lines 104 - 113, The relationship rendering currently falls back to "?" for missing models which yields "? → ?" for empty models; update the logic in schema_indexer.py (the models / left / right handling and the lines.append for "Relationship: {name}") to explicitly handle an empty models list—either skip emitting the relationship entirely or replace the placeholders with a clear token like "<no models specified>" and optionally emit a warning via the module logger; ensure you update the code paths that compute left and right and the relationship line so the output is unambiguous when models is empty.wren/src/wren/memory/cli.py (1)
100-102: Overly broad exception handling hides errors.Catching all exceptions with
except Exceptionand silently falling back to plain output could mask real issues (e.g., pandas import errors vs. data formatting errors). Consider catching more specific exceptions or at least logging the error.♻️ Suggested improvement
- except Exception: + except ImportError: + # pandas not installed, fall back to plain output for r in results: typer.echo(str(r))If other pandas errors are expected, you could add logging:
except Exception as e: # Log for debugging but don't fail # logger.debug(f"Table formatting failed: {e}") for r in results: typer.echo(str(r))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wren/src/wren/memory/cli.py` around lines 100 - 102, The except Exception block that falls back to printing results is too broad and may hide real errors; update the try/except in wren/src/wren/memory/cli.py to catch specific expected errors (e.g., ImportError, pandas-related exceptions, ValueError) or capture the exception as a variable (e) and log it before falling back, then continue to iterate over results and call typer.echo(str(r)); reference the current block handling results and typer.echo to locate and replace the generic except Exception with a targeted except clause that logs the error (using your module logger or typer) and then prints results.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cli-skills/wren-usage/references/memory.md`:
- Line 37: The phrasing is inconsistent: replace the incorrect reference to
"context with full strategy" with "fetch with full strategy" so it matches the
actual CLI command; update the sentence that mentions `describe` or `context` to
read `describe` or `fetch` (or say "fetch with full strategy") ensuring the
mention of the "full strategy" variant aligns with the `fetch` command and its
usage.
- Line 7: The header text is inconsistent with the actual CLI command: change
the schema context header from using "context" to match the implemented command
name `wren memory fetch`; update the Markdown heading (the line showing "Schema
context: `context` and `describe`") so it reads "Schema context: `fetch` and
`describe`" or otherwise replace `context` with `fetch` to match the `wren
memory fetch` command referenced elsewhere.
In `@wren/pyproject.toml`:
- Around line 54-56: Update the "all" extras list in pyproject.toml to include
the missing "spark" connector and correct the self-reference to the actual
package name "wren-engine" (not "wren"); specifically modify the all extra entry
(the list assigned to all = [...]) so it contains
"wren-engine[postgres,mysql,bigquery,snowflake,clickhouse,trino,mssql,databricks,redshift,athena,oracle,spark,memory]"
ensuring pip install wren-engine[all] will resolve and include spark.
In `@wren/README.md`:
- Around line 13-14: The README examples use unquoted extras which are
interpreted as shell globs (e.g., the lines showing pip install
wren-engine[memory] and wren-engine[all]); update those two examples to quote
the package specs (wrap wren-engine[memory] and wren-engine[all] in single or
double quotes) so paste-and-run works across shells and match the CLI docs
style.
In `@wren/src/wren/cli.py`:
- Around line 362-367: The current bare except ImportError around importing
memory_app and calling app.add_typer will swallow unexpected import errors;
change it to only silence missing optional dependency errors by catching
ModuleNotFoundError (or inspect the caught ImportError and re-raise unless it's
a ModuleNotFoundError for the optional packages). Concretely, replace the bare
except ImportError with either "except ModuleNotFoundError: pass" or "except
ImportError as e: if not isinstance(e, ModuleNotFoundError): raise" so that
memory_app and app.add_typer still propagate syntax/other import failures while
only suppressing the absent optional extras.
In `@wren/src/wren/memory/cli.py`:
- Around line 84-89: The code currently mutates each result dict in results by
doing r[k] = v.isoformat() (in the loop inside the CLI JSON dump), which can
cause side effects if those dicts are used elsewhere; modify the logic to build
a new list of sanitized dictionaries (e.g., iterate results and for each r
produce a new dict copying keys/values but replacing datetime-like values using
v.isoformat()) and then json.dumps that new list instead of mutating the
original results; locate the loop that iterates "for r in results" and the inner
"for k, v in r.items()" and replace it with creating and serializing a
non-mutating copy.
In `@wren/src/wren/memory/store.py`:
- Around line 89-112: index_schema currently always recreates the _SCHEMA_TABLE
because create_table uses mode="overwrite" regardless of the replace argument;
update index_schema to (1) if items is empty and replace is True, explicitly
drop _SCHEMA_TABLE (using the existing _SCHEMA_TABLE in _table_names(self._db)
check) and return 0, (2) if items is empty and replace is False, return 0
without touching the table, and (3) when items exist, if replace is True
create/overwrite the table as now, but if replace is False and the table already
exists (check via _SCHEMA_TABLE in _table_names(self._db)) append the new items
using the DB API to add/insert rows (e.g., table.add or equivalent) instead of
calling create_table with mode="overwrite"; keep the embedding steps and
attaching of "vector" to each item before storing.
- Around line 176-182: The where clause builds predicate strings by splicing
user inputs (item_type, model_name, datasource) directly into q.where, which
breaks for values containing single quotes; before appending to where_parts (the
block that creates where_parts and calls q.where with " AND ".join(where_parts))
escape single quotes by doubling them (use value.replace("'", "''")) for
item_type and model_name in the shown block and likewise for the datasource
usage later (the similar code around lines 239-240) so all interpolated filter
values are safe when passed to q.where.
- Around line 24-52: The Arrow schemas _schema_items_arrow_schema and
_query_history_arrow_schema are hard-coded to _DEFAULT_DIM (384) which breaks
non-384-dim embedders; update callers (notably index_schema() and store_query())
to compute the actual embedding dimension from the embedding function (or model)
and pass it into these schema constructors (e.g., call
_schema_items_arrow_schema(dim=actual_dim) and
_query_history_arrow_schema(dim=actual_dim)); alternatively, until dimension
negotiation exists, enforce/reject unsupported model_name values in the
constructor so index_schema() and store_query() never build schemas with the
wrong dim.
In `@wren/tests/unit/test_memory.py`:
- Line 71: These three pure schema test classes (e.g., TestManifestHash and the
other two test classes around lines 86 and 166) are missing the pytest marker
used by the justfile filter; add the pytest marker by importing pytest if needed
and decorating each test class with `@pytest.mark.unit` (or apply the marker via
pytest.mark.unit on the class definition) so they are included when running
`just test-unit`; ensure the decorator appears immediately above each class
definition (e.g., above TestManifestHash).
- Around line 216-225: The fixtures memory_store and wren_memory currently catch
all ImportError which hides real import problems; change them to first check for
the required optional extras (test for presence of packages lancedb,
sentence-transformers, pyarrow as appropriate) and call pytest.skip if those
extras are missing, then perform the direct imports of wren.memory.store /
wren.memory.embeddings / wren.memory.schema_indexer (i.e., import MemoryStore,
embeddings, SchemaIndexer) without wrapping in a try/except so any internal
ImportError surfaces as a test failure; ensure references to the fixtures
(memory_store, wren_memory) and the class/function names (MemoryStore,
embeddings, SchemaIndexer) are used to locate and update the code.
---
Nitpick comments:
In `@cli-skills/wren-usage/SKILL.md`:
- Line 7: The SKILL.md frontmatter only includes version ("1.0") and should
declare linked docs; add a references metadata field in the frontmatter (e.g.,
references: - references/memory.md) so tooling can discover related docs; update
the SKILL.md frontmatter near the existing version key and include any other
referenced files in that list (for example "references/memory.md") to keep
metadata and links in sync.
- Around line 37-38: The line "Use results as few-shot examples. Skip if empty."
is ambiguous; update the SKILL.md text so "empty" is defined (e.g., results
array is null, undefined, or has length 0 / no results returned) and state the
exact fallback behavior: if results are empty, do not attempt to create few-shot
examples and proceed directly to Step 3. Reference the original sentence ("Use
results as few-shot examples. Skip if empty.") and the target flow ("Step 3") so
readers know when to skip.
In `@wren/src/wren/memory/cli.py`:
- Around line 100-102: The except Exception block that falls back to printing
results is too broad and may hide real errors; update the try/except in
wren/src/wren/memory/cli.py to catch specific expected errors (e.g.,
ImportError, pandas-related exceptions, ValueError) or capture the exception as
a variable (e) and log it before falling back, then continue to iterate over
results and call typer.echo(str(r)); reference the current block handling
results and typer.echo to locate and replace the generic except Exception with a
targeted except clause that logs the error (using your module logger or typer)
and then prints results.
In `@wren/src/wren/memory/schema_indexer.py`:
- Around line 104-113: The relationship rendering currently falls back to "?"
for missing models which yields "? → ?" for empty models; update the logic in
schema_indexer.py (the models / left / right handling and the lines.append for
"Relationship: {name}") to explicitly handle an empty models list—either skip
emitting the relationship entirely or replace the placeholders with a clear
token like "<no models specified>" and optionally emit a warning via the module
logger; ensure you update the code paths that compute left and right and the
relationship line so the output is unambiguous when models is empty.
In `@wren/tests/unit/test_memory.py`:
- Around line 167-209: Add a negative regression test to ensure the removed
tableReference output is not present: in test_memory.py near the existing
describe_schema(_MANIFEST) assertions (function describe_schema and
fixture/_MANIFEST), add an assertion that the raw table reference string(s)
(e.g., any "tableReference" tokens or specific raw table identifiers present in
the old contract) are not in the returned text (e.g., assert "tableReference"
not in text or assert "<raw_table_name>" not in text) to lock down the absence
of that output.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4a02d04d-09c3-4641-985e-57b6305a4b02
⛔ Files ignored due to path filters (1)
wren/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (13)
cli-skills/wren-usage/SKILL.mdcli-skills/wren-usage/references/memory.mdwren/README.mdwren/docs/cli.mdwren/justfilewren/pyproject.tomlwren/src/wren/cli.pywren/src/wren/memory/__init__.pywren/src/wren/memory/cli.pywren/src/wren/memory/embeddings.pywren/src/wren/memory/schema_indexer.pywren/src/wren/memory/store.pywren/tests/unit/test_memory.py
- Fix "context" → "fetch" naming in memory.md reference docs - Fix `all` extra: correct self-reference to wren-engine, add missing spark - Quote pip install extras in README for zsh compatibility - Narrow ImportError catch in cli.py to only missing optional deps - Avoid mutating input dicts during JSON serialization in memory/cli.py - Escape single quotes in LanceDB where-clause values in store.py - Add @pytest.mark.unit to TestManifestHash, TestExtractSchemaItems, TestDescribeSchema - Add regression test asserting tableReference is excluded from describe output Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (4)
wren/src/wren/cli.py (1)
362-370:⚠️ Potential issue | 🟠 MajorOnly suppress known optional-module misses when registering
memory_app.This
except ImportErroralso hides import failures coming fromwren.memory.clior broken installs of its dependencies, and the CLI just loses the command group. Narrow the handler to known optional-moduleModuleNotFoundErrors and re-raise everything else.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wren/src/wren/cli.py` around lines 362 - 370, Replace the broad "except ImportError: pass" around the optional imports (lancedb, sentence_transformers, and from wren.memory.cli import memory_app; app.add_typer(memory_app)) with a handler that only swallows ModuleNotFoundError for those specific optional modules: catch ModuleNotFoundError as e and if e.name is one of "lancedb", "sentence_transformers", or "wren.memory.cli" then ignore, otherwise re-raise the exception so real import errors in wren.memory.cli or its deps surface.wren/tests/unit/test_memory.py (1)
225-233:⚠️ Potential issue | 🟠 MajorThese fixtures still hide import regressions by skipping.
The blanket
ImportErrorcatches failures inwren.memory.store/wren.memorythemselves, so broken imports can show up as skipped tests instead of failures. Check the optional packages first, then import and construct the Wren objects without a broad catch so internal regressions fail the suite.Also applies to: 328-335
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wren/tests/unit/test_memory.py` around lines 225 - 233, The fixture memory_store currently catches ImportError broadly which masks import-time regressions; change it to first check for the optional dependency (e.g., use pytest.importorskip("wren.memory", reason=...) or pytest.importorskip("wren.memory.store")) and only then import MemoryStore and construct it (call MemoryStore(path=tmp_path)) without wrapping that import/instantiation in a wide try/except so real import errors in wren.memory or MemoryStore surface as test failures; update the same pattern for the other fixture block at lines 328-335.wren/src/wren/memory/store.py (2)
29-56:⚠️ Potential issue | 🟠 MajorCustom embedding models still create 384-dim tables.
model_nameis configurable, but both Arrow schema builders default to_DEFAULT_DIMand bothcreate_table()call sites use that default. Any embedder that does not return 384 values will break onceindex_schema()orstore_query()writes rows. Please derive the schema dimension fromself._embed_fnor reject non-default models until that is supported end-to-end.Also applies to: 79-90, 112-115, 222-225
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wren/src/wren/memory/store.py` around lines 29 - 56, Both Arrow schema builders (_schema_items_arrow_schema and _query_history_arrow_schema) always use the hardcoded _DEFAULT_DIM which causes tables to be created for 384-dim only; update create_table/code paths that call these (including index_schema() and store_query()) to derive the vector dimension from the embedding function on the store instance (self._embed_fn) or explicitly validate and reject non-default embedding models until full support is implemented. Concretely: inspect self._embed_fn (or its config) to obtain the returned embedding length, pass that length into _schema_items_arrow_schema and _query_history_arrow_schema when creating schemas/tables, and add a validation branch in the create_table/index_schema/store_query call sites to raise a clear error if the embedding dim cannot be determined or differs from _DEFAULT_DIM. Reference symbols: _schema_items_arrow_schema, _query_history_arrow_schema, create_table, index_schema, store_query, and self._embed_fn.
99-117:⚠️ Potential issue | 🟠 Major
replace=Falsestill overwrites the existing schema index.
create_table(..., mode="overwrite")on Lines 112-117 runs regardless ofreplace, so the non-replacing path still discards prior rows. The early return on Line 100 also leaves staleschema_itemsbehind whenreplace=Trueand the new manifest extracts no items.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wren/src/wren/memory/store.py` around lines 99 - 117, The code unconditionally calls self._db.create_table(..., mode="overwrite") which overwrites the schema index even when replace is False, and the early return when items is empty prevents dropping the table when replace is True; fix by: after items = extract_schema_items(manifest) do not return immediately on empty items — if replace is True then drop _SCHEMA_TABLE when present via _table_names/_SCHEMA_TABLE and return 0; otherwise (replace is False) simply return 0 without touching the DB; only call self._db.create_table when items is non-empty, and when replace is False ensure you do not use mode="overwrite" (use append/default) while when replace is True you may create with mode="overwrite" after dropping the table; reference symbols: extract_schema_items, items, replace, _SCHEMA_TABLE, _table_names, self._db.drop_table, self._db.create_table.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@wren/src/wren/memory/cli.py`:
- Around line 82-93: The code currently treats any non-"json" value as the table
branch, so typos silently change output; update the output validation in the
json branch (the block that checks output = output.lower() and then if output ==
"json":) to explicitly reject unsupported values by mirroring the validation in
wren/src/wren/cli.py::_print_result — i.e., allow only the supported output
names (accept "json" and the other expected modes) and for any other value raise
a user-facing error (e.g., via typer.BadParameter or typer.echo + typer.Exit) so
typos no longer fall through to the table renderer. Ensure you reference the
same set of valid outputs and use the same error messaging/exit behavior as
_print_result.
- Around line 62-73: The ImportError handling should only cover importing
MemoryStore and its construction: move the MemoryStore(path=path) call into the
try block inside _get_store so missing optional deps (like lancedb or
sentence_transformers used elsewhere by get_embedding_function or __init__)
raise a typer.Exit with the "wren[memory] extras not installed" message; ensure
the except ImportError only wraps the import/constructor of MemoryStore (not
other module imports) so unrelated ImportError from embeddings or schema_indexer
aren't misreported.
---
Duplicate comments:
In `@wren/src/wren/cli.py`:
- Around line 362-370: Replace the broad "except ImportError: pass" around the
optional imports (lancedb, sentence_transformers, and from wren.memory.cli
import memory_app; app.add_typer(memory_app)) with a handler that only swallows
ModuleNotFoundError for those specific optional modules: catch
ModuleNotFoundError as e and if e.name is one of "lancedb",
"sentence_transformers", or "wren.memory.cli" then ignore, otherwise re-raise
the exception so real import errors in wren.memory.cli or its deps surface.
In `@wren/src/wren/memory/store.py`:
- Around line 29-56: Both Arrow schema builders (_schema_items_arrow_schema and
_query_history_arrow_schema) always use the hardcoded _DEFAULT_DIM which causes
tables to be created for 384-dim only; update create_table/code paths that call
these (including index_schema() and store_query()) to derive the vector
dimension from the embedding function on the store instance (self._embed_fn) or
explicitly validate and reject non-default embedding models until full support
is implemented. Concretely: inspect self._embed_fn (or its config) to obtain the
returned embedding length, pass that length into _schema_items_arrow_schema and
_query_history_arrow_schema when creating schemas/tables, and add a validation
branch in the create_table/index_schema/store_query call sites to raise a clear
error if the embedding dim cannot be determined or differs from _DEFAULT_DIM.
Reference symbols: _schema_items_arrow_schema, _query_history_arrow_schema,
create_table, index_schema, store_query, and self._embed_fn.
- Around line 99-117: The code unconditionally calls self._db.create_table(...,
mode="overwrite") which overwrites the schema index even when replace is False,
and the early return when items is empty prevents dropping the table when
replace is True; fix by: after items = extract_schema_items(manifest) do not
return immediately on empty items — if replace is True then drop _SCHEMA_TABLE
when present via _table_names/_SCHEMA_TABLE and return 0; otherwise (replace is
False) simply return 0 without touching the DB; only call self._db.create_table
when items is non-empty, and when replace is False ensure you do not use
mode="overwrite" (use append/default) while when replace is True you may create
with mode="overwrite" after dropping the table; reference symbols:
extract_schema_items, items, replace, _SCHEMA_TABLE, _table_names,
self._db.drop_table, self._db.create_table.
In `@wren/tests/unit/test_memory.py`:
- Around line 225-233: The fixture memory_store currently catches ImportError
broadly which masks import-time regressions; change it to first check for the
optional dependency (e.g., use pytest.importorskip("wren.memory", reason=...) or
pytest.importorskip("wren.memory.store")) and only then import MemoryStore and
construct it (call MemoryStore(path=tmp_path)) without wrapping that
import/instantiation in a wide try/except so real import errors in wren.memory
or MemoryStore surface as test failures; update the same pattern for the other
fixture block at lines 328-335.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e42fbc34-eb80-4db4-a537-10a031f27c2f
⛔ Files ignored due to path filters (1)
wren/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
cli-skills/wren-usage/references/memory.mdwren/README.mdwren/pyproject.tomlwren/src/wren/cli.pywren/src/wren/memory/cli.pywren/src/wren/memory/store.pywren/tests/unit/test_memory.py
✅ Files skipped from review due to trivial changes (2)
- wren/README.md
- cli-skills/wren-usage/references/memory.md
🚧 Files skipped from review as they are similar to previous changes (1)
- wren/pyproject.toml
- Fix index_schema replace logic: replace=False now appends instead of overwriting; empty items with replace=True drops stale table - Fix embedding dimension mismatch: derive actual dim from model probe at init instead of hardcoding _DEFAULT_DIM in Arrow schemas - Use pytest.importorskip in test fixtures to only skip on genuinely missing extras; internal ImportErrors now surface as failures - Reject unsupported --output format values in memory CLI Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@wren/src/wren/memory/cli.py`:
- Around line 172-183: The CLI always prints a plain "Strategy: ..." line and
ignores the --output flag for the "full" path; fix fetch so it honors the output
option and emits pure JSON when output == "json". In the block that calls
_load_manifest, _get_store and store.get_context, remove the plain
typer.echo("Strategy: ...") and instead, when output == "json", serialize a JSON
object containing at least {"strategy": strategy, "schema": <schema>} for
strategy == "full" or {"strategy": strategy, "results": <results>} for other
strategies (use json.dumps and typer.echo); otherwise preserve the current
human-readable behavior (print schema text for "full" and call
_print_results(manifest_or_results, output) for non-full). Ensure you reference
the result from store.get_context and pass the output flag through to
_print_results so machine consumers receive valid JSON.
In `@wren/src/wren/memory/store.py`:
- Around line 285-288: The reset() method currently iterates all names returned
by _table_names(self._db) and calls self._db.drop_table(name), which can erase
unrelated LanceDB tables under a caller-controlled path; change reset() to only
target the known internal tables (e.g. "schema_items" and "query_history"):
enumerate _table_names(self._db), filter to the allowed set of table names you
want to drop, and call self._db.drop_table(name) only for those names (or
alternatively use an explicit existence check like if "schema_items" in
_table_names(...) before dropping), keeping the rest of the database intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e4715376-c7b1-495f-a672-7b8eeb2634b1
📒 Files selected for processing (3)
wren/src/wren/memory/cli.pywren/src/wren/memory/store.pywren/tests/unit/test_memory.py
- Fix fetch --output json: emit valid JSON for both full and search strategies instead of mixing plain text with JSON - Fix _get_store: move MemoryStore construction inside try block and narrow catch to ModuleNotFoundError with name guard - Scope reset() to only drop schema_items and query_history tables instead of all tables in the LanceDB directory - Scope _search_schema by mdl_hash so replace=False with multiple manifests returns only rows from the current manifest - Change schema_is_current to require ALL rows match current hash (was .any(), now .all()) to detect stale mixed-manifest state Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
wren/src/wren/memory/store.py (1)
300-304: Optional micro-refactor: avoid repeated table-name lookups inreset().Line 303 recomputes
_table_names(self._db)on each loop iteration. Cache once to reduce repeated metadata calls and simplify flow.♻️ Suggested cleanup
def reset(self) -> None: """Drop Wren memory tables.""" - for name in (_SCHEMA_TABLE, _QUERY_TABLE): - if name in _table_names(self._db): + existing = set(_table_names(self._db)) + for name in (_SCHEMA_TABLE, _QUERY_TABLE): + if name in existing: self._db.drop_table(name)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wren/src/wren/memory/store.py` around lines 300 - 304, In reset(), avoid calling _table_names(self._db) twice: call it once (e.g., assign to a local like table_names = _table_names(self._db)) and then iterate over or check membership for _SCHEMA_TABLE and _QUERY_TABLE against that cached table_names before calling self._db.drop_table(name); update references to use the local variable so metadata is fetched only once and the flow is simpler.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@wren/src/wren/memory/store.py`:
- Around line 300-304: In reset(), avoid calling _table_names(self._db) twice:
call it once (e.g., assign to a local like table_names = _table_names(self._db))
and then iterate over or check membership for _SCHEMA_TABLE and _QUERY_TABLE
against that cached table_names before calling self._db.drop_table(name); update
references to use the local variable so metadata is fetched only once and the
flow is simpler.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 53ee8be5-0d65-4a3e-aa4d-6ce7db5f30da
📒 Files selected for processing (2)
wren/src/wren/memory/cli.pywren/src/wren/memory/store.py
🚧 Files skipped from review as they are similar to previous changes (1)
- wren/src/wren/memory/cli.py
Summary
wren.memorymodule with LanceDB-backed schema and query memory for the standalone wren CLIindex,fetch,describe,store,recall,status,resetcli-skills/wren-usage/) with decision-driven workflow guide teaching agents when to fetch context, store confirmed queries, and handle errorspip install wren-engine[memory]Key design decisions
wren memory fetchauto-selects full text vs embedding search based on a 30K-char threshold (~8K tokens). Full text gives better LLM results for small schemas; embedding search scales for large ones.fetchcommand instead of separatecontext+search— reduces agent confusion from two similar commands.cli-skills/placementAgent skills for the wren CLI are placed under
cli-skills/temporarily to avoid polluting theskills/directory, which contains the already-released MCP server skills. Once the standalone CLI is officially released,cli-skills/will be merged intoskills/.CI note
Memory integration tests (13 tests using LanceDB + sentence-transformers) are skipped in CI intentionally. The
wren[memory]extras pull intorch(~200MB) and a model download (~80MB), which would add ~4 minutes to every CI run. Since:just install-memory && just test-memoryWe keep CI lean and rely on local testing for the embedding/LanceDB layer.
Test plan
just test-memory(index, fetch, store, recall, status, reset, hybrid strategy)wren memory index && wren memory fetch -q "customer orders"against a real MDLwren memory describeoutput excludes tableReferenceSummary by CodeRabbit
New Features
Documentation
Chores
Tests