Add semantic search via pgvector + Ollama#40
Conversation
Adds vector similarity search as an optional, additive capability. Existing tag-based search is unchanged. When an embedding provider is configured, write tools auto-generate embeddings fire-and-forget, and the new semantic_search tool finds entries by meaning. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
- get_knowledge now supports created_after/created_before for filtering by creation time (distinct from since/until which filter by last update) - Documented and isolated _prompt_manager._prompts private access in custom prompt sync — no public remove API exists in FastMCP Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Ollama service container in GitHub Actions with nomic-embed-text model - 7 integration tests that exercise the real embedding pipeline: OllamaEmbedding.embed(), is_available(), similarity ordering, write-and-search round trip, embedding generation on write - Tests skip gracefully when Ollama is not available (local dev) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lama - Store: semantic_search since/until filter tests - Server: embed-returns-empty and embed-raises-exception error path tests - Embeddings: unreachable Ollama returns is_available=False test - 292 tests total Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cmeans
left a comment
There was a problem hiding this comment.
QA Review — PR #40: Add semantic search via pgvector + Ollama
Automated tests
- 294/294 pass (PR says 276 — subsequent commits added more)
- ruff: clean
- mypy: clean
Manual tests (8/8 pass)
Tested both without and with Ollama provider via isolated MCP instances.
| # | Test | Result |
|---|---|---|
| 1 | semantic_search without provider → structured error |
✅ Clear error mentioning AWARENESS_EMBEDDING_PROVIDER |
| 2 | remember works without provider (no regression) |
✅ status: "ok" |
| 3 | get_knowledge by tags (no regression) |
✅ Returns entry |
| 4 | Configure Ollama provider | ✅ Restarted with AWARENESS_EMBEDDING_PROVIDER=ollama |
| 5 | remember generates embedding |
✅ Verified embedding row in embeddings table |
| 6 | semantic_search finds by meaning |
✅ "retirement planning" → "Roth IRA conversion ladder strategy" (similarity: 0.39) |
| 7 | semantic_search with source filter + list mode |
✅ List mode with similarity scores |
| 8 | semantic_search with entry_type filter |
✅ Only notes returned |
Code review
Architecture: Clean separation — embeddings.py defines the EmbeddingProvider protocol, OllamaEmbedding (stdlib urllib, no SDK dep), and NullEmbedding. Server uses _generate_embedding fire-and-forget wrapper. Store protocol extended with 3 methods. Good.
Schema: embeddings table with VECTOR(768), UNIQUE(entry_id, model), ON DELETE CASCADE, text_hash for staleness detection. Alembic migration includes HNSW index. Good.
Graceful degradation: NullEmbedding.is_available() returns False, so all write tools silently skip embedding generation. semantic_search returns a clear error. No functional regression without a provider. Verified manually.
Ollama integration: Uses /api/embed (batch endpoint), /api/tags for availability check. Model name matching handles :latest suffix. Configurable timeout. Good.
Docker Compose: Ollama service under embeddings profile with resource limits (1 CPU, 2GB). HNSW index parameters (m=16, ef_construction=64) are reasonable for the expected scale.
CI: Ollama service added with model pull step and wait loop. AWARENESS_OLLAMA_URL env var. Good.
New filters: created_after/created_before in get_knowledge — clean addition, SQL-level, parameterized.
Docs: CHANGELOG, README (27 tools, test count), CLAUDE.md (architecture), data-dictionary (embeddings table). Good.
Findings
1. HNSW index missing from inline DDL (functional gap)
The Alembic migration creates the HNSW index:
CREATE INDEX IF NOT EXISTS idx_embeddings_vector_hnsw ON embeddings
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);But the inline DDL in postgres_store.py (lines 77-90) only creates the table and idx_embeddings_entry — no HNSW index. New installs that don't run Alembic will get sequential scans on every semantic_search query.
Suggest: add the HNSW index creation to the inline DDL block.
2. VECTOR(768) hardcoded vs configurable dimensions
Both inline DDL and Alembic migration hardcode VECTOR(768). But AWARENESS_EMBEDDING_DIMENSIONS env var allows changing it. If someone configures a different model with different dimensions (e.g., 384 for all-MiniLM-L6-v2), upsert_embedding will fail with a Postgres type error.
Options:
- Document that 768 is the only supported dimension (simplest)
- Use
VECTORwithout dimension constraint (flexible but loses type safety) - Make the migration/DDL dimension-aware
3. _generate_embedding is synchronous, not truly fire-and-forget (observation)
_generate_embedding calls provider.embed() — a synchronous HTTP request to Ollama — inline before returning the tool response. On my test setup this added ~100-200ms per write. The PR description says "fire-and-forget (never blocks responses)" but it does block.
For v1 this is fine (Ollama is local, latency is tolerable). If it becomes an issue, a background thread or queue would make it truly non-blocking.
4. Pre-existing entries don't get embeddings (observation, expected)
The 401k entry created without Ollama didn't get an embedding (verified in DB). Only entries written after configuring a provider are embedded. A backfill mechanism (get_entries_without_embeddings exists in the store but isn't exposed as a tool) would be useful for onboarding.
5. Test count discrepancy (nit)
PR says 276, actual is 294. Likely accumulated from subsequent commits (5 commits on the branch). Update the PR body/CHANGELOG if desired.
Verdict
Finding #1 (missing HNSW index in inline DDL) is the substantive issue — it creates a functional gap for non-Alembic installs. Finding #2 (hardcoded dimensions) is worth a decision. #3-#5 are observations.
The feature is well-implemented with good graceful degradation. Requesting changes for the HNSW index gap (#1).
…traint - Add HNSW index to _create_tables (finding #1: non-Alembic installs) - Document VECTOR(768) hardcoded dimension in data-dictionary (finding #2) - Clarify embedding-on-write is synchronous, background planned for Phase 2 - Update test count to 294 in CHANGELOG and README Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cmeans
left a comment
There was a problem hiding this comment.
QA Re-review — PR #40 (round 2)
Fix verification (commit 7e9c8f9)
All findings addressed:
- HNSW index in inline DDL — ✅ Added. Matches Alembic migration (
m=16, ef_construction=64). - Dimension constraint documented — ✅ Data-dictionary notes
VECTOR(768)is hardcoded andAWARENESS_EMBEDDING_DIMENSIONSdoesn't alter the column type. - Sync embedding clarified — ✅ Changelog updated: "currently synchronous; background generation planned for Phase 2".
- Test count — ✅ Updated to 294 in README and CHANGELOG.
Automated tests
- 294/294 pass
- ruff + mypy clean
Verdict
All clear. Ready for QA Approved label.
Review responseThanks for the thorough review — 8/8 manual QA passing is great to see. Finding #1: HNSW index missing from inline DDL ✅ FixedAdded the HNSW index to Finding #2: Hardcoded VECTOR(768) ✅ DocumentedDocumented in data-dictionary that Finding #3: Synchronous embedding on write ✅ AcknowledgedUpdated CHANGELOG to clarify this is synchronous in v1. Background thread/queue for truly non-blocking generation is a Phase 2 item. Finding #4: Pre-existing entries not backfilled ✅ Phase 2
Finding #5: Test count ✅ FixedUpdated CHANGELOG and README to 294. |
Summary
semantic_searchtool finds entries by meaning using pgvector cosine similarityEmbeddingProviderprotocol withOllamaEmbeddingandNullEmbeddingimplementationsembeddingstable with HNSW index, CASCADE delete, unique per entry+modelembeddingsprofileQA
Prerequisites
pip install -e ".[dev]"AWARENESS_PORT=8421)Manual tests (via MCP tools)
Expected: JSON with
status: "error"and message mentioningAWARENESS_EMBEDDING_PROVIDERExpected:
status: "ok"with entry idExpected: returns the entry from step 2
Restart mcp-awareness with
AWARENESS_EMBEDDING_PROVIDER=ollamaExpected:
status: "ok"— embedding generated silentlyExpected: returns entries about 401k and Roth IRA with similarity scores
Expected: list mode results with similarity scores, filtered to source "test"
Expected: only note entries returned
🤖 Generated with Claude Code