Skip to content

Add semantic search via pgvector + Ollama#40

Merged
cmeans merged 6 commits into
mainfrom
semantic-search
Mar 23, 2026
Merged

Add semantic search via pgvector + Ollama#40
cmeans merged 6 commits into
mainfrom
semantic-search

Conversation

@cmeans
Copy link
Copy Markdown
Owner

@cmeans cmeans commented Mar 23, 2026

Summary

  • New semantic_search tool finds entries by meaning using pgvector cosine similarity
  • EmbeddingProvider protocol with OllamaEmbedding and NullEmbedding implementations
  • Fire-and-forget embedding generation on write tools (remember, learn_pattern, add_context, report_alert, report_status, update_entry)
  • Separate embeddings table with HNSW index, CASCADE delete, unique per entry+model
  • Optional Ollama Docker service under embeddings profile
  • Graceful degradation: everything works without an embedding provider configured
  • 276 tests (+37 new), all passing; ruff, mypy, format clean

QA

Prerequisites

  • pip install -e ".[dev]"
  • Deploy to test instance on alternate port (AWARENESS_PORT=8421)

Manual tests (via MCP tools)

    • semantic_search without provider returns helpful error
    semantic_search(query="retirement planning")
    

    Expected: JSON with status: "error" and message mentioning AWARENESS_EMBEDDING_PROVIDER

    • remember creates entry (existing behavior unchanged)
    remember(source="test", tags=["finance"], description="401k contribution strategy for 2026")
    

    Expected: status: "ok" with entry id

    • get_knowledge still works by tags
    get_knowledge(tags=["finance"])
    

    Expected: returns the entry from step 2

    • With Ollama configured (AWARENESS_EMBEDDING_PROVIDER=ollama):
    docker compose --profile embeddings up -d
    docker exec awareness-ollama ollama pull nomic-embed-text
    

    Restart mcp-awareness with AWARENESS_EMBEDDING_PROVIDER=ollama

    • remember generates embedding
    remember(source="test", tags=["finance"], description="Roth IRA conversion ladder strategy")
    

    Expected: status: "ok" — embedding generated silently

    • semantic_search finds by meaning
    semantic_search(query="retirement planning")
    

    Expected: returns entries about 401k and Roth IRA with similarity scores

    • semantic_search with filters
    semantic_search(query="retirement", source="test", mode="list")
    

    Expected: list mode results with similarity scores, filtered to source "test"

    • semantic_search with entry_type filter
    semantic_search(query="retirement", entry_type="note")
    

    Expected: only note entries returned

🤖 Generated with Claude Code

Adds vector similarity search as an optional, additive capability.
Existing tag-based search is unchanged. When an embedding provider
is configured, write tools auto-generate embeddings fire-and-forget,
and the new semantic_search tool finds entries by meaning.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

cmeans and others added 4 commits March 23, 2026 16:42
- get_knowledge now supports created_after/created_before for filtering by
  creation time (distinct from since/until which filter by last update)
- Documented and isolated _prompt_manager._prompts private access in custom
  prompt sync — no public remove API exists in FastMCP

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Ollama service container in GitHub Actions with nomic-embed-text model
- 7 integration tests that exercise the real embedding pipeline:
  OllamaEmbedding.embed(), is_available(), similarity ordering,
  write-and-search round trip, embedding generation on write
- Tests skip gracefully when Ollama is not available (local dev)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lama

- Store: semantic_search since/until filter tests
- Server: embed-returns-empty and embed-raises-exception error path tests
- Embeddings: unreachable Ollama returns is_available=False test
- 292 tests total

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Owner Author

@cmeans cmeans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QA Review — PR #40: Add semantic search via pgvector + Ollama

Automated tests

  • 294/294 pass (PR says 276 — subsequent commits added more)
  • ruff: clean
  • mypy: clean

Manual tests (8/8 pass)

Tested both without and with Ollama provider via isolated MCP instances.

# Test Result
1 semantic_search without provider → structured error ✅ Clear error mentioning AWARENESS_EMBEDDING_PROVIDER
2 remember works without provider (no regression) status: "ok"
3 get_knowledge by tags (no regression) ✅ Returns entry
4 Configure Ollama provider ✅ Restarted with AWARENESS_EMBEDDING_PROVIDER=ollama
5 remember generates embedding ✅ Verified embedding row in embeddings table
6 semantic_search finds by meaning ✅ "retirement planning" → "Roth IRA conversion ladder strategy" (similarity: 0.39)
7 semantic_search with source filter + list mode ✅ List mode with similarity scores
8 semantic_search with entry_type filter ✅ Only notes returned

Code review

Architecture: Clean separation — embeddings.py defines the EmbeddingProvider protocol, OllamaEmbedding (stdlib urllib, no SDK dep), and NullEmbedding. Server uses _generate_embedding fire-and-forget wrapper. Store protocol extended with 3 methods. Good.

Schema: embeddings table with VECTOR(768), UNIQUE(entry_id, model), ON DELETE CASCADE, text_hash for staleness detection. Alembic migration includes HNSW index. Good.

Graceful degradation: NullEmbedding.is_available() returns False, so all write tools silently skip embedding generation. semantic_search returns a clear error. No functional regression without a provider. Verified manually.

Ollama integration: Uses /api/embed (batch endpoint), /api/tags for availability check. Model name matching handles :latest suffix. Configurable timeout. Good.

Docker Compose: Ollama service under embeddings profile with resource limits (1 CPU, 2GB). HNSW index parameters (m=16, ef_construction=64) are reasonable for the expected scale.

CI: Ollama service added with model pull step and wait loop. AWARENESS_OLLAMA_URL env var. Good.

New filters: created_after/created_before in get_knowledge — clean addition, SQL-level, parameterized.

Docs: CHANGELOG, README (27 tools, test count), CLAUDE.md (architecture), data-dictionary (embeddings table). Good.

Findings

1. HNSW index missing from inline DDL (functional gap)

The Alembic migration creates the HNSW index:

CREATE INDEX IF NOT EXISTS idx_embeddings_vector_hnsw ON embeddings
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

But the inline DDL in postgres_store.py (lines 77-90) only creates the table and idx_embeddings_entryno HNSW index. New installs that don't run Alembic will get sequential scans on every semantic_search query.

Suggest: add the HNSW index creation to the inline DDL block.

2. VECTOR(768) hardcoded vs configurable dimensions

Both inline DDL and Alembic migration hardcode VECTOR(768). But AWARENESS_EMBEDDING_DIMENSIONS env var allows changing it. If someone configures a different model with different dimensions (e.g., 384 for all-MiniLM-L6-v2), upsert_embedding will fail with a Postgres type error.

Options:

  • Document that 768 is the only supported dimension (simplest)
  • Use VECTOR without dimension constraint (flexible but loses type safety)
  • Make the migration/DDL dimension-aware

3. _generate_embedding is synchronous, not truly fire-and-forget (observation)

_generate_embedding calls provider.embed() — a synchronous HTTP request to Ollama — inline before returning the tool response. On my test setup this added ~100-200ms per write. The PR description says "fire-and-forget (never blocks responses)" but it does block.

For v1 this is fine (Ollama is local, latency is tolerable). If it becomes an issue, a background thread or queue would make it truly non-blocking.

4. Pre-existing entries don't get embeddings (observation, expected)

The 401k entry created without Ollama didn't get an embedding (verified in DB). Only entries written after configuring a provider are embedded. A backfill mechanism (get_entries_without_embeddings exists in the store but isn't exposed as a tool) would be useful for onboarding.

5. Test count discrepancy (nit)

PR says 276, actual is 294. Likely accumulated from subsequent commits (5 commits on the branch). Update the PR body/CHANGELOG if desired.

Verdict

Finding #1 (missing HNSW index in inline DDL) is the substantive issue — it creates a functional gap for non-Alembic installs. Finding #2 (hardcoded dimensions) is worth a decision. #3-#5 are observations.

The feature is well-implemented with good graceful degradation. Requesting changes for the HNSW index gap (#1).

…traint

- Add HNSW index to _create_tables (finding #1: non-Alembic installs)
- Document VECTOR(768) hardcoded dimension in data-dictionary (finding #2)
- Clarify embedding-on-write is synchronous, background planned for Phase 2
- Update test count to 294 in CHANGELOG and README

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Owner Author

@cmeans cmeans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QA Re-review — PR #40 (round 2)

Fix verification (commit 7e9c8f9)

All findings addressed:

  1. HNSW index in inline DDL — ✅ Added. Matches Alembic migration (m=16, ef_construction=64).
  2. Dimension constraint documented — ✅ Data-dictionary notes VECTOR(768) is hardcoded and AWARENESS_EMBEDDING_DIMENSIONS doesn't alter the column type.
  3. Sync embedding clarified — ✅ Changelog updated: "currently synchronous; background generation planned for Phase 2".
  4. Test count — ✅ Updated to 294 in README and CHANGELOG.

Automated tests

  • 294/294 pass
  • ruff + mypy clean

Verdict

All clear. Ready for QA Approved label.

@cmeans
Copy link
Copy Markdown
Owner Author

cmeans commented Mar 23, 2026

Review response

Thanks for the thorough review — 8/8 manual QA passing is great to see.

Finding #1: HNSW index missing from inline DDL ✅ Fixed

Added the HNSW index to _create_tables so non-Alembic installs (and tests) get it too.

Finding #2: Hardcoded VECTOR(768) ✅ Documented

Documented in data-dictionary that VECTOR(768) is intentionally hardcoded to match nomic-embed-text. AWARENESS_EMBEDDING_DIMENSIONS configures the provider but doesn't alter the column type — both DDL and migration must be updated to change dimensions.

Finding #3: Synchronous embedding on write ✅ Acknowledged

Updated CHANGELOG to clarify this is synchronous in v1. Background thread/queue for truly non-blocking generation is a Phase 2 item.

Finding #4: Pre-existing entries not backfilled ✅ Phase 2

get_entries_without_embeddings exists in the store protocol — exposing it as a backfill_embeddings tool is planned for the follow-up PR.

Finding #5: Test count ✅ Fixed

Updated CHANGELOG and README to 294.

@cmeans cmeans added the QA Approved Manual QA testing completed and passed label Mar 23, 2026
@cmeans cmeans merged commit 8da0886 into main Mar 23, 2026
9 checks passed
@cmeans cmeans deleted the semantic-search branch March 23, 2026 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

QA Approved Manual QA testing completed and passed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant