Conversation
Fixes critical bug where hybrid search functions referenced non-existent cp.embedding and ce.embedding columns instead of dimension-specific columns. Changes: - Add new multi-dimensional hybrid search functions with dynamic column selection - Maintain backward compatibility with existing legacy functions - Support all embedding dimensions: 384, 768, 1024, 1536, 3072 - Proper error handling for unsupported dimensions Resolves: #675 - RAG queries now work with multi-dimensional embeddings 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
WalkthroughIntroduces multi-dimensional hybrid search SQL functions with dynamic selection of embedding columns by dimension. Adds backward-compatible wrappers fixed to 1536D. Applies the same updates in add_hybrid_search_tsvector.sql and complete_setup.sql, replacing prior single-column references and addressing the non-existent cp.embedding issue. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant DB as Postgres
participant HF as hybrid_search_*_multi
Client->>DB: SELECT * FROM hybrid_search_*_multi(query_embedding, embedding_dimension, query_text, match_count, filter, source_filter)
activate DB
DB->>HF: Invoke function
activate HF
HF->>HF: Map embedding_dimension → embedding column (e.g., embedding_1536)
HF->>DB: Dynamic SQL: vector similarity + full-text tsquery, apply filters
DB-->>HF: Result rows (id, url, chunk_number, content, ... , similarity, match_type)
deactivate HF
DB-->>Client: Top N matches
deactivate DB
sequenceDiagram
autonumber
actor Client
participant DB as Postgres
participant Legacy as hybrid_search_* (1536D wrapper)
participant Multi as hybrid_search_*_multi
Client->>DB: SELECT * FROM hybrid_search_*(query_embedding_1536, query_text, ...)
activate DB
DB->>Legacy: Call legacy wrapper
activate Legacy
Legacy->>Multi: Delegate with embedding_dimension = 1536
activate Multi
Multi->>DB: Execute dynamic query using embedding_1536
DB-->>Multi: Rows
deactivate Multi
Legacy-->>DB: Return rows
deactivate Legacy
DB-->>Client: Rows
deactivate DB
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (5 passed)
✨ Finishing touches🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (7)
migration/add_hybrid_search_tsvector.sql (4)
66-75: Guard against embedding dimension mismatches.Add a runtime check so a 768D vector isn’t used against a 1536D column (currently would error at execution).
Apply this diff near the start of the function body:
BEGIN + -- Validate query vector dimension + IF vector_dims(query_embedding) != embedding_dimension THEN + RAISE EXCEPTION 'Query embedding dimension (%) does not match embedding_dimension (%)', + vector_dims(query_embedding), embedding_dimension; + END IF;
100-115: Text-search rank vs. vector similarity scale mismatch.
ts_rank_cdand cosine similarity are on different scales; ordering the combined set by a singlesimilaritycan bias results. Optional: blend scores with weights.Example change inside
combined_results:- COALESCE(v.vector_sim, t.text_sim, 0)::float8 AS similarity, + (0.7 * COALESCE(v.vector_sim, 0) + 0.3 * COALESCE(t.text_sim, 0))::float8 AS similarity,Consider making weights configurable later.
171-209: Repeat the dimension guard in code_examples.Mirror the earlier dimension check to prevent mismatches here as well.
BEGIN + IF vector_dims(query_embedding) != embedding_dimension THEN + RAISE EXCEPTION 'Query embedding dimension (%) does not match embedding_dimension (%)', + vector_dims(query_embedding), embedding_dimension; + END IF;
252-276: Same optional score fusion note for code_examples.Consider weighted blending as above.
migration/complete_setup.sql (3)
477-514: Deduplicate CASE via helper function and add dimension check.You already define
get_embedding_column_name(dimension)and can also validate dimension early.BEGIN - -- Determine which embedding column to use based on dimension - CASE embedding_dimension - WHEN 384 THEN embedding_column := 'embedding_384'; - WHEN 768 THEN embedding_column := 'embedding_768'; - WHEN 1024 THEN embedding_column := 'embedding_1024'; - WHEN 1536 THEN embedding_column := 'embedding_1536'; - WHEN 3072 THEN embedding_column := 'embedding_3072'; - ELSE RAISE EXCEPTION 'Unsupported embedding dimension: %', embedding_dimension; - END CASE; + IF vector_dims(query_embedding) != embedding_dimension THEN + RAISE EXCEPTION 'Query embedding dimension (%) does not match embedding_dimension (%)', + vector_dims(query_embedding), embedding_dimension; + END IF; + embedding_column := get_embedding_column_name(embedding_dimension);
610-647: Apply the same helper+dimension guard to code_examples.Reduce duplication and add safety.
BEGIN - -- Determine which embedding column to use based on dimension - CASE embedding_dimension - WHEN 384 THEN embedding_column := 'embedding_384'; - WHEN 768 THEN embedding_column := 'embedding_768'; - WHEN 1024 THEN embedding_column := 'embedding_1024'; - WHEN 1536 THEN embedding_column := 'embedding_1536'; - WHEN 3072 THEN embedding_column := 'embedding_3072'; - ELSE RAISE EXCEPTION 'Unsupported embedding dimension: %', embedding_dimension; - END CASE; + IF vector_dims(query_embedding) != embedding_dimension THEN + RAISE EXCEPTION 'Query embedding dimension (%) does not match embedding_dimension (%)', + vector_dims(query_embedding), embedding_dimension; + END IF; + embedding_column := get_embedding_column_name(embedding_dimension);
227-234: Performance caveat for 3072D embeddings.No index can be created for 3072D; expect seq scans and sorting. Consider a config flag to disallow 3072D at scale or set
ivfflat.probeshigher for other dims to improve recall.Run EXPLAIN ANALYZE comparing 1536D vs 3072D to validate impact on your dataset size.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
migration/add_hybrid_search_tsvector.sql(10 hunks)migration/complete_setup.sql(9 hunks)
🔇 Additional comments (14)
migration/add_hybrid_search_tsvector.sql (7)
38-75: Good fix: dynamic embedding column selection and safe SQL construction.
- Correctly maps dimension → column and uses
%Ifor identifiers.- Addresses the original cp/ce.embedding bug.
Please run EXPLAIN on a 3072D query to confirm acceptable performance without an index.
80-98: Cosine distance to similarity conversion is correct (pgvector<=>).Using
1 - (col <=> $1)yields cosine similarity when<=>is cosine distance. Keep as-is.Confirm your pgvector version uses
<=>for cosine distance; if not, we should switch operators.
146-170: Legacy wrapper: compatibility preserved.Wrapper cleanly delegates to 1536D. No issues.
214-233: OK: vector branch mirrors crawled_pages.Identifier formatting and null filter are correct.
235-251: OK: full‑text search joins summary + content.
content_search_vectoralready indexes both; good.
283-307: Legacy wrapper: compatibility preserved.Looks good.
316-321: Helpful function comments.Clear docstrings for future maintainers.
migration/complete_setup.sql (7)
519-579: Vector branch and dynamic SQL look correct; placeholders align.No correctness issues spotted.
580-582: Parameter order verification.
USING query_embedding, max_vector_results, max_text_results, filter, source_filter, query_textmatches placeholders$1..$6.
585-609: Legacy wrapper: OK.Back-compat maintained.
653-716: OK: dynamic SQL mirrors crawled_pages; identifiers escaped.Looks good.
717-719: Placeholder ordering verified.
USINGargument list matches$1..$6.
722-746: Legacy wrapper: OK.Back-compat maintained.
749-752: Doc comments helpful; keep consistent across migrations.Good documentation.
Fixes critical bug where hybrid search functions referenced non-existent cp.embedding and ce.embedding columns instead of dimension-specific columns. Changes: - Add new multi-dimensional hybrid search functions with dynamic column selection - Maintain backward compatibility with existing legacy functions - Support all embedding dimensions: 384, 768, 1024, 1536, 3072 - Proper error handling for unsupported dimensions Resolves: coleam00#675 - RAG queries now work with multi-dimensional embeddings 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <noreply@anthropic.com>
…l-nav fix: replace sidebar navigation with top-level tabs
…top-level-nav fix: replace sidebar navigation with top-level tabs
…top-level-nav fix: replace sidebar navigation with top-level tabs
Summary
Fixes #675 - Updates hybrid search functions in migration scripts to properly handle multi-dimensional vector fields instead of referencing non-existent
cp.embeddingandce.embeddingcolumns.Problem
The hybrid search functions in both migration scripts were referencing columns that don't exist:
cp.embeddingandce.embeddingembedding_384,embedding_768,embedding_1024,embedding_1536,embedding_3072This caused RAG queries to fail with "column does not exist" database errors.
Solution
Changes
migration/complete_setup.sql: Updated hybrid search functions to use multi-dimensional approachmigration/add_hybrid_search_tsvector.sql: Applied identical fixes for consistencyTesting
✅ Verified RAG queries work without database errors
✅ Confirmed hybrid search mode is active
✅ No "column does not exist" errors in logs
Test Plan
POST /api/rag/query🤖 Generated with Claude Code
Summary by CodeRabbit