feat: add step-by-step migration strategy for multi-dimensional embeddings#683
feat: add step-by-step migration strategy for multi-dimensional embeddings#683
Conversation
…dings Adds alternative migration approach for users on Supabase free tier who encounter timeouts when running the full upgrade_database.sql script. Breaks migration into 4 manageable steps to avoid memory/timeout issues when creating vector indexes on large datasets.
WalkthroughIntroduces a new migration guide and four SQL scripts enabling a staged database migration for multi-dimensional embeddings. Adds columns, migrates existing data, creates search functions with legacy wrappers, and optionally builds vector and metadata indexes. Includes guidance for execution paths and verification. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Admin as Admin/Operator
participant DB as PostgreSQL
participant T1 as archon_crawled_pages
participant T2 as archon_code_examples
rect rgb(232,240,254)
note over Admin,DB: Migration Flow (multi-path)
Admin->>DB: Step 1: add embedding_* and metadata columns
Admin->>DB: Step 2: migrate legacy embedding -> embedding_1536<br/>drop old column and indexes
Admin->>DB: Step 3: create helper + multi search + legacy wrappers
alt Optional
Admin->>DB: Step 4: create IVFFLAT and B-tree indexes
else Skipped or partial
note over DB: Falls back to brute-force scan when IVFFLAT absent
end
end
rect rgb(240,255,240)
note over Client,DB: Query Flow (search)
participant Client as App/Service
Client->>DB: SELECT FROM match_archon_*_multi(query_embedding, dim, ...)
DB->>DB: get_embedding_column_name(dim)
DB->>T1: Scan/vector search on chosen embedding column (or T2)
T1-->>DB: Top-k rows with distance
DB-->>Client: Rows with similarity and metadata
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (17)
migration/step1_add_columns.sql (2)
11-21: Constrain allowed embedding_dimension values.A CHECK guards accidental writes (e.g., 512) and helps query planners.
Append these constraints after the ALTER (either table-by-table or once per table):
ALTER TABLE archon_crawled_pages @@ ADD COLUMN IF NOT EXISTS embedding_dimension INTEGER; + +-- Constrain dimension to known values +ALTER TABLE archon_crawled_pages + ADD CONSTRAINT archon_crawled_pages_embedding_dimension_chk + CHECK (embedding_dimension IN (384, 768, 1024, 1536, 3072));Repeat for
archon_code_examples:ALTER TABLE archon_code_examples @@ ADD COLUMN IF NOT EXISTS embedding_dimension INTEGER; + +ALTER TABLE archon_code_examples + ADD CONSTRAINT archon_code_examples_embedding_dimension_chk + CHECK (embedding_dimension IN (384, 768, 1024, 1536, 3072));
6-8: maintenance_work_mem isn’t needed for ADD COLUMN.Harmless but not used here; consider moving this to the indexing step only to avoid confusion.
migration/step2_migrate_data.sql (2)
16-20: Filter by schema in information_schema lookups.Avoid false positives if similarly named tables exist in other schemas.
Apply:
- WHERE table_name = 'archon_crawled_pages' + WHERE table_schema = 'public' AND table_name = 'archon_crawled_pages' @@ - WHERE table_name = 'archon_code_examples' + WHERE table_schema = 'public' AND table_name = 'archon_code_examples'Also applies to: 41-45
29-34: Don’t guess the embedding_model.Setting
'text-embedding-3-small'may be incorrect for legacy data (e.g., ada-002). Prefer leaving as-is or marking as'legacy-1536'for later curation.Apply:
- embedding_model = COALESCE(embedding_model, 'text-embedding-3-small') + embedding_model = COALESCE(embedding_model, 'legacy-1536')(Repeat for both tables.)
Also applies to: 53-58
migration/step3_create_functions.sql (4)
16-29: Use the helper to avoid duplicate CASE logic.
get_embedding_column_nameexists; leverage it as single source of truth.Apply:
CREATE OR REPLACE FUNCTION get_embedding_column_name(dimension INTEGER) RETURNS TEXT AS $$ BEGIN - CASE dimension - WHEN 384 THEN RETURN 'embedding_384'; - WHEN 768 THEN RETURN 'embedding_768'; - WHEN 1024 THEN RETURN 'embedding_1024'; - WHEN 1536 THEN RETURN 'embedding_1536'; - WHEN 3072 THEN RETURN 'embedding_3072'; - ELSE RAISE EXCEPTION 'Unsupported embedding dimension: %', dimension; - END CASE; + CASE dimension + WHEN 384 THEN RETURN 'embedding_384'; + WHEN 768 THEN RETURN 'embedding_768'; + WHEN 1024 THEN RETURN 'embedding_1024'; + WHEN 1536 THEN RETURN 'embedding_1536'; + WHEN 3072 THEN RETURN 'embedding_3072'; + ELSE RAISE EXCEPTION 'Unsupported embedding dimension: %', dimension; + END CASE; END; $$ LANGUAGE plpgsql IMMUTABLE;(Keep function, then use it below.)
53-61: De-duplicate CASE blocks inside search functions.Resolve column with the helper to reduce divergence risk.
Apply in both functions:
- CASE embedding_dimension - WHEN 384 THEN embedding_column := 'embedding_384'; - WHEN 768 THEN embedding_column := 'embedding_768'; - WHEN 1024 THEN embedding_column := 'embedding_1024'; - WHEN 1536 THEN embedding_column := 'embedding_1536'; - WHEN 3072 THEN embedding_column := 'embedding_3072'; - ELSE RAISE EXCEPTION 'Unsupported embedding dimension: %', embedding_dimension; - END CASE; + embedding_column := get_embedding_column_name(embedding_dimension);Also applies to: 101-109
63-75: Compute distance once; order by the alias.Saves recomputation and can help planners. Also guard against dimension mismatch.
Apply to both functions:
- sql_query := format(' - SELECT id, url, chunk_number, content, metadata, source_id, - 1 - (%I <=> $1) AS similarity + -- Optional: verify query embedding dimension matches requested dimension + IF detect_embedding_dimension(query_embedding) <> embedding_dimension THEN + RAISE EXCEPTION 'Query embedding dimension (%) does not match requested dimension (%)', + detect_embedding_dimension(query_embedding), embedding_dimension; + END IF; + + sql_query := format(' + SELECT id, url, chunk_number, content, metadata, source_id, + (%I <=> $1) AS distance, + 1 - (%I <=> $1) AS similarity FROM archon_crawled_pages WHERE (%I IS NOT NULL) AND metadata @> $3 AND ($4 IS NULL OR source_id = $4) - ORDER BY %I <=> $1 + ORDER BY distance LIMIT $2', - embedding_column, embedding_column, embedding_column); + embedding_column, embedding_column, embedding_column);And for code examples:
- SELECT id, url, chunk_number, content, summary, metadata, source_id, - 1 - (%I <=> $1) AS similarity + SELECT id, url, chunk_number, content, summary, metadata, source_id, + (%I <=> $1) AS distance, + 1 - (%I <=> $1) AS similarity @@ - ORDER BY %I <=> $1 + ORDER BY distance @@ - embedding_column, embedding_column, embedding_column); + embedding_column, embedding_column, embedding_column);Also applies to: 111-123
39-46: Prefer DOUBLE PRECISION over FLOAT for clarity.Postgres maps
floatambiguously; be explicit.- similarity FLOAT + similarity DOUBLE PRECISION(Apply in all RETURNS TABLE signatures.)
Also applies to: 86-94
migration/step4_create_indexes_optional.sql (3)
16-55: Consider adding optional 3072D indexes (commented) for users of text-embedding-3-large.Keeps parity with available columns; users can uncomment when needed.
Append after Index 8:
+-- (Optional) Index 9 of 10 +-- CREATE INDEX IF NOT EXISTS idx_archon_crawled_pages_embedding_3072 +-- ON archon_crawled_pages USING ivfflat (embedding_3072 vector_cosine_ops) +-- WITH (lists = 100); + +-- (Optional) Index 10 of 10 +-- CREATE INDEX IF NOT EXISTS idx_archon_code_examples_embedding_3072 +-- ON archon_code_examples USING ivfflat (embedding_3072 vector_cosine_ops) +-- WITH (lists = 100);
56-63: Add GIN indexes on metadata to accelerate JSONB filter.
metadata @> filterbenefits heavily from GIN.Add:
CREATE INDEX IF NOT EXISTS idx_archon_crawled_pages_embedding_model ON archon_crawled_pages (embedding_model); @@ CREATE INDEX IF NOT EXISTS idx_archon_code_examples_llm_chat_model ON archon_code_examples (llm_chat_model); + +-- JSONB metadata indexes +CREATE INDEX IF NOT EXISTS idx_archon_crawled_pages_metadata_gin ON archon_crawled_pages USING GIN (metadata); +CREATE INDEX IF NOT EXISTS idx_archon_code_examples_metadata_gin ON archon_code_examples USING GIN (metadata);
10-12: Analyze after building IVFFLAT indexes.IVFFLAT needs ANALYZE for good recall/perf; also consider CONCURRENTLY in production.
Apply:
SET statement_timeout = '10min'; @@ RESET maintenance_work_mem; RESET statement_timeout; + +-- Gather stats for planner / IVFFLAT +ANALYZE archon_crawled_pages; +ANALYZE archon_code_examples;Note: If minimizing write locks is critical, consider
CREATE INDEX CONCURRENTLYoutside transactions (may run longer on free tier).Also applies to: 64-66
migration/MIGRATION_GUIDE.md (6)
20-28: Explicitly require pgvector extension.Avoids confusion when VECTOR type/functions are missing.
Add after “Direct Database Connection” section intro:
+> Prerequisite: Ensure pgvector is installed +> +> ```sql +> CREATE EXTENSION IF NOT EXISTS vector; +> ```
63-66: Clarify skipping indexes and performance caveat.Mention 3072D specifically won’t be indexed by default; add ANALYZE note.
Add:
If you have a small dataset (<10,000 documents), you can skip Step 4 entirely. The system will use brute-force search which is fast enough for small datasets. + +Note: 3072‑dimension embeddings (e.g., text‑embedding‑3‑large) are not indexed by default in Step 4. Expect brute‑force for 3072D unless you add those indexes later.
69-82: Verification query may count metadata indexes too; that’s OK—clarify expected ranges.Minor doc tweak so users aren’t surprised by counts > 8.
- - `index_count`: 8+ (or 0 if you skipped Step 4) + - `index_count`: 8–14 (0 if you skipped Step 4), depending on which vector/metadata indexes you created
109-122: Recommend ANALYZE after creating IVFFLAT indexes.Improves recall/performance immediately after indexing.
Add:
2. **Test the system**: @@ 3. **Monitor performance**: - If searches are slow without indexes, create them via direct connection - Consider using smaller embedding dimensions (384 or 768) for faster performance + - After creating IVFFLAT indexes, run: + ```sql + ANALYZE archon_crawled_pages; + ANALYZE archon_code_examples; + ```
91-99: Timeout guidance: prefer smaller batches and direct connection; avoid transaction pooler for long DDL.Add a note about connecting via the Session pooler or direct port to support
SETand long‑running ops.- Use direct database connection - Increase `statement_timeout` setting + - Ensure you connect via the Session pooler (not Transaction pooler) or directly to Postgres for long‑running DDL and `SET` commands
52-58: Minor: “db push” wording can confuse; clarify both workflows.Clarify difference between pushing migration files vs. executing ad‑hoc SQL.
# Login and link project @@ -# Run migration -supabase db push migration/upgrade_database_with_memory_fix.sql +# Option A: apply local migrations in supabase/migrations +# supabase db push +# Option B: run a specific SQL file +# supabase db query -f migration/upgrade_database_with_memory_fix.sql
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
migration/MIGRATION_GUIDE.md(1 hunks)migration/step1_add_columns.sql(1 hunks)migration/step2_migrate_data.sql(1 hunks)migration/step3_create_functions.sql(1 hunks)migration/step4_create_indexes_optional.sql(1 hunks)
| # Run migration | ||
| supabase db push migration/upgrade_database_with_memory_fix.sql | ||
| ``` |
There was a problem hiding this comment.
Supabase CLI command is incorrect for running a single SQL file.
Use db query -f (or psql -f) to execute a script; db push applies migration files from the migrations dir.
Apply:
-# Run migration
-supabase db push migration/upgrade_database_with_memory_fix.sql
+# Run a specific SQL script
+supabase db query -f migration/upgrade_database_with_memory_fix.sql
+
+# Or with psql
+# psql "$CONNECTION_STRING" -f migration/upgrade_database_with_memory_fix.sqlCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In migration/MIGRATION_GUIDE.md around lines 59 to 61, the suggested Supabase
CLI command is wrong: replace the `supabase db push
migration/upgrade_database_with_memory_fix.sql` line with the correct command to
run a single SQL file (e.g., `supabase db query -f
migration/upgrade_database_with_memory_fix.sql`) or note the alternative `psql
-f migration/upgrade_database_with_memory_fix.sql` so the guide shows how to
execute a standalone script rather than using `db push` which applies migrations
from the migrations directory.
| -- Increase memory for this session | ||
| SET maintenance_work_mem = '256MB'; | ||
|
|
||
| BEGIN; | ||
|
|
There was a problem hiding this comment.
Ensure pgvector extension is installed before using VECTOR.
Without CREATE EXTENSION vector, the type VECTOR(n) and ops won’t exist on some setups (esp. fresh DBs or new environments).
Apply this diff near the top (before BEGIN;):
-- Increase memory for this session
-SET maintenance_work_mem = '256MB';
+SET maintenance_work_mem = '256MB';
+
+-- Ensure pgvector is available
+CREATE EXTENSION IF NOT EXISTS vector;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| -- Increase memory for this session | |
| SET maintenance_work_mem = '256MB'; | |
| BEGIN; | |
| -- Increase memory for this session | |
| SET maintenance_work_mem = '256MB'; | |
| -- Ensure pgvector is available | |
| CREATE EXTENSION IF NOT EXISTS vector; | |
| BEGIN; |
🤖 Prompt for AI Agents
In migration/step1_add_columns.sql around lines 6 to 10, the migration assumes
the pgvector extension is present but does not ensure it is installed; add a
statement before BEGIN to create the extension if missing (e.g., run CREATE
EXTENSION IF NOT EXISTS vector) so the VECTOR(n) type and operators are
available on fresh DBs or new environments.
| IF crawled_pages_count > 0 THEN | ||
| -- Detect dimension | ||
| SELECT vector_dims(embedding) INTO dimension_detected | ||
| FROM archon_crawled_pages | ||
| WHERE embedding IS NOT NULL | ||
| LIMIT 1; | ||
|
|
||
| IF dimension_detected = 1536 THEN | ||
| UPDATE archon_crawled_pages | ||
| SET embedding_1536 = embedding, | ||
| embedding_dimension = 1536, | ||
| embedding_model = COALESCE(embedding_model, 'text-embedding-3-small') | ||
| WHERE embedding IS NOT NULL AND embedding_1536 IS NULL; | ||
| END IF; | ||
|
|
||
| -- Drop old column | ||
| ALTER TABLE archon_crawled_pages DROP COLUMN IF EXISTS embedding; | ||
| END IF; |
There was a problem hiding this comment.
Data‑loss risk: non‑1536 embeddings are dropped.
If the old embedding column isn’t 1536D (e.g., 384/768/1024/3072), the script skips copying but still drops the column, losing data.
Replace the DO block with dimension‑aware copy and safe drop:
DO $$
DECLARE
- crawled_pages_count INTEGER;
- code_examples_count INTEGER;
- dimension_detected INTEGER;
+ crawled_pages_count INTEGER;
+ code_examples_count INTEGER;
+ dimension_detected INTEGER;
BEGIN
@@
- IF crawled_pages_count > 0 THEN
- -- Detect dimension
- SELECT vector_dims(embedding) INTO dimension_detected
- FROM archon_crawled_pages
- WHERE embedding IS NOT NULL
- LIMIT 1;
-
- IF dimension_detected = 1536 THEN
- UPDATE archon_crawled_pages
- SET embedding_1536 = embedding,
- embedding_dimension = 1536,
- embedding_model = COALESCE(embedding_model, 'text-embedding-3-small')
- WHERE embedding IS NOT NULL AND embedding_1536 IS NULL;
- END IF;
-
- -- Drop old column
- ALTER TABLE archon_crawled_pages DROP COLUMN IF EXISTS embedding;
- END IF;
+ IF crawled_pages_count > 0 THEN
+ SELECT vector_dims(embedding) INTO dimension_detected
+ FROM archon_crawled_pages
+ WHERE embedding IS NOT NULL
+ LIMIT 1;
+
+ IF dimension_detected IS NOT NULL THEN
+ CASE dimension_detected
+ WHEN 384 THEN
+ UPDATE archon_crawled_pages
+ SET embedding_384 = embedding,
+ embedding_dimension = 384
+ WHERE embedding IS NOT NULL AND embedding_384 IS NULL;
+ WHEN 768 THEN
+ UPDATE archon_crawled_pages
+ SET embedding_768 = embedding,
+ embedding_dimension = 768
+ WHERE embedding IS NOT NULL AND embedding_768 IS NULL;
+ WHEN 1024 THEN
+ UPDATE archon_crawled_pages
+ SET embedding_1024 = embedding,
+ embedding_dimension = 1024
+ WHERE embedding IS NOT NULL AND embedding_1024 IS NULL;
+ WHEN 1536 THEN
+ UPDATE archon_crawled_pages
+ SET embedding_1536 = embedding,
+ embedding_dimension = 1536
+ WHERE embedding IS NOT NULL AND embedding_1536 IS NULL;
+ WHEN 3072 THEN
+ UPDATE archon_crawled_pages
+ SET embedding_3072 = embedding,
+ embedding_dimension = 3072
+ WHERE embedding IS NOT NULL AND embedding_3072 IS NULL;
+ ELSE
+ RAISE NOTICE 'Unsupported embedding dimension % in archon_crawled_pages; keeping old column', dimension_detected;
+ END CASE;
+ END IF;
+
+ -- Only drop after successful copy of a supported dimension
+ IF dimension_detected IN (384,768,1024,1536,3072) THEN
+ ALTER TABLE archon_crawled_pages DROP COLUMN IF EXISTS embedding;
+ END IF;
+ END IF;
@@
- IF code_examples_count > 0 THEN
- SELECT vector_dims(embedding) INTO dimension_detected
- FROM archon_code_examples
- WHERE embedding IS NOT NULL
- LIMIT 1;
-
- IF dimension_detected = 1536 THEN
- UPDATE archon_code_examples
- SET embedding_1536 = embedding,
- embedding_dimension = 1536,
- embedding_model = COALESCE(embedding_model, 'text-embedding-3-small')
- WHERE embedding IS NOT NULL AND embedding_1536 IS NULL;
- END IF;
-
- ALTER TABLE archon_code_examples DROP COLUMN IF EXISTS embedding;
- END IF;
+ IF code_examples_count > 0 THEN
+ SELECT vector_dims(embedding) INTO dimension_detected
+ FROM archon_code_examples
+ WHERE embedding IS NOT NULL
+ LIMIT 1;
+
+ IF dimension_detected IS NOT NULL THEN
+ CASE dimension_detected
+ WHEN 384 THEN
+ UPDATE archon_code_examples
+ SET embedding_384 = embedding,
+ embedding_dimension = 384
+ WHERE embedding IS NOT NULL AND embedding_384 IS NULL;
+ WHEN 768 THEN
+ UPDATE archon_code_examples
+ SET embedding_768 = embedding,
+ embedding_dimension = 768
+ WHERE embedding IS NOT NULL AND embedding_768 IS NULL;
+ WHEN 1024 THEN
+ UPDATE archon_code_examples
+ SET embedding_1024 = embedding,
+ embedding_dimension = 1024
+ WHERE embedding IS NOT NULL AND embedding_1024 IS NULL;
+ WHEN 1536 THEN
+ UPDATE archon_code_examples
+ SET embedding_1536 = embedding,
+ embedding_dimension = 1536
+ WHERE embedding IS NOT NULL AND embedding_1536 IS NULL;
+ WHEN 3072 THEN
+ UPDATE archon_code_examples
+ SET embedding_3072 = embedding,
+ embedding_dimension = 3072
+ WHERE embedding IS NOT NULL AND embedding_3072 IS NULL;
+ ELSE
+ RAISE NOTICE 'Unsupported embedding dimension % in archon_code_examples; keeping old column', dimension_detected;
+ END CASE;
+ END IF;
+
+ IF dimension_detected IN (384,768,1024,1536,3072) THEN
+ ALTER TABLE archon_code_examples DROP COLUMN IF EXISTS embedding;
+ END IF;
+ END IF;
END $$;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| IF crawled_pages_count > 0 THEN | |
| -- Detect dimension | |
| SELECT vector_dims(embedding) INTO dimension_detected | |
| FROM archon_crawled_pages | |
| WHERE embedding IS NOT NULL | |
| LIMIT 1; | |
| IF dimension_detected = 1536 THEN | |
| UPDATE archon_crawled_pages | |
| SET embedding_1536 = embedding, | |
| embedding_dimension = 1536, | |
| embedding_model = COALESCE(embedding_model, 'text-embedding-3-small') | |
| WHERE embedding IS NOT NULL AND embedding_1536 IS NULL; | |
| END IF; | |
| -- Drop old column | |
| ALTER TABLE archon_crawled_pages DROP COLUMN IF EXISTS embedding; | |
| END IF; | |
| DO $$ | |
| DECLARE | |
| crawled_pages_count INTEGER; | |
| code_examples_count INTEGER; | |
| dimension_detected INTEGER; | |
| BEGIN | |
| IF crawled_pages_count > 0 THEN | |
| SELECT vector_dims(embedding) INTO dimension_detected | |
| FROM archon_crawled_pages | |
| WHERE embedding IS NOT NULL | |
| LIMIT 1; | |
| IF dimension_detected IS NOT NULL THEN | |
| CASE dimension_detected | |
| WHEN 384 THEN | |
| UPDATE archon_crawled_pages | |
| SET embedding_384 = embedding, | |
| embedding_dimension = 384 | |
| WHERE embedding IS NOT NULL AND embedding_384 IS NULL; | |
| WHEN 768 THEN | |
| UPDATE archon_crawled_pages | |
| SET embedding_768 = embedding, | |
| embedding_dimension = 768 | |
| WHERE embedding IS NOT NULL AND embedding_768 IS NULL; | |
| WHEN 1024 THEN | |
| UPDATE archon_crawled_pages | |
| SET embedding_1024 = embedding, | |
| embedding_dimension = 1024 | |
| WHERE embedding IS NOT NULL AND embedding_1024 IS NULL; | |
| WHEN 1536 THEN | |
| UPDATE archon_crawled_pages | |
| SET embedding_1536 = embedding, | |
| embedding_dimension = 1536 | |
| WHERE embedding IS NOT NULL AND embedding_1536 IS NULL; | |
| WHEN 3072 THEN | |
| UPDATE archon_crawled_pages | |
| SET embedding_3072 = embedding, | |
| embedding_dimension = 3072 | |
| WHERE embedding IS NOT NULL AND embedding_3072 IS NULL; | |
| ELSE | |
| RAISE NOTICE 'Unsupported embedding dimension % in archon_crawled_pages; keeping old column', dimension_detected; | |
| END CASE; | |
| END IF; | |
| IF dimension_detected IN (384,768,1024,1536,3072) THEN | |
| ALTER TABLE archon_crawled_pages DROP COLUMN IF EXISTS embedding; | |
| END IF; | |
| END IF; | |
| IF code_examples_count > 0 THEN | |
| SELECT vector_dims(embedding) INTO dimension_detected | |
| FROM archon_code_examples | |
| WHERE embedding IS NOT NULL | |
| LIMIT 1; | |
| IF dimension_detected IS NOT NULL THEN | |
| CASE dimension_detected | |
| WHEN 384 THEN | |
| UPDATE archon_code_examples | |
| SET embedding_384 = embedding, | |
| embedding_dimension = 384 | |
| WHERE embedding IS NOT NULL AND embedding_384 IS NULL; | |
| WHEN 768 THEN | |
| UPDATE archon_code_examples | |
| SET embedding_768 = embedding, | |
| embedding_dimension = 768 | |
| WHERE embedding IS NOT NULL AND embedding_768 IS NULL; | |
| WHEN 1024 THEN | |
| UPDATE archon_code_examples | |
| SET embedding_1024 = embedding, | |
| embedding_dimension = 1024 | |
| WHERE embedding IS NOT NULL AND embedding_1024 IS NULL; | |
| WHEN 1536 THEN | |
| UPDATE archon_code_examples | |
| SET embedding_1536 = embedding, | |
| embedding_dimension = 1536 | |
| WHERE embedding IS NOT NULL AND embedding_1536 IS NULL; | |
| WHEN 3072 THEN | |
| UPDATE archon_code_examples | |
| SET embedding_3072 = embedding, | |
| embedding_dimension = 3072 | |
| WHERE embedding IS NOT NULL AND embedding_3072 IS NULL; | |
| ELSE | |
| RAISE NOTICE 'Unsupported embedding dimension % in archon_code_examples; keeping old column', dimension_detected; | |
| END CASE; | |
| END IF; | |
| IF dimension_detected IN (384,768,1024,1536,3072) THEN | |
| ALTER TABLE archon_code_examples DROP COLUMN IF EXISTS embedding; | |
| END IF; | |
| END IF; | |
| END $$; |
|
Thanks for this Rasmus! I actually took this split and incorporated it into my latest PR that redoes a lot of our migrations/ folder setup - #718. I'll be closing this PR because of that but I did use this work directly. |
…at-state fix: welcoming empty chat state and suppress Disconnected/No project
…empty-chat-state fix: welcoming empty chat state and suppress Disconnected/No project
…empty-chat-state fix: welcoming empty chat state and suppress Disconnected/No project
Pull Request
Summary
Adds an alternative migration strategy for users on Supabase free tier who encounter timeouts when running the full
migration/upgrade_database.sqlscript. This PR provides a step-by-step approach that breaks the migration into smaller, manageable chunks.Changes Made
MIGRATION_GUIDE.mdwith detailed instructions for handling Supabase SQL editor timeoutsstep1_add_columns.sql- Adds new columns for multi-dimensional embeddings (~5 seconds)step2_migrate_data.sql- Migrates existing data to new columns (~10 seconds)step3_create_functions.sql- Creates search functions (~5 seconds)step4_create_indexes_optional.sql- Creates vector indexes (may timeout - optional)Type of Change
Affected Services
Testing
Test Evidence
Migration scripts have been tested on Supabase free tier instances with large datasets (>10k documents) where the original
upgrade_database.sqlwould timeout.Checklist
Breaking Changes
None - this is an alternative migration path that doesn't change the existing
upgrade_database.sqlapproach.Additional Notes
This migration strategy is specifically designed for:
The migration can be run via:
If Step 4 (index creation) times out, the system will still work using brute-force search, which is acceptable for smaller datasets (<10k documents).
Summary by CodeRabbit
New Features
Documentation