feat: add step-by-step migration strategy for multi-dimensional embeddings by Wirasm · Pull Request #683 · coleam00/Archon

Wirasm · 2025-09-17T08:42:21Z

Pull Request

Summary

Adds an alternative migration strategy for users on Supabase free tier who encounter timeouts when running the full migration/upgrade_database.sql script. This PR provides a step-by-step approach that breaks the migration into smaller, manageable chunks.

Changes Made

Added MIGRATION_GUIDE.md with detailed instructions for handling Supabase SQL editor timeouts
Created 4 step-by-step SQL scripts that can be run individually:
- step1_add_columns.sql - Adds new columns for multi-dimensional embeddings (~5 seconds)
- step2_migrate_data.sql - Migrates existing data to new columns (~10 seconds)
- step3_create_functions.sql - Creates search functions (~5 seconds)
- step4_create_indexes_optional.sql - Creates vector indexes (may timeout - optional)

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Performance improvement
Code refactoring

Affected Services

Testing

Manually tested affected user flows
Docker builds succeed for all services

Test Evidence

Migration scripts have been tested on Supabase free tier instances with large datasets (>10k documents) where the original upgrade_database.sql would timeout.

Checklist

My code follows the service architecture patterns
I have verified no regressions in existing features
I have updated relevant documentation

Breaking Changes

None - this is an alternative migration path that doesn't change the existing upgrade_database.sql approach.

Additional Notes

This migration strategy is specifically designed for:

Users on Supabase free tier with timeout limitations
Large datasets where vector index creation exceeds memory limits
Users who prefer a step-by-step approach to monitor migration progress

The migration can be run via:

Supabase SQL editor (step by step)
Direct database connection (psql, TablePlus, etc.)
Supabase CLI

If Step 4 (index creation) times out, the system will still work using brute-force search, which is acceptable for smaller datasets (<10k documents).

Summary by CodeRabbit

New Features
- Added support for multiple embedding dimensions (including 1536 and others) with backward-compatible vector search.
- Enhanced similarity search with flexible filtering and improved performance via optional indexes.
- Safer, staged database migration with options for direct connection, CLI, or GUI tools.
- Option to skip vector indexes for small datasets.
Documentation
- Introduced a comprehensive migration guide with step-by-step instructions, verification steps, troubleshooting for timeouts/memory/permissions, and post-migration checks.

…dings Adds alternative migration approach for users on Supabase free tier who encounter timeouts when running the full upgrade_database.sql script. Breaks migration into 4 manageable steps to avoid memory/timeout issues when creating vector indexes on large datasets.

coderabbitai · 2025-09-17T08:42:28Z

Walkthrough

Introduces a new migration guide and four SQL scripts enabling a staged database migration for multi-dimensional embeddings. Adds columns, migrates existing data, creates search functions with legacy wrappers, and optionally builds vector and metadata indexes. Includes guidance for execution paths and verification.

Changes

Cohort / File(s)	Summary
Documentation: Migration Guide `migration/MIGRATION_GUIDE.md`	Adds a migration guide detailing four-step execution, alternative execution methods (psql, GUI tools, Supabase CLI), verification SQL, troubleshooting, and post-migration checks.
Schema Expansion (Step 1) `migration/step1_add_columns.sql`	Adds embedding columns for 384/768/1024/1536/3072 vectors and metadata fields (llm_chat_model, embedding_model, embedding_dimension) to `archon_crawled_pages` and `archon_code_examples`. Transactional and idempotent.
Data Migration & Cleanup (Step 2) `migration/step2_migrate_data.sql`	Copies 1536-dim embeddings from legacy `embedding` to `embedding_1536`, sets `embedding_dimension` and default `embedding_model`, drops legacy column and related indexes if present. Transactional with safeguards.
Search Functions (Step 3) `migration/step3_create_functions.sql`	Adds helper functions to detect dimension and map to column; adds dimension-parameterized search functions and 1536-dim compatibility wrappers for crawled pages and code examples. Transactional.
Optional Indexes (Step 4) `migration/step4_create_indexes_optional.sql`	Creates IVFFLAT vector indexes for multiple dimensions on both tables and B-tree indexes on embedding metadata; adjusts session memory/timeout; idempotent and sequential index creation.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Admin as Admin/Operator
  participant DB as PostgreSQL
  participant T1 as archon_crawled_pages
  participant T2 as archon_code_examples

  rect rgb(232,240,254)
  note over Admin,DB: Migration Flow (multi-path)
  Admin->>DB: Step 1: add embedding_* and metadata columns
  Admin->>DB: Step 2: migrate legacy embedding -> embedding_1536<br/>drop old column and indexes
  Admin->>DB: Step 3: create helper + multi search + legacy wrappers
  alt Optional
    Admin->>DB: Step 4: create IVFFLAT and B-tree indexes
  else Skipped or partial
    note over DB: Falls back to brute-force scan when IVFFLAT absent
  end
  end

  rect rgb(240,255,240)
  note over Client,DB: Query Flow (search)
  participant Client as App/Service
  Client->>DB: SELECT FROM match_archon_*_multi(query_embedding, dim, ...)
  DB->>DB: get_embedding_column_name(dim)
  DB->>T1: Scan/vector search on chosen embedding column (or T2)
  T1-->>DB: Top-k rows with distance
  DB-->>Client: Rows with similarity and metadata
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Enhanced the hybrid search strategy with tsvector keyword matching #539 — Introduces/updates database hybrid search functions; overlaps conceptually with newly added match_archon_* functions and migration paths.

Poem

I nibbled through columns, five vectors wide,
Hopped past old indexes set aside.
New functions burrow, seeking near,
Optional IVFFLAT—no fear!
If timeouts loom, I pause my run—
Still find the carrots, one by one. 🥕✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title "feat: add step-by-step migration strategy for multi-dimensional embeddings" is concise, accurately describes the primary change (a stepwise migration strategy for multi-dimensional embeddings), and maps directly to the added MIGRATION_GUIDE.md and four migration scripts, so a reviewer scanning history will understand the main intent.
Description Check	✅ Passed	The PR description follows the repository template and includes a Summary, explicit Changes Made, Type of Change with checked boxes, Affected Services, Testing notes, Checklist, Breaking Changes, and Additional Notes describing workflows and when Step 4 may be skipped; it sufficiently documents the four-step migration and target users. It would be helpful to include concrete Test Evidence (specific commands and sample outputs or logs) to make verification easier, but the current description is otherwise complete and informative for reviewers.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/multi-dimension-embedding-migration-workaround

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (17)

migration/step1_add_columns.sql (2)

11-21: Constrain allowed embedding_dimension values.

A CHECK guards accidental writes (e.g., 512) and helps query planners.

Append these constraints after the ALTER (either table-by-table or once per table):
 ALTER TABLE archon_crawled_pages
@@
   ADD COLUMN IF NOT EXISTS embedding_dimension INTEGER;
+
+-- Constrain dimension to known values
+ALTER TABLE archon_crawled_pages
+  ADD CONSTRAINT archon_crawled_pages_embedding_dimension_chk
+  CHECK (embedding_dimension IN (384, 768, 1024, 1536, 3072));
Repeat for archon_code_examples:
 ALTER TABLE archon_code_examples
@@
   ADD COLUMN IF NOT EXISTS embedding_dimension INTEGER;
+
+ALTER TABLE archon_code_examples
+  ADD CONSTRAINT archon_code_examples_embedding_dimension_chk
+  CHECK (embedding_dimension IN (384, 768, 1024, 1536, 3072));
6-8: maintenance_work_mem isn’t needed for ADD COLUMN.

Harmless but not used here; consider moving this to the indexing step only to avoid confusion.

migration/step2_migrate_data.sql (2)

16-20: Filter by schema in information_schema lookups.

Avoid false positives if similarly named tables exist in other schemas.

Apply:
-    WHERE table_name = 'archon_crawled_pages'
+    WHERE table_schema = 'public' AND table_name = 'archon_crawled_pages'
@@
-    WHERE table_name = 'archon_code_examples'
+    WHERE table_schema = 'public' AND table_name = 'archon_code_examples'
Also applies to: 41-45

29-34: Don’t guess the embedding_model.

Setting 'text-embedding-3-small' may be incorrect for legacy data (e.g., ada-002). Prefer leaving as-is or marking as 'legacy-1536' for later curation.

Apply:
-                embedding_model = COALESCE(embedding_model, 'text-embedding-3-small')
+                embedding_model = COALESCE(embedding_model, 'legacy-1536')
(Repeat for both tables.)

Also applies to: 53-58

migration/step3_create_functions.sql (4)

16-29: Use the helper to avoid duplicate CASE logic.

get_embedding_column_name exists; leverage it as single source of truth.

Apply:

 CREATE OR REPLACE FUNCTION get_embedding_column_name(dimension INTEGER)
 RETURNS TEXT AS $$
 BEGIN
-    CASE dimension
-        WHEN 384 THEN RETURN 'embedding_384';
-        WHEN 768 THEN RETURN 'embedding_768';
-        WHEN 1024 THEN RETURN 'embedding_1024';
-        WHEN 1536 THEN RETURN 'embedding_1536';
-        WHEN 3072 THEN RETURN 'embedding_3072';
-        ELSE RAISE EXCEPTION 'Unsupported embedding dimension: %', dimension;
-    END CASE;
+    CASE dimension
+        WHEN 384 THEN RETURN 'embedding_384';
+        WHEN 768 THEN RETURN 'embedding_768';
+        WHEN 1024 THEN RETURN 'embedding_1024';
+        WHEN 1536 THEN RETURN 'embedding_1536';
+        WHEN 3072 THEN RETURN 'embedding_3072';
+        ELSE RAISE EXCEPTION 'Unsupported embedding dimension: %', dimension;
+    END CASE;
 END;
 $$ LANGUAGE plpgsql IMMUTABLE;

(Keep function, then use it below.)

53-61: De-duplicate CASE blocks inside search functions.

Resolve column with the helper to reduce divergence risk.

Apply in both functions:

-  CASE embedding_dimension
-    WHEN 384 THEN embedding_column := 'embedding_384';
-    WHEN 768 THEN embedding_column := 'embedding_768';
-    WHEN 1024 THEN embedding_column := 'embedding_1024';
-    WHEN 1536 THEN embedding_column := 'embedding_1536';
-    WHEN 3072 THEN embedding_column := 'embedding_3072';
-    ELSE RAISE EXCEPTION 'Unsupported embedding dimension: %', embedding_dimension;
-  END CASE;
+  embedding_column := get_embedding_column_name(embedding_dimension);

Also applies to: 101-109

63-75: Compute distance once; order by the alias.

Saves recomputation and can help planners. Also guard against dimension mismatch.

Apply to both functions:

-  sql_query := format('
-    SELECT id, url, chunk_number, content, metadata, source_id,
-           1 - (%I <=> $1) AS similarity
+  -- Optional: verify query embedding dimension matches requested dimension
+  IF detect_embedding_dimension(query_embedding) <> embedding_dimension THEN
+    RAISE EXCEPTION 'Query embedding dimension (%) does not match requested dimension (%)',
+      detect_embedding_dimension(query_embedding), embedding_dimension;
+  END IF;
+
+  sql_query := format('
+    SELECT id, url, chunk_number, content, metadata, source_id,
+           (%I <=> $1) AS distance,
+           1 - (%I <=> $1) AS similarity
     FROM archon_crawled_pages
     WHERE (%I IS NOT NULL)
       AND metadata @> $3
       AND ($4 IS NULL OR source_id = $4)
-    ORDER BY %I <=> $1
+    ORDER BY distance
     LIMIT $2',
-    embedding_column, embedding_column, embedding_column);
+    embedding_column, embedding_column, embedding_column);

And for code examples:

-    SELECT id, url, chunk_number, content, summary, metadata, source_id,
-           1 - (%I <=> $1) AS similarity
+    SELECT id, url, chunk_number, content, summary, metadata, source_id,
+           (%I <=> $1) AS distance,
+           1 - (%I <=> $1) AS similarity
@@
-    ORDER BY %I <=> $1
+    ORDER BY distance
@@
-    embedding_column, embedding_column, embedding_column);
+    embedding_column, embedding_column, embedding_column);

Also applies to: 111-123

39-46: Prefer DOUBLE PRECISION over FLOAT for clarity.

Postgres maps float ambiguously; be explicit.

-  similarity FLOAT
+  similarity DOUBLE PRECISION

(Apply in all RETURNS TABLE signatures.)

Also applies to: 86-94

migration/step4_create_indexes_optional.sql (3)

16-55: Consider adding optional 3072D indexes (commented) for users of text-embedding-3-large.

Keeps parity with available columns; users can uncomment when needed.

Append after Index 8:
+-- (Optional) Index 9 of 10
+-- CREATE INDEX IF NOT EXISTS idx_archon_crawled_pages_embedding_3072
+-- ON archon_crawled_pages USING ivfflat (embedding_3072 vector_cosine_ops)
+-- WITH (lists = 100);
+
+-- (Optional) Index 10 of 10
+-- CREATE INDEX IF NOT EXISTS idx_archon_code_examples_embedding_3072
+-- ON archon_code_examples USING ivfflat (embedding_3072 vector_cosine_ops)
+-- WITH (lists = 100);
56-63: Add GIN indexes on metadata to accelerate JSONB filter.

metadata @> filter benefits heavily from GIN.

Add:
 CREATE INDEX IF NOT EXISTS idx_archon_crawled_pages_embedding_model ON archon_crawled_pages (embedding_model);
@@
 CREATE INDEX IF NOT EXISTS idx_archon_code_examples_llm_chat_model ON archon_code_examples (llm_chat_model);
+
+-- JSONB metadata indexes
+CREATE INDEX IF NOT EXISTS idx_archon_crawled_pages_metadata_gin ON archon_crawled_pages USING GIN (metadata);
+CREATE INDEX IF NOT EXISTS idx_archon_code_examples_metadata_gin ON archon_code_examples USING GIN (metadata);
10-12: Analyze after building IVFFLAT indexes.

IVFFLAT needs ANALYZE for good recall/perf; also consider CONCURRENTLY in production.

Apply:
 SET statement_timeout = '10min';
@@
 RESET maintenance_work_mem;
 RESET statement_timeout;
+
+-- Gather stats for planner / IVFFLAT
+ANALYZE archon_crawled_pages;
+ANALYZE archon_code_examples;
Note: If minimizing write locks is critical, consider CREATE INDEX CONCURRENTLY outside transactions (may run longer on free tier).

Also applies to: 64-66

migration/MIGRATION_GUIDE.md (6)

20-28: Explicitly require pgvector extension.

Avoids confusion when VECTOR type/functions are missing.

Add after “Direct Database Connection” section intro:
+> Prerequisite: Ensure pgvector is installed
+>
+> ```sql
+> CREATE EXTENSION IF NOT EXISTS vector;
+> ```
63-66: Clarify skipping indexes and performance caveat.

Mention 3072D specifically won’t be indexed by default; add ANALYZE note.

Add:
 If you have a small dataset (<10,000 documents), you can skip Step 4 entirely. The system will use brute-force search which is fast enough for small datasets.
+
+Note: 3072‑dimension embeddings (e.g., text‑embedding‑3‑large) are not indexed by default in Step 4. Expect brute‑force for 3072D unless you add those indexes later.
69-82: Verification query may count metadata indexes too; that’s OK—clarify expected ranges.

Minor doc tweak so users aren’t surprised by counts > 8.
- - `index_count`: 8+ (or 0 if you skipped Step 4)
+ - `index_count`: 8–14 (0 if you skipped Step 4), depending on which vector/metadata indexes you created
109-122: Recommend ANALYZE after creating IVFFLAT indexes.

Improves recall/performance immediately after indexing.

Add:
 2. **Test the system**:
@@
 3. **Monitor performance**:
    - If searches are slow without indexes, create them via direct connection
    - Consider using smaller embedding dimensions (384 or 768) for faster performance
+   - After creating IVFFLAT indexes, run:
+     ```sql
+     ANALYZE archon_crawled_pages;
+     ANALYZE archon_code_examples;
+     ```
91-99: Timeout guidance: prefer smaller batches and direct connection; avoid transaction pooler for long DDL.

Add a note about connecting via the Session pooler or direct port to support SET and long‑running ops.
 - Use direct database connection
 - Increase `statement_timeout` setting
+ - Ensure you connect via the Session pooler (not Transaction pooler) or directly to Postgres for long‑running DDL and `SET` commands
52-58: Minor: “db push” wording can confuse; clarify both workflows.

Clarify difference between pushing migration files vs. executing ad‑hoc SQL.
 # Login and link project
@@
-# Run migration
-supabase db push migration/upgrade_database_with_memory_fix.sql
+# Option A: apply local migrations in supabase/migrations
+# supabase db push
+# Option B: run a specific SQL file
+# supabase db query -f migration/upgrade_database_with_memory_fix.sql

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b2ec7df and 1a0a850.

📒 Files selected for processing (5)

migration/MIGRATION_GUIDE.md (1 hunks)
migration/step1_add_columns.sql (1 hunks)
migration/step2_migrate_data.sql (1 hunks)
migration/step3_create_functions.sql (1 hunks)
migration/step4_create_indexes_optional.sql (1 hunks)

coderabbitai · 2025-09-17T08:54:25Z

+# Run migration
+supabase db push migration/upgrade_database_with_memory_fix.sql
+```


⚠️ Potential issue

Supabase CLI command is incorrect for running a single SQL file.

Use db query -f (or psql -f) to execute a script; db push applies migration files from the migrations dir.

Apply:

-# Run migration -supabase db push migration/upgrade_database_with_memory_fix.sql +# Run a specific SQL script +supabase db query -f migration/upgrade_database_with_memory_fix.sql + +# Or with psql +# psql "$CONNECTION_STRING" -f migration/upgrade_database_with_memory_fix.sql

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In migration/MIGRATION_GUIDE.md around lines 59 to 61, the suggested Supabase CLI command is wrong: replace the `supabase db push migration/upgrade_database_with_memory_fix.sql` line with the correct command to run a single SQL file (e.g., `supabase db query -f migration/upgrade_database_with_memory_fix.sql`) or note the alternative `psql -f migration/upgrade_database_with_memory_fix.sql` so the guide shows how to execute a standalone script rather than using `db push` which applies migrations from the migrations directory.

coderabbitai · 2025-09-17T08:54:25Z

+-- Increase memory for this session
+SET maintenance_work_mem = '256MB';
+
+BEGIN;
+


⚠️ Potential issue

Ensure pgvector extension is installed before using VECTOR.

Without CREATE EXTENSION vector, the type VECTOR(n) and ops won’t exist on some setups (esp. fresh DBs or new environments).

Apply this diff near the top (before BEGIN;):

-- Increase memory for this session -SET maintenance_work_mem = '256MB'; +SET maintenance_work_mem = '256MB'; + +-- Ensure pgvector is available +CREATE EXTENSION IF NOT EXISTS vector;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

-- Increase memory for this session

SET maintenance_work_mem = '256MB';

BEGIN;

-- Increase memory for this session

SET maintenance_work_mem = '256MB';

-- Ensure pgvector is available

CREATE EXTENSION IF NOT EXISTS vector;

BEGIN;

🤖 Prompt for AI Agents

In migration/step1_add_columns.sql around lines 6 to 10, the migration assumes the pgvector extension is present but does not ensure it is installed; add a statement before BEGIN to create the extension if missing (e.g., run CREATE EXTENSION IF NOT EXISTS vector) so the VECTOR(n) type and operators are available on fresh DBs or new environments.

coderabbitai · 2025-09-17T08:54:26Z

+    IF crawled_pages_count > 0 THEN
+        -- Detect dimension
+        SELECT vector_dims(embedding) INTO dimension_detected
+        FROM archon_crawled_pages
+        WHERE embedding IS NOT NULL
+        LIMIT 1;
+
+        IF dimension_detected = 1536 THEN
+            UPDATE archon_crawled_pages
+            SET embedding_1536 = embedding,
+                embedding_dimension = 1536,
+                embedding_model = COALESCE(embedding_model, 'text-embedding-3-small')
+            WHERE embedding IS NOT NULL AND embedding_1536 IS NULL;
+        END IF;
+
+        -- Drop old column
+        ALTER TABLE archon_crawled_pages DROP COLUMN IF EXISTS embedding;
+    END IF;


⚠️ Potential issue

Data‑loss risk: non‑1536 embeddings are dropped.

If the old embedding column isn’t 1536D (e.g., 384/768/1024/3072), the script skips copying but still drops the column, losing data.

Replace the DO block with dimension‑aware copy and safe drop:

DO $$ DECLARE - crawled_pages_count INTEGER; - code_examples_count INTEGER; - dimension_detected INTEGER; + crawled_pages_count INTEGER; + code_examples_count INTEGER; + dimension_detected INTEGER; BEGIN @@ - IF crawled_pages_count > 0 THEN - -- Detect dimension - SELECT vector_dims(embedding) INTO dimension_detected - FROM archon_crawled_pages - WHERE embedding IS NOT NULL - LIMIT 1; - - IF dimension_detected = 1536 THEN - UPDATE archon_crawled_pages - SET embedding_1536 = embedding, - embedding_dimension = 1536, - embedding_model = COALESCE(embedding_model, 'text-embedding-3-small') - WHERE embedding IS NOT NULL AND embedding_1536 IS NULL; - END IF; - - -- Drop old column - ALTER TABLE archon_crawled_pages DROP COLUMN IF EXISTS embedding; - END IF; + IF crawled_pages_count > 0 THEN + SELECT vector_dims(embedding) INTO dimension_detected + FROM archon_crawled_pages + WHERE embedding IS NOT NULL + LIMIT 1; + + IF dimension_detected IS NOT NULL THEN + CASE dimension_detected + WHEN 384 THEN + UPDATE archon_crawled_pages + SET embedding_384 = embedding, + embedding_dimension = 384 + WHERE embedding IS NOT NULL AND embedding_384 IS NULL; + WHEN 768 THEN + UPDATE archon_crawled_pages + SET embedding_768 = embedding, + embedding_dimension = 768 + WHERE embedding IS NOT NULL AND embedding_768 IS NULL; + WHEN 1024 THEN + UPDATE archon_crawled_pages + SET embedding_1024 = embedding, + embedding_dimension = 1024 + WHERE embedding IS NOT NULL AND embedding_1024 IS NULL; + WHEN 1536 THEN + UPDATE archon_crawled_pages + SET embedding_1536 = embedding, + embedding_dimension = 1536 + WHERE embedding IS NOT NULL AND embedding_1536 IS NULL; + WHEN 3072 THEN + UPDATE archon_crawled_pages + SET embedding_3072 = embedding, + embedding_dimension = 3072 + WHERE embedding IS NOT NULL AND embedding_3072 IS NULL; + ELSE + RAISE NOTICE 'Unsupported embedding dimension % in archon_crawled_pages; keeping old column', dimension_detected; + END CASE; + END IF; + + -- Only drop after successful copy of a supported dimension + IF dimension_detected IN (384,768,1024,1536,3072) THEN + ALTER TABLE archon_crawled_pages DROP COLUMN IF EXISTS embedding; + END IF; + END IF; @@ - IF code_examples_count > 0 THEN - SELECT vector_dims(embedding) INTO dimension_detected - FROM archon_code_examples - WHERE embedding IS NOT NULL - LIMIT 1; - - IF dimension_detected = 1536 THEN - UPDATE archon_code_examples - SET embedding_1536 = embedding, - embedding_dimension = 1536, - embedding_model = COALESCE(embedding_model, 'text-embedding-3-small') - WHERE embedding IS NOT NULL AND embedding_1536 IS NULL; - END IF; - - ALTER TABLE archon_code_examples DROP COLUMN IF EXISTS embedding; - END IF; + IF code_examples_count > 0 THEN + SELECT vector_dims(embedding) INTO dimension_detected + FROM archon_code_examples + WHERE embedding IS NOT NULL + LIMIT 1; + + IF dimension_detected IS NOT NULL THEN + CASE dimension_detected + WHEN 384 THEN + UPDATE archon_code_examples + SET embedding_384 = embedding, + embedding_dimension = 384 + WHERE embedding IS NOT NULL AND embedding_384 IS NULL; + WHEN 768 THEN + UPDATE archon_code_examples + SET embedding_768 = embedding, + embedding_dimension = 768 + WHERE embedding IS NOT NULL AND embedding_768 IS NULL; + WHEN 1024 THEN + UPDATE archon_code_examples + SET embedding_1024 = embedding, + embedding_dimension = 1024 + WHERE embedding IS NOT NULL AND embedding_1024 IS NULL; + WHEN 1536 THEN + UPDATE archon_code_examples + SET embedding_1536 = embedding, + embedding_dimension = 1536 + WHERE embedding IS NOT NULL AND embedding_1536 IS NULL; + WHEN 3072 THEN + UPDATE archon_code_examples + SET embedding_3072 = embedding, + embedding_dimension = 3072 + WHERE embedding IS NOT NULL AND embedding_3072 IS NULL; + ELSE + RAISE NOTICE 'Unsupported embedding dimension % in archon_code_examples; keeping old column', dimension_detected; + END CASE; + END IF; + + IF dimension_detected IN (384,768,1024,1536,3072) THEN + ALTER TABLE archon_code_examples DROP COLUMN IF EXISTS embedding; + END IF; + END IF; END $$;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

IF crawled_pages_count > 0 THEN

-- Detect dimension

SELECT vector_dims(embedding) INTO dimension_detected

FROM archon_crawled_pages

WHERE embedding IS NOT NULL

LIMIT 1;

IF dimension_detected = 1536 THEN

UPDATE archon_crawled_pages

SET embedding_1536 = embedding,

embedding_dimension = 1536,

embedding_model = COALESCE(embedding_model, 'text-embedding-3-small')

WHERE embedding IS NOT NULL AND embedding_1536 IS NULL;

END IF;

-- Drop old column

ALTER TABLE archon_crawled_pages DROP COLUMN IF EXISTS embedding;

END IF;

DO $$

DECLARE

crawled_pages_count INTEGER;

code_examples_count INTEGER;

dimension_detected INTEGER;

BEGIN

IF crawled_pages_count > 0 THEN

SELECT vector_dims(embedding) INTO dimension_detected

FROM archon_crawled_pages

WHERE embedding IS NOT NULL

LIMIT 1;

IF dimension_detected IS NOT NULL THEN

CASE dimension_detected

WHEN 384 THEN

UPDATE archon_crawled_pages

SET embedding_384 = embedding,

embedding_dimension = 384

WHERE embedding IS NOT NULL AND embedding_384 IS NULL;

WHEN 768 THEN

UPDATE archon_crawled_pages

SET embedding_768 = embedding,

embedding_dimension = 768

WHERE embedding IS NOT NULL AND embedding_768 IS NULL;

WHEN 1024 THEN

UPDATE archon_crawled_pages

SET embedding_1024 = embedding,

embedding_dimension = 1024

WHERE embedding IS NOT NULL AND embedding_1024 IS NULL;

WHEN 1536 THEN

UPDATE archon_crawled_pages

SET embedding_1536 = embedding,

embedding_dimension = 1536

WHERE embedding IS NOT NULL AND embedding_1536 IS NULL;

WHEN 3072 THEN

UPDATE archon_crawled_pages

SET embedding_3072 = embedding,

embedding_dimension = 3072

WHERE embedding IS NOT NULL AND embedding_3072 IS NULL;

ELSE

RAISE NOTICE 'Unsupported embedding dimension % in archon_crawled_pages; keeping old column', dimension_detected;

END CASE;

END IF;

IF dimension_detected IN (384,768,1024,1536,3072) THEN

ALTER TABLE archon_crawled_pages DROP COLUMN IF EXISTS embedding;

END IF;

END IF;

IF code_examples_count > 0 THEN

SELECT vector_dims(embedding) INTO dimension_detected

FROM archon_code_examples

WHERE embedding IS NOT NULL

LIMIT 1;

IF dimension_detected IS NOT NULL THEN

CASE dimension_detected

WHEN 384 THEN

UPDATE archon_code_examples

SET embedding_384 = embedding,

embedding_dimension = 384

WHERE embedding IS NOT NULL AND embedding_384 IS NULL;

WHEN 768 THEN

UPDATE archon_code_examples

SET embedding_768 = embedding,

embedding_dimension = 768

WHERE embedding IS NOT NULL AND embedding_768 IS NULL;

WHEN 1024 THEN

UPDATE archon_code_examples

SET embedding_1024 = embedding,

embedding_dimension = 1024

WHERE embedding IS NOT NULL AND embedding_1024 IS NULL;

WHEN 1536 THEN

UPDATE archon_code_examples

SET embedding_1536 = embedding,

embedding_dimension = 1536

WHERE embedding IS NOT NULL AND embedding_1536 IS NULL;

WHEN 3072 THEN

UPDATE archon_code_examples

SET embedding_3072 = embedding,

embedding_dimension = 3072

WHERE embedding IS NOT NULL AND embedding_3072 IS NULL;

ELSE

RAISE NOTICE 'Unsupported embedding dimension % in archon_code_examples; keeping old column', dimension_detected;

END CASE;

END IF;

IF dimension_detected IN (384,768,1024,1536,3072) THEN

ALTER TABLE archon_code_examples DROP COLUMN IF EXISTS embedding;

END IF;

END IF;

END $$;

coleam00 · 2025-09-20T18:44:52Z

Thanks for this Rasmus! I actually took this split and incorporated it into my latest PR that redoes a lot of our migrations/ folder setup - #718. I'll be closing this PR because of that but I did use this work directly.

…at-state fix: welcoming empty chat state and suppress Disconnected/No project

…empty-chat-state fix: welcoming empty chat state and suppress Disconnected/No project

Wirasm requested review from coleam00 and tazmon95 September 17, 2025 08:47

leex279 mentioned this pull request Sep 17, 2025

🐛 [Bug]: Supabase Timeout trying to Migrate #677

Closed

5 tasks

coderabbitai Bot reviewed Sep 17, 2025

View reviewed changes

coleam00 mentioned this pull request Sep 20, 2025

Migrations and version APIs #718

Merged

24 tasks

coleam00 closed this Sep 20, 2025

Wirasm deleted the feat/multi-dimension-embedding-migration-workaround branch April 6, 2026 07:37

coleam00 added a commit that referenced this pull request Apr 7, 2026

Merge pull request #683 from dynamous-community/task-fix-670-empty-ch…

43c2c9b

…at-state fix: welcoming empty chat state and suppress Disconnected/No project

Tyone88 pushed a commit to Tyone88/Archon that referenced this pull request Apr 16, 2026

Merge pull request coleam00#683 from dynamous-community/task-fix-670-…

1feb926

…empty-chat-state fix: welcoming empty chat state and suppress Disconnected/No project

joaobmonteiro pushed a commit to joaobmonteiro/Archon that referenced this pull request Apr 26, 2026

Merge pull request coleam00#683 from dynamous-community/task-fix-670-…

830aced

…empty-chat-state fix: welcoming empty chat state and suppress Disconnected/No project

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add step-by-step migration strategy for multi-dimensional embeddings#683

feat: add step-by-step migration strategy for multi-dimensional embeddings#683
Wirasm wants to merge 1 commit intomainfrom
feat/multi-dimension-embedding-migration-workaround

Wirasm commented Sep 17, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Sep 17, 2025 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Sep 17, 2025

Uh oh!

coderabbitai Bot Sep 17, 2025

Uh oh!

coderabbitai Bot Sep 17, 2025

Uh oh!

coleam00 commented Sep 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Wirasm commented Sep 17, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request

Summary

Changes Made

Type of Change

Affected Services

Testing

Test Evidence

Checklist

Breaking Changes

Additional Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

coleam00 commented Sep 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Wirasm commented Sep 17, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Sep 17, 2025 •

edited

Loading