-
Notifications
You must be signed in to change notification settings - Fork 0
Feat/data ingestion system #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| node.exe : Initialising login role... | ||
| 위치 C:\Program Files\nodejs\npx.ps1:28 문자: | ||
| 12 | ||
| + $input | & $NODE_EXE $NPX_CLI_JS $arg | ||
| s | ||
| + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
| ~ | ||
| + CategoryInfo : NotSpecifi | ||
| ed: (Initialising login role...:Stri | ||
| ng) [], RemoteException | ||
| + FullyQualifiedErrorId : NativeComm | ||
| andError | ||
|
|
||
| Connecting to remote database... | ||
| Do you want to push these migrations to t | ||
| he remote database? | ||
| ??20260225141500_add_hnsw_index.sql | ||
|
|
||
| [Y/n] y | ||
| Applying migration 20260225141500_add_hns | ||
| w_index.sql... | ||
| Finished supabase db push. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -8,3 +8,4 @@ sse-starlette | |
| pydantic>=2.7.0 | ||
| pydantic-settings | ||
| python-dotenv | ||
| fastembed | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
43 changes: 43 additions & 0 deletions
43
supabase/migrations/20260226140500_update_vector_to_mini_lm.sql
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| -- This migration changes the `embedding` column dimension from 3072 to 384 | ||
| -- to support the local `all-MiniLM-L6-v2` model. | ||
|
|
||
| -- 1. Drop the existing HNSW index and match_documents function | ||
| DROP INDEX IF EXISTS documents_embedding_idx; | ||
| DROP FUNCTION IF EXISTS match_documents; | ||
|
|
||
| -- 2. Clear existing incompatible 3072-dimension vectors to avoid casting errors | ||
| TRUNCATE TABLE documents; | ||
|
|
||
| -- 3. Alter the column type now that the table is empty | ||
| ALTER TABLE documents | ||
| ALTER COLUMN embedding TYPE vector(384); | ||
|
|
||
| -- 3. Recreate the match_documents function with the new dimension | ||
| create or replace function match_documents ( | ||
| query_embedding vector(384), | ||
| match_count int DEFAULT null, | ||
| filter jsonb DEFAULT '{}' | ||
| ) returns table ( | ||
| id uuid, | ||
| content text, | ||
| metadata jsonb, | ||
| similarity float | ||
| ) | ||
| language plpgsql | ||
| as $$ | ||
| begin | ||
| return query | ||
| select | ||
| documents.id, | ||
| documents.content, | ||
| documents.metadata, | ||
| 1 - (documents.embedding <=> query_embedding) as similarity | ||
| from documents | ||
| where documents.metadata @> filter | ||
| order by documents.embedding <=> query_embedding | ||
| limit match_count; | ||
| end; | ||
| $$; | ||
|
|
||
| -- 4. Recreate the HNSW index for the 384 dimension | ||
| CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops); | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
Repository: SanghunYun95/philo-rag
Length of output: 1600
🏁 Script executed:
Repository: SanghunYun95/philo-rag
Length of output: 88
🏁 Script executed:
Repository: SanghunYun95/philo-rag
Length of output: 533
🏁 Script executed:
# Check supabase config cat supabase/config.tomlRepository: SanghunYun95/philo-rag
Length of output: 14313
🏁 Script executed:
Repository: SanghunYun95/philo-rag
Length of output: 48
🏁 Script executed:
Repository: SanghunYun95/philo-rag
Length of output: 131
Add environment guard or explicit production migration plan for
TRUNCATE TABLE documents.This migration irreversibly removes all documents during deployment. While
TRUNCATEis technically necessary to alter the embedding column type from vector(3072) to vector(384), the migration file contains no environment checks. If deployed to production without explicit intent, all indexed documents will be deleted.Either add a conditional guard (e.g., check environment variable) or document this as a breaking change requiring explicit approval for production deployments.
🤖 Prompt for AI Agents