Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration files delete vector tables with pgvector in PostgreSQL #11835

Open
5 tasks done
faye1225 opened this issue Dec 19, 2024 · 1 comment
Open
5 tasks done

Migration files delete vector tables with pgvector in PostgreSQL #11835

faye1225 opened this issue Dec 19, 2024 · 1 comment
Labels
🐞 bug Something isn't working

Comments

@faye1225
Copy link
Contributor

faye1225 commented Dec 19, 2024

Self Checks

  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

0.14.1

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

When deploying Dify with pgvector as the vector store, the vector tables are stored with names starting with embedding_vector_index_ in PostgreSQL. However, when generating migration files, the new migration will delete all these vector tables.

There are two potential solutions to address this issue:

1. Store the base data in one database and store the vectors in another database.
2. a line of code to the migration file to include an exception for these vector tables:

/api/migrations/env.py

.....
def include_name(name, type_, parent_names):
    if type_ == "table":
        # takin command:忽略所有 embedding_vector_index 开头的表
        return not name.startswith('embedding_vector_index_')
    return True

...


def run_migrations_online():
...
    with connectable.connect() as connection:
        context.configure(
            connection=connection,
            target_metadata=get_metadata(),
            process_revision_directives=process_revision_directives,
            include_object=include_object,
            include_name=include_name, // add this
            **current_app.extensions['migrate'].configure_args
        )


....

✔️ Expected Behavior

Migration files should not delete vector tables, or they should provide a way to manage these tables properly in the migration process.

❌ Actual Behavior

No response

Copy link

dosubot bot commented Dec 19, 2024

To prevent migration files from deleting vector tables with names starting with embedding_vector_index_ in PostgreSQL, you can implement a custom function in the env.py file to exclude these tables from migrations. The include_name function you've mentioned is a suitable approach. Here's how it works:

  1. Custom Function: The include_name function checks the name of each table during the migration process. If the table name starts with embedding_vector_index_, it returns False, effectively excluding these tables from being affected by the migration.

  2. Integration in env.py: You need to ensure that this function is integrated into the migration context configuration. This is done by adding include_name=include_name to the context.configure call in the run_migrations_online function.

Here's a brief overview of the code snippet you provided:

def include_name(name, type_, parent_names):
    if type_ == "table":
        # Ignore all tables starting with 'embedding_vector_index_'
        return not name.startswith('embedding_vector_index_')
    return True

def run_migrations_online():
    ...
    with connectable.connect() as connection:
        context.configure(
            connection=connection,
            target_metadata=get_metadata(),
            process_revision_directives=process_revision_directives,
            include_object=include_object,
            include_name=include_name,  # Add this line
            **current_app.extensions['migrate'].configure_args
        )
    ...

By implementing this change, you ensure that the migration process respects the presence of these vector tables and does not delete them inadvertently. This approach allows you to manage your vector tables effectively without needing to separate them into a different database.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@dosubot dosubot bot added the 🐞 bug Something isn't working label Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant